Skip to main content

Challenge 41: Custom Skills in Azure AI Search

Estimated Time

60-90 min | Cost: ~$0.50 (Free tier Search + Function App consumption) | Domain: Knowledge Mining & Extraction (15-20%)

Exam skills covered

SkillWeight
Implement a custom skill for Azure AI SearchHigh
Design the custom skill input/output contractHigh
Integrate a custom skill into a skillsetHigh
Deploy an Azure Function for custom skill processingMedium

Overview

While built-in skills cover common scenarios (entity recognition, key phrases, OCR), custom skills let you add any processing logic to the enrichment pipeline. A custom skill is a Web API endpoint (typically an Azure Function) that conforms to the custom skill interface contract.

Custom Skill Contract

The custom Web API skill expects:

  • Input: JSON with a values array, each containing a recordId and data object
  • Output: JSON with a values array, each containing recordId, data (enriched results), errors, and warnings
// INPUT (from indexer to your function)
{
"values": [
{
"recordId": "1",
"data": {
"text": "The contract was signed on January 15, 2024 for $50,000."
}
}
]
}

// OUTPUT (from your function back to indexer)
{
"values": [
{
"recordId": "1",
"data": {
"contractDate": "2024-01-15",
"contractAmount": 50000.00
},
"errors": [],
"warnings": []
}
]
}

Prerequisites

  • Completed Challenge 40 (or equivalent AI Search setup)
  • Azure Functions Core Tools v4
  • Python 3.9+ or .NET 8 SDK
  • azure-search-documents>=11.4.0

Implementation

Task 1: Create the Azure Function (Custom Skill)

The function extracts custom metadata — in this example, word count and reading time.

# function_app.py
import azure.functions as func
import json
import logging

app = func.FunctionApp()

@app.route(route="custom-skill", methods=["POST"], auth_level=func.AuthLevel.FUNCTION)
def custom_skill(req: func.HttpRequest) -> func.HttpResponse:
"""Custom skill: calculates word count and estimated reading time."""
logging.info("Custom skill invoked")

try:
body = req.get_json()
except ValueError:
return func.HttpResponse("Invalid JSON", status_code=400)

values = body.get("values", [])
results = []

for record in values:
record_id = record["recordId"]
text = record.get("data", {}).get("text", "")

try:
word_count = len(text.split()) if text else 0
reading_time_minutes = round(word_count / 200, 1) # avg 200 wpm

results.append({
"recordId": record_id,
"data": {
"wordCount": word_count,
"readingTimeMinutes": reading_time_minutes
},
"errors": [],
"warnings": []
})
except Exception as e:
results.append({
"recordId": record_id,
"data": {},
"errors": [{"message": str(e)}],
"warnings": []
})

return func.HttpResponse(
json.dumps({"values": results}),
mimetype="application/json"
)
# requirements.txt
azure-functions
// host.json
{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": { "isEnabled": true }
}
}
}

Task 2: Deploy the Azure Function

# Create Function App
FUNC_APP="func-customskill-$(openssl rand -hex 4)"
FUNC_STORAGE="stfunc$(openssl rand -hex 4)"

az storage account create \
--name $FUNC_STORAGE \
--resource-group $RG \
--location $LOCATION \
--sku Standard_LRS

az functionapp create \
--name $FUNC_APP \
--resource-group $RG \
--storage-account $FUNC_STORAGE \
--consumption-plan-location $LOCATION \
--runtime python \
--runtime-version 3.11 \
--functions-version 4 \
--os-type Linux

# Deploy the function (from the function project directory)
func azure functionapp publish $FUNC_APP

# Get the function URL with key
FUNC_URL=$(az functionapp function show \
--resource-group $RG \
--name $FUNC_APP \
--function-name "custom-skill" \
--query "invokeUrlTemplate" -o tsv)

FUNC_KEY=$(az functionapp function keys list \
--resource-group $RG \
--name $FUNC_APP \
--function-name "custom-skill" \
--query "default" -o tsv)

SKILL_URI="${FUNC_URL}?code=${FUNC_KEY}"
echo "Custom skill URI: $SKILL_URI"

Task 3: Test the custom skill endpoint

# Test locally or against deployed function
curl -X POST "$SKILL_URI" \
-H "Content-Type: application/json" \
-d '{
"values": [
{
"recordId": "test-1",
"data": {
"text": "Azure AI Search provides indexing and querying over content stored in various data sources. It supports AI enrichment through skillsets."
}
}
]
}'

Expected response:

{
"values": [
{
"recordId": "test-1",
"data": {
"wordCount": 23,
"readingTimeMinutes": 0.1
},
"errors": [],
"warnings": []
}
]
}

Task 4: Integrate custom skill into the skillset

from azure.search.documents.indexes.models import (
SearchIndexerSkillset,
WebApiSkill,
InputFieldMappingEntry,
OutputFieldMappingEntry,
)

custom_skill = WebApiSkill(
name="word-count-skill",
description="Calculates word count and reading time",
context="/document",
uri=SKILL_URI,
http_method="POST",
timeout="PT30S",
batch_size=10,
inputs=[
InputFieldMappingEntry(name="text", source="/document/content")
],
outputs=[
OutputFieldMappingEntry(name="wordCount", target_name="wordCount"),
OutputFieldMappingEntry(name="readingTimeMinutes", target_name="readingTime"),
]
)

# Add to existing skillset (alongside built-in skills from Challenge 40)
skillset = SearchIndexerSkillset(
name="document-skillset",
description="Enrichment with built-in and custom skills",
skills=[key_phrase_skill, entity_skill, language_skill, custom_skill],
cognitive_services_account=CognitiveServicesAccountKey(key=AI_KEY)
)

indexer_client.create_or_update_skillset(skillset)
print("Skillset updated with custom skill")

Task 5: Update the index and re-run the indexer

from azure.search.documents.indexes.models import SimpleField, SearchFieldDataType

# Add new fields to the index for custom skill output
index = index_client.get_index("documents-index")
index.fields.append(
SimpleField(name="wordCount", type=SearchFieldDataType.Int32, filterable=True, sortable=True)
)
index.fields.append(
SimpleField(name="readingTime", type=SearchFieldDataType.Double, filterable=True, sortable=True)
)
index_client.create_or_update_index(index)

# Update indexer output field mappings
indexer = indexer_client.get_indexer("document-indexer")
indexer.output_field_mappings.append(
FieldMapping(source_field_name="/document/wordCount", target_field_name="wordCount")
)
indexer.output_field_mappings.append(
FieldMapping(source_field_name="/document/readingTime", target_field_name="readingTime")
)
indexer_client.create_or_update_indexer(indexer)

# Reset and re-run
indexer_client.reset_indexer("document-indexer")
indexer_client.run_indexer("document-indexer")
print("Indexer reset and re-running with custom skill")

Expected Output

After re-indexing, documents should contain custom skill enrichments:

{
"value": [
{
"id": "aHR0cHM6Ly9...",
"content": "Azure AI services provide cloud-based AI capabilities...",
"wordCount": 23,
"readingTime": 0.1,
"keyphrases": ["cloud-based AI capabilities", "cognitive services"]
}
]
}

Break & fix

#ScenarioSymptomRoot CauseFix
1Function returns 401Indexer shows "Web API skill response was not valid"Function key in skill URI is incorrect or expiredRegenerate function key and update the skill URI
2Skill timeout errorsWebApiSkillExecutionError with timeout messageFunction cold start exceeds default 30s timeoutIncrease timeout to PT230S (max) or use Premium plan to avoid cold starts
3Enrichment output is emptyCustom fields are null in index, but no errorstargetName in skill output doesn't match the outputFieldMapping source pathEnsure /document/{targetName} in outputFieldMappings matches skill output targetName exactly
4Batch processing failsSome records succeed, others show errorsFunction doesn't handle the values array — processes only first recordEnsure function iterates ALL records in values array and returns matching recordId for each
5CORS/network error"Unable to connect to custom skill endpoint"Function App has IP restrictions or network isolation enabledAdd Search service outbound IPs to Function App allowed list, or use Private Endpoint

Knowledge Check

1. Your custom Web API skill processes documents but the indexer reports errors for some records. What is the MOST LIKELY cause if the function returns HTTP 200 but certain records show errors?

2. What is the maximum timeout value you can set for a custom Web API skill?

3. You want your custom skill to process 25 documents per request for efficiency. Which property controls this behavior?

4. A custom skill must match each input record with its corresponding output. What field links them together?

5. Which @odata.type value identifies a custom Web API skill in the skillset JSON definition?

Cleanup

az group delete --name rg-ai102-search --yes --no-wait

Learn More