Challenge 42: Search Queries — Syntax and Filters
45-60 min | Cost: ~$0.10 (queries against existing index) | Domain: Knowledge Mining & Extraction (15-20%)
Exam skills covered
| Skill | Weight |
|---|---|
| Query an index using simple syntax | High |
| Query an index using full Lucene syntax | High |
| Apply filters with OData expressions | High |
| Implement sorting, paging, and field selection | Medium |
| Implement faceted navigation | Medium |
| Use wildcards and fuzzy search | Medium |
Overview
Azure AI Search supports two query parsers:
| Parser | Syntax | Use case |
|---|---|---|
| Simple (default) | +term -term "phrase" *suffix | User-facing search boxes |
| Full Lucene | field:term~2 /regex/ term^boost | Advanced developer queries |
Key query parameters:
search: The search text (simple or Lucene syntax)$filter: OData filter expression for exact matching$orderby: Sort results$select: Choose which fields to return$top/$skip: Pagination$count: Include total count in responsefacets: Aggregate field values for navigation
Prerequisites
- Completed Challenge 40 (index with enriched documents)
- Python 3.9+ with
azure-search-documents>=11.4.0 - .NET 8 with
Azure.Search.Documents - At least 10+ documents indexed for meaningful results
Implementation
Task 1: Simple query syntax
- Python SDK
- C# SDK
- REST API
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
endpoint = f"https://{SEARCH_SERVICE}.search.windows.net"
credential = AzureKeyCredential(SEARCH_KEY)
search_client = SearchClient(endpoint=endpoint, index_name="documents-index", credential=credential)
# Simple search — finds documents containing "Azure" AND "cognitive"
results = search_client.search(
search_text="Azure cognitive",
include_total_count=True,
top=5
)
print(f"Total matching documents: {results.get_count()}")
for result in results:
print(f" Score: {result['@search.score']:.4f} | {result['metadata_storage_name']}")
# Phrase search — exact phrase match
results = search_client.search(search_text='"Azure AI services"')
for result in results:
print(f" Phrase match: {result['metadata_storage_name']}")
# Boolean operators in simple syntax (+ required, - excluded, | OR)
results = search_client.search(search_text="+Azure -deprecated | cognitive")
using Azure.Search.Documents;
using Azure.Search.Documents.Models;
var searchClient = new SearchClient(
new Uri($"https://{searchService}.search.windows.net"),
"documents-index",
new AzureKeyCredential(searchKey));
// Simple search
var options = new SearchOptions
{
IncludeTotalCount = true,
Size = 5
};
var results = await searchClient.SearchAsync<SearchDocument>("Azure cognitive", options);
Console.WriteLine($"Total: {results.Value.TotalCount}");
await foreach (var result in results.Value.GetResultsAsync())
{
Console.WriteLine($" Score: {result.Score:F4} | {result.Document["metadata_storage_name"]}");
}
// Phrase search
var phraseResults = await searchClient.SearchAsync<SearchDocument>("\"Azure AI services\"");
# Simple search
curl -s "https://${SEARCH_SERVICE}.search.windows.net/indexes/documents-index/docs?api-version=2024-07-01&search=Azure+cognitive&\$count=true&\$top=5" \
-H "api-key: ${SEARCH_KEY}" | python -m json.tool
# Phrase search
curl -s "https://${SEARCH_SERVICE}.search.windows.net/indexes/documents-index/docs?api-version=2024-07-01&search=%22Azure+AI+services%22&\$count=true" \
-H "api-key: ${SEARCH_KEY}" | python -m json.tool
Task 2: Full Lucene syntax
- Python SDK
- C# SDK
- REST API
from azure.search.documents.models import QueryType
# Fuzzy search — finds "cognitive" even if user types "cogntive" (edit distance 1)
results = search_client.search(
search_text="cogntive~1",
query_type=QueryType.FULL
)
for result in results:
print(f" Fuzzy match: {result['metadata_storage_name']}")
# Wildcard search — prefix matching
results = search_client.search(
search_text="micro*",
query_type=QueryType.FULL
)
# Proximity search — "Azure" and "services" within 3 words of each other
results = search_client.search(
search_text='"Azure services"~3',
query_type=QueryType.FULL
)
# Boosted terms — "AI" is 4x more important than "cloud"
results = search_client.search(
search_text="AI^4 cloud",
query_type=QueryType.FULL
)
# Field-scoped search — search only in keyphrases field
results = search_client.search(
search_text="keyphrases:machine learning",
query_type=QueryType.FULL
)
// Fuzzy search
var fuzzyOptions = new SearchOptions { QueryType = SearchQueryType.Full };
var fuzzyResults = await searchClient.SearchAsync<SearchDocument>("cogntive~1", fuzzyOptions);
// Wildcard search
var wildcardResults = await searchClient.SearchAsync<SearchDocument>("micro*", fuzzyOptions);
// Proximity search
var proximityResults = await searchClient.SearchAsync<SearchDocument>(
"\"Azure services\"~3", fuzzyOptions);
// Boosted terms
var boostedResults = await searchClient.SearchAsync<SearchDocument>("AI^4 cloud", fuzzyOptions);
// Field-scoped search
var fieldResults = await searchClient.SearchAsync<SearchDocument>(
"keyphrases:\"machine learning\"", fuzzyOptions);
# Fuzzy search (queryType=full enables Lucene syntax)
curl -s -X POST "https://${SEARCH_SERVICE}.search.windows.net/indexes/documents-index/docs/search?api-version=2024-07-01" \
-H "Content-Type: application/json" \
-H "api-key: ${SEARCH_KEY}" \
-d '{
"search": "cogntive~1",
"queryType": "full",
"count": true
}'
# Wildcard and boosted search
curl -s -X POST "https://${SEARCH_SERVICE}.search.windows.net/indexes/documents-index/docs/search?api-version=2024-07-01" \
-H "Content-Type: application/json" \
-H "api-key: ${SEARCH_KEY}" \
-d '{
"search": "AI^4 cloud",
"queryType": "full",
"count": true
}'
Task 3: OData filters
- Python SDK
- C# SDK
- REST API
# Filter by language
results = search_client.search(
search_text="*",
filter="language eq 'en'",
include_total_count=True
)
print(f"English documents: {results.get_count()}")
# Filter with collection — any keyphrase matches
results = search_client.search(
search_text="*",
filter="keyphrases/any(k: k eq 'machine learning')"
)
# Combine search + filter
results = search_client.search(
search_text="Azure",
filter="language eq 'en' and wordCount gt 100",
order_by=["wordCount desc"],
select=["metadata_storage_name", "language", "wordCount"]
)
for result in results:
print(f" {result['metadata_storage_name']} | Words: {result.get('wordCount', 'N/A')}")
# Comparison operators: eq, ne, gt, ge, lt, le
# Logical operators: and, or, not
# Collection operators: any(), all()
# Functions: search.in(), geo.distance(), geo.intersects()
results = search_client.search(
search_text="*",
filter="search.in(language, 'en,fr,de', ',')"
)
// Filter by language
var filterOptions = new SearchOptions
{
Filter = "language eq 'en'",
IncludeTotalCount = true
};
var filtered = await searchClient.SearchAsync<SearchDocument>("*", filterOptions);
Console.WriteLine($"English documents: {filtered.Value.TotalCount}");
// Collection filter
var collectionOptions = new SearchOptions
{
Filter = "keyphrases/any(k: k eq 'machine learning')"
};
var collResults = await searchClient.SearchAsync<SearchDocument>("*", collectionOptions);
// Combined search + filter + sort + select
var combinedOptions = new SearchOptions
{
Filter = "language eq 'en' and wordCount gt 100",
IncludeTotalCount = true
};
combinedOptions.OrderBy.Add("wordCount desc");
combinedOptions.Select.Add("metadata_storage_name");
combinedOptions.Select.Add("language");
combinedOptions.Select.Add("wordCount");
var combined = await searchClient.SearchAsync<SearchDocument>("Azure", combinedOptions);
# Filter with search
curl -s -X POST "https://${SEARCH_SERVICE}.search.windows.net/indexes/documents-index/docs/search?api-version=2024-07-01" \
-H "Content-Type: application/json" \
-H "api-key: ${SEARCH_KEY}" \
-d '{
"search": "Azure",
"filter": "language eq '\''en'\'' and wordCount gt 100",
"orderby": "wordCount desc",
"select": "metadata_storage_name,language,wordCount",
"count": true
}'
# Collection filter
curl -s -X POST "https://${SEARCH_SERVICE}.search.windows.net/indexes/documents-index/docs/search?api-version=2024-07-01" \
-H "Content-Type: application/json" \
-H "api-key: ${SEARCH_KEY}" \
-d '{
"search": "*",
"filter": "keyphrases/any(k: k eq '\''machine learning'\'')",
"count": true
}'
Task 4: Pagination and field selection
- Python SDK
- C# SDK
- REST API
# Paginated results — page 1 (items 1-10)
page1 = search_client.search(
search_text="*",
top=10,
skip=0,
include_total_count=True,
select=["metadata_storage_name", "language", "keyphrases"]
)
print(f"Total: {page1.get_count()}")
for doc in page1:
print(f" {doc['metadata_storage_name']}")
# Page 2 (items 11-20)
page2 = search_client.search(
search_text="*",
top=10,
skip=10,
select=["metadata_storage_name", "language", "keyphrases"]
)
# Sorting by multiple fields
results = search_client.search(
search_text="*",
order_by=["language asc", "metadata_storage_name asc"],
top=20
)
// Paginated results
var pageOptions = new SearchOptions
{
Size = 10,
Skip = 0,
IncludeTotalCount = true
};
pageOptions.Select.Add("metadata_storage_name");
pageOptions.Select.Add("language");
pageOptions.Select.Add("keyphrases");
var page1 = await searchClient.SearchAsync<SearchDocument>("*", pageOptions);
Console.WriteLine($"Total: {page1.Value.TotalCount}");
// Page 2
pageOptions.Skip = 10;
var page2 = await searchClient.SearchAsync<SearchDocument>("*", pageOptions);
// Multi-field sort
var sortOptions = new SearchOptions { Size = 20 };
sortOptions.OrderBy.Add("language asc");
sortOptions.OrderBy.Add("metadata_storage_name asc");
var sorted = await searchClient.SearchAsync<SearchDocument>("*", sortOptions);
# Paginated query
curl -s -X POST "https://${SEARCH_SERVICE}.search.windows.net/indexes/documents-index/docs/search?api-version=2024-07-01" \
-H "Content-Type: application/json" \
-H "api-key: ${SEARCH_KEY}" \
-d '{
"search": "*",
"top": 10,
"skip": 0,
"count": true,
"select": "metadata_storage_name,language,keyphrases",
"orderby": "language asc, metadata_storage_name asc"
}'
Task 5: Faceted navigation
- Python SDK
- C# SDK
- REST API
# Facets — aggregate values for building filter UI
results = search_client.search(
search_text="*",
facets=["language,count:10", "keyphrases,count:20"],
include_total_count=True
)
print(f"Total results: {results.get_count()}")
print("\nLanguage facets:")
for facet in results.get_facets().get("language", []):
print(f" {facet['value']}: {facet['count']} documents")
print("\nTop keyphrases:")
for facet in results.get_facets().get("keyphrases", []):
print(f" {facet['value']}: {facet['count']} documents")
# Combine facets with a filter (drill-down)
results = search_client.search(
search_text="*",
filter="language eq 'en'",
facets=["keyphrases,count:10"],
)
print("\nTop keyphrases (English only):")
for facet in results.get_facets().get("keyphrases", []):
print(f" {facet['value']}: {facet['count']}")
var facetOptions = new SearchOptions
{
IncludeTotalCount = true
};
facetOptions.Facets.Add("language,count:10");
facetOptions.Facets.Add("keyphrases,count:20");
var facetResults = await searchClient.SearchAsync<SearchDocument>("*", facetOptions);
Console.WriteLine($"Total: {facetResults.Value.TotalCount}");
foreach (var facet in facetResults.Value.Facets["language"])
{
Console.WriteLine($" {facet.Value}: {facet.Count}");
}
// Drill-down with filter
var drillOptions = new SearchOptions { Filter = "language eq 'en'" };
drillOptions.Facets.Add("keyphrases,count:10");
var drillResults = await searchClient.SearchAsync<SearchDocument>("*", drillOptions);
# Faceted search
curl -s -X POST "https://${SEARCH_SERVICE}.search.windows.net/indexes/documents-index/docs/search?api-version=2024-07-01" \
-H "Content-Type: application/json" \
-H "api-key: ${SEARCH_KEY}" \
-d '{
"search": "*",
"facets": ["language,count:10", "keyphrases,count:20"],
"count": true
}'
Expected Output
{
"@odata.count": 42,
"@search.facets": {
"language": [
{"value": "en", "count": 35},
{"value": "fr", "count": 4},
{"value": "de", "count": 3}
],
"keyphrases": [
{"value": "machine learning", "count": 12},
{"value": "Azure AI", "count": 10},
{"value": "cognitive services", "count": 8}
]
},
"value": [...]
}
Break & fix
| # | Scenario | Symptom | Root Cause | Fix |
|---|---|---|---|---|
| 1 | Filter on non-filterable field | HTTP 400: "Field 'content' is not filterable" | The content field was defined with filterable: false | Update index schema to add filterable: true or filter on a field that is filterable |
| 2 | Facet on non-facetable field | HTTP 400: "Field is not facetable" | Field lacks facetable attribute in index definition | Update index to make the field facetable (requires re-index if changing type) |
| 3 | Full Lucene syntax not working | Wildcards/fuzzy treated as literal text | Missing queryType=full — defaults to simple | Set query_type=QueryType.FULL (Python) or QueryType = SearchQueryType.Full (C#) |
| 4 | $orderby fails | "Cannot sort on field 'keyphrases'" | Collection fields (Collection(Edm.String)) cannot be sorted | Sort on a scalar field only; use scoring profiles for relevance tuning |
| 5 | Pagination returns duplicates | Same documents appear on different pages | Index was modified between page requests; use continuation tokens for consistency | Use search_after for deep pagination or accept eventual consistency |
Knowledge Check
1. You want users to search for 'programing' and still find documents containing 'programming'. Which query syntax supports this?
2. You need to filter documents where ANY keyphrase equals 'machine learning'. Which OData filter is correct?
3. What is the maximum value allowed for $skip in Azure AI Search?
4. A field is defined as 'searchable: true, filterable: false, facetable: true'. Which operation will FAIL?
5. You configure facets=['language,count:5']. What does the 'count:5' parameter control?
Cleanup
No additional resources created in this challenge (uses existing index from Challenge 40).
# If you want to clean up everything:
az group delete --name rg-ai102-search --yes --no-wait