Skip to main content

Challenge 42: Search Queries — Syntax and Filters

Estimated Time

45-60 min | Cost: ~$0.10 (queries against existing index) | Domain: Knowledge Mining & Extraction (15-20%)

Exam skills covered

SkillWeight
Query an index using simple syntaxHigh
Query an index using full Lucene syntaxHigh
Apply filters with OData expressionsHigh
Implement sorting, paging, and field selectionMedium
Implement faceted navigationMedium
Use wildcards and fuzzy searchMedium

Overview

Azure AI Search supports two query parsers:

ParserSyntaxUse case
Simple (default)+term -term "phrase" *suffixUser-facing search boxes
Full Lucenefield:term~2 /regex/ term^boostAdvanced developer queries

Key query parameters:

  • search: The search text (simple or Lucene syntax)
  • $filter: OData filter expression for exact matching
  • $orderby: Sort results
  • $select: Choose which fields to return
  • $top / $skip: Pagination
  • $count: Include total count in response
  • facets: Aggregate field values for navigation

Prerequisites

  • Completed Challenge 40 (index with enriched documents)
  • Python 3.9+ with azure-search-documents>=11.4.0
  • .NET 8 with Azure.Search.Documents
  • At least 10+ documents indexed for meaningful results

Implementation

Task 1: Simple query syntax

from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient

endpoint = f"https://{SEARCH_SERVICE}.search.windows.net"
credential = AzureKeyCredential(SEARCH_KEY)
search_client = SearchClient(endpoint=endpoint, index_name="documents-index", credential=credential)

# Simple search — finds documents containing "Azure" AND "cognitive"
results = search_client.search(
search_text="Azure cognitive",
include_total_count=True,
top=5
)

print(f"Total matching documents: {results.get_count()}")
for result in results:
print(f" Score: {result['@search.score']:.4f} | {result['metadata_storage_name']}")

# Phrase search — exact phrase match
results = search_client.search(search_text='"Azure AI services"')
for result in results:
print(f" Phrase match: {result['metadata_storage_name']}")

# Boolean operators in simple syntax (+ required, - excluded, | OR)
results = search_client.search(search_text="+Azure -deprecated | cognitive")

Task 2: Full Lucene syntax

from azure.search.documents.models import QueryType

# Fuzzy search — finds "cognitive" even if user types "cogntive" (edit distance 1)
results = search_client.search(
search_text="cogntive~1",
query_type=QueryType.FULL
)
for result in results:
print(f" Fuzzy match: {result['metadata_storage_name']}")

# Wildcard search — prefix matching
results = search_client.search(
search_text="micro*",
query_type=QueryType.FULL
)

# Proximity search — "Azure" and "services" within 3 words of each other
results = search_client.search(
search_text='"Azure services"~3',
query_type=QueryType.FULL
)

# Boosted terms — "AI" is 4x more important than "cloud"
results = search_client.search(
search_text="AI^4 cloud",
query_type=QueryType.FULL
)

# Field-scoped search — search only in keyphrases field
results = search_client.search(
search_text="keyphrases:machine learning",
query_type=QueryType.FULL
)

Task 3: OData filters

# Filter by language
results = search_client.search(
search_text="*",
filter="language eq 'en'",
include_total_count=True
)
print(f"English documents: {results.get_count()}")

# Filter with collection — any keyphrase matches
results = search_client.search(
search_text="*",
filter="keyphrases/any(k: k eq 'machine learning')"
)

# Combine search + filter
results = search_client.search(
search_text="Azure",
filter="language eq 'en' and wordCount gt 100",
order_by=["wordCount desc"],
select=["metadata_storage_name", "language", "wordCount"]
)

for result in results:
print(f" {result['metadata_storage_name']} | Words: {result.get('wordCount', 'N/A')}")

# Comparison operators: eq, ne, gt, ge, lt, le
# Logical operators: and, or, not
# Collection operators: any(), all()
# Functions: search.in(), geo.distance(), geo.intersects()
results = search_client.search(
search_text="*",
filter="search.in(language, 'en,fr,de', ',')"
)

Task 4: Pagination and field selection

# Paginated results — page 1 (items 1-10)
page1 = search_client.search(
search_text="*",
top=10,
skip=0,
include_total_count=True,
select=["metadata_storage_name", "language", "keyphrases"]
)
print(f"Total: {page1.get_count()}")
for doc in page1:
print(f" {doc['metadata_storage_name']}")

# Page 2 (items 11-20)
page2 = search_client.search(
search_text="*",
top=10,
skip=10,
select=["metadata_storage_name", "language", "keyphrases"]
)

# Sorting by multiple fields
results = search_client.search(
search_text="*",
order_by=["language asc", "metadata_storage_name asc"],
top=20
)

Task 5: Faceted navigation

# Facets — aggregate values for building filter UI
results = search_client.search(
search_text="*",
facets=["language,count:10", "keyphrases,count:20"],
include_total_count=True
)

print(f"Total results: {results.get_count()}")
print("\nLanguage facets:")
for facet in results.get_facets().get("language", []):
print(f" {facet['value']}: {facet['count']} documents")

print("\nTop keyphrases:")
for facet in results.get_facets().get("keyphrases", []):
print(f" {facet['value']}: {facet['count']} documents")

# Combine facets with a filter (drill-down)
results = search_client.search(
search_text="*",
filter="language eq 'en'",
facets=["keyphrases,count:10"],
)
print("\nTop keyphrases (English only):")
for facet in results.get_facets().get("keyphrases", []):
print(f" {facet['value']}: {facet['count']}")

Expected Output

{
"@odata.count": 42,
"@search.facets": {
"language": [
{"value": "en", "count": 35},
{"value": "fr", "count": 4},
{"value": "de", "count": 3}
],
"keyphrases": [
{"value": "machine learning", "count": 12},
{"value": "Azure AI", "count": 10},
{"value": "cognitive services", "count": 8}
]
},
"value": [...]
}

Break & fix

#ScenarioSymptomRoot CauseFix
1Filter on non-filterable fieldHTTP 400: "Field 'content' is not filterable"The content field was defined with filterable: falseUpdate index schema to add filterable: true or filter on a field that is filterable
2Facet on non-facetable fieldHTTP 400: "Field is not facetable"Field lacks facetable attribute in index definitionUpdate index to make the field facetable (requires re-index if changing type)
3Full Lucene syntax not workingWildcards/fuzzy treated as literal textMissing queryType=full — defaults to simpleSet query_type=QueryType.FULL (Python) or QueryType = SearchQueryType.Full (C#)
4$orderby fails"Cannot sort on field 'keyphrases'"Collection fields (Collection(Edm.String)) cannot be sortedSort on a scalar field only; use scoring profiles for relevance tuning
5Pagination returns duplicatesSame documents appear on different pagesIndex was modified between page requests; use continuation tokens for consistencyUse search_after for deep pagination or accept eventual consistency

Knowledge Check

1. You want users to search for 'programing' and still find documents containing 'programming'. Which query syntax supports this?

2. You need to filter documents where ANY keyphrase equals 'machine learning'. Which OData filter is correct?

3. What is the maximum value allowed for $skip in Azure AI Search?

4. A field is defined as 'searchable: true, filterable: false, facetable: true'. Which operation will FAIL?

5. You configure facets=['language,count:5']. What does the 'count:5' parameter control?

Cleanup

No additional resources created in this challenge (uses existing index from Challenge 40).

# If you want to clean up everything:
az group delete --name rg-ai102-search --yes --no-wait

Learn More