Skip to main content

Knowledge Mining & Extraction

This domain covers building intelligent search solutions and extracting structured data from documents using Azure AI Search and Azure Document Intelligence. It represents 15–20% of the AI-102 exam.

You'll create search indexes, build enrichment pipelines with AI skillsets, implement vector and hybrid search, and extract structured data from invoices, receipts, and custom documents. These skills are critical for RAG implementations — Domain 2's retrieval layer depends on the indexing and search skills you build here.

The exam tests your understanding of the full AI Search pipeline: data sources → indexers → skillsets → index → queries. Know how to configure each stage, troubleshoot failures, and optimize for relevance. Document Intelligence questions focus on selecting the right prebuilt model and understanding custom model training.

What You'll Learn

  • Design and create Azure AI Search index schemas
  • Configure indexers and data sources for automated indexing
  • Build AI enrichment pipelines with built-in and custom skillsets
  • Implement vector search with embedding fields
  • Write effective search queries (simple, full Lucene, vector, hybrid)
  • Extract data from documents with prebuilt and custom models
  • Implement knowledge stores for downstream analytics

Skills Measured

  • Create and manage Azure AI Search indexes
  • Implement an indexing pipeline with data sources and indexers
  • Implement AI enrichment with skillsets (built-in and custom)
  • Implement vector search and hybrid search
  • Query an Azure AI Search index with multiple query types
  • Analyze documents with Azure Document Intelligence

Challenges

#TitleKey Topics
40Create an AI Search IndexIndex schema, fields, data types, analyzers
41Scoring Profiles & RelevanceScoring profiles, boosting, freshness functions
42Indexers & Data SourcesBlob storage, SQL, change detection, schedule
43Incremental EnrichmentEnrichment cache, partial updates, debug sessions
44Built-in AI SkillsEntity recognition, key phrases, OCR, image analysis
45Custom Skills & Knowledge StoreAzure Functions, projections, power skills
46Vector Search & Hybrid QueriesVector fields, HNSW config, hybrid ranking
47Advanced Queries & FiltersLucene syntax, facets, filters, autocomplete
48Document Intelligence ModelsPrebuilt invoice/receipt, custom models, composed models

Prerequisites

  • Completed Domain 1 (Plan & Manage) or equivalent knowledge
  • Completed Domain 2 (Generative AI) — vector concepts from Challenges 14–15
  • Azure AI Search resource provisioned
  • Understanding of JSON schema design
  • Basic knowledge of search relevance concepts