Skip to main content

Challenge 43: Knowledge Store Projections

Estimated Time

45-60 min | Cost: ~$0.30 (Search + Storage transactions) | Domain: Knowledge Mining & Extraction (15-20%)

Exam skills covered

SkillWeight
Define a knowledge store in a skillsetHigh
Create table projections for structured dataHigh
Create object projections for JSON documentsMedium
Create file projections for normalized imagesMedium
Query knowledge store data from Azure StorageMedium

Overview

A Knowledge Store is a persistent storage destination for enriched content created by an AI Search skillset. While the search index serves queries, the knowledge store preserves enrichment output in Azure Storage for downstream analytics (Power BI, data science, custom apps).

Three projection types:

TypeStorage destinationFormatUse case
TableAzure Table StorageRows/columnsPower BI, tabular analytics
ObjectBlob StorageJSON documentsCustom apps, further processing
FileBlob StorageBinary (images)Normalized images from OCR pipeline

Shaper skill

The Shaper skill creates custom JSON shapes from enrichment outputs. It's commonly used to prepare data for projections with clean, well-structured output.

Prerequisites

  • Completed Challenge 40 (AI Search with skillset)
  • Azure Storage Account (from Challenge 40 or create new)
  • Python 3.9+ with azure-search-documents>=11.4.0
  • .NET 8 with Azure.Search.Documents

Implementation

Task 1: Add a Shaper skill to prepare projection data

The Shaper skill consolidates enrichments into a structured shape suitable for projections.

from azure.search.documents.indexes.models import (
ShaperSkill,
InputFieldMappingEntry,
OutputFieldMappingEntry,
)

shaper_skill = ShaperSkill(
name="shaper-skill",
description="Shape enriched data for knowledge store projections",
context="/document",
inputs=[
InputFieldMappingEntry(name="fileName", source="/document/metadata_storage_name"),
InputFieldMappingEntry(name="content", source="/document/content"),
InputFieldMappingEntry(name="keyphrases", source="/document/keyphrases"),
InputFieldMappingEntry(name="organizations", source="/document/organizations"),
InputFieldMappingEntry(name="language", source="/document/language"),
],
outputs=[
OutputFieldMappingEntry(name="output", target_name="documentShape")
]
)

Task 2: Define the knowledge store with table projections

from azure.search.documents.indexes.models import (
SearchIndexerSkillset,
SearchIndexerKnowledgeStore,
SearchIndexerKnowledgeStoreProjection,
SearchIndexerKnowledgeStoreTableProjectionSelector,
SearchIndexerKnowledgeStoreObjectProjectionSelector,
SearchIndexerKnowledgeStoreFileProjectionSelector,
CognitiveServicesAccountKey,
)

# Define knowledge store with table + object projections
knowledge_store = SearchIndexerKnowledgeStore(
storage_connection_string=STORAGE_CONN,
projections=[
SearchIndexerKnowledgeStoreProjection(
tables=[
SearchIndexerKnowledgeStoreTableProjectionSelector(
table_name="documentsTable",
generated_key_name="documentId",
source="/document/documentShape"
),
SearchIndexerKnowledgeStoreTableProjectionSelector(
table_name="keyphrasesTable",
generated_key_name="keyphraseId",
source="/document/documentShape/keyphrases/*"
),
],
objects=[
SearchIndexerKnowledgeStoreObjectProjectionSelector(
storage_container="knowledge-objects",
generated_key_name="objectId",
source="/document/documentShape"
)
],
files=[]
)
]
)

# Update the skillset with knowledge store
skillset = SearchIndexerSkillset(
name="document-skillset",
description="Enrichment with knowledge store projections",
skills=[key_phrase_skill, entity_skill, language_skill, shaper_skill],
knowledge_store=knowledge_store,
cognitive_services_account=CognitiveServicesAccountKey(key=AI_KEY)
)

indexer_client.create_or_update_skillset(skillset)
print("Skillset updated with knowledge store projections")

Task 3: Add file projections for images

# File projections store binary content (normalized images from OCR)
# First, add an image extraction configuration to the indexer

from azure.search.documents.indexes.models import (
SearchIndexer,
IndexingParameters,
IndexingParametersConfiguration,
)

# Update indexer to extract images
indexer = SearchIndexer(
name="document-indexer",
data_source_name="blob-datasource",
target_index_name="documents-index",
skillset_name="document-skillset",
parameters=IndexingParameters(
configuration=IndexingParametersConfiguration(
image_action="generateNormalizedImages"
)
)
)
indexer_client.create_or_update_indexer(indexer)

# For file projections, update the knowledge store:
file_projection = SearchIndexerKnowledgeStoreProjection(
tables=[],
objects=[],
files=[
SearchIndexerKnowledgeStoreFileProjectionSelector(
storage_container="knowledge-images",
generated_key_name="imageId",
source="/document/normalized_images/*"
)
]
)

Task 4: Query the Knowledge Store

After running the indexer, verify projected data in Azure Storage:

# List tables in storage account
az storage table list --account-name $STORAGE_ACCOUNT --query "[].name" -o tsv

# Query the documents table
az storage entity query \
--table-name "documentsTable" \
--account-name $STORAGE_ACCOUNT \
--query-filter "" \
--select "fileName,language" \
--top 10

# List blobs in knowledge-objects container
az storage blob list \
--container-name "knowledge-objects" \
--account-name $STORAGE_ACCOUNT \
--query "[].name" -o tsv

# Download and inspect a projected JSON object
az storage blob download \
--container-name "knowledge-objects" \
--account-name $STORAGE_ACCOUNT \
--name "<blob-name>" \
--file projected-doc.json

cat projected-doc.json | python -m json.tool

Expected Output

Table projection (documentsTable):

documentIdfileNamelanguagecontent (truncated)
abc123sample-doc.txtenAzure AI services provide...
def456report.pdfenQuarterly report showing...

Object projection (JSON blob):

{
"fileName": "sample-doc.txt",
"content": "Azure AI services provide cloud-based AI capabilities...",
"keyphrases": ["cloud-based AI capabilities", "cognitive services"],
"organizations": ["Microsoft"],
"language": "en"
}

Break & fix

#ScenarioSymptomRoot CauseFix
1Knowledge store tables not createdNo tables appear in storage after indexer runsStorage connection string in skillset is incorrect or missingVerify storageConnectionString in the knowledge store definition
2Projection source path invalidIndexer warning: "Could not project to knowledge store"Source path doesn't match any enrichment node (e.g., missing Shaper skill)Add a Shaper skill or correct the source path to match an existing enrichment output
3Related tables not linkedTable projections exist but can't join documentsTable to keyphrasesTable in Power BIProjections in different projections groups don't share keysPut related tables in the SAME projection group — they share a generatedKeyName for relationships
4Images not projectedFile projections container is emptyIndexer not configured with imageAction: generateNormalizedImagesSet indexer parameter configuration.imageAction to generateNormalizedImages
5Storage container doesn't existError: "Container not found"Knowledge store doesn't auto-create containers for object/file projectionsThe container IS auto-created. Check storage connection string and permissions

Knowledge Check

1. You want to analyze enriched documents in Power BI with rows and columns. Which knowledge store projection type should you use?

2. Two table projections (documentsTable and keyphrasesTable) need a foreign key relationship. How do you ensure they can be joined?

3. What is the purpose of the Shaper skill in a knowledge store pipeline?

4. What indexer parameter must be set for file projections to store images extracted from documents?

5. Where does an object projection physically store its data?

Cleanup

az group delete --name rg-ai102-search --yes --no-wait

Learn More