Skip to main content

Challenge 18: Azure AI Language and Speech Services

Estimated Time

25-30 min | Cost: Free | Domain: Natural Language Processing (15-20%)

Exam skills covered

  • Identify Azure AI Language service capabilities
  • Identify Azure AI Speech service capabilities
  • Describe features and uses for conversational language understanding (CLU)
  • Describe features and uses for question answering

Overview

Azure provides two primary services for natural language processing: Azure AI Language for text-based NLP and Azure AI Speech for audio-based processing. Understanding which service handles which capability — and when to combine them — is essential for the AI-900 exam.

Azure AI Language is the text analytics powerhouse. Beyond the pre-built capabilities (sentiment, entities, key phrases, language detection), it offers custom capabilities like Conversational Language Understanding (CLU) for building intent-recognition models, custom question answering for FAQ-style bots, text summarization, PII detection, and custom text classification. Think of it as "everything you can do with written text."

Azure AI Speech handles the spoken word. Beyond basic speech-to-text and text-to-speech, it provides speech translation, speaker recognition (identifying who is speaking), keyword recognition (wake words like "Hey Cortana"), and pronunciation assessment. Think of it as "everything you can do with audio/voice."

Explore

Task 1: Map Azure AI Language capabilities

Azure AI Language provides both pre-built (ready-to-use) and custom (trainable) capabilities:

Pre-built capabilities (no training required):

CapabilityWhat it does
Sentiment analysisDetermines positive/negative/neutral/mixed sentiment
Named entity recognitionIdentifies people, places, organizations, dates
Key phrase extractionExtracts main talking points from text
Language detectionIdentifies which language text is written in
PII detectionFinds personally identifiable information (SSNs, emails, phone numbers)
Text summarizationGenerates concise summaries of documents
Entity linkingConnects entities to Wikipedia knowledge base entries

Custom capabilities (require training data):

CapabilityWhat it does
Conversational Language Understanding (CLU)Recognizes user intents and extracts entities from natural language
Custom question answeringBuilds FAQ-style knowledge bases for Q&A bots
Custom text classificationClassifies text into your own categories
Custom named entity recognitionExtracts domain-specific entities you define

Task 2: Understand Conversational Language Understanding (CLU)

CLU (formerly LUIS) helps you build applications that understand natural language commands:

Key concepts:

  • Utterance — What the user says: "Book a flight to Paris next Friday"
  • Intent — What the user wants to do: "BookFlight"
  • Entity — Important details: "Paris" (destination), "next Friday" (date)

Example project:

UtteranceIntentEntities
"Turn on the living room lights"TurnOnDevice: lights, Room: living room
"Set temperature to 72 degrees"SetTemperatureTemperature: 72
"What's the weather in Seattle?"GetWeatherLocation: Seattle
"Play some jazz music"PlayMusicGenre: jazz

Training workflow:

  1. Define intents (what users want to do)
  2. Define entities (important information to extract)
  3. Add example utterances (labeled with intents and entities)
  4. Train and test the model
  5. Deploy and integrate with your application

Task 3: Explore custom question answering

Custom question answering (formerly QnA Maker) creates knowledge bases from existing content:

Sources it can import:

  • FAQ web pages
  • PDF documents
  • Word documents
  • Manual question-answer pairs

How it works:

  1. Import content (FAQ pages, documents)
  2. The service extracts question-answer pairs automatically
  3. Add custom Q&A pairs and alternative phrasings
  4. Test and refine responses
  5. Deploy as a REST endpoint for chatbots

Example knowledge base:

QuestionAnswer
What are your business hours?We're open Monday-Friday, 9 AM to 5 PM EST.
How do I reset my password?Go to the login page, click "Forgot Password," and follow the email instructions.
Do you offer free shipping?Free shipping is available on orders over $50.

Task 4: Navigate Language Studio vs Speech Studio

Compare the two studios side by side:

Azure AI Language Studio (language.cognitive.azure.com):

  • Classify text (sentiment, custom classification)
  • Extract information (entities, key phrases, PII, summarization)
  • Understand questions and conversational language (CLU, Q&A)

Azure AI Speech Studio (speech.microsoft.com):

  • Speech-to-text (real-time and batch)
  • Text-to-speech (voice gallery, custom voices)
  • Speech translation
  • Speaker recognition
  • Pronunciation assessment
  • Custom keyword recognition

Decision guide — Which service do I need?

I want to...Use
Analyze text for sentimentAzure AI Language
Transcribe audio recordingsAzure AI Speech
Build a chatbot that answers FAQsAzure AI Language (Question Answering)
Create a voice assistantAzure AI Speech + Azure AI Language
Detect PII in documentsAzure AI Language
Add a wake word ("Hey Assistant")Azure AI Speech (Keyword Recognition)
Understand user commands in a smart home appAzure AI Language (CLU)
Identify who is speaking in a recordingAzure AI Speech (Speaker Recognition)
Azure CLI Alternative
# List capabilities of your Language resource
az cognitiveservices account show \
--name my-language-resource \
--resource-group myResourceGroup \
--query "{name:name, kind:kind, sku:sku.name, endpoint:properties.endpoint}"

Key Concepts

ConceptDefinition
Azure AI LanguageService for text-based NLP: sentiment, NER, CLU, Q&A, summarization, PII detection
Azure AI SpeechService for audio-based processing: STT, TTS, speech translation, speaker recognition
Conversational Language Understanding (CLU)Custom model that recognizes intents and entities in natural language input
IntentWhat the user wants to accomplish (e.g., BookFlight, GetWeather)
Custom question answeringKnowledge base service for building FAQ-style Q&A experiences
Speaker recognitionIdentifying or verifying a person's identity based on their voice

Common Misconceptions

MisconceptionReality
Azure AI Language and Azure AI Speech are the same serviceThey are separate services — Language handles text, Speech handles audio
CLU replaces all NLP capabilitiesCLU is specifically for understanding intents and entities in conversational input; other capabilities (sentiment, NER) remain separate
Question answering requires programming a chatbot from scratchYou can import existing FAQ content and the service automatically creates Q&A pairs
Speaker recognition identifies what someone saysSpeaker recognition identifies WHO is speaking, not what they say — that's speech-to-text
You need separate Azure resources for each NLP capabilityA single Azure AI Language resource provides access to all Language capabilities (sentiment, NER, CLU, etc.)

Knowledge Check

1. A company wants to build a smart home app that understands commands like "turn off the kitchen lights" and "set the thermostat to 70 degrees." Which capability should they use?

2. A company has a 50-page FAQ document and wants to create a chatbot that answers customer questions from it. Which Azure AI capability should they use?

3. Which capability is part of Azure AI Speech (NOT Azure AI Language)?

4. In Conversational Language Understanding, what is an "intent"?

5. A company wants to automatically detect and redact Social Security numbers and email addresses from customer documents. Which Azure AI Language capability should they use?

Learn More