Challenge 18: Azure AI Language and Speech Services
25-30 min | Cost: Free | Domain: Natural Language Processing (15-20%)
Exam skills covered
- Identify Azure AI Language service capabilities
- Identify Azure AI Speech service capabilities
- Describe features and uses for conversational language understanding (CLU)
- Describe features and uses for question answering
Overview
Azure provides two primary services for natural language processing: Azure AI Language for text-based NLP and Azure AI Speech for audio-based processing. Understanding which service handles which capability — and when to combine them — is essential for the AI-900 exam.
Azure AI Language is the text analytics powerhouse. Beyond the pre-built capabilities (sentiment, entities, key phrases, language detection), it offers custom capabilities like Conversational Language Understanding (CLU) for building intent-recognition models, custom question answering for FAQ-style bots, text summarization, PII detection, and custom text classification. Think of it as "everything you can do with written text."
Azure AI Speech handles the spoken word. Beyond basic speech-to-text and text-to-speech, it provides speech translation, speaker recognition (identifying who is speaking), keyword recognition (wake words like "Hey Cortana"), and pronunciation assessment. Think of it as "everything you can do with audio/voice."
Explore
Task 1: Map Azure AI Language capabilities
Azure AI Language provides both pre-built (ready-to-use) and custom (trainable) capabilities:
Pre-built capabilities (no training required):
| Capability | What it does |
|---|---|
| Sentiment analysis | Determines positive/negative/neutral/mixed sentiment |
| Named entity recognition | Identifies people, places, organizations, dates |
| Key phrase extraction | Extracts main talking points from text |
| Language detection | Identifies which language text is written in |
| PII detection | Finds personally identifiable information (SSNs, emails, phone numbers) |
| Text summarization | Generates concise summaries of documents |
| Entity linking | Connects entities to Wikipedia knowledge base entries |
Custom capabilities (require training data):
| Capability | What it does |
|---|---|
| Conversational Language Understanding (CLU) | Recognizes user intents and extracts entities from natural language |
| Custom question answering | Builds FAQ-style knowledge bases for Q&A bots |
| Custom text classification | Classifies text into your own categories |
| Custom named entity recognition | Extracts domain-specific entities you define |
Task 2: Understand Conversational Language Understanding (CLU)
CLU (formerly LUIS) helps you build applications that understand natural language commands:
Key concepts:
- Utterance — What the user says: "Book a flight to Paris next Friday"
- Intent — What the user wants to do: "BookFlight"
- Entity — Important details: "Paris" (destination), "next Friday" (date)
Example project:
| Utterance | Intent | Entities |
|---|---|---|
| "Turn on the living room lights" | TurnOn | Device: lights, Room: living room |
| "Set temperature to 72 degrees" | SetTemperature | Temperature: 72 |
| "What's the weather in Seattle?" | GetWeather | Location: Seattle |
| "Play some jazz music" | PlayMusic | Genre: jazz |
Training workflow:
- Define intents (what users want to do)
- Define entities (important information to extract)
- Add example utterances (labeled with intents and entities)
- Train and test the model
- Deploy and integrate with your application
Task 3: Explore custom question answering
Custom question answering (formerly QnA Maker) creates knowledge bases from existing content:
Sources it can import:
- FAQ web pages
- PDF documents
- Word documents
- Manual question-answer pairs
How it works:
- Import content (FAQ pages, documents)
- The service extracts question-answer pairs automatically
- Add custom Q&A pairs and alternative phrasings
- Test and refine responses
- Deploy as a REST endpoint for chatbots
Example knowledge base:
| Question | Answer |
|---|---|
| What are your business hours? | We're open Monday-Friday, 9 AM to 5 PM EST. |
| How do I reset my password? | Go to the login page, click "Forgot Password," and follow the email instructions. |
| Do you offer free shipping? | Free shipping is available on orders over $50. |
Task 4: Navigate Language Studio vs Speech Studio
Compare the two studios side by side:
Azure AI Language Studio (language.cognitive.azure.com):
- Classify text (sentiment, custom classification)
- Extract information (entities, key phrases, PII, summarization)
- Understand questions and conversational language (CLU, Q&A)
Azure AI Speech Studio (speech.microsoft.com):
- Speech-to-text (real-time and batch)
- Text-to-speech (voice gallery, custom voices)
- Speech translation
- Speaker recognition
- Pronunciation assessment
- Custom keyword recognition
Decision guide — Which service do I need?
| I want to... | Use |
|---|---|
| Analyze text for sentiment | Azure AI Language |
| Transcribe audio recordings | Azure AI Speech |
| Build a chatbot that answers FAQs | Azure AI Language (Question Answering) |
| Create a voice assistant | Azure AI Speech + Azure AI Language |
| Detect PII in documents | Azure AI Language |
| Add a wake word ("Hey Assistant") | Azure AI Speech (Keyword Recognition) |
| Understand user commands in a smart home app | Azure AI Language (CLU) |
| Identify who is speaking in a recording | Azure AI Speech (Speaker Recognition) |
# List capabilities of your Language resource
az cognitiveservices account show \
--name my-language-resource \
--resource-group myResourceGroup \
--query "{name:name, kind:kind, sku:sku.name, endpoint:properties.endpoint}"
Key Concepts
| Concept | Definition |
|---|---|
| Azure AI Language | Service for text-based NLP: sentiment, NER, CLU, Q&A, summarization, PII detection |
| Azure AI Speech | Service for audio-based processing: STT, TTS, speech translation, speaker recognition |
| Conversational Language Understanding (CLU) | Custom model that recognizes intents and entities in natural language input |
| Intent | What the user wants to accomplish (e.g., BookFlight, GetWeather) |
| Custom question answering | Knowledge base service for building FAQ-style Q&A experiences |
| Speaker recognition | Identifying or verifying a person's identity based on their voice |
Common Misconceptions
| Misconception | Reality |
|---|---|
| Azure AI Language and Azure AI Speech are the same service | They are separate services — Language handles text, Speech handles audio |
| CLU replaces all NLP capabilities | CLU is specifically for understanding intents and entities in conversational input; other capabilities (sentiment, NER) remain separate |
| Question answering requires programming a chatbot from scratch | You can import existing FAQ content and the service automatically creates Q&A pairs |
| Speaker recognition identifies what someone says | Speaker recognition identifies WHO is speaking, not what they say — that's speech-to-text |
| You need separate Azure resources for each NLP capability | A single Azure AI Language resource provides access to all Language capabilities (sentiment, NER, CLU, etc.) |
Knowledge Check
1. A company wants to build a smart home app that understands commands like "turn off the kitchen lights" and "set the thermostat to 70 degrees." Which capability should they use?
2. A company has a 50-page FAQ document and wants to create a chatbot that answers customer questions from it. Which Azure AI capability should they use?
3. Which capability is part of Azure AI Speech (NOT Azure AI Language)?
4. In Conversational Language Understanding, what is an "intent"?
5. A company wants to automatically detect and redact Social Security numbers and email addresses from customer documents. Which Azure AI Language capability should they use?