Challenge 31: Text Analytics - Key Phrases, Entities, Sentiment
Estimated Time
45 min | Cost: $1-3 (estimated) | Domain: Implement NLP Solutions (15-20%)
Exam skills covered
- Extract key phrases from text
- Recognize named entities and linked entities
- Determine sentiment with opinion mining
- Detect language
Overview
Azure AI Language (Text Analytics) provides NLP capabilities:
| Feature | Description |
|---|---|
| Sentiment Analysis | Positive/neutral/negative with confidence + opinion mining |
| Key Phrase Extraction | Identify main talking points |
| Named Entity Recognition (NER) | Detect entities (Person, Location, Organization, DateTime, etc.) |
| Entity Linking | Link entities to Wikipedia knowledge base |
| Language Detection | Identify language of text |
The client supports batch operations — send multiple documents in one request for efficiency.
Prerequisites
- Azure subscription
- Azure AI Language resource (or multi-service)
- Python 3.9+ or .NET 8
- Package:
azure-ai-textanalytics(v5.3+)
Implementation
Task 1: Create Language Resource
az group create --name rg-ai102-nlp --location eastus2
az cognitiveservices account create \
--name language-ai102 \
--resource-group rg-ai102-nlp \
--kind TextAnalytics \
--sku S \
--location eastus2
ENDPOINT=$(az cognitiveservices account show --name language-ai102 --resource-group rg-ai102-nlp --query properties.endpoint -o tsv)
KEY=$(az cognitiveservices account keys list --name language-ai102 --resource-group rg-ai102-nlp --query key1 -o tsv)
Task 2: Analyze Sentiment with Opinion Mining
- Python SDK
- C# SDK
- REST API
import os
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential
client = TextAnalyticsClient(
endpoint=os.environ["AZURE_AI_ENDPOINT"],
credential=AzureKeyCredential(os.environ["AZURE_AI_KEY"])
)
documents = [
"The hotel room was clean and spacious, but the service was slow and unfriendly.",
"I absolutely love this product! Fast delivery and excellent quality.",
"The meeting was scheduled for 3 PM."
]
# Sentiment analysis with opinion mining
results = client.analyze_sentiment(
documents,
show_opinion_mining=True,
language="en"
)
for idx, result in enumerate(results):
if result.is_error:
print(f"Doc {idx}: Error - {result.error.message}")
continue
print(f"Document {idx}: '{documents[idx][:50]}...'")
print(f" Overall: {result.sentiment} "
f"(pos={result.confidence_scores.positive:.3f}, "
f"neu={result.confidence_scores.neutral:.3f}, "
f"neg={result.confidence_scores.negative:.3f})")
for sentence in result.sentences:
print(f" Sentence: '{sentence.text[:40]}...' → {sentence.sentiment}")
# Opinion mining - aspect-based sentiment
for mined_opinion in sentence.mined_opinions:
target = mined_opinion.target
print(f" Target: '{target.text}' ({target.sentiment})")
for assessment in mined_opinion.assessments:
print(f" Assessment: '{assessment.text}' ({assessment.sentiment})")
print()
using Azure;
using Azure.AI.TextAnalytics;
var client = new TextAnalyticsClient(
new Uri(Environment.GetEnvironmentVariable("AZURE_AI_ENDPOINT")),
new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_AI_KEY")));
var documents = new List<string>
{
"The hotel room was clean and spacious, but the service was slow.",
"I absolutely love this product! Fast delivery and excellent quality."
};
var options = new AnalyzeSentimentOptions { IncludeOpinionMining = true };
var results = client.AnalyzeSentimentBatch(documents, "en", options);
foreach (var result in results)
{
Console.WriteLine($"Sentiment: {result.DocumentSentiment.Sentiment}");
foreach (var sentence in result.DocumentSentiment.Sentences)
{
Console.WriteLine($" '{sentence.Text}' -> {sentence.Sentiment}");
foreach (var opinion in sentence.Opinions)
{
Console.WriteLine($" Target: '{opinion.Target.Text}' ({opinion.Target.Sentiment})");
foreach (var assessment in opinion.Assessments)
Console.WriteLine($" '{assessment.Text}' ({assessment.Sentiment})");
}
}
}
ENDPOINT="https://<resource>.cognitiveservices.azure.com"
KEY="<your-key>"
curl -s "${ENDPOINT}/language/:analyze-text?api-version=2023-04-01" \
-H "Ocp-Apim-Subscription-Key: ${KEY}" \
-H "Content-Type: application/json" \
-d '{
"kind": "SentimentAnalysis",
"parameters": {"opinionMining": true},
"analysisInput": {
"documents": [
{"id": "1", "language": "en", "text": "The hotel was great but the food was terrible."}
]
}
}' | jq '.results.documents[0]'
Task 3: Extract Key Phrases and Named Entities
- Python SDK
- REST API
documents = [
"Microsoft CEO Satya Nadella announced Azure AI updates at the Build 2024 conference in Seattle on May 21.",
"The quarterly revenue increased by 15% to $62 billion, driven by cloud services growth."
]
# Key phrase extraction
key_phrases_results = client.extract_key_phrases(documents, language="en")
print("=== KEY PHRASES ===")
for idx, result in enumerate(key_phrases_results):
if not result.is_error:
print(f"Doc {idx}: {result.key_phrases}")
# Named Entity Recognition
ner_results = client.recognize_entities(documents, language="en")
print("\n=== NAMED ENTITIES ===")
for idx, result in enumerate(ner_results):
if not result.is_error:
print(f"Doc {idx}:")
for entity in result.entities:
print(f" '{entity.text}' → {entity.category}"
f"{f'/{entity.subcategory}' if entity.subcategory else ''}"
f" (confidence: {entity.confidence_score:.3f})")
# Entity Linking (to Wikipedia)
linked_results = client.recognize_linked_entities(documents, language="en")
print("\n=== LINKED ENTITIES ===")
for idx, result in enumerate(linked_results):
if not result.is_error:
for entity in result.entities:
print(f" '{entity.name}' → {entity.url}")
print(f" Data source: {entity.data_source}, ID: {entity.data_source_entity_id}")
# Key phrases
curl -s "${ENDPOINT}/language/:analyze-text?api-version=2023-04-01" \
-H "Ocp-Apim-Subscription-Key: ${KEY}" \
-H "Content-Type: application/json" \
-d '{
"kind": "KeyPhraseExtraction",
"analysisInput": {
"documents": [{"id": "1", "language": "en", "text": "Microsoft announced Azure AI updates at Build 2024 in Seattle."}]
}
}' | jq '.results.documents[0].keyPhrases'
# Named entities
curl -s "${ENDPOINT}/language/:analyze-text?api-version=2023-04-01" \
-H "Ocp-Apim-Subscription-Key: ${KEY}" \
-H "Content-Type: application/json" \
-d '{
"kind": "EntityRecognition",
"analysisInput": {
"documents": [{"id": "1", "language": "en", "text": "Microsoft CEO Satya Nadella announced Azure AI updates at Build 2024 in Seattle on May 21."}]
}
}' | jq '.results.documents[0].entities[] | {text, category, confidenceScore}'
Task 4: Language Detection
- Python SDK
# Language detection
multilingual_docs = [
"Hello, how are you today?",
"Bonjour, comment allez-vous?",
"こんにちは、元気ですか?",
"Hola, ¿cómo estás?"
]
lang_results = client.detect_language(multilingual_docs)
print("=== LANGUAGE DETECTION ===")
for idx, result in enumerate(lang_results):
if not result.is_error:
lang = result.primary_language
print(f" '{multilingual_docs[idx][:30]}...' → {lang.name} ({lang.iso6391_name}) "
f"confidence: {lang.confidence_score:.3f}")
Expected Output
Document 0: 'The hotel room was clean and spacious, but the s...'
Overall: mixed (pos=0.450, neu=0.100, neg=0.450)
Sentence: 'The hotel room was clean and sp...' → mixed
Target: 'room' (positive)
Assessment: 'clean' (positive)
Assessment: 'spacious' (positive)
Target: 'service' (negative)
Assessment: 'slow' (negative)
Assessment: 'unfriendly' (negative)
=== KEY PHRASES ===
Doc 0: ['Microsoft CEO Satya Nadella', 'Azure AI updates', 'Build 2024 conference', 'Seattle']
=== NAMED ENTITIES ===
Doc 0:
'Microsoft' → Organization (confidence: 0.990)
'Satya Nadella' → Person (confidence: 0.985)
'Azure AI' → Product (confidence: 0.920)
'Build 2024' → Event (confidence: 0.880)
'Seattle' → Location (confidence: 0.995)
'May 21' → DateTime/Date (confidence: 0.970)
=== LANGUAGE DETECTION ===
'Hello, how are you today?...' → English (en) confidence: 1.000
'Bonjour, comment allez-vous?...' → French (fr) confidence: 1.000
'こんにちは、元気ですか?...' → Japanese (ja) confidence: 1.000
'Hola, ¿cómo estás?...' → Spanish (es) confidence: 1.000
Break & fix
| Scenario | Symptom | Root Cause | Fix |
|---|---|---|---|
| Mixed results on clear text | Unexpected mixed sentiment | Opinion mining detects opposing opinions | Use sentence-level sentiment for granularity |
| Empty key phrases | No phrases returned | Text too short or generic | Provide substantive text (10+ words recommended) |
Entity category Unknown | Unrecognized entities | Domain-specific terms not in model | Use custom NER model for specialized entities |
| Batch error on one doc | InvalidDocument in results | Document exceeds 5,120 characters | Split long documents; check is_error per document |
| Wrong language detection | Incorrect language | Mixed-language text confuses detection | Separate text by language; use longer samples |
Knowledge Check
1. What does opinion mining add to standard sentiment analysis?
2. What is the maximum document size for a single text analytics request?
3. What is the difference between Named Entity Recognition (NER) and Entity Linking?
4. How should you handle errors in batch text analytics results?
5. What confidence score format does language detection return?
Cleanup
az group delete --name rg-ai102-nlp --yes --no-wait