Skip to main content

Challenge 31: Text Analytics - Key Phrases, Entities, Sentiment

Estimated Time

45 min | Cost: $1-3 (estimated) | Domain: Implement NLP Solutions (15-20%)

Exam skills covered

  • Extract key phrases from text
  • Recognize named entities and linked entities
  • Determine sentiment with opinion mining
  • Detect language

Overview

Azure AI Language (Text Analytics) provides NLP capabilities:

FeatureDescription
Sentiment AnalysisPositive/neutral/negative with confidence + opinion mining
Key Phrase ExtractionIdentify main talking points
Named Entity Recognition (NER)Detect entities (Person, Location, Organization, DateTime, etc.)
Entity LinkingLink entities to Wikipedia knowledge base
Language DetectionIdentify language of text

The client supports batch operations — send multiple documents in one request for efficiency.

Prerequisites

  • Azure subscription
  • Azure AI Language resource (or multi-service)
  • Python 3.9+ or .NET 8
  • Package: azure-ai-textanalytics (v5.3+)

Implementation

Task 1: Create Language Resource

az group create --name rg-ai102-nlp --location eastus2

az cognitiveservices account create \
--name language-ai102 \
--resource-group rg-ai102-nlp \
--kind TextAnalytics \
--sku S \
--location eastus2

ENDPOINT=$(az cognitiveservices account show --name language-ai102 --resource-group rg-ai102-nlp --query properties.endpoint -o tsv)
KEY=$(az cognitiveservices account keys list --name language-ai102 --resource-group rg-ai102-nlp --query key1 -o tsv)

Task 2: Analyze Sentiment with Opinion Mining

import os
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

client = TextAnalyticsClient(
endpoint=os.environ["AZURE_AI_ENDPOINT"],
credential=AzureKeyCredential(os.environ["AZURE_AI_KEY"])
)

documents = [
"The hotel room was clean and spacious, but the service was slow and unfriendly.",
"I absolutely love this product! Fast delivery and excellent quality.",
"The meeting was scheduled for 3 PM."
]

# Sentiment analysis with opinion mining
results = client.analyze_sentiment(
documents,
show_opinion_mining=True,
language="en"
)

for idx, result in enumerate(results):
if result.is_error:
print(f"Doc {idx}: Error - {result.error.message}")
continue

print(f"Document {idx}: '{documents[idx][:50]}...'")
print(f" Overall: {result.sentiment} "
f"(pos={result.confidence_scores.positive:.3f}, "
f"neu={result.confidence_scores.neutral:.3f}, "
f"neg={result.confidence_scores.negative:.3f})")

for sentence in result.sentences:
print(f" Sentence: '{sentence.text[:40]}...' → {sentence.sentiment}")

# Opinion mining - aspect-based sentiment
for mined_opinion in sentence.mined_opinions:
target = mined_opinion.target
print(f" Target: '{target.text}' ({target.sentiment})")
for assessment in mined_opinion.assessments:
print(f" Assessment: '{assessment.text}' ({assessment.sentiment})")
print()

Task 3: Extract Key Phrases and Named Entities

documents = [
"Microsoft CEO Satya Nadella announced Azure AI updates at the Build 2024 conference in Seattle on May 21.",
"The quarterly revenue increased by 15% to $62 billion, driven by cloud services growth."
]

# Key phrase extraction
key_phrases_results = client.extract_key_phrases(documents, language="en")
print("=== KEY PHRASES ===")
for idx, result in enumerate(key_phrases_results):
if not result.is_error:
print(f"Doc {idx}: {result.key_phrases}")

# Named Entity Recognition
ner_results = client.recognize_entities(documents, language="en")
print("\n=== NAMED ENTITIES ===")
for idx, result in enumerate(ner_results):
if not result.is_error:
print(f"Doc {idx}:")
for entity in result.entities:
print(f" '{entity.text}' → {entity.category}"
f"{f'/{entity.subcategory}' if entity.subcategory else ''}"
f" (confidence: {entity.confidence_score:.3f})")

# Entity Linking (to Wikipedia)
linked_results = client.recognize_linked_entities(documents, language="en")
print("\n=== LINKED ENTITIES ===")
for idx, result in enumerate(linked_results):
if not result.is_error:
for entity in result.entities:
print(f" '{entity.name}' → {entity.url}")
print(f" Data source: {entity.data_source}, ID: {entity.data_source_entity_id}")

Task 4: Language Detection

# Language detection
multilingual_docs = [
"Hello, how are you today?",
"Bonjour, comment allez-vous?",
"こんにちは、元気ですか?",
"Hola, ¿cómo estás?"
]

lang_results = client.detect_language(multilingual_docs)
print("=== LANGUAGE DETECTION ===")
for idx, result in enumerate(lang_results):
if not result.is_error:
lang = result.primary_language
print(f" '{multilingual_docs[idx][:30]}...' → {lang.name} ({lang.iso6391_name}) "
f"confidence: {lang.confidence_score:.3f}")

Expected Output

Document 0: 'The hotel room was clean and spacious, but the s...'
Overall: mixed (pos=0.450, neu=0.100, neg=0.450)
Sentence: 'The hotel room was clean and sp...' → mixed
Target: 'room' (positive)
Assessment: 'clean' (positive)
Assessment: 'spacious' (positive)
Target: 'service' (negative)
Assessment: 'slow' (negative)
Assessment: 'unfriendly' (negative)

=== KEY PHRASES ===
Doc 0: ['Microsoft CEO Satya Nadella', 'Azure AI updates', 'Build 2024 conference', 'Seattle']

=== NAMED ENTITIES ===
Doc 0:
'Microsoft' → Organization (confidence: 0.990)
'Satya Nadella' → Person (confidence: 0.985)
'Azure AI' → Product (confidence: 0.920)
'Build 2024' → Event (confidence: 0.880)
'Seattle' → Location (confidence: 0.995)
'May 21' → DateTime/Date (confidence: 0.970)

=== LANGUAGE DETECTION ===
'Hello, how are you today?...' → English (en) confidence: 1.000
'Bonjour, comment allez-vous?...' → French (fr) confidence: 1.000
'こんにちは、元気ですか?...' → Japanese (ja) confidence: 1.000
'Hola, ¿cómo estás?...' → Spanish (es) confidence: 1.000

Break & fix

ScenarioSymptomRoot CauseFix
Mixed results on clear textUnexpected mixed sentimentOpinion mining detects opposing opinionsUse sentence-level sentiment for granularity
Empty key phrasesNo phrases returnedText too short or genericProvide substantive text (10+ words recommended)
Entity category UnknownUnrecognized entitiesDomain-specific terms not in modelUse custom NER model for specialized entities
Batch error on one docInvalidDocument in resultsDocument exceeds 5,120 charactersSplit long documents; check is_error per document
Wrong language detectionIncorrect languageMixed-language text confuses detectionSeparate text by language; use longer samples

Knowledge Check

1. What does opinion mining add to standard sentiment analysis?

2. What is the maximum document size for a single text analytics request?

3. What is the difference between Named Entity Recognition (NER) and Entity Linking?

4. How should you handle errors in batch text analytics results?

5. What confidence score format does language detection return?

Cleanup

az group delete --name rg-ai102-nlp --yes --no-wait

Learn More