Skip to main content

Challenge 33: Text and Document Translation

Estimated Time

50 min | Cost: $2-5 (estimated) | Domain: Implement NLP Solutions (15-20%)

Exam skills covered

  • Translate text using Azure Translator service
  • Translate documents preserving formatting
  • Implement custom translation models for domain-specific terms

Overview

Azure Translator provides:

FeatureDescription
Text TranslationReal-time translation of text (up to 50,000 chars)
Document TranslationTranslate entire documents preserving layout
Custom TranslatorTrain models for domain terminology
TransliterationConvert scripts (e.g., Japanese kanji → romaji)
Language DetectionAuto-detect source language
DictionaryLookup alternative translations

The Text Translator uses a global endpoint: https://api.cognitive.microsofttranslator.com

Prerequisites

  • Azure subscription
  • Azure Translator resource
  • Python 3.9+ with requests library
  • For Document Translation: Azure Blob Storage container

Implementation

Task 1: Create Translator Resource

az group create --name rg-ai102-translator --location eastus2

az cognitiveservices account create \
--name translator-ai102 \
--resource-group rg-ai102-translator \
--kind TextTranslation \
--sku S1 \
--location eastus2

TRANSLATOR_KEY=$(az cognitiveservices account keys list --name translator-ai102 --resource-group rg-ai102-translator --query key1 -o tsv)
TRANSLATOR_REGION="eastus2"

Task 2: Translate Text

import os
import requests
import uuid

key = os.environ["AZURE_TRANSLATOR_KEY"]
region = os.environ["AZURE_TRANSLATOR_REGION"]
endpoint = "https://api.cognitive.microsofttranslator.com"

def translate_text(texts, target_languages, source_language=None):
"""Translate text to one or more target languages"""
path = "/translate"
params = {
"api-version": "3.0",
"to": target_languages
}
if source_language:
params["from"] = source_language

headers = {
"Ocp-Apim-Subscription-Key": key,
"Ocp-Apim-Subscription-Region": region,
"Content-type": "application/json",
"X-ClientTraceId": str(uuid.uuid4())
}

body = [{"text": t} for t in texts]

response = requests.post(
endpoint + path,
params=params,
headers=headers,
json=body
)
response.raise_for_status()
return response.json()

# Translate to multiple languages simultaneously
texts = [
"Azure AI services make it easy to build intelligent applications.",
"The weather in Seattle is rainy today."
]

results = translate_text(texts, target_languages=["es", "fr", "ja"])

for i, result in enumerate(results):
detected = result.get("detectedLanguage", {})
print(f"\nSource: '{texts[i]}'")
if detected:
print(f" Detected language: {detected['language']} ({detected['score']:.2f})")
for translation in result["translations"]:
print(f" → [{translation['to']}] {translation['text']}")

Task 3: Document Translation (Batch)

import time

# Document Translation requires Azure Blob Storage
# Source container: contains documents to translate
# Target container: receives translated documents

translator_endpoint = os.environ.get("AZURE_TRANSLATOR_DOCUMENT_ENDPOINT",
"https://translator-ai102.cognitiveservices.azure.com")

def translate_documents(source_url, target_url, target_language):
"""Translate all documents in source container to target container"""
path = "/translator/document/batches"
url = translator_endpoint + path

headers = {
"Ocp-Apim-Subscription-Key": key,
"Content-Type": "application/json"
}

body = {
"inputs": [
{
"source": {
"sourceUrl": source_url,
"language": "en"
},
"targets": [
{
"targetUrl": target_url,
"language": target_language
}
]
}
]
}

response = requests.post(url, headers=headers, json=body, params={"api-version": "2024-05-01"})

if response.status_code == 202:
operation_url = response.headers["Operation-Location"]
print(f"Translation started: {operation_url}")
return operation_url
else:
print(f"Error: {response.status_code} - {response.text}")
return None

def poll_translation_status(operation_url):
"""Poll until translation completes"""
headers = {"Ocp-Apim-Subscription-Key": key}

while True:
response = requests.get(operation_url, headers=headers)
result = response.json()
status = result["status"]
print(f" Status: {status}")

if status in ["Succeeded", "Failed", "Cancelled"]:
return result
time.sleep(5)

# Example usage (requires storage SAS URLs)
source_sas = "https://storage.blob.core.windows.net/source-docs?sv=...&sig=..."
target_sas = "https://storage.blob.core.windows.net/translated-es?sv=...&sig=..."

# operation_url = translate_documents(source_sas, target_sas, "es")
# result = poll_translation_status(operation_url)
print("Document translation configured (requires Blob Storage SAS URLs)")

Task 4: Transliteration

def transliterate(texts, language, from_script, to_script):
"""Convert text from one script to another"""
path = "/transliterate"
params = {
"api-version": "3.0",
"language": language,
"fromScript": from_script,
"toScript": to_script
}
headers = {
"Ocp-Apim-Subscription-Key": key,
"Ocp-Apim-Subscription-Region": region,
"Content-type": "application/json"
}
body = [{"text": t} for t in texts]

response = requests.post(endpoint + path, params=params, headers=headers, json=body)
return response.json()

# Convert Japanese to Latin script
results = transliterate(["こんにちは世界"], "ja", "Jpan", "Latn")
for r in results:
print(f"Transliterated: {r['text']}") # "konnichiwa sekai"

# Convert Hindi Devanagari to Latin
results = transliterate(["नमस्ते दुनिया"], "hi", "Deva", "Latn")
for r in results:
print(f"Transliterated: {r['text']}") # "namaste duniya"

Expected Output

Source: 'Azure AI services make it easy to build intelligent applications.'
Detected language: en (1.00)
→ [es] Los servicios de Azure AI facilitan la creación de aplicaciones inteligentes.
→ [fr] Les services Azure AI facilitent la création d'applications intelligentes.
→ [ja] Azure AIサービスを使用すると、インテリジェントなアプリケーションを簡単に構築できます。

Source: 'The weather in Seattle is rainy today.'
Detected language: en (1.00)
→ [es] El clima en Seattle está lluvioso hoy.
→ [fr] Le temps à Seattle est pluvieux aujourd'hui.
→ [ja] 今日のシアトルの天気は雨です。

Transliterated: konnichiwa sekai
Transliterated: namaste duniya

Break & fix

ScenarioSymptomRoot CauseFix
401 UnauthorizedAuth failedMissing region headerInclude Ocp-Apim-Subscription-Region header
Empty translationsNo resultsMissing to parameterSpecify at least one target language
Wrong language detectedMistranslationShort text or ambiguousSpecify from parameter explicitly for known source
Document translation 400Bad requestInvalid SAS token or containerVerify SAS has read (source) and write (target) permissions
Transliteration errorScript not supportedInvalid script codeCheck supported scripts per language via /languages endpoint

Knowledge Check

1. What is the global endpoint for the Azure Translator text API?

2. Which header is required in addition to the subscription key for Translator requests?

3. How does Document Translation differ from Text Translation?

4. What does transliteration do?

5. How many target languages can you specify in a single text translation request?

Cleanup

az group delete --name rg-ai102-translator --yes --no-wait

Learn More