Skip to main content

Challenge 39: Custom Translation Models

Estimated Time

60 min | Cost: $5-15 (estimated) | Domain: Implement NLP Solutions (15-20%)

Exam skills covered

  • Implement custom text translation models
  • Train and evaluate custom translation with parallel data
  • Publish and consume custom translation models
  • Implement multi-language question answering

Overview

Custom Translator trains domain-specific translation models using your parallel data (source-target sentence pairs). This improves translation accuracy for specialized terminology:

ConceptDescription
Parallel dataAligned sentence pairs in source and target languages
BLEU scoreTranslation quality metric (0-100, higher = better)
Category IDIdentifier used to route requests to your custom model
BaselineMicrosoft's general translation model (comparison point)
TrainingFine-tuning the baseline with your parallel data

Multi-language Question Answering allows a single knowledge base to serve answers in multiple languages.

Portal-Based Operations

Some Custom Translator operations (project creation, file upload) are primarily done via the Custom Translator portal. This challenge documents the workflow and programmatic consumption of trained models.

Prerequisites

  • Azure subscription
  • Azure Translator resource (S1 tier for custom translation)
  • Parallel training data (TMX, XLIFF, TSV, or TXT files)
  • Custom Translator portal access

Implementation

Task 1: Prepare Parallel Training Data

import os

# Custom translation requires parallel data - aligned sentences
# Format: Tab-separated source and target (or separate aligned files)

# Example: Medical domain English-to-Spanish parallel data
training_data_tsv = """The patient presents with acute bronchitis.\tEl paciente presenta bronquitis aguda.
Administer 500mg amoxicillin three times daily.\tAdministrar 500mg de amoxicilina tres veces al día.
Blood pressure reading is 120 over 80.\tLa lectura de presión arterial es 120 sobre 80.
The MRI shows no abnormalities.\tLa resonancia magnética no muestra anomalías.
Schedule a follow-up appointment in two weeks.\tProgramar una cita de seguimiento en dos semanas.
Patient reports chest pain and shortness of breath.\tEl paciente reporta dolor en el pecho y dificultad para respirar.
Prescribe ibuprofen 400mg as needed for pain.\tRecetar ibuprofeno 400mg según sea necesario para el dolor.
The biopsy results are benign.\tLos resultados de la biopsia son benignos.
Apply topical antibiotic ointment twice daily.\tAplicar ungüento antibiótico tópico dos veces al día.
Refer patient to cardiology for further evaluation.\tReferir al paciente a cardiología para evaluación adicional."""

# Save training file
with open("medical-training-en-es.tsv", "w", encoding="utf-8") as f:
f.write(training_data_tsv)

# Tuning data (separate set for validation)
tuning_data = """Patient exhibits symptoms of type 2 diabetes.\tEl paciente exhibe síntomas de diabetes tipo 2.
Recommend physical therapy twice a week.\tRecomendar fisioterapia dos veces por semana.
Lab results indicate elevated cholesterol.\tLos resultados del laboratorio indican colesterol elevado."""

with open("medical-tuning-en-es.tsv", "w", encoding="utf-8") as f:
f.write(tuning_data)

# Testing data (for BLEU evaluation)
test_data = """Administer insulin injection before meals.\tAdministrar inyección de insulina antes de las comidas.
The X-ray reveals a hairline fracture.\tLa radiografía revela una fractura capilar."""

with open("medical-test-en-es.tsv", "w", encoding="utf-8") as f:
f.write(test_data)

print("Training data files created:")
print(f" medical-training-en-es.tsv ({training_data_tsv.count(chr(10))+1} sentence pairs)")
print(f" medical-tuning-en-es.tsv (3 sentence pairs)")
print(f" medical-test-en-es.tsv (2 sentence pairs)")
print("\nNote: Production models need 10,000+ sentence pairs for significant improvement.")

Task 2: Custom Translator Workflow (Portal + API)

import requests
import uuid

# Custom Translator portal workflow:
# 1. Create a workspace at https://portal.customtranslator.azure.ai
# 2. Create a project (specify language pair: en → es)
# 3. Upload parallel documents (training, tuning, testing)
# 4. Train the model
# 5. Publish the model (get Category ID)

# After training in the portal, you'll receive a Category ID
# Use this to route translation requests to your custom model

CATEGORY_ID = os.environ.get("CUSTOM_TRANSLATOR_CATEGORY_ID", "your-category-id")

# The BLEU score comparison after training:
print("""
Custom Translation Training Results (example):
================================================
Model: Medical-EN-ES-v1
Language pair: English → Spanish
Training sentences: 10,000
BLEU Score (baseline): 42.5
BLEU Score (custom): 58.3 (+15.8 improvement)
Status: Published
Category ID: {CATEGORY_ID}

Interpretation:
- BLEU < 30: Low quality (general model may be better for this pair)
- BLEU 30-40: Reasonable quality
- BLEU 40-60: Good quality
- BLEU > 60: Excellent quality
""")

Task 3: Consume Custom Translation Model

import requests
import uuid

key = os.environ["AZURE_TRANSLATOR_KEY"]
region = os.environ["AZURE_TRANSLATOR_REGION"]
endpoint = "https://api.cognitive.microsofttranslator.com"
category_id = os.environ.get("CUSTOM_TRANSLATOR_CATEGORY_ID", "general")

def translate_with_custom_model(texts, source_lang, target_lang, category=None):
"""Translate using custom model by specifying category"""
path = "/translate"
params = {
"api-version": "3.0",
"from": source_lang,
"to": target_lang,
}
if category:
params["category"] = category # Routes to custom model

headers = {
"Ocp-Apim-Subscription-Key": key,
"Ocp-Apim-Subscription-Region": region,
"Content-type": "application/json",
"X-ClientTraceId": str(uuid.uuid4())
}

body = [{"text": t} for t in texts]
response = requests.post(endpoint + path, params=params, headers=headers, json=body)
response.raise_for_status()
return response.json()

# Test sentences with medical terminology
medical_texts = [
"The patient presents with acute myocardial infarction.",
"Administer epinephrine 0.3mg intramuscularly immediately.",
"Schedule an echocardiogram to assess ventricular function."
]

# Compare general vs custom model
print("=== General Model (baseline) ===")
general_results = translate_with_custom_model(medical_texts, "en", "es", category="general")
for i, result in enumerate(general_results):
print(f" EN: {medical_texts[i]}")
print(f" ES: {result['translations'][0]['text']}\n")

print("=== Custom Model (medical domain) ===")
custom_results = translate_with_custom_model(medical_texts, "en", "es", category=category_id)
for i, result in enumerate(custom_results):
print(f" EN: {medical_texts[i]}")
print(f" ES: {result['translations'][0]['text']}\n")

Task 4: Multi-Language Question Answering

from azure.ai.language.questionanswering import QuestionAnsweringClient
from azure.core.credentials import AzureKeyCredential

# Multi-language QA: one knowledge base serving multiple languages
# The project must be created with multilingualResource=True

qa_client = QuestionAnsweringClient(
endpoint=os.environ["AZURE_AI_ENDPOINT"],
credential=AzureKeyCredential(os.environ["AZURE_AI_KEY"])
)

# Query in different languages against the same knowledge base
multilingual_queries = [
("What is Azure AI?", "en"),
("¿Qué es Azure AI?", "es"),
("Azure AIとは何ですか?", "ja"),
("Qu'est-ce qu'Azure AI?", "fr")
]

print("=== Multi-Language QA ===")
for question, lang in multilingual_queries:
response = qa_client.get_answers(
question=question,
project_name="faq-knowledge-base",
deployment_name="production",
language=lang
)

if response.answers:
top_answer = response.answers[0]
print(f"\n[{lang}] Q: {question}")
print(f" A: {top_answer.answer[:80]}...")
print(f" Confidence: {top_answer.confidence:.3f}")

Expected Output

Training data files created:
medical-training-en-es.tsv (10 sentence pairs)
medical-tuning-en-es.tsv (3 sentence pairs)
medical-test-en-es.tsv (2 sentence pairs)

Custom Translation Training Results (example):
================================================
Model: Medical-EN-ES-v1
BLEU Score (baseline): 42.5
BLEU Score (custom): 58.3 (+15.8 improvement)

=== General Model (baseline) ===
EN: The patient presents with acute myocardial infarction.
ES: El paciente se presenta con infarto agudo de miocardio.

=== Custom Model (medical domain) ===
EN: The patient presents with acute myocardial infarction.
ES: El paciente presenta infarto agudo al miocardio.

=== Multi-Language QA ===
[en] Q: What is Azure AI?
A: Azure AI Services is a collection of cloud-based AI APIs that help developers...
Confidence: 0.953
[es] Q: ¿Qué es Azure AI?
A: Azure AI Services is a collection of cloud-based AI APIs...
Confidence: 0.891
[ja] Q: Azure AIとは何ですか?
A: Azure AI Services is a collection of cloud-based AI APIs...
Confidence: 0.845

Break & fix

ScenarioSymptomRoot CauseFix
Custom model not usedGeneral translations returnedCategory ID not specified or incorrectVerify category parameter matches published model's Category ID
Low BLEU scoreNo improvement over baselineInsufficient training data or poor alignmentNeed 10,000+ aligned sentence pairs; verify alignment quality
Training failsUpload rejectedFile format incorrectUse supported formats: TMX, XLIFF, TSV, aligned TXT
Category not found400 error on translationModel not published or expiredPublish model in Custom Translator portal; check expiration
Multi-language QA poorLow confidence cross-languageProject not configured as multilingualEnable multilingualResource: true when creating project

Knowledge Check

1. How do you route a translation request to your custom model?

2. What is the BLEU score used for in Custom Translator?

3. What type of training data does Custom Translator require?

4. How does multi-language Question Answering work?

5. What is the minimum recommended amount of parallel training data for meaningful improvement?

Cleanup

az group delete --name rg-ai102-translator --yes --no-wait

Learn More