Pular para o conteúdo principal

Desafio 25: Custom Vision - Classificação de Imagens

Tempo Estimado

60 min | Custo: $2-5 (estimado) | Domínio: Implementar Soluções de Visão Computacional (10-15%)

Habilidades do exame abordadas

  • Escolher entre classificação de imagem e detecção de objetos
  • Rotular imagens para treinamento de modelos personalizados
  • Treinar um modelo de imagem personalizado
  • Avaliar métricas do modelo (precision, recall, AP)
  • Publicar e consumir iterações do modelo personalizado

Visão Geral

O Custom Vision permite treinar modelos de classificação de imagens específicos para um domínio sem necessidade de expertise em ML. Dois tipos de projeto:

TipoCaso de UsoSaída
Classificação - MulticlassImagem pertence a UMA categoriaUma tag por imagem
Classificação - MultilabelImagem pode ter MÚLTIPLAS categoriasMúltiplas tags por imagem
Detecção de ObjetosLocalizar objetos com bounding boxesTags + coordenadas

O treinamento produz iterações com métricas: Precision (das predições positivas, quantas estão corretas), Recall (dos positivos reais, quantos foram encontrados) e AP (Average Precision — área sob a curva precision-recall).

Pré-requisitos

  • Assinatura Azure
  • Recursos Custom Vision Training + Prediction
  • Python 3.9+
  • Pacotes: azure-cognitiveservices-vision-customvision (training + prediction)

Implementação

Tarefa 1: Criar Recursos Custom Vision

az group create --name rg-ai102-customvision --location eastus2

# Training resource
az cognitiveservices account create \
--name cv-training-ai102 \
--resource-group rg-ai102-customvision \
--kind CustomVision.Training \
--sku S0 \
--location eastus2

# Prediction resource
az cognitiveservices account create \
--name cv-prediction-ai102 \
--resource-group rg-ai102-customvision \
--kind CustomVision.Prediction \
--sku S0 \
--location eastus2

# Get keys
TRAINING_KEY=$(az cognitiveservices account keys list --name cv-training-ai102 --resource-group rg-ai102-customvision --query key1 -o tsv)
TRAINING_ENDPOINT=$(az cognitiveservices account show --name cv-training-ai102 --resource-group rg-ai102-customvision --query properties.endpoint -o tsv)
PREDICTION_KEY=$(az cognitiveservices account keys list --name cv-prediction-ai102 --resource-group rg-ai102-customvision --query key1 -o tsv)
PREDICTION_ENDPOINT=$(az cognitiveservices account show --name cv-prediction-ai102 --resource-group rg-ai102-customvision --query properties.endpoint -o tsv)

Tarefa 2: Criar Projeto, Enviar e Rotular Imagens, Treinar

import os
import time
from azure.cognitiveservices.vision.customvision.training import CustomVisionTrainingClient
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from azure.cognitiveservices.vision.customvision.training.models import ImageUrlCreateEntry
from msrest.authentication import ApiKeyCredentials

training_endpoint = os.environ["CUSTOM_VISION_TRAINING_ENDPOINT"]
training_key = os.environ["CUSTOM_VISION_TRAINING_KEY"]
prediction_endpoint = os.environ["CUSTOM_VISION_PREDICTION_ENDPOINT"]
prediction_key = os.environ["CUSTOM_VISION_PREDICTION_KEY"]

# Create training client
training_credentials = ApiKeyCredentials(in_headers={"Training-key": training_key})
trainer = CustomVisionTrainingClient(training_endpoint, training_credentials)

# Create a classification project (Multiclass)
project = trainer.create_project(
name="Fruit-Classifier",
classification_type="Multiclass",
domain_id=None # Use General domain
)
print(f"Created project: {project.id}")

# Create tags (categories)
apple_tag = trainer.create_tag(project.id, "apple")
banana_tag = trainer.create_tag(project.id, "banana")
orange_tag = trainer.create_tag(project.id, "orange")
print(f"Tags created: apple={apple_tag.id}, banana={banana_tag.id}, orange={orange_tag.id}")

# Upload training images (using URLs)
apple_images = [
"https://upload.wikimedia.org/wikipedia/commons/1/15/Red_Apple.jpg",
"https://upload.wikimedia.org/wikipedia/commons/e/ee/Apples.jpg",
]
banana_images = [
"https://upload.wikimedia.org/wikipedia/commons/8/8a/Banana-Single.jpg",
"https://upload.wikimedia.org/wikipedia/commons/d/de/Bananacomp.jpg",
]

# Upload apple images
image_entries = [ImageUrlCreateEntry(url=url, tag_ids=[apple_tag.id]) for url in apple_images]
upload_result = trainer.create_images_from_urls(project.id, images=image_entries)
print(f"Apple images uploaded: {upload_result.is_batch_successful}")

# Upload banana images
image_entries = [ImageUrlCreateEntry(url=url, tag_ids=[banana_tag.id]) for url in banana_images]
upload_result = trainer.create_images_from_urls(project.id, images=image_entries)
print(f"Banana images uploaded: {upload_result.is_batch_successful}")

# Train the model (requires minimum 5 images per tag for best results)
print("\nTraining model...")
iteration = trainer.train_project(project.id)

while iteration.status != "Completed":
iteration = trainer.get_iteration(project.id, iteration.id)
print(f" Status: {iteration.status}")
time.sleep(5)

print(f"Training complete! Iteration: {iteration.id}")

# Get performance metrics
performance = trainer.get_iteration_performance(project.id, iteration.id)
print(f"\nPerformance Metrics:")
print(f" Precision: {performance.precision:.4f}")
print(f" Recall: {performance.recall:.4f}")
print(f" AP (Average Precision): {performance.average_precision:.4f}")

for tag_perf in performance.per_tag_performance:
print(f" Tag '{tag_perf.name}': precision={tag_perf.precision:.3f}, recall={tag_perf.recall:.3f}")

Tarefa 3: Publicar e Fazer Predições

# Publish the iteration for prediction
prediction_resource_id = f"/subscriptions/<sub-id>/resourceGroups/rg-ai102-customvision/providers/Microsoft.CognitiveServices/accounts/cv-prediction-ai102"
publish_name = "fruit-model-v1"

trainer.publish_iteration(
project.id,
iteration.id,
publish_name,
prediction_resource_id
)
print(f"Model published as: {publish_name}")

# Create prediction client
prediction_credentials = ApiKeyCredentials(in_headers={"Prediction-key": prediction_key})
predictor = CustomVisionPredictionClient(prediction_endpoint, prediction_credentials)

# Make a prediction with URL
test_url = "https://upload.wikimedia.org/wikipedia/commons/1/15/Red_Apple.jpg"

results = predictor.classify_image_url(
project.id,
publish_name,
url=test_url
)

print(f"\nPrediction results:")
for prediction in results.predictions:
print(f" {prediction.tag_name}: {prediction.probability:.4f} ({prediction.probability*100:.1f}%)")

# Prediction with local file
with open("test-fruit.jpg", "rb") as image_file:
results = predictor.classify_image(project.id, publish_name, image_file)
for prediction in results.predictions:
if prediction.probability > 0.5:
print(f" Classified as: {prediction.tag_name} ({prediction.probability:.2%})")

Saída Esperada

Created project: a1b2c3d4-e5f6-7890-abcd-ef1234567890
Tags created: apple=..., banana=..., orange=...
Apple images uploaded: True
Banana images uploaded: True

Training model...
Status: Training
Status: Training
Status: Completed
Training complete! Iteration: iter-12345

Performance Metrics:
Precision: 0.9200
Recall: 0.8800
AP (Average Precision): 0.9450
Tag 'apple': precision=0.950, recall=0.900
Tag 'banana': precision=0.890, recall=0.860

Model published as: fruit-model-v1

Prediction results:
apple: 0.9723 (97.2%)
banana: 0.0198 (2.0%)
orange: 0.0079 (0.8%)

Quebra & conserta

CenárioSintomaCausa RaizCorreção
Treinamento falhaBadRequestImageCountMenos de 5 imagens por tagAdicione pelo menos 5 imagens por categoria
Precision baixaMuitos falsos positivosImagens de treinamento muito similares entre tagsAdicione exemplos negativos diversos; use imagens mais distintas
Publicação falhaErro de Resource IDID do recurso de predição com formato incorretoUse o caminho completo do ARM resource ID
Predição retorna 404Iteração não encontradaModelo não publicado ou nome de publicação erradoVerifique se publish_iteration teve sucesso; use o nome correto
Upload de imagem falhaImageUrl not reachableURL não acessível a partir do AzureUse URLs acessíveis publicamente ou faça upload direto do arquivo

Verificação de Conhecimento

1. Qual é a diferença entre classificação Multiclass e Multilabel?

2. O que a Average Precision (AP) mede no Custom Vision?

3. O que é necessário antes de fazer predições com um modelo Custom Vision treinado?

4. Qual é o número mínimo de imagens de treinamento recomendado por tag?

5. Como você consome um modelo Custom Vision publicado para predições?

Limpeza

az group delete --name rg-ai102-customvision --yes --no-wait

Saiba Mais