Pular para o conteúdo principal

Desafio 20: Orquestração Multi-Modelo

Tempo Estimado

45-60 min | Custo: ~$3.00 (estimado) | Domínio: Soluções de IA Generativa (15-20%)

Habilidades do exame cobertas

  • Implementar orquestração de múltiplos modelos de IA generativa
  • Implantar modelos em contêineres para cenários de borda
  • Implementar function calling para uso de ferramentas

Visão Geral

Sistemas de IA em produção raramente dependem de um único modelo. Orquestração multi-modelo roteia requisições para diferentes modelos com base na complexidade da tarefa, restrições de custo ou requisitos de capacidade. Por exemplo, um roteador pode enviar tarefas simples de classificação para o GPT-4o-mini (rápido, barato) enquanto direciona raciocínio complexo para o GPT-4o (mais lento, mais capaz). Esse padrão otimiza o tradeoff custo-qualidade em um portfólio de aplicações.

Semantic Kernel é o SDK de orquestração open-source da Microsoft que fornece abstrações para serviços de IA, plugins (funções que o modelo pode chamar) e planejadores que decompõem tarefas complexas em etapas. Ele suporta tanto Python quanto C#, integrando-se nativamente com o Azure OpenAI. Function calling (uso de ferramentas) permite que modelos invoquem ferramentas externas — APIs, bancos de dados ou código customizado — descrevendo as funções disponíveis e deixando o modelo decidir quando e como chamá-las.

Para cenários de implantação na borda, os contêineres Azure AI empacotam modelos para operação offline ou de baixa latência. Modelos em contêineres operam independentemente de conectividade com a nuvem, sendo adequados para chãos de fábrica, veículos ou redes restritas onde o acesso à nuvem é limitado ou proibido.

Arquitetura

Este desafio implementa um roteador de modelos, configura function calling com ferramentas, constrói um pipeline de orquestração multi-etapas e implanta um endpoint de modelo em contêiner.

Challenge 20 topology

Pré-requisitos

  • Recurso Azure OpenAI com GPT-4o e GPT-4o-mini implantados
  • Python 3.9+ com pacotes openai, semantic-kernel
  • .NET 8 SDK com pacotes NuGet Azure.AI.OpenAI, Microsoft.SemanticKernel
  • Docker Desktop (para a tarefa de implantação em contêiner)
  • Azure Container Registry (opcional, para push)

Implementação

Tarefa 1: Implementar Roteador de Modelos (Baseado em Complexidade)

import os
import time
from openai import AzureOpenAI

client = AzureOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_KEY"],
api_version="2024-10-21"
)

class ModelRouter:
"""Route requests to appropriate models based on complexity."""

COMPLEX_MODEL = "gpt-4o" # For complex reasoning tasks
SIMPLE_MODEL = "gpt-4o-mini" # For simple, fast tasks

COMPLEXITY_INDICATORS = [
"analyze", "compare", "evaluate", "synthesize",
"design", "architect", "debug", "explain why",
"multi-step", "trade-offs", "implications"
]

def classify_complexity(self, message: str) -> str:
"""Determine if a request is simple or complex."""
message_lower = message.lower()
complexity_score = sum(
1 for indicator in self.COMPLEXITY_INDICATORS
if indicator in message_lower
)
# Also consider message length as a heuristic
if len(message) > 500 or complexity_score >= 2:
return "complex"
return "simple"

def route(self, messages: list, **kwargs) -> dict:
"""Route request to appropriate model."""
user_message = next(
(m["content"] for m in reversed(messages) if m["role"] == "user"), ""
)
complexity = self.classify_complexity(user_message)
model = self.COMPLEX_MODEL if complexity == "complex" else self.SIMPLE_MODEL

start = time.time()
response = client.chat.completions.create(
model=model,
messages=messages,
**kwargs
)
latency = time.time() - start

return {
"response": response,
"model_used": model,
"complexity": complexity,
"latency_ms": latency * 1000
}

# Test the router
router = ModelRouter()

# Simple request → routes to GPT-4o-mini
result1 = router.route(
[{"role": "user", "content": "What is Azure?"}],
max_tokens=100
)
print(f"Simple: model={result1['model_used']}, "
f"latency={result1['latency_ms']:.0f}ms")
print(f" Response: {result1['response'].choices[0].message.content[:80]}...\n")

# Complex request → routes to GPT-4o
result2 = router.route(
[{"role": "user", "content": "Analyze the trade-offs between using Azure Functions Consumption plan vs Premium plan. Compare cost implications, cold start behavior, and scaling characteristics for a multi-step data processing pipeline."}],
max_tokens=300
)
print(f"Complex: model={result2['model_used']}, "
f"latency={result2['latency_ms']:.0f}ms")
print(f" Response: {result2['response'].choices[0].message.content[:80]}...")

Tarefa 2: Implementar Function Calling (Uso de Ferramentas)

import os
import json
from openai import AzureOpenAI

client = AzureOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_KEY"],
api_version="2024-10-21"
)

# Define available tools (functions the model can call)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g., 'Seattle, WA'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
}
},
{
"type": "function",
"function": {
"name": "search_documents",
"description": "Search internal knowledge base for relevant documents",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query"
},
"top_k": {
"type": "integer",
"description": "Number of results to return",
"default": 3
}
},
"required": ["query"]
}
}
}
]

# Simulated tool implementations
def get_weather(location: str, unit: str = "celsius") -> str:
# In production, call a real weather API
return json.dumps({"location": location, "temperature": 18, "unit": unit, "condition": "partly cloudy"})

def search_documents(query: str, top_k: int = 3) -> str:
# In production, call Azure AI Search
return json.dumps({"results": [{"title": f"Doc about {query}", "snippet": f"Information about {query}..."}]})

# Function dispatch table
available_functions = {
"get_weather": get_weather,
"search_documents": search_documents
}

# Send request with tools
messages = [
{"role": "system", "content": "You help users by calling tools when needed."},
{"role": "user", "content": "What's the weather in Seattle and find docs about Azure Functions?"}
]

response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto" # Let model decide when to call tools
)

# Process tool calls
response_message = response.choices[0].message
if response_message.tool_calls:
messages.append(response_message) # Add assistant's tool call message

for tool_call in response_message.tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)

print(f"Calling: {function_name}({function_args})")
function_response = available_functions[function_name](**function_args)

# Add tool response to messages
messages.append({
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": function_response
})

# Get final response with tool results
final_response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
print(f"\nFinal answer: {final_response.choices[0].message.content}")

Tarefa 3: Construir Pipeline Multi-Etapas com Semantic Kernel

import os
import asyncio
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from semantic_kernel.functions import kernel_function
from semantic_kernel.connectors.ai.open_ai import AzureChatPromptExecutionSettings

# Initialize Semantic Kernel
kernel = Kernel()

# Add Azure OpenAI service
kernel.add_service(
AzureChatCompletion(
deployment_name="gpt-4o",
endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_KEY"],
)
)

# Define plugins (native functions)
class TextAnalysisPlugin:
"""Plugin for text analysis tasks."""

@kernel_function(description="Summarize text into key points")
async def summarize(self, input: str) -> str:
settings = AzureChatPromptExecutionSettings(max_tokens=200, temperature=0)
result = await kernel.invoke_prompt(
f"Summarize the following text into 3 bullet points:\n\n{input}",
settings=settings
)
return str(result)

@kernel_function(description="Extract action items from text")
async def extract_actions(self, input: str) -> str:
settings = AzureChatPromptExecutionSettings(max_tokens=200, temperature=0)
result = await kernel.invoke_prompt(
f"Extract action items from this text as a numbered list:\n\n{input}",
settings=settings
)
return str(result)

@kernel_function(description="Determine sentiment of text")
async def analyze_sentiment(self, input: str) -> str:
settings = AzureChatPromptExecutionSettings(max_tokens=50, temperature=0)
result = await kernel.invoke_prompt(
f"What is the sentiment of this text? Reply with: positive, negative, or neutral.\n\n{input}",
settings=settings
)
return str(result)

# Register plugin
kernel.add_plugin(TextAnalysisPlugin(), plugin_name="TextAnalysis")

async def multi_step_pipeline(document: str):
"""Execute a multi-step analysis pipeline."""
print("=== Multi-Step Analysis Pipeline ===\n")

# Step 1: Summarize
print("Step 1: Summarizing...")
summary_fn = kernel.get_function("TextAnalysis", "summarize")
summary = await kernel.invoke(summary_fn, input=document)
print(f"Summary:\n{summary}\n")

# Step 2: Extract actions
print("Step 2: Extracting action items...")
actions_fn = kernel.get_function("TextAnalysis", "extract_actions")
actions = await kernel.invoke(actions_fn, input=document)
print(f"Actions:\n{actions}\n")

# Step 3: Sentiment analysis
print("Step 3: Analyzing sentiment...")
sentiment_fn = kernel.get_function("TextAnalysis", "analyze_sentiment")
sentiment = await kernel.invoke(sentiment_fn, input=document)
print(f"Sentiment: {sentiment}")

return {"summary": str(summary), "actions": str(actions), "sentiment": str(sentiment)}

# Run the pipeline
document = """
Meeting notes from Q4 planning:
The team agreed to migrate the data pipeline to Azure Data Factory by end of January.
Performance has been excellent this quarter with 99.9% uptime. However, we need to
address the rising costs in the compute cluster. Sarah will investigate spot instances.
Mike will prepare the migration plan document by next Friday.
"""

asyncio.run(multi_step_pipeline(document))

Tarefa 4: Implantar Endpoint de Modelo em Contêiner

import os
import subprocess
import requests

# Deploy a containerized Azure AI model for edge/offline scenarios
# This example uses Azure AI containers for text analytics

# Step 1: Pull the container image
# docker pull mcr.microsoft.com/azure-cognitive-services/textanalytics/sentiment:latest

# Step 2: Run locally with configuration
container_config = {
"image": "mcr.microsoft.com/azure-cognitive-services/textanalytics/sentiment:latest",
"ports": {"5000/tcp": 5000},
"environment": {
"Eula": "accept",
"Billing": os.environ["AZURE_AI_ENDPOINT"],
"ApiKey": os.environ["AZURE_AI_KEY"]
}
}

# Start container (equivalent Docker command shown)
print("Starting container...")
print(f"docker run -d -p 5000:5000 \\")
print(f" -e Eula=accept \\")
print(f" -e Billing={os.environ.get('AZURE_AI_ENDPOINT', '<endpoint>')} \\")
print(f" -e ApiKey={os.environ.get('AZURE_AI_KEY', '<key>')} \\")
print(f" mcr.microsoft.com/azure-cognitive-services/textanalytics/sentiment:latest")

# Step 3: Call the containerized endpoint
def analyze_sentiment_local(text: str, port: int = 5000) -> dict:
"""Call the local container endpoint."""
response = requests.post(
f"http://localhost:{port}/text/analytics/v3.1/sentiment",
json={
"documents": [
{"id": "1", "language": "en", "text": text}
]
}
)
return response.json()

# Test the container
# result = analyze_sentiment_local("Azure AI services are excellent and easy to use!")
# print(f"Sentiment: {result['documents'][0]['sentiment']}")

# Step 4: For Azure OpenAI proxy pattern (APIM or custom gateway)
from openai import AzureOpenAI

# Custom base URL pointing to local container or edge gateway
edge_client = AzureOpenAI(
azure_endpoint="http://localhost:8080", # Local proxy
api_key="local-key",
api_version="2024-10-21"
)

print("\nEdge deployment pattern configured")
print("Container runs independently of cloud connectivity")
print("Billing endpoint required for meter reporting only")

Saída Esperada

Simple: model=gpt-4o-mini, latency=285ms
Response: Azure is Microsoft's cloud computing platform that provides a wide range of...

Complex: model=gpt-4o, latency=1250ms
Response: When comparing Azure Functions Consumption and Premium plans, several key trade...

Calling: get_weather({"location": "Seattle", "unit": "celsius"})
Calling: search_documents({"query": "Azure Functions"})

Final answer: The weather in Seattle is currently 18°C and partly cloudy. I also found
documentation about Azure Functions in our knowledge base...

=== Multi-Step Analysis Pipeline ===
Step 1: Summarizing...
Summary:
• Migration to Azure Data Factory planned for end of January
• Excellent Q4 performance with 99.9% uptime
• Rising compute costs need investigation (spot instances)

Step 2: Extracting actions...
Actions:
1. Sarah: Investigate spot instances for compute cluster
2. Mike: Prepare migration plan document by next Friday

Step 3: Analyzing sentiment...
Sentiment: positive

Quebra & conserta

CenárioSintomaCausa RaizCorreção
Função não chamadaModelo responde diretamente sem tool callDescrição da função confusa ou irrelevanteMelhorar descrições das funções; usar tool_choice: "required"
Loop infinito de ferramentasModelo continua chamando a mesma funçãoSem condição de terminaçãoLimitar rodadas de tool call; adicionar lógica de "done"
Erro no plugin do Semantic KernelExceção FunctionNotFoundPlugin não registrado ou nome de função incorretoVerificar chamada add_plugin() e se o nome da função corresponde
Contêiner falha ao iniciarErro Eula=accept ausenteEULA não aceitoDefinir variável de ambiente Eula=accept
Erro de billing no contêinerContêiner para após 10-15 minEndpoint de billing inacessívelGarantir que a URL de Billing está acessível; verificar rede

Verificação de Conhecimento

1. No function calling do Azure OpenAI, o que o modelo retorna quando decide usar uma ferramenta?

2. Qual é a principal vantagem de usar um roteador de modelos em arquiteturas multi-modelo?

3. Qual variável de ambiente é necessária para que os contêineres Azure AI funcionem corretamente?

4. No Semantic Kernel, o que é um 'plugin'?

5. Ao implementar function calling, o que acontece depois que a aplicação executa a função e retorna os resultados?

Limpeza

# Stop and remove containers
docker stop ai-sentiment && docker rm ai-sentiment

# Delete Azure resources
az group delete --name rg-ai102-challenge20 --yes --no-wait

Saiba Mais