Skip to main content

Challenge 20: Multi-Model Orchestration

Estimated Time

45-60 min | Cost: ~$3.00 (estimated) | Domain: Generative AI Solutions (15-20%)

Exam skills covered

  • Implement orchestration of multiple generative AI models
  • Deploy models in containers for edge scenarios
  • Implement function calling for tool use

Overview

Production AI systems rarely rely on a single model. Multi-model orchestration routes requests to different models based on task complexity, cost constraints, or capability requirements. For example, a router might send simple classification tasks to GPT-4o-mini (fast, cheap) while directing complex reasoning to GPT-4o (slower, more capable). This pattern optimizes the cost-quality tradeoff across an application portfolio.

Semantic Kernel is Microsoft's open-source orchestration SDK that provides abstractions for AI services, plugins (functions the model can call), and planners that decompose complex tasks into steps. It supports both Python and C#, integrating natively with Azure OpenAI. Function calling (tool use) enables models to invoke external tools—APIs, databases, or custom code—by describing available functions and letting the model decide when and how to call them.

For edge deployment scenarios, Azure AI containers package models for offline or low-latency operation. Containerized models run independently of cloud connectivity, suitable for manufacturing floors, vehicles, or restricted networks where cloud access is limited or prohibited.

Architecture

This challenge implements a model router, configures function calling with tools, builds a multi-step orchestration pipeline, and deploys a containerized model endpoint.

Challenge 20 topology

Prerequisites

  • Azure OpenAI resource with both GPT-4o and GPT-4o-mini deployed
  • Python 3.9+ with openai, semantic-kernel packages
  • .NET 8 SDK with Azure.AI.OpenAI, Microsoft.SemanticKernel NuGet packages
  • Docker Desktop (for container deployment task)
  • Azure Container Registry (optional, for push)

Implementation

Task 1: Implement Model Router (Complexity-Based)

import os
import time
from openai import AzureOpenAI

client = AzureOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_KEY"],
api_version="2024-10-21"
)

class ModelRouter:
"""Route requests to appropriate models based on complexity."""

COMPLEX_MODEL = "gpt-4o" # For complex reasoning tasks
SIMPLE_MODEL = "gpt-4o-mini" # For simple, fast tasks

COMPLEXITY_INDICATORS = [
"analyze", "compare", "evaluate", "synthesize",
"design", "architect", "debug", "explain why",
"multi-step", "trade-offs", "implications"
]

def classify_complexity(self, message: str) -> str:
"""Determine if a request is simple or complex."""
message_lower = message.lower()
complexity_score = sum(
1 for indicator in self.COMPLEXITY_INDICATORS
if indicator in message_lower
)
# Also consider message length as a heuristic
if len(message) > 500 or complexity_score >= 2:
return "complex"
return "simple"

def route(self, messages: list, **kwargs) -> dict:
"""Route request to appropriate model."""
user_message = next(
(m["content"] for m in reversed(messages) if m["role"] == "user"), ""
)
complexity = self.classify_complexity(user_message)
model = self.COMPLEX_MODEL if complexity == "complex" else self.SIMPLE_MODEL

start = time.time()
response = client.chat.completions.create(
model=model,
messages=messages,
**kwargs
)
latency = time.time() - start

return {
"response": response,
"model_used": model,
"complexity": complexity,
"latency_ms": latency * 1000
}

# Test the router
router = ModelRouter()

# Simple request → routes to GPT-4o-mini
result1 = router.route(
[{"role": "user", "content": "What is Azure?"}],
max_tokens=100
)
print(f"Simple: model={result1['model_used']}, "
f"latency={result1['latency_ms']:.0f}ms")
print(f" Response: {result1['response'].choices[0].message.content[:80]}...\n")

# Complex request → routes to GPT-4o
result2 = router.route(
[{"role": "user", "content": "Analyze the trade-offs between using Azure Functions Consumption plan vs Premium plan. Compare cost implications, cold start behavior, and scaling characteristics for a multi-step data processing pipeline."}],
max_tokens=300
)
print(f"Complex: model={result2['model_used']}, "
f"latency={result2['latency_ms']:.0f}ms")
print(f" Response: {result2['response'].choices[0].message.content[:80]}...")

Task 2: Implement Function Calling (Tool Use)

import os
import json
from openai import AzureOpenAI

client = AzureOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_KEY"],
api_version="2024-10-21"
)

# Define available tools (functions the model can call)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g., 'Seattle, WA'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
}
},
{
"type": "function",
"function": {
"name": "search_documents",
"description": "Search internal knowledge base for relevant documents",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query"
},
"top_k": {
"type": "integer",
"description": "Number of results to return",
"default": 3
}
},
"required": ["query"]
}
}
}
]

# Simulated tool implementations
def get_weather(location: str, unit: str = "celsius") -> str:
# In production, call a real weather API
return json.dumps({"location": location, "temperature": 18, "unit": unit, "condition": "partly cloudy"})

def search_documents(query: str, top_k: int = 3) -> str:
# In production, call Azure AI Search
return json.dumps({"results": [{"title": f"Doc about {query}", "snippet": f"Information about {query}..."}]})

# Function dispatch table
available_functions = {
"get_weather": get_weather,
"search_documents": search_documents
}

# Send request with tools
messages = [
{"role": "system", "content": "You help users by calling tools when needed."},
{"role": "user", "content": "What's the weather in Seattle and find docs about Azure Functions?"}
]

response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto" # Let model decide when to call tools
)

# Process tool calls
response_message = response.choices[0].message
if response_message.tool_calls:
messages.append(response_message) # Add assistant's tool call message

for tool_call in response_message.tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)

print(f"Calling: {function_name}({function_args})")
function_response = available_functions[function_name](**function_args)

# Add tool response to messages
messages.append({
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": function_response
})

# Get final response with tool results
final_response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
print(f"\nFinal answer: {final_response.choices[0].message.content}")

Task 3: Build Multi-Step Pipeline with Semantic Kernel

import os
import asyncio
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from semantic_kernel.functions import kernel_function
from semantic_kernel.connectors.ai.open_ai import AzureChatPromptExecutionSettings

# Initialize Semantic Kernel
kernel = Kernel()

# Add Azure OpenAI service
kernel.add_service(
AzureChatCompletion(
deployment_name="gpt-4o",
endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_KEY"],
)
)

# Define plugins (native functions)
class TextAnalysisPlugin:
"""Plugin for text analysis tasks."""

@kernel_function(description="Summarize text into key points")
async def summarize(self, input: str) -> str:
settings = AzureChatPromptExecutionSettings(max_tokens=200, temperature=0)
result = await kernel.invoke_prompt(
f"Summarize the following text into 3 bullet points:\n\n{input}",
settings=settings
)
return str(result)

@kernel_function(description="Extract action items from text")
async def extract_actions(self, input: str) -> str:
settings = AzureChatPromptExecutionSettings(max_tokens=200, temperature=0)
result = await kernel.invoke_prompt(
f"Extract action items from this text as a numbered list:\n\n{input}",
settings=settings
)
return str(result)

@kernel_function(description="Determine sentiment of text")
async def analyze_sentiment(self, input: str) -> str:
settings = AzureChatPromptExecutionSettings(max_tokens=50, temperature=0)
result = await kernel.invoke_prompt(
f"What is the sentiment of this text? Reply with: positive, negative, or neutral.\n\n{input}",
settings=settings
)
return str(result)

# Register plugin
kernel.add_plugin(TextAnalysisPlugin(), plugin_name="TextAnalysis")

async def multi_step_pipeline(document: str):
"""Execute a multi-step analysis pipeline."""
print("=== Multi-Step Analysis Pipeline ===\n")

# Step 1: Summarize
print("Step 1: Summarizing...")
summary_fn = kernel.get_function("TextAnalysis", "summarize")
summary = await kernel.invoke(summary_fn, input=document)
print(f"Summary:\n{summary}\n")

# Step 2: Extract actions
print("Step 2: Extracting action items...")
actions_fn = kernel.get_function("TextAnalysis", "extract_actions")
actions = await kernel.invoke(actions_fn, input=document)
print(f"Actions:\n{actions}\n")

# Step 3: Sentiment analysis
print("Step 3: Analyzing sentiment...")
sentiment_fn = kernel.get_function("TextAnalysis", "analyze_sentiment")
sentiment = await kernel.invoke(sentiment_fn, input=document)
print(f"Sentiment: {sentiment}")

return {"summary": str(summary), "actions": str(actions), "sentiment": str(sentiment)}

# Run the pipeline
document = """
Meeting notes from Q4 planning:
The team agreed to migrate the data pipeline to Azure Data Factory by end of January.
Performance has been excellent this quarter with 99.9% uptime. However, we need to
address the rising costs in the compute cluster. Sarah will investigate spot instances.
Mike will prepare the migration plan document by next Friday.
"""

asyncio.run(multi_step_pipeline(document))

Task 4: Deploy Containerized Model Endpoint

import os
import subprocess
import requests

# Deploy a containerized Azure AI model for edge/offline scenarios
# This example uses Azure AI containers for text analytics

# Step 1: Pull the container image
# docker pull mcr.microsoft.com/azure-cognitive-services/textanalytics/sentiment:latest

# Step 2: Run locally with configuration
container_config = {
"image": "mcr.microsoft.com/azure-cognitive-services/textanalytics/sentiment:latest",
"ports": {"5000/tcp": 5000},
"environment": {
"Eula": "accept",
"Billing": os.environ["AZURE_AI_ENDPOINT"],
"ApiKey": os.environ["AZURE_AI_KEY"]
}
}

# Start container (equivalent Docker command shown)
print("Starting container...")
print(f"docker run -d -p 5000:5000 \\")
print(f" -e Eula=accept \\")
print(f" -e Billing={os.environ.get('AZURE_AI_ENDPOINT', '<endpoint>')} \\")
print(f" -e ApiKey={os.environ.get('AZURE_AI_KEY', '<key>')} \\")
print(f" mcr.microsoft.com/azure-cognitive-services/textanalytics/sentiment:latest")

# Step 3: Call the containerized endpoint
def analyze_sentiment_local(text: str, port: int = 5000) -> dict:
"""Call the local container endpoint."""
response = requests.post(
f"http://localhost:{port}/text/analytics/v3.1/sentiment",
json={
"documents": [
{"id": "1", "language": "en", "text": text}
]
}
)
return response.json()

# Test the container
# result = analyze_sentiment_local("Azure AI services are excellent and easy to use!")
# print(f"Sentiment: {result['documents'][0]['sentiment']}")

# Step 4: For Azure OpenAI proxy pattern (APIM or custom gateway)
from openai import AzureOpenAI

# Custom base URL pointing to local container or edge gateway
edge_client = AzureOpenAI(
azure_endpoint="http://localhost:8080", # Local proxy
api_key="local-key",
api_version="2024-10-21"
)

print("\nEdge deployment pattern configured")
print("Container runs independently of cloud connectivity")
print("Billing endpoint required for meter reporting only")

Expected Output

Simple: model=gpt-4o-mini, latency=285ms
Response: Azure is Microsoft's cloud computing platform that provides a wide range of...

Complex: model=gpt-4o, latency=1250ms
Response: When comparing Azure Functions Consumption and Premium plans, several key trade...

Calling: get_weather({"location": "Seattle", "unit": "celsius"})
Calling: search_documents({"query": "Azure Functions"})

Final answer: The weather in Seattle is currently 18°C and partly cloudy. I also found
documentation about Azure Functions in our knowledge base...

=== Multi-Step Analysis Pipeline ===
Step 1: Summarizing...
Summary:
• Migration to Azure Data Factory planned for end of January
• Excellent Q4 performance with 99.9% uptime
• Rising compute costs need investigation (spot instances)

Step 2: Extracting actions...
Actions:
1. Sarah: Investigate spot instances for compute cluster
2. Mike: Prepare migration plan document by next Friday

Step 3: Analyzing sentiment...
Sentiment: positive

Break & fix

ScenarioSymptomRoot CauseFix
Function not calledModel responds directly without tool callFunction description unclear or not relevantImprove function descriptions; use tool_choice: "required"
Infinite tool loopModel keeps calling same functionNo termination conditionLimit tool call rounds; add "done" logic
Semantic Kernel plugin errorFunctionNotFound exceptionPlugin not registered or wrong function nameVerify add_plugin() call and function name matches
Container fails to startEula=accept missing errorEULA not acceptedSet Eula=accept environment variable
Container billing errorContainer stops after 10-15 minBilling endpoint unreachableEnsure Billing URL is accessible; check network

Knowledge Check

1. In Azure OpenAI function calling, what does the model return when it decides to use a tool?

2. What is the primary advantage of using a model router in multi-model architectures?

3. Which environment variable is required for Azure AI containers to function correctly?

4. In Semantic Kernel, what is a 'plugin'?

5. When implementing function calling, what happens after the application executes the function and returns results?

Cleanup

# Stop and remove containers
docker stop ai-sentiment && docker rm ai-sentiment

# Delete Azure resources
az group delete --name rg-ai102-challenge20 --yes --no-wait

Learn More