第二节:OpenAISDK与AzureAIFoundry的集成


文档摘要

第二节:OpenAI SDK与Azure AI Foundry的集成 概述 在您掌握Foundry Local基础知识的基础上,本节课将重点介绍高级OpenAI SDK集成模式,这些模式能够无缝支持Microsoft Foundry Local和Azure OpenAI。您将学习如何构建灵活的AI应用程序,这些应用程序既可以在本地运行以确保隐私和开发需求,又可以通过Azure OpenAI实现云端扩展。

第二节:OpenAI SDK与Azure AI Foundry的集成

概述

在您掌握Foundry Local基础知识的基础上,本节课将重点介绍高级OpenAI SDK集成模式,这些模式能够无缝支持Microsoft Foundry Local和Azure OpenAI。您将学习如何构建灵活的AI应用程序,这些应用程序既可以在本地运行以确保隐私和开发需求,又可以通过Azure OpenAI实现云端扩展。

学习目标

完成本节课程后,您将能够:

  • 掌握Foundry Local和Azure OpenAI的高级OpenAI SDK集成技术
  • 实现流式响应以提升用户体验
  • 创建支持多提供商的强大客户端工厂模式
  • 构建具有上下文保留功能的对话管理系统
  • 建立性能基准测试和健康监控机制
  • 部署具有完善错误处理的生产级应用程序

前置条件

  • 完成第一节课程:Foundry Local入门
  • 已安装并运行模型的Foundry Local环境
  • Python 3.8或更高版本,支持虚拟环境
  • 已安装OpenAI Python SDK(pip install openai foundry-local-sdk
  • Azure账户并启用OpenAI服务(可选,用于云端场景)
  • 基本了解Python的async/await模式

第一部分:OpenAI SDK客户端工厂模式

理解多提供商架构

构建同时支持Foundry Local和Azure OpenAI的应用程序需要灵活的客户端创建模式:

# sdk_integration.py - Sample 02 pattern import os from openai import OpenAI from typing import Tuple try: from foundry_local import FoundryLocalManager FOUNDRY_SDK_AVAILABLE = True except ImportError: FOUNDRY_SDK_AVAILABLE = False def create_azure_client() -> Tuple[OpenAI, str]: """Create Azure OpenAI client.""" azure_endpoint = os.environ.get("AZURE_OPENAI_ENDPOINT") azure_api_key = os.environ.get("AZURE_OPENAI_API_KEY") azure_api_version = os.environ.get("AZURE_OPENAI_API_VERSION", "2024-08-01-preview") if not azure_endpoint or not azure_api_key: raise ValueError("Azure OpenAI endpoint and API key are required") model = os.environ.get("MODEL", "your-deployment-name") client = OpenAI( base_url=f"{azure_endpoint}/openai", api_key=azure_api_key, default_query={"api-version": azure_api_version}, ) print(f" Azure OpenAI client created with model: {model}") return client, model def create_foundry_client() -> Tuple[OpenAI, str]: """Create Foundry Local client with SDK management.""" alias = os.environ.get("MODEL", "phi-4-mini") if FOUNDRY_SDK_AVAILABLE: try: # Use FoundryLocalManager for proper service management manager = FoundryLocalManager(alias) model_info = manager.get_model_info(alias) # Configure OpenAI client to use local Foundry service client = OpenAI( base_url=manager.endpoint, api_key=manager.api_key ) print(f" Foundry Local SDK initialized with model: {model_info.id}") return client, model_info.id except Exception as e: print(f"⚠️ Could not use Foundry SDK ({e}), falling back to manual configuration") # Fallback to manual configuration base_url = os.environ.get("BASE_URL", "http://localhost:8000") api_key = os.environ.get("API_KEY", "") client = OpenAI( base_url=f"{base_url}/v1", api_key=api_key ) print(f" Manual configuration with model: {alias}") return client, alias def initialize_client() -> Tuple[OpenAI, str, str]: """Initialize the appropriate OpenAI client.""" # Check for Azure OpenAI configuration azure_endpoint = os.environ.get("AZURE_OPENAI_ENDPOINT") azure_api_key = os.environ.get("AZURE_OPENAI_API_KEY") if azure_endpoint and azure_api_key: try: client, model = create_azure_client() return client, model, "azure" except Exception as e: print(f"❌ Azure OpenAI initialization failed: {e}") print(" Falling back to Foundry Local...") # Use Foundry Local client, model = create_foundry_client() return client, model, "foundry"

第二部分:流式响应与实时交互

实现流式聊天补全

流式响应通过实时显示生成的内容提供更好的用户体验:

# streaming_chat.py - Following Sample 02 patterns def streaming_chat_completion(client: OpenAI, model: str, prompt: str, max_tokens: int = 300): """Demonstrate streaming responses for better UX.""" try: print(" Assistant (streaming):") # Create streaming completion stream = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], max_tokens=max_tokens, stream=True ) full_response = "" for chunk in stream: if chunk.choices[0].delta.content is not None: content = chunk.choices[0].delta.content print(content, end="", flush=True) full_response += content print("\n") # New line after streaming return full_response except Exception as e: error_msg = f"Error: {e}" print(error_msg) return error_msg # Usage example client, model, provider = initialize_client() prompt = "Explain the key benefits of using Microsoft Foundry Local for AI development." response = streaming_chat_completion(client, model, prompt)

多轮对话管理

# conversation_manager.py class ConversationManager: """Manages multi-turn conversations with context preservation.""" def __init__(self, client: OpenAI, model: str, system_prompt: str = None): self.client = client self.model = model self.messages = [] if system_prompt: self.messages.append({"role": "system", "content": system_prompt}) def send_message(self, user_message: str, max_tokens: int = 200, stream: bool = False): """Send a message and get response while maintaining context.""" # Add user message to conversation self.messages.append({"role": "user", "content": user_message}) try: if stream: return self._stream_response(max_tokens) else: return self._regular_response(max_tokens) except Exception as e: return f"Error: {e}" def _regular_response(self, max_tokens: int): """Get regular (non-streaming) response.""" response = self.client.chat.completions.create( model=self.model, messages=self.messages, max_tokens=max_tokens ) assistant_message = response.choices[0].message.content self.messages.append({"role": "assistant", "content": assistant_message}) return assistant_message def _stream_response(self, max_tokens: int): """Get streaming response.""" stream = self.client.chat.completions.create( model=self.model, messages=self.messages, max_tokens=max_tokens, stream=True ) full_response = "" for chunk in stream: if chunk.choices[0].delta.content: content = chunk.choices[0].delta.content print(content, end="", flush=True) full_response += content print() # New line self.messages.append({"role": "assistant", "content": full_response}) return full_response def get_conversation_length(self) -> int: """Get the number of messages in the conversation.""" return len(self.messages) def clear_conversation(self, keep_system: bool = True): """Clear conversation history.""" if keep_system and self.messages and self.messages[0]["role"] == "system": self.messages = [self.messages[0]] else: self.messages = [] # Example usage client, model, provider = initialize_client() system_prompt = "You are a helpful AI assistant specialized in explaining AI and machine learning concepts." conversation = ConversationManager(client, model, system_prompt) # Multi-turn conversation conversation_turns = [ "What is the difference between AI inference on-device vs in the cloud?", "Which approach is better for privacy?", "What about performance and latency considerations?" ] for i, turn in enumerate(conversation_turns, 1): print(f"\nTurn {i}: {turn}") response = conversation.send_message(turn, stream=True)

第三部分:性能基准测试与分析

响应时间测量

测量并比较不同配置下的性能:

# performance_benchmark.py - Sample 02 patterns import time from typing import Dict, List from openai import OpenAI def benchmark_response_time(client: OpenAI, model: str, prompt: str, iterations: int = 3) -> Dict: """Benchmark response time for a given prompt.""" times = [] responses = [] for i in range(iterations): start_time = time.time() try: response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], max_tokens=50 # Keep responses short for timing ) end_time = time.time() response_time = end_time - start_time times.append(response_time) responses.append(response.choices[0].message.content) except Exception as e: print(f"Error in iteration {i+1}: {e}") if times: return { "average_time": sum(times) / len(times), "min_time": min(times), "max_time": max(times), "all_times": times, "sample_response": responses[0] if responses else None, "success_rate": len(times) / iterations * 100 } return {"error": "No successful responses"} def compare_providers(foundry_client: OpenAI, foundry_model: str, azure_client: OpenAI, azure_model: str, test_prompts: List[str]): """Compare performance between Foundry Local and Azure OpenAI.""" results = { "foundry_local": [], "azure_openai": [] } for prompt in test_prompts: print(f"\nTesting prompt: '{prompt}'") # Test Foundry Local foundry_result = benchmark_response_time(foundry_client, foundry_model, prompt) results["foundry_local"].append({ "prompt": prompt, "benchmark": foundry_result }) # Test Azure OpenAI azure_result = benchmark_response_time(azure_client, azure_model, prompt) results["azure_openai"].append({ "prompt": prompt, "benchmark": azure_result }) # Compare results if "error" not in foundry_result and "error" not in azure_result: foundry_time = foundry_result["average_time"] azure_time = azure_result["average_time"] print(f" Foundry Local: {foundry_time:.2f}s") print(f" Azure OpenAI: {azure_time:.2f}s") print(f" Winner: {'Foundry Local' if foundry_time < azure_time else 'Azure OpenAI'}") return results # Example usage benchmark_prompts = [ "What is AI?", "Explain machine learning in simple terms.", "List 3 benefits of edge computing." ] # Initialize clients foundry_client, foundry_model, _ = initialize_client() # azure_client, azure_model = create_azure_client() # Uncomment if Azure is configured for prompt in benchmark_prompts: print(f"\n Benchmarking: '{prompt}'") result = benchmark_response_time(foundry_client, foundry_model, prompt) if "error" not in result: print(f" ⏰ Average time: {result['average_time']:.2f}s") print(f" ⚡ Fastest: {result['min_time']:.2f}s") print(f" Slowest: {result['max_time']:.2f}s") print(f" ✅ Success rate: {result['success_rate']:.1f}%")

温度和参数测试

# parameter_testing.py def test_temperature_effects(client: OpenAI, model: str, prompt: str): """Test how different temperature values affect responses.""" temperatures = [0.1, 0.5, 0.9] print(f"Testing prompt: '{prompt}'") print("=" * 60) for temp in temperatures: print(f"\n️ Temperature: {temp}") print("-" * 30) try: response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], max_tokens=100, temperature=temp ) print(f"Response: {response.choices[0].message.content[:150]}...") except Exception as e: print(f"Error with temperature {temp}: {e}") # Test creative vs analytical prompts creative_prompt = "Write a creative short story about AI." analytical_prompt = "Explain the technical differences between GPT and BERT models." test_temperature_effects(foundry_client, foundry_model, creative_prompt) test_temperature_effects(foundry_client, foundry_model, analytical_prompt)

第四部分:服务健康监控与诊断

全面的健康检查系统

# health_monitoring.py - Sample 02 patterns def comprehensive_health_check(client: OpenAI, model: str, provider: str) -> Dict: """Perform comprehensive health check of the AI service.""" print(" Comprehensive Health Check") print("=" * 50) health_results = { "provider": provider, "model": model, "timestamp": time.time(), "tests": {} } # Test 1: Model listing try: models_response = client.models.list() available_models = [m.id for m in models_response.data] health_results["tests"]["model_listing"] = { "status": "success", "available_models": available_models, "current_model_available": model in available_models } print(f"✅ Model listing: SUCCESS ({len(available_models)} models)") except Exception as e: health_results["tests"]["model_listing"] = { "status": "failed", "error": str(e) } print(f"❌ Model listing: FAILED - {e}") # Test 2: Basic completion try: start_time = time.time() test_response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": "Say 'Health check successful'"}], max_tokens=10 ) response_time = time.time() - start_time health_results["tests"]["basic_completion"] = { "status": "success", "response_time": response_time, "response": test_response.choices[0].message.content } print(f"✅ Basic completion: SUCCESS ({response_time:.2f}s)") except Exception as e: health_results["tests"]["basic_completion"] = { "status": "failed", "error": str(e) } print(f"❌ Basic completion: FAILED - {e}") # Test 3: Streaming try: start_time = time.time() stream = client.chat.completions.create( model=model, messages=[{"role": "user", "content": "Count to 3"}], max_tokens=20, stream=True ) stream_content = "" chunk_count = 0 for chunk in stream: if chunk.choices[0].delta.content: stream_content += chunk.choices[0].delta.content chunk_count += 1 streaming_time = time.time() - start_time health_results["tests"]["streaming"] = { "status": "success", "response_time": streaming_time, "chunks_received": chunk_count, "content": stream_content.strip() } print(f"✅ Streaming: SUCCESS ({streaming_time:.2f}s, {chunk_count} chunks)") except Exception as e: health_results["tests"]["streaming"] = { "status": "failed", "error": str(e) } print(f"❌ Streaming: FAILED - {e}") # Overall health score successful_tests = sum(1 for test in health_results["tests"].values() if test["status"] == "success") total_tests = len(health_results["tests"]) health_score = (successful_tests / total_tests) * 100 health_results["overall_health"] = { "score": health_score, "successful_tests": successful_tests, "total_tests": total_tests, "status": "healthy" if health_score >= 70 else "degraded" if health_score >= 30 else "unhealthy" } print(f"\n Overall Health: {health_score:.1f}% ({health_results['overall_health']['status'].upper()})") return health_results # Usage example client, model, provider = initialize_client() health_status = comprehensive_health_check(client, model, provider)

环境配置验证器

# config_validator.py def validate_environment_configuration() -> Dict: """Validate environment configuration for both providers.""" validation_results = { "foundry_local": {}, "azure_openai": {}, "recommendations": [] } # Check Foundry Local configuration foundry_sdk_available = FOUNDRY_SDK_AVAILABLE base_url = os.environ.get("BASE_URL", "http://localhost:8000") validation_results["foundry_local"] = { "sdk_available": foundry_sdk_available, "base_url": base_url, "model": os.environ.get("MODEL", "phi-4-mini"), "api_key": bool(os.environ.get("API_KEY")) } if not foundry_sdk_available: validation_results["recommendations"].append( "Install Foundry Local SDK: pip install foundry-local-sdk" ) # Check Azure OpenAI configuration azure_endpoint = os.environ.get("AZURE_OPENAI_ENDPOINT") azure_api_key = os.environ.get("AZURE_OPENAI_API_KEY") azure_api_version = os.environ.get("AZURE_OPENAI_API_VERSION") validation_results["azure_openai"] = { "endpoint_configured": bool(azure_endpoint), "api_key_configured": bool(azure_api_key), "api_version": azure_api_version or "2024-08-01-preview", "model": os.environ.get("MODEL", "your-deployment-name") } if azure_endpoint and not azure_api_key: validation_results["recommendations"].append( "Azure endpoint is set but API key is missing" ) # Overall assessment can_use_foundry = foundry_sdk_available or base_url can_use_azure = azure_endpoint and azure_api_key if not can_use_foundry and not can_use_azure: validation_results["recommendations"].append( "No valid configuration found. Set up either Foundry Local or Azure OpenAI." ) validation_results["summary"] = { "foundry_ready": can_use_foundry, "azure_ready": can_use_azure, "total_options": sum([can_use_foundry, can_use_azure]) } return validation_results # Display configuration status config_status = validate_environment_configuration() print("⚙️ Environment Configuration Status") print("=" * 40) print(f" Foundry Local Ready: {'✅' if config_status['summary']['foundry_ready'] else '❌'}") print(f" Azure OpenAI Ready: {'✅' if config_status['summary']['azure_ready'] else '❌'}") print(f" Available Options: {config_status['summary']['total_options']}") if config_status["recommendations"]: print("\n Recommendations:") for rec in config_status["recommendations"]: print(f" • {rec}")

第五部分:环境变量与配置管理

环境变量参考

完整的配置参考,用于设置两个提供商:

# config_reference.py - Sample 02 patterns import os from typing import Dict, Optional class ConfigurationManager: """Manages environment configuration for multi-provider setup.""" @staticmethod def get_foundry_config() -> Dict[str, Optional[str]]: """Get Foundry Local configuration from environment.""" return { "MODEL": os.environ.get("MODEL", "phi-4-mini"), "BASE_URL": os.environ.get("BASE_URL", "http://localhost:8000"), "API_KEY": os.environ.get("API_KEY", ""), } @staticmethod def get_azure_config() -> Dict[str, Optional[str]]: """Get Azure OpenAI configuration from environment.""" return { "AZURE_OPENAI_ENDPOINT": os.environ.get("AZURE_OPENAI_ENDPOINT"), "AZURE_OPENAI_API_KEY": os.environ.get("AZURE_OPENAI_API_KEY"), "AZURE_OPENAI_API_VERSION": os.environ.get("AZURE_OPENAI_API_VERSION", "2024-08-01-preview"), "MODEL": os.environ.get("MODEL", "your-deployment-name"), } @staticmethod def display_current_config(): """Display current configuration status.""" print("⚙️ Current Configuration") print("=" * 40) foundry_config = ConfigurationManager.get_foundry_config() azure_config = ConfigurationManager.get_azure_config() print(" Foundry Local:") for key, value in foundry_config.items(): display_value = value if value else "(not set)" if key == "API_KEY" and value: display_value = "***" + value[-4:] if len(value) > 4 else "***" print(f" {key}: {display_value}") print("\n Azure OpenAI:") for key, value in azure_config.items(): display_value = value if value else "(not set)" if "KEY" in key and value: display_value = "***" + value[-4:] if len(value) > 4 else "***" print(f" {key}: {display_value}") # Determine active provider azure_ready = azure_config["AZURE_OPENAI_ENDPOINT"] and azure_config["AZURE_OPENAI_API_KEY"] foundry_ready = True # Foundry can always fallback to defaults print(f"\n Provider Status:") print(f" Azure OpenAI: {'✅ Ready' if azure_ready else '❌ Not configured'}") print(f" Foundry Local: {'✅ Ready' if foundry_ready else '❌ Not available'}") print(f" Active: {'Azure OpenAI' if azure_ready else 'Foundry Local'}") # Display current configuration config_manager = ConfigurationManager() config_manager.display_current_config()

配置示例

Windows命令提示符设置:

REM Foundry Local configuration set MODEL=phi-4-mini set BASE_URL=http://localhost:8000 set API_KEY= REM Azure OpenAI configuration (alternative) set AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com set AZURE_OPENAI_API_KEY=your-api-key-here set AZURE_OPENAI_API_VERSION=2024-08-01-preview set MODEL=your-deployment-name REM Run the sample python samples\02\sdk_quickstart.py

PowerShell设置:

# Foundry Local configuration $env:MODEL = "phi-4-mini" $env:BASE_URL = "http://localhost:8000" $env:API_KEY = "" # Azure OpenAI configuration (alternative) $env:AZURE_OPENAI_ENDPOINT = "https://your-resource.openai.azure.com" $env:AZURE_OPENAI_API_KEY = "your-api-key-here" $env:AZURE_OPENAI_API_VERSION = "2024-08-01-preview" $env:MODEL = "your-deployment-name" # Run the sample python samples/02/sdk_quickstart.py

第六部分:实践练习

练习1:多提供商SDK集成

构建一个能够无缝切换提供商的完整应用程序:

# exercise_1_multi_provider.py from openai import OpenAI from typing import Tuple, Dict, Any import time class MultiProviderSDKDemo: """Demonstrates seamless switching between Foundry Local and Azure OpenAI.""" def __init__(self): self.clients = {} self.models = {} self.setup_clients() def setup_clients(self): """Initialize all available clients.""" # Try to initialize Foundry Local try: foundry_client, foundry_model, _ = initialize_client() self.clients["foundry"] = foundry_client self.models["foundry"] = foundry_model print("✅ Foundry Local client ready") except Exception as e: print(f"❌ Foundry Local setup failed: {e}") # Try to initialize Azure OpenAI try: if os.environ.get("AZURE_OPENAI_ENDPOINT") and os.environ.get("AZURE_OPENAI_API_KEY"): azure_client, azure_model = create_azure_client() self.clients["azure"] = azure_client self.models["azure"] = azure_model print("✅ Azure OpenAI client ready") except Exception as e: print(f"❌ Azure OpenAI setup failed: {e}") def compare_providers(self, prompt: str, max_tokens: int = 100) -> Dict[str, Any]: """Compare responses from all available providers.""" results = {} for provider_name, client in self.clients.items(): model = self.models[provider_name] print(f"\nTesting {provider_name} ({model})...") start_time = time.time() try: response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], max_tokens=max_tokens ) response_time = time.time() - start_time results[provider_name] = { "model": model, "response": response.choices[0].message.content, "response_time": response_time, "status": "success" } print(f" ✅ Success ({response_time:.2f}s)") except Exception as e: results[provider_name] = { "model": model, "error": str(e), "status": "failed" } print(f" ❌ Failed: {e}") return results def streaming_comparison(self, prompt: str, max_tokens: int = 150): """Compare streaming responses from providers.""" for provider_name, client in self.clients.items(): model = self.models[provider_name] print(f"\n Streaming from {provider_name} ({model}):") print("-" * 50) try: stream = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], max_tokens=max_tokens, stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) print("\n") except Exception as e: print(f"Streaming failed: {e}") # Run Exercise 1 exercise_1 = MultiProviderSDKDemo() test_prompt = "Explain the benefits of running AI models locally versus in the cloud." print(f"️ Exercise 1: Multi-Provider Comparison") print(f"Prompt: {test_prompt}") print("=" * 60) comparison_results = exercise_1.compare_providers(test_prompt) exercise_1.streaming_comparison(test_prompt)

练习2:高级对话管理

# exercise_2_conversation.py class AdvancedConversationManager: """Advanced conversation management with multiple features.""" def __init__(self, client: OpenAI, model: str): self.client = client self.model = model self.conversations = {} # Multiple conversation sessions def create_conversation(self, session_id: str, system_prompt: str = None) -> str: """Create a new conversation session.""" self.conversations[session_id] = { "messages": [], "created_at": time.time(), "message_count": 0 } if system_prompt: self.conversations[session_id]["messages"].append({ "role": "system", "content": system_prompt }) return f"Conversation {session_id} created" def send_message(self, session_id: str, message: str, temperature: float = 0.7, max_tokens: int = 200) -> Dict[str, Any]: """Send message in a specific conversation session.""" if session_id not in self.conversations: return {"error": f"Conversation {session_id} not found"} conversation = self.conversations[session_id] conversation["messages"].append({"role": "user", "content": message}) try: response = self.client.chat.completions.create( model=self.model, messages=conversation["messages"], temperature=temperature, max_tokens=max_tokens ) assistant_message = response.choices[0].message.content conversation["messages"].append({ "role": "assistant", "content": assistant_message }) conversation["message_count"] += 2 # User + assistant return { "session_id": session_id, "response": assistant_message, "message_count": conversation["message_count"], "status": "success" } except Exception as e: return {"error": str(e), "session_id": session_id} def get_conversation_summary(self, session_id: str) -> Dict[str, Any]: """Get summary of conversation session.""" if session_id not in self.conversations: return {"error": f"Conversation {session_id} not found"} conversation = self.conversations[session_id] return { "session_id": session_id, "message_count": conversation["message_count"], "created_at": conversation["created_at"], "duration": time.time() - conversation["created_at"], "has_system_prompt": len(conversation["messages"]) > 0 and conversation["messages"][0]["role"] == "system" } def export_conversation(self, session_id: str) -> str: """Export conversation as formatted text.""" if session_id not in self.conversations: return f"Conversation {session_id} not found" conversation = self.conversations[session_id] export_text = f"Conversation Export: {session_id}\n" export_text += "=" * 50 + "\n\n" for msg in conversation["messages"]: role = msg["role"].title() content = msg["content"] export_text += f"{role}: {content}\n\n" return export_text # Run Exercise 2 client, model, provider = initialize_client() conv_manager = AdvancedConversationManager(client, model) # Create multiple conversation sessions print(" Exercise 2: Advanced Conversation Management") print("=" * 60) # Technical discussion conv_manager.create_conversation("tech_discussion", "You are a technical expert explaining AI concepts clearly.") # Creative session conv_manager.create_conversation("creative_session", "You are a creative writing assistant helping with storytelling.") # Test conversations tech_questions = [ "What is the difference between inference and training?", "How does quantization improve model performance?" ] creative_prompts = [ "Start a story about an AI that lives on an edge device.", "Continue the story with a plot twist." ] # Technical conversation print("\n Technical Discussion:") for question in tech_questions: result = conv_manager.send_message("tech_discussion", question) print(f"Q: {question}") print(f"A: {result['response'][:100]}...\n") # Creative conversation print(" Creative Session:") for prompt in creative_prompts: result = conv_manager.send_message("creative_session", prompt, temperature=0.9) print(f"Prompt: {prompt}") print(f"Response: {result['response'][:100]}...\n") # Show conversation summaries print(" Conversation Summaries:") for session_id in conv_manager.conversations.keys(): summary = conv_manager.get_conversation_summary(session_id) print(f" {session_id}: {summary['message_count']} messages, {summary['duration']:.1f}s")

练习3:生产级健康监控

# exercise_3_monitoring.py class ProductionHealthMonitor: """Production-ready health monitoring for AI services.""" def __init__(self): self.health_history = [] self.alert_thresholds = { "response_time": 5.0, "error_rate": 10.0, "availability": 95.0 } def run_comprehensive_check(self, client: OpenAI, model: str, provider: str) -> Dict[str, Any]: """Run comprehensive health check with detailed reporting.""" check_results = { "timestamp": time.time(), "provider": provider, "model": model, "tests": {}, "overall_health": "unknown" } # Test 1: Basic connectivity connectivity_result = self._test_connectivity(client) check_results["tests"]["connectivity"] = connectivity_result # Test 2: Model availability model_result = self._test_model_availability(client, model) check_results["tests"]["model_availability"] = model_result # Test 3: Response time benchmark performance_result = self._test_performance(client, model) check_results["tests"]["performance"] = performance_result # Test 4: Stress test stress_result = self._test_stress(client, model) check_results["tests"]["stress_test"] = stress_result # Calculate overall health check_results["overall_health"] = self._calculate_health_score(check_results["tests"]) # Store for trending self.health_history.append(check_results) return check_results def _test_connectivity(self, client: OpenAI) -> Dict[str, Any]: """Test basic service connectivity.""" try: start_time = time.time() models = client.models.list() response_time = time.time() - start_time return { "status": "success", "response_time": response_time, "models_count": len(models.data) } except Exception as e: return {"status": "failed", "error": str(e)} def _test_model_availability(self, client: OpenAI, model: str) -> Dict[str, Any]: """Test specific model availability.""" try: response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": "Health check"}], max_tokens=5 ) return { "status": "success", "model": model, "response_received": bool(response.choices[0].message.content) } except Exception as e: return {"status": "failed", "error": str(e)} def _test_performance(self, client: OpenAI, model: str) -> Dict[str, Any]: """Test response time performance.""" response_times = [] for i in range(3): try: start_time = time.time() client.chat.completions.create( model=model, messages=[{"role": "user", "content": f"Test {i+1}"}], max_tokens=10 ) response_time = time.time() - start_time response_times.append(response_time) except Exception: pass if response_times: avg_time = sum(response_times) / len(response_times) return { "status": "success", "average_response_time": avg_time, "min_time": min(response_times), "max_time": max(response_times), "within_threshold": avg_time < self.alert_thresholds["response_time"] } else: return {"status": "failed", "error": "No successful responses"} def _test_stress(self, client: OpenAI, model: str) -> Dict[str, Any]: """Test service under concurrent requests.""" import concurrent.futures def single_request(): try: client.chat.completions.create( model=model, messages=[{"role": "user", "content": "Stress test"}], max_tokens=5 ) return True except Exception: return False # Run 5 concurrent requests with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: futures = [executor.submit(single_request) for _ in range(5)] results = [future.result() for future in concurrent.futures.as_completed(futures)] success_rate = (sum(results) / len(results)) * 100 return { "status": "success" if success_rate > 80 else "degraded", "concurrent_requests": len(results), "success_rate": success_rate, "within_threshold": success_rate >= self.alert_thresholds["availability"] } def _calculate_health_score(self, tests: Dict[str, Any]) -> str: """Calculate overall health score.""" successful_tests = sum(1 for test in tests.values() if test["status"] == "success") total_tests = len(tests) health_percentage = (successful_tests / total_tests) * 100 if health_percentage >= 90: return "healthy" elif health_percentage >= 70: return "degraded" else: return "unhealthy" def generate_health_report(self) -> str: """Generate formatted health report.""" if not self.health_history: return "No health data available" latest = self.health_history[-1] report = f"Health Report - {time.ctime(latest['timestamp'])}\n" report += "=" * 60 + "\n" report += f"Provider: {latest['provider']}\n" report += f"Model: {latest['model']}\n" report += f"Overall Health: {latest['overall_health'].upper()}\n\n" for test_name, test_result in latest["tests"].items(): status_icon = "✅" if test_result["status"] == "success" else "❌" report += f"{status_icon} {test_name.replace('_', ' ').title()}: {test_result['status']}\n" return report # Run Exercise 3 client, model, provider = initialize_client() health_monitor = ProductionHealthMonitor() print(" Exercise 3: Production Health Monitoring") print("=" * 60) health_results = health_monitor.run_comprehensive_check(client, model, provider) print(health_monitor.generate_health_report())

第七部分:总结与下一步

本节成果

在本节课程中,您已经掌握了:

  • OpenAI SDK集成:Foundry Local和Azure OpenAI的高级集成模式
  • 流式响应:实时聊天补全以提升用户体验
  • 多提供商支持:本地和云端AI服务的无缝切换
  • 对话管理:支持上下文的多轮对话
  • 性能监控:生产部署的基准测试和健康检查
  • 生产模式:完善的错误处理和配置管理

关键架构模式

客户端工厂模式:

Environment Detection → Provider Selection → Client Creation → Model Configuration ↓ ↓ ↓ ↓ Azure/Local Azure OpenAI/ OpenAI Client Model Selection Credentials Foundry Local Initialization and Validation

流式响应流程:

User Input → Chat Completion → Stream Processing → Real-time Display ↓ ↓ ↓ ↓ Prompt Stream=True Token Chunks Progressive UI

最佳实践总结

  1. ** 始终实现回退机制**:Azure → Foundry Local → 错误处理
  2. ** 对于长响应使用流式模式**:提升用户感知性能
  3. ️ 实现全面的错误处理:提供用户友好的错误信息
  4. ** 监控性能**:跟踪响应时间和成功率
  5. ⚙️ 基于环境的配置:轻松切换开发/测试/生产环境
  6. ** 安全的凭证管理**:避免硬编码API密钥

提供商选择指南

场景 推荐提供商 原因
开发 Foundry Local 快速迭代,无API成本
隐私敏感 Foundry Local 数据不离开设备
高容量生产 Azure OpenAI 更好的扩展性,企业级SLA
最新模型 Azure OpenAI 获取最新模型版本
离线需求 Foundry Local 无需互联网依赖
成本敏感 Foundry Local 无按令牌收费

环境变量快速参考

REM Foundry Local (default) set MODEL=phi-4-mini set BASE_URL=http://localhost:8000 set API_KEY= REM Azure OpenAI (cloud) set AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com set AZURE_OPENAI_API_KEY=your-api-key set AZURE_OPENAI_API_VERSION=2024-08-01-preview set MODEL=your-deployment-name

为第三节课程做准备:开源模型

  1. 探索模型目录:查看Foundry Local中的可用模型
  2. 了解模型格式:学习ONNX、量化和优化技术
  3. 考虑定制模型:思考领域特定的模型需求

故障排除快速指南

常见问题:

REM Issue: Could not use Foundry SDK pip install foundry-local-sdk REM Issue: Connection refused foundry service status foundry model run phi-4-mini REM Issue: Azure authentication failed echo %AZURE_OPENAI_ENDPOINT% echo %AZURE_OPENAI_API_KEY% REM Issue: Model not found foundry model list curl http://localhost:8000/v1/models

参考资料

现在,您已经具备构建复杂AI应用程序的能力,这些应用程序能够无缝集成本地和云端AI功能,为每个具体用例选择合适的提供商,同时保持一致的开发模式。


发布者: 作者: 转发
评论区 (0)
U