使用AZD部署生产级AI工作负载的最佳实践


文档摘要

使用 AZD 部署生产级 AI 工作负载的最佳实践 章节导航: 课程主页: AZD 初学者指南 当前章节: 第8章 - 生产与企业模式 ⬅️ 上一章节: 第7章:故障排查 ⬅️ 相关内容: AI 工作坊实验 课程完成: AZD 初学者指南 概述 本指南提供了使用 Azure Developer CLI (AZD) 部署生产级 AI 工作负载的全面最佳实践。这些实践基于 Microsoft Foundry Discord 社区的反馈和真实客户部署经验,旨在解决生产 AI 系统中最常见的挑战。

使用 AZD 部署生产级 AI 工作负载的最佳实践

章节导航:

概述

本指南提供了使用 Azure Developer CLI (AZD) 部署生产级 AI 工作负载的全面最佳实践。这些实践基于 Microsoft Foundry Discord 社区的反馈和真实客户部署经验,旨在解决生产 AI 系统中最常见的挑战。

主要挑战

根据社区投票结果,以下是开发者面临的主要挑战:

  • 45% 在多服务 AI 部署中遇到困难
  • 38% 在凭据和密钥管理方面存在问题
  • 35% 认为生产就绪和扩展具有挑战性
  • 32% 需要更好的成本优化策略
  • 29% 需要改进监控和故障排查

生产级 AI 的架构模式

模式 1:微服务 AI 架构

适用场景: 具有多种功能的复杂 AI 应用

┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Web Frontend │────│ API Gateway │────│ Load Balancer │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ ┌───────────────┼───────────────┐ │ │ │ ┌───────▼──────┐ ┌──────▼──────┐ ┌─────▼──────┐ │ Chat Service │ │Image Service│ │Text Service│ └──────────────┘ └─────────────┘ └────────────┘ │ │ │ ┌───────▼──────┐ ┌──────▼──────┐ ┌─────▼──────┐ │Azure OpenAI │ │Computer │ │Document │ │ │ │Vision │ │Intelligence│ └──────────────┘ └─────────────┘ └────────────┘

AZD 实现:

# azure.yaml name: enterprise-ai-platform services: web: project: ./web host: staticwebapp api-gateway: project: ./api-gateway host: containerapp chat-service: project: ./services/chat host: containerapp vision-service: project: ./services/vision host: containerapp text-service: project: ./services/text host: containerapp

模式 2:事件驱动的 AI 处理

适用场景: 批处理、文档分析、异步工作流

// Event Hub for AI processing pipeline resource eventHub 'Microsoft.EventHub/namespaces@2023-01-01-preview' = { name: eventHubNamespaceName location: location sku: { name: 'Standard' tier: 'Standard' capacity: 1 } } // Service Bus for reliable message processing resource serviceBus 'Microsoft.ServiceBus/namespaces@2022-10-01-preview' = { name: serviceBusNamespaceName location: location sku: { name: 'Premium' tier: 'Premium' capacity: 1 } } // Function App for processing resource functionApp 'Microsoft.Web/sites@2023-01-01' = { name: functionAppName location: location kind: 'functionapp,linux' properties: { siteConfig: { appSettings: [ { name: 'FUNCTIONS_EXTENSION_VERSION' value: '~4' } { name: 'AZURE_OPENAI_ENDPOINT' value: '@Microsoft.KeyVault(VaultName=${keyVault.name};SecretName=openai-endpoint)' } ] } } }

安全性最佳实践

1. 零信任安全模型

实施策略:

  • 无认证的服务间通信禁止
  • 所有 API 调用使用托管身份
  • 使用私有终端进行网络隔离
  • 最小权限访问控制
// Managed Identity for each service resource chatServiceIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = { name: 'chat-service-identity' location: location } // Role assignments with minimal permissions resource openAIUserRole 'Microsoft.Authorization/roleAssignments@2022-04-01' = { scope: openAIAccount name: guid(openAIAccount.id, chatServiceIdentity.id, openAIUserRoleDefinitionId) properties: { roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '5e0bd9bd-7b93-4f28-af87-19fc36ad61bd') principalId: chatServiceIdentity.properties.principalId principalType: 'ServicePrincipal' } }

2. 安全的密钥管理

Key Vault 集成模式:

// Key Vault with proper access policies resource keyVault 'Microsoft.KeyVault/vaults@2023-02-01' = { name: keyVaultName location: location properties: { tenantId: tenant().tenantId sku: { family: 'A' name: 'premium' // Use premium for production } enableRbacAuthorization: true // Use RBAC instead of access policies enablePurgeProtection: true // Prevent accidental deletion enableSoftDelete: true softDeleteRetentionInDays: 90 } } // Store all AI service credentials resource openAIKeySecret 'Microsoft.KeyVault/vaults/secrets@2023-02-01' = { parent: keyVault name: 'openai-api-key' properties: { value: openAIAccount.listKeys().key1 attributes: { enabled: true } } }

3. 网络安全

私有终端配置:

// Virtual Network for AI services resource virtualNetwork 'Microsoft.Network/virtualNetworks@2023-04-01' = { name: vnetName location: location properties: { addressSpace: { addressPrefixes: ['10.0.0.0/16'] } subnets: [ { name: 'ai-services-subnet' properties: { addressPrefix: '10.0.1.0/24' privateEndpointNetworkPolicies: 'Disabled' } } { name: 'app-services-subnet' properties: { addressPrefix: '10.0.2.0/24' delegations: [ { name: 'Microsoft.Web/serverFarms' properties: { serviceName: 'Microsoft.Web/serverFarms' } } ] } } ] } } // Private endpoints for all AI services resource openAIPrivateEndpoint 'Microsoft.Network/privateEndpoints@2023-04-01' = { name: '${openAIAccountName}-pe' location: location properties: { subnet: { id: virtualNetwork.properties.subnets[0].id } privateLinkServiceConnections: [ { name: 'openai-connection' properties: { privateLinkServiceId: openAIAccount.id groupIds: ['account'] } } ] } }

性能与扩展

1. 自动扩展策略

容器应用自动扩展:

resource containerApp 'Microsoft.App/containerApps@2023-05-01' = { name: containerAppName location: location properties: { configuration: { ingress: { external: true targetPort: 8000 transport: 'http' } } template: { scale: { minReplicas: 2 // Always have 2 instances minimum maxReplicas: 50 // Scale up to 50 for high load rules: [ { name: 'http-scaling' http: { metadata: { concurrentRequests: '20' // Scale when >20 concurrent requests } } } { name: 'cpu-scaling' custom: { type: 'cpu' metadata: { type: 'Utilization' value: '70' // Scale when CPU >70% } } } ] } } } }

2. 缓存策略

Redis 缓存 AI 响应:

// Redis Premium for production workloads resource redisCache 'Microsoft.Cache/redis@2023-04-01' = { name: redisCacheName location: location properties: { sku: { name: 'Premium' family: 'P' capacity: 1 } enableNonSslPort: false minimumTlsVersion: '1.2' redisConfiguration: { 'maxmemory-policy': 'allkeys-lru' } // Enable clustering for high availability redisVersion: '6.0' shardCount: 2 } } // Cache configuration in application var cacheConnectionString = '${redisCache.properties.hostName}:6380,password=${redisCache.listKeys().primaryKey},ssl=True,abortConnect=False'

3. 负载均衡与流量管理

带 WAF 的应用网关:

// Application Gateway with Web Application Firewall resource applicationGateway 'Microsoft.Network/applicationGateways@2023-04-01' = { name: appGatewayName location: location properties: { sku: { name: 'WAF_v2' tier: 'WAF_v2' capacity: 2 } webApplicationFirewallConfiguration: { enabled: true firewallMode: 'Prevention' ruleSetType: 'OWASP' ruleSetVersion: '3.2' } // Backend pools for AI services backendAddressPools: [ { name: 'ai-services-pool' properties: { backendAddresses: [ { fqdn: '${containerApp.properties.configuration.ingress.fqdn}' } ] } } ] } }

成本优化

1. 资源适配

基于环境的配置:

# Development environment azd env new development azd env set AZURE_OPENAI_SKU "S0" azd env set AZURE_OPENAI_CAPACITY 10 azd env set AZURE_SEARCH_SKU "basic" azd env set CONTAINER_CPU 0.5 azd env set CONTAINER_MEMORY 1.0 # Production environment azd env new production azd env set AZURE_OPENAI_SKU "S0" azd env set AZURE_OPENAI_CAPACITY 100 azd env set AZURE_SEARCH_SKU "standard" azd env set CONTAINER_CPU 2.0 azd env set CONTAINER_MEMORY 4.0

2. 成本监控与预算

// Cost management and budgets resource budget 'Microsoft.Consumption/budgets@2023-05-01' = { name: 'ai-workload-budget' properties: { timePeriod: { startDate: '2024-01-01' endDate: '2024-12-31' } timeGrain: 'Monthly' amount: 2000 // $2000 monthly budget category: 'Cost' notifications: { warning: { enabled: true operator: 'GreaterThan' threshold: 80 contactEmails: [ 'finance@company.com' 'engineering@company.com' ] contactRoles: [ 'Owner' 'Contributor' ] } critical: { enabled: true operator: 'GreaterThan' threshold: 95 contactEmails: [ 'cto@company.com' ] } } } }

3. Token 使用优化

OpenAI 成本管理:

// Application-level token optimization class TokenOptimizer { private readonly maxTokens = 4000; private readonly reserveTokens = 500; optimizePrompt(userInput: string, context: string): string { const availableTokens = this.maxTokens - this.reserveTokens; const estimatedTokens = this.estimateTokens(userInput + context); if (estimatedTokens > availableTokens) { // Truncate context, not user input context = this.truncateContext(context, availableTokens - this.estimateTokens(userInput)); } return `${context}\n\nUser: ${userInput}`; } private estimateTokens(text: string): number { // Rough estimation: 1 token ≈ 4 characters return Math.ceil(text.length / 4); } }

监控与可观测性

1. 全面的应用洞察

// Application Insights with advanced features resource applicationInsights 'Microsoft.Insights/components@2020-02-02' = { name: applicationInsightsName location: location kind: 'web' properties: { Application_Type: 'web' WorkspaceResourceId: logAnalyticsWorkspace.id SamplingPercentage: 100 // Full sampling for AI apps DisableIpMasking: false // Enable for security } } // Custom metrics for AI operations resource aiMetricAlerts 'Microsoft.Insights/metricAlerts@2018-03-01' = { name: 'ai-high-error-rate' location: 'global' properties: { description: 'Alert when AI service error rate is high' severity: 2 enabled: true scopes: [ applicationInsights.id ] evaluationFrequency: 'PT1M' windowSize: 'PT5M' criteria: { 'odata.type': 'Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria' allOf: [ { name: 'high-error-rate' metricName: 'requests/failed' operator: 'GreaterThan' threshold: 10 timeAggregation: 'Count' } ] } } }

2. AI 专属监控

AI 指标的自定义仪表盘:

// Dashboard configuration for AI workloads { "dashboard": { "name": "AI Application Monitoring", "tiles": [ { "name": "OpenAI Request Volume", "query": "requests | where name contains 'openai' | summarize count() by bin(timestamp, 5m)" }, { "name": "AI Response Latency", "query": "requests | where name contains 'openai' | summarize avg(duration) by bin(timestamp, 5m)" }, { "name": "Token Usage", "query": "customMetrics | where name == 'openai_tokens_used' | summarize sum(value) by bin(timestamp, 1h)" }, { "name": "Cost per Hour", "query": "customMetrics | where name == 'openai_cost' | summarize sum(value) by bin(timestamp, 1h)" } ] } }

3. 健康检查与正常运行时间监控

// Application Insights availability tests resource availabilityTest 'Microsoft.Insights/webtests@2022-06-15' = { name: 'ai-app-availability-test' location: location tags: { 'hidden-link:${applicationInsights.id}': 'Resource' } properties: { SyntheticMonitorId: 'ai-app-availability-test' Name: 'AI Application Availability Test' Description: 'Tests AI application endpoints' Enabled: true Frequency: 300 // 5 minutes Timeout: 120 // 2 minutes Kind: 'ping' Locations: [ { Id: 'us-east-2-azr' } { Id: 'us-west-2-azr' } ] Configuration: { WebTest: ''' <WebTest Name="AI Health Check" Id="8d2de8d2-a2b0-4c2e-9a0d-8f9c9a0b8c8d" Enabled="True" CssProjectStructure="" CssIteration="" Timeout="120" WorkItemIds="" xmlns="http://microsoft.com/schemas/VisualStudio/TeamTest/2010" Description="" CredentialUserName="" CredentialPassword="" PreAuthenticate="True" Proxy="default" StopOnError="False" RecordedResultFile="" ResultsLocale=""> <Items> <Request Method="GET" Guid="a5f10126-e4cd-570d-961c-cea43999a200" Version="1.1" Url="${webApp.properties.defaultHostName}/health" ThinkTime="0" Timeout="120" ParseDependentRequests="True" FollowRedirects="True" RecordResult="True" Cache="False" ResponseTimeGoal="0" Encoding="utf-8" ExpectedHttpStatusCode="200" ExpectedResponseUrl="" ReportingName="" IgnoreHttpStatusCode="False" /> </Items> </WebTest> ''' } } }

灾难恢复与高可用性

1. 多区域部署

# azure.yaml - Multi-region configuration name: ai-app-multiregion services: api-primary: project: ./api host: containerapp env: - AZURE_REGION=eastus api-secondary: project: ./api host: containerapp env: - AZURE_REGION=westus2
// Traffic Manager for global load balancing resource trafficManager 'Microsoft.Network/trafficManagerProfiles@2022-04-01' = { name: trafficManagerProfileName location: 'global' properties: { profileStatus: 'Enabled' trafficRoutingMethod: 'Priority' dnsConfig: { relativeName: trafficManagerProfileName ttl: 30 } monitorConfig: { protocol: 'HTTPS' port: 443 path: '/health' intervalInSeconds: 30 toleratedNumberOfFailures: 3 timeoutInSeconds: 10 } endpoints: [ { name: 'primary-endpoint' type: 'Microsoft.Network/trafficManagerProfiles/azureEndpoints' properties: { targetResourceId: primaryAppService.id endpointStatus: 'Enabled' priority: 1 } } { name: 'secondary-endpoint' type: 'Microsoft.Network/trafficManagerProfiles/azureEndpoints' properties: { targetResourceId: secondaryAppService.id endpointStatus: 'Enabled' priority: 2 } } ] } }

2. 数据备份与恢复

// Backup configuration for critical data resource backupVault 'Microsoft.DataProtection/backupVaults@2023-05-01' = { name: backupVaultName location: location identity: { type: 'SystemAssigned' } properties: { storageSettings: [ { datastoreType: 'VaultStore' type: 'LocallyRedundant' } ] } } // Backup policy for AI models and data resource backupPolicy 'Microsoft.DataProtection/backupVaults/backupPolicies@2023-05-01' = { parent: backupVault name: 'ai-data-backup-policy' properties: { policyRules: [ { backupParameters: { backupType: 'Full' objectType: 'AzureBackupParams' } trigger: { schedule: { repeatingTimeIntervals: [ 'R/2024-01-01T02:00:00+00:00/P1D' // Daily at 2 AM ] } objectType: 'ScheduleBasedTriggerContext' } dataStore: { datastoreType: 'VaultStore' objectType: 'DataStoreInfoBase' } name: 'BackupDaily' objectType: 'AzureBackupRule' } ] } }

DevOps 与 CI/CD 集成

1. GitHub Actions 工作流

# .github/workflows/deploy-ai-app.yml name: Deploy AI Application on: push: branches: [main] pull_request: branches: [main] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup Python uses: actions/setup-python@v4 with: python-version: '3.11' - name: Install dependencies run: | pip install -r requirements.txt pip install pytest - name: Run tests run: pytest tests/ - name: AI Safety Tests run: | python scripts/test_ai_safety.py python scripts/validate_prompts.py deploy-staging: needs: test if: github.event_name == 'pull_request' runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup AZD uses: Azure/setup-azd@v1.0.0 - name: Login to Azure uses: azure/login@v1 with: creds: ${{ secrets.AZURE_CREDENTIALS }} - name: Deploy to Staging run: | azd env select staging azd deploy deploy-production: needs: test if: github.ref == 'refs/heads/main' runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup AZD uses: Azure/setup-azd@v1.0.0 - name: Login to Azure uses: azure/login@v1 with: creds: ${{ secrets.AZURE_CREDENTIALS }} - name: Deploy to Production run: | azd env select production azd deploy - name: Run Production Health Checks run: | python scripts/health_check.py --env production

2. 基础设施验证

# scripts/validate_infrastructure.sh #!/bin/bash echo "Validating AI infrastructure deployment..." # Check if all required services are running services=("openai" "search" "storage" "keyvault") for service in "${services[@]}"; do echo "Checking $service..." if ! az resource list --resource-type "Microsoft.CognitiveServices/accounts" --query "[?contains(name, '$service')]" -o tsv; then echo "ERROR: $service not found" exit 1 fi done # Validate OpenAI model deployments echo "Validating OpenAI model deployments..." models=$(az cognitiveservices account deployment list --name $AZURE_OPENAI_NAME --resource-group $AZURE_RESOURCE_GROUP --query "[].name" -o tsv) if [[ ! $models == *"gpt-35-turbo"* ]]; then echo "ERROR: Required model gpt-35-turbo not deployed" exit 1 fi # Test AI service connectivity echo "Testing AI service connectivity..." python scripts/test_connectivity.py echo "Infrastructure validation completed successfully!"

生产就绪检查清单

安全性 ✅

  • 所有服务使用托管身份
  • 密钥存储在 Key Vault 中
  • 配置私有终端
  • 实施网络安全组
  • 最小权限的 RBAC
  • 公共终端启用 WAF

性能 ✅

  • 配置自动扩展
  • 实现缓存
  • 设置负载均衡
  • 静态内容使用 CDN
  • 数据库连接池化
  • Token 使用优化

监控 ✅

  • 配置应用洞察
  • 定义自定义指标
  • 设置警报规则
  • 创建仪表盘
  • 实施健康检查
  • 日志保留策略

可靠性 ✅

  • 多区域部署
  • 备份与恢复计划
  • 实施断路器
  • 配置重试策略
  • 优雅降级
  • 健康检查终端

成本管理 ✅

  • 配置预算警报
  • 资源适配
  • 应用开发/测试折扣
  • 购买预留实例
  • 成本监控仪表盘
  • 定期成本审查

合规性 ✅

  • 满足数据驻留要求
  • 启用审计日志
  • 应用合规策略
  • 实施安全基线
  • 定期安全评估
  • 事件响应计划

性能基准

典型生产指标

指标 目标 监控方式
响应时间 < 2 秒 应用洞察
可用性 99.9% 正常运行时间监控
错误率 < 0.1% 应用日志
Token 使用 < $500/月 成本管理
并发用户 1000+ 负载测试
恢复时间 < 1 小时 灾难恢复测试

负载测试

# Load testing script for AI applications python scripts/load_test.py \ --endpoint https://your-ai-app.azurewebsites.net \ --concurrent-users 100 \ --duration 300 \ --ramp-up 60

社区最佳实践

基于 Microsoft Foundry Discord 社区反馈:

社区的主要建议:

  1. 从小开始,逐步扩展: 从基础 SKU 开始,根据实际使用情况扩展
  2. 监控一切: 从第一天起设置全面的监控
  3. 自动化安全: 使用基础设施即代码确保安全一致性
  4. 彻底测试: 在管道中包含 AI 专属测试
  5. 规划成本: 早期监控 Token 使用并设置预算警报

常见的陷阱:

  • ❌ 在代码中硬编码 API 密钥
  • ❌ 未设置适当的监控
  • ❌ 忽视成本优化
  • ❌ 未测试故障场景
  • ❌ 部署时未进行健康检查

其他资源

章节导航:

请记住: 生产级 AI 工作负载需要仔细规划、监控和持续优化。从这些模式开始,并根据您的具体需求进行调整。

免责声明
本文档使用AI翻译服务Co-op Translator进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于重要信息,建议使用专业人工翻译。我们对因使用此翻译而产生的任何误解或误读不承担责任。


发布者: 作者: 转发
评论区 (0)
U