AZD部署调试指南


文档摘要

AZD 部署调试指南 章节导航: 课程主页: AZD 初学者指南 当前章节: 第七章 - 故障排查与调试 ⬅️ 上一节: 常见问题 ➡️ 下一节: AI 专属故障排查 下一章: 第八章:生产与企业模式 简介 本指南提供了高级调试策略、工具和技术,用于诊断和解决 Azure Developer CLI 部署中的复杂问题。学习系统化的故障排查方法、日志分析技术、性能分析以及高级诊断工具,以高效解决部署和运行时问题。

AZD 部署调试指南

章节导航:

简介

本指南提供了高级调试策略、工具和技术,用于诊断和解决 Azure Developer CLI 部署中的复杂问题。学习系统化的故障排查方法、日志分析技术、性能分析以及高级诊断工具,以高效解决部署和运行时问题。

学习目标

完成本指南后,您将能够:

  • 掌握 Azure Developer CLI 问题的系统化调试方法
  • 理解高级日志配置和日志分析技术
  • 实施性能分析和监控策略
  • 使用 Azure 诊断工具和服务解决复杂问题
  • 应用网络调试和安全故障排查技术
  • 配置全面的监控和警报以主动检测问题

学习成果

完成后,您将能够:

  • 应用 TRIAGE 方法系统化调试复杂的部署问题
  • 配置并分析全面的日志和跟踪信息
  • 高效使用 Azure Monitor、Application Insights 和诊断工具
  • 独立调试网络连接、身份验证和权限问题
  • 实施性能监控和优化策略
  • 创建自定义调试脚本和自动化工具以解决重复性问题

调试方法

TRIAGE 方法

  • Time: 问题何时开始?
  • Reproduce: 是否可以稳定复现?
  • Isolate: 哪个组件出现了故障?
  • Analyze: 日志中显示了什么?
  • Gather: 收集所有相关信息
  • Escalate: 何时需要寻求额外帮助

启用调试模式

环境变量

# Enable comprehensive debugging export AZD_DEBUG=true export AZD_LOG_LEVEL=debug export AZURE_CORE_DIAGNOSTICS_DEBUG=true # Azure CLI debugging export AZURE_CLI_DIAGNOSTICS=true # Disable telemetry for cleaner output export AZD_DISABLE_TELEMETRY=true

调试配置

# Set debug configuration globally azd config set debug.enabled true azd config set debug.logLevel debug azd config set debug.verboseOutput true # Enable trace logging azd config set trace.enabled true azd config set trace.outputPath ./debug-traces

日志分析技术

理解日志级别

TRACE - Most detailed, includes internal function calls DEBUG - Detailed diagnostic information INFO - General operational messages WARN - Warning conditions that should be noted ERROR - Error conditions that need attention FATAL - Critical errors that cause application termination

结构化日志分析

# Filter logs by level azd logs --level error --since 1h # Filter by service azd logs --service api --level debug # Export logs for analysis azd logs --output json > deployment-logs.json # Parse JSON logs with jq cat deployment-logs.json | jq '.[] | select(.level == "ERROR")'

日志关联

#!/bin/bash # correlate-logs.sh - Correlate logs across services TRACE_ID=$1 if [ -z "$TRACE_ID" ]; then echo "Usage: $0 <trace-id>" exit 1 fi echo "Correlating logs for trace ID: $TRACE_ID" # Search across all services for service in web api worker; do echo "=== $service logs ===" azd logs --service $service | grep "$TRACE_ID" done # Search Azure logs az monitor activity-log list --correlation-id "$TRACE_ID"

️ 高级调试工具

Azure Resource Graph 查询

# Query resources by tags az graph query -q "Resources | where tags['azd-env-name'] == 'production' | project name, type, location" # Find failed deployments az graph query -q "ResourceContainers | where type == 'microsoft.resources/resourcegroups' | extend deploymentStatus = properties.provisioningState | where deploymentStatus != 'Succeeded'" # Check resource health az graph query -q "HealthResources | where properties.targetResourceId contains 'myapp' | project properties.targetResourceId, properties.currentHealthStatus"

网络调试

# Test connectivity between services test_connectivity() { local source=$1 local dest=$2 local port=$3 echo "Testing connectivity from $source to $dest:$port" az network watcher test-connectivity \ --source-resource "$source" \ --dest-address "$dest" \ --dest-port "$port" \ --output table } # Usage test_connectivity "/subscriptions/.../myapp-web" "myapp-api.azurewebsites.net" 443

容器调试

# Debug container app issues debug_container() { local app_name=$1 local resource_group=$2 echo "=== Container App Status ===" az containerapp show --name "$app_name" --resource-group "$resource_group" \ --query "properties.{provisioningState:provisioningState,runningState:runningState}" echo "=== Container App Revisions ===" az containerapp revision list --name "$app_name" --resource-group "$resource_group" \ --query "[].{name:name,active:properties.active,createdTime:properties.createdTime}" echo "=== Container Logs ===" az containerapp logs show --name "$app_name" --resource-group "$resource_group" --follow }

数据库连接调试

# Debug database connectivity debug_database() { local db_server=$1 local db_name=$2 echo "=== Database Server Status ===" az postgres flexible-server show --name "$db_server" --resource-group "$resource_group" \ --query "{state:state,version:version,location:location}" echo "=== Firewall Rules ===" az postgres flexible-server firewall-rule list --name "$db_server" --resource-group "$resource_group" echo "=== Connection Test ===" timeout 10 bash -c "</dev/tcp/$db_server.postgres.database.azure.com/5432" && echo "Port 5432 is open" || echo "Port 5432 is closed" }

性能调试

应用性能监控

# Enable Application Insights debugging export APPLICATIONINSIGHTS_CONFIGURATION_CONTENT='{ "role": { "name": "myapp-debug" }, "sampling": { "percentage": 100 }, "instrumentation": { "logging": { "level": "DEBUG" } } }' # Custom performance monitoring monitor_performance() { local endpoint=$1 local duration=${2:-60} echo "Monitoring $endpoint for $duration seconds..." for i in $(seq 1 $duration); do response_time=$(curl -o /dev/null -s -w "%{time_total}" "$endpoint") status_code=$(curl -o /dev/null -s -w "%{http_code}" "$endpoint") echo "$(date '+%Y-%m-%d %H:%M:%S') - Status: $status_code, Response Time: ${response_time}s" sleep 1 done }

资源利用率分析

# Monitor resource usage monitor_resources() { local resource_group=$1 echo "=== CPU Usage ===" az monitor metrics list \ --resource-group "$resource_group" \ --resource-type "Microsoft.Web/sites" \ --metric "CpuPercentage" \ --interval PT1M \ --aggregation Average echo "=== Memory Usage ===" az monitor metrics list \ --resource-group "$resource_group" \ --resource-type "Microsoft.Web/sites" \ --metric "MemoryPercentage" \ --interval PT1M \ --aggregation Average }

测试与验证

集成测试调试

#!/bin/bash # debug-integration-tests.sh set -e echo "Running integration tests with debugging..." # Set debug environment export NODE_ENV=test export DEBUG=* export LOG_LEVEL=debug # Get service endpoints WEB_URL=$(azd show --output json | jq -r '.services.web.endpoint') API_URL=$(azd show --output json | jq -r '.services.api.endpoint') echo "Testing endpoints:" echo "Web: $WEB_URL" echo "API: $API_URL" # Test health endpoints test_health() { local service=$1 local url=$2 echo "Testing $service health..." response=$(curl -s -o /dev/null -w "%{http_code},%{time_total}" "$url/health") status_code=$(echo $response | cut -d',' -f1) response_time=$(echo $response | cut -d',' -f2) if [ "$status_code" = "200" ]; then echo "✅ $service is healthy (${response_time}s)" else echo "❌ $service health check failed ($status_code)" return 1 fi } # Run tests test_health "Web" "$WEB_URL" test_health "API" "$API_URL" # Run custom integration tests npm run test:integration

压力测试调试

# Simple load test to identify performance bottlenecks load_test() { local url=$1 local concurrent=${2:-10} local requests=${3:-100} echo "Load testing $url with $concurrent concurrent connections, $requests total requests" # Using Apache Bench (install: apt-get install apache2-utils) ab -n "$requests" -c "$concurrent" -v 2 "$url" > load-test-results.txt # Extract key metrics echo "=== Load Test Results ===" grep -E "(Time taken|Requests per second|Time per request)" load-test-results.txt # Check for failures grep -E "(Failed requests|Non-2xx responses)" load-test-results.txt }

基础设施调试

Bicep 模板调试

# Validate Bicep templates with detailed output validate_bicep() { local template_file=$1 echo "Validating Bicep template: $template_file" # Syntax validation az bicep build --file "$template_file" --stdout > /dev/null # Lint validation az bicep lint --file "$template_file" # What-if deployment az deployment group what-if \ --resource-group "myapp-dev-rg" \ --template-file "$template_file" \ --parameters @main.parameters.json } # Debug template deployment debug_deployment() { local deployment_name=$1 local resource_group=$2 echo "=== Deployment Status ===" az deployment group show \ --name "$deployment_name" \ --resource-group "$resource_group" \ --query "properties.{provisioningState:provisioningState,timestamp:timestamp}" echo "=== Deployment Operations ===" az deployment operation group list \ --name "$deployment_name" \ --resource-group "$resource_group" \ --query "[].{operationId:operationId,provisioningState:properties.provisioningState,resourceType:properties.targetResource.resourceType,error:properties.statusMessage.error}" }

资源状态分析

# Analyze resource states for inconsistencies analyze_resources() { local resource_group=$1 echo "=== Resource Analysis for $resource_group ===" # List all resources with their states az resource list --resource-group "$resource_group" \ --query "[].{name:name,type:type,provisioningState:properties.provisioningState,location:location}" \ --output table # Check for failed resources failed_resources=$(az resource list --resource-group "$resource_group" \ --query "[?properties.provisioningState != 'Succeeded'].{name:name,state:properties.provisioningState}" \ --output tsv) if [ -n "$failed_resources" ]; then echo "❌ Failed resources found:" echo "$failed_resources" else echo "✅ All resources provisioned successfully" fi }

安全调试

身份验证流程调试

# Debug Azure authentication debug_auth() { echo "=== Current Authentication Status ===" az account show --query "{user:user.name,tenant:tenantId,subscription:name}" echo "=== Token Information ===" token=$(az account get-access-token --query accessToken -o tsv) # Decode JWT token (requires jq and base64) echo "$token" | cut -d'.' -f2 | base64 -d | jq '.' echo "=== Role Assignments ===" user_id=$(az account show --query user.name -o tsv) az role assignment list --assignee "$user_id" --query "[].{role:roleDefinitionName,scope:scope}" } # Debug Key Vault access debug_keyvault() { local vault_name=$1 echo "=== Key Vault Access Policies ===" az keyvault show --name "$vault_name" --query "properties.accessPolicies[].{objectId:objectId,permissions:permissions}" echo "=== RBAC Assignments ===" vault_id=$(az keyvault show --name "$vault_name" --query id -o tsv) az role assignment list --scope "$vault_id" echo "=== Test Secret Access ===" az keyvault secret list --vault-name "$vault_name" --query "[].name" || echo "❌ Cannot access secrets" }

网络安全调试

# Debug network security groups debug_network_security() { local resource_group=$1 echo "=== Network Security Groups ===" az network nsg list --resource-group "$resource_group" --query "[].{name:name,location:location}" # Check security rules for nsg in $(az network nsg list --resource-group "$resource_group" --query "[].name" -o tsv); do echo "=== Rules for $nsg ===" az network nsg rule list --nsg-name "$nsg" --resource-group "$resource_group" \ --query "[].{name:name,priority:priority,direction:direction,access:access,protocol:protocol,sourcePortRange:sourcePortRange,destinationPortRange:destinationPortRange}" done }

应用程序专属调试

Node.js 应用调试

// debug-middleware.js - Express debugging middleware const debug = require('debug')('app:debug'); module.exports = (req, res, next) => { const start = Date.now(); // Log request details debug(`${req.method} ${req.url}`, { headers: req.headers, query: req.query, body: req.body, userAgent: req.get('User-Agent'), ip: req.ip }); // Override res.json to log responses const originalJson = res.json; res.json = function(data) { const duration = Date.now() - start; debug(`Response ${res.statusCode} in ${duration}ms`, data); return originalJson.call(this, data); }; next(); };

数据库查询调试

// database-debug.js - Database debugging utilities const { Pool } = require('pg'); const debug = require('debug')('app:db'); class DebuggingPool extends Pool { async query(text, params) { const start = Date.now(); debug('Executing query:', { text, params }); try { const result = await super.query(text, params); const duration = Date.now() - start; debug(`Query completed in ${duration}ms`, { rowCount: result.rowCount, command: result.command }); return result; } catch (error) { const duration = Date.now() - start; debug(`Query failed after ${duration}ms:`, error.message); throw error; } } } module.exports = DebuggingPool;

紧急调试程序

生产问题响应

#!/bin/bash # emergency-debug.sh - Emergency production debugging set -e RESOURCE_GROUP=$1 ENVIRONMENT=$2 if [ -z "$RESOURCE_GROUP" ] || [ -z "$ENVIRONMENT" ]; then echo "Usage: $0 <resource-group> <environment>" exit 1 fi echo " EMERGENCY DEBUGGING STARTED: $(date)" echo "Resource Group: $RESOURCE_GROUP" echo "Environment: $ENVIRONMENT" # Switch to correct environment azd env select "$ENVIRONMENT" # Collect critical information echo "=== 1. System Status ===" azd show --output json > emergency-status.json cat emergency-status.json | jq '.services[].endpoint' echo "=== 2. Application Health ===" for endpoint in $(cat emergency-status.json | jq -r '.services[].endpoint'); do echo "Testing $endpoint/health" curl -f "$endpoint/health" || echo "❌ Health check failed for $endpoint" done echo "=== 3. Recent Errors ===" azd logs --level error --since 30m > emergency-errors.log echo "Error count: $(wc -l < emergency-errors.log)" echo "=== 4. Resource Status ===" az resource list --resource-group "$RESOURCE_GROUP" \ --query "[?properties.provisioningState != 'Succeeded']" > failed-resources.json if [ -s failed-resources.json ]; then echo "❌ Failed resources found!" cat failed-resources.json else echo "✅ All resources are healthy" fi echo "=== 5. Recent Deployments ===" az deployment group list --resource-group "$RESOURCE_GROUP" \ --query "[?properties.timestamp >= '$(date -d '1 hour ago' -Iseconds)']" \ > recent-deployments.json echo "Emergency debugging completed: $(date)" echo "Files generated:" echo " - emergency-status.json" echo " - emergency-errors.log" echo " - failed-resources.json" echo " - recent-deployments.json"

回滚程序

# Quick rollback script quick_rollback() { local environment=$1 local backup_timestamp=$2 echo " INITIATING ROLLBACK for $environment to $backup_timestamp" # Switch environment azd env select "$environment" # Rollback application azd deploy --rollback --timestamp "$backup_timestamp" # Verify rollback echo "Verifying rollback..." azd show # Test critical endpoints WEB_URL=$(azd show --output json | jq -r '.services.web.endpoint') curl -f "$WEB_URL/health" || echo "❌ Rollback verification failed" echo "✅ Rollback completed" }

调试仪表盘

自定义监控仪表盘

# Create Application Insights queries for debugging create_debug_queries() { local app_insights_name=$1 # Query for errors az monitor app-insights query \ --app "$app_insights_name" \ --analytics-query "exceptions | where timestamp > ago(1h) | summarize count() by problemId, outerMessage" # Query for performance issues az monitor app-insights query \ --app "$app_insights_name" \ --analytics-query "requests | where timestamp > ago(1h) and duration > 5000 | project timestamp, name, duration, resultCode" # Query for dependency failures az monitor app-insights query \ --app "$app_insights_name" \ --analytics-query "dependencies | where timestamp > ago(1h) and success == false | project timestamp, name, target, resultCode" }

日志聚合

# Aggregate logs from multiple sources aggregate_logs() { local output_file="aggregated-logs-$(date +%Y%m%d_%H%M%S).json" echo "Aggregating logs to $output_file" { echo '{"source": "azd", "logs": [' azd logs --output json --since 1h | sed '$ ! s/$/,/' echo ']}' echo ',{"source": "azure", "logs": [' az monitor activity-log list --start-time "$(date -d '1 hour ago' -Iseconds)" --output json | sed '$ ! s/$/,/' echo ']}' } > "$output_file" echo "Logs aggregated in $output_file" }

高级资源

自定义调试脚本

创建一个 scripts/debug/ 目录,包含以下内容:

  • health-check.sh - 全面的健康检查
  • performance-test.sh - 自动化性能测试
  • log-analyzer.py - 高级日志解析与分析
  • resource-validator.sh - 基础设施验证

监控集成

# azure.yaml - Add debugging hooks hooks: postdeploy: shell: sh run: | echo "Running post-deployment debugging..." ./scripts/debug/health-check.sh ./scripts/debug/performance-test.sh if [ "$?" -ne 0 ]; then echo "❌ Post-deployment checks failed" exit 1 fi

最佳实践

  1. 始终在非生产环境中启用调试日志
  2. 为问题创建可复现的测试用例
  3. 为团队记录调试流程
  4. 自动化健康检查和监控
  5. 随着应用更新保持调试工具的更新
  6. 在非紧急情况下练习调试流程

下一步

记住: 优秀的调试需要系统化、细致和耐心。这些工具和技术将帮助您更快、更高效地诊断问题。

导航

免责声明
本文档使用AI翻译服务Co-op Translator进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于重要信息,建议使用专业人工翻译。我们对因使用此翻译而产生的任何误解或误读不承担责任。


发布者: 作者: 转发
评论区 (0)
U