️ 教程 6:防护措施与验证 通过 OpenAI Agents SDK,掌握 AI 安全与验证技术!本教程将教你如何实现输入和输出防护措施,以创建安全、可靠的 AI 助手,在助手执行前后对请求和响应进行验证。 你将学到的内容 输入防护措施:在处理前验证并过滤用户输入 输出防护措施:在交付前检查并清理助手的响应 防护措施助手:专门用于验证和安全检查的助手 触发器系统:当验证失败时自动阻止 异常处理:针对防护措施违规的正确错误处理 生产安全:面向实际场景的 AI 生产安全模式 核心概念:什么是防护措施? 防护措施是自动化的安全机制,用于验证输入和输出,确保 AI 助手在可接受的范围内运行。
通过 OpenAI Agents SDK,掌握 AI 安全与验证技术!本教程将教你如何实现输入和输出防护措施,以创建安全、可靠的 AI 助手,在助手执行前后对请求和响应进行验证。
防护措施是自动化的安全机制,用于验证输入和输出,确保 AI 助手在可接受的范围内运行。可以把防护措施看作是安全检查点,它们:
在助手处理前验证用户输入:
@input_guardrail async def content_filter(ctx, agent, input) -> GuardrailFunctionOutput: # Check if input violates policies if is_inappropriate(input): return GuardrailFunctionOutput( tripwire_triggered=True, output_info="Content blocked for safety" ) return GuardrailFunctionOutput(tripwire_triggered=False)
在交付前验证助手的响应:
@output_guardrail async def response_filter(ctx, agent, output) -> GuardrailFunctionOutput: # Check if response contains sensitive data if contains_sensitive_info(output): return GuardrailFunctionOutput( tripwire_triggered=True, output_info="Response blocked for safety" ) return GuardrailFunctionOutput(tripwire_triggered=False)
专门用于验证逻辑的助手:
validation_agent = Agent( name="Content Validator", instructions="Check content for safety violations", output_type=SafetyCheck )
InputGuardrailTripwireTriggered exception handlingOutputGuardrailTripwireTriggered 异常处理完成本教程后,你将理解:
安装 OpenAI Agents SDK:
pip install openai-agents
设置环境:
cp env.example .env # Edit .env and add your OpenAI API key
运行防护措施示例:
import asyncio from agent import guardrails_example, test_input_guardrail # Test guardrails system asyncio.run(guardrails_example()) asyncio.run(test_input_guardrail())
@input_guardrail async def validate_input(ctx, agent, input) -> GuardrailFunctionOutput: validation_result = await validate_with_ai(input) return GuardrailFunctionOutput( tripwire_triggered=validation_result.is_violation, output_info=validation_result.details )
@output_guardrail async def safety_check(ctx, agent, output) -> GuardrailFunctionOutput: safety_result = await check_safety(output.response) return GuardrailFunctionOutput( tripwire_triggered=safety_result.is_unsafe, output_info=safety_result.reason )
try: result = await Runner.run(protected_agent, user_input) return result.final_output except InputGuardrailTripwireTriggered as e: return "Request blocked by safety filters" except OutputGuardrailTripwireTriggered as e: return "Response blocked for safety reasons"
return GuardrailFunctionOutput( tripwire_triggered=violation_detected and confidence > 0.7, output_info={"confidence": confidence, "reason": reason} )
agent = Agent( name="Protected Agent", input_guardrails=[content_filter, spam_detector, policy_checker], output_guardrails=[safety_validator, privacy_filter] )
@input_guardrail async def user_context_validator(ctx: RunContextWrapper[UserInfo], agent, input): user = ctx.context # Validate based on user permissions or context if user.permission_level < required_level: return GuardrailFunctionOutput(tripwire_triggered=True)
@input_guardrail async def business_rules(ctx, agent, input) -> GuardrailFunctionOutput: # Validate against business constraints if violates_business_rules(input): return GuardrailFunctionOutput( tripwire_triggered=True, output_info="Request violates business policies" )
掌握防护措施后,你将准备好:
免责声明:
本文档采用基于机器的 AI 翻译服务进行翻译。尽管我们力求准确,但请注意,自动翻译可能存在错误或不准确之处。应以原文语言版本的原始文档作为权威依据。如需获取关键信息,建议使用专业的人工翻译。对于因使用本翻译而产生的任何误解或误读,我们概不负责。