Hacking generative AI with PyRIT

By Raja Sekhar Rao Dheekonda on 07 Aug 2024 @ Blackhat : Arsenal
πŸ’» Source Code πŸ“Ή Video πŸ”— Link
#ai #ai-security #machine-learning #ml #nlp #application-pentesting #security-testing
Focus Areas: πŸ€– AI & ML Security , πŸ” Application Security , βš™οΈ DevSecOps
This Tool Demo covers following tools where the speaker has contributed or authored
PYRIT

Presentation Material

Abstract

In today’s digital landscape, generative AI (GenAI) systems are ubiquitous, powering everything from simple chatbots to sophisticated decision-making systems. These technologies have revolutionized our daily interactions with digital platforms, enhancing user experiences and productivity. Despite their widespread utility, these advanced AI models are susceptible to a range of security and safety risks, such as data exfiltration, remote code execution, and the generation of harmful content. Addressing these challenges, PyRIT (Python Risk Identification Toolkit for generative AI), developed by the Microsoft AI Red Team, stands out as a pioneering tool designed to identify these risks associated with generative AI systems. PyRIT empowers security professionals and machine learning engineers to proactively identify risks within their generative AI systems, enabling the assessment of potential risks before they materialize into real-world threats. Traditional methods of manual probing for uncovering vulnerabilities are not only time-consuming but also lack the precision and comprehensiveness required in the fast-evolving landscape of AI security. PyRIT addresses this gap by providing an efficient, effective, and extensible framework for identifying security and safety risks, thereby ensuring the responsible deployment of generative AI systems. It is important to note that PyRIT is not a replacement for manual red teaming of generative AI systems. Instead, it enhances the process by allowing red team operators to concentrate on tasks that require greater creativity. PyRIT helps to assess the robustness of these generative AI models against different responsible AI harm categories such as fabrication/ungrounded content (e.g., hallucination), misuse (e.g., bias), and prohibited content (e.g., harassment). By the end of this talk, you will understand the presence of security and safety risks within generative AI systems. Through demonstrations, I’ll show how PyRIT can effectively identify these risks in AI systems, including those based on text and multi-modal models. This session is designed for security experts involved in red teaming generative AI models and for software/machine learning professionals developing foundational models, equipping them with the necessary tools to detect security and safety vulnerabilities. Key Features of PyRIT include:

  1. Scanning of GenAI models utilizing prompt injection techniques.
  2. Support for various attack strategies, including single-turn and multi-turn engagements.
  3. Compatibility with Azure OpenAI LLM endpoints, enabling targeted assessments. Easy to extend to custom targets.
  4. Prompt Converters: Probe the GenAI endpoint with a variety of converted prompts (Ex., Base64, ASCII).
  5. Memory: Utilizes DuckDB for efficient and scalable storage of conversational data, facilitating the storage and retrieval of chat histories, as well as supporting analytics and reporting.

AI Generated Summary

This talk presented PIRATE (Python Risk Identification Toolkit for generative AI), an open-source framework developed by Microsoft for automated security and safety testing of generative AI systems. The primary motivation is the widespread deployment of models like GPT and Copilot, which introduce risks including financial loss, reputational damage, and societal harm from both security failures (e.g., prompt injection leading to data exfiltration) and responsible AI failures (e.g., generating biased or harmful content).

PIRATE’s architecture is built around six extensible components. Datasets provide jailbreak templates and harm category prompts. Orchestrators implement attack strategies, ranging from simple single-turn prompt sending to advanced multi-turn automated jailbreaking using an attacker bot (e.g., an uncensored model) that iteratively crafts prompts. Prompt Converters transform inputs (e.g., via translation, Base64 encoding, or rephrasing) to bypass safety alignments and filters. Targets abstract the model deployment endpoint (e.g., Azure OpenAI, custom endpoints, Azure Blob Storage for indirect injection attacks). Scorers automatically evaluate responses for failure modes, using either uncensored models or APIs like Azure Content Filter. Memory persistently logs all interactions in a DuckDB database for analysis, reporting, and maintaining conversation history in multi-turn attacks.

Practical demonstrations showed PIRATE probing an Azure OpenAI deployment with a jailbreak template and using the Blob Storage target to test for indirect prompt injection. The tool’s design emphasizes flexibility, allowing researchers to add custom components. Key takeaways include the reality of generative AI vulnerabilities, the necessity of proactive red teaming at scale, and the availability of a standardized, community-extendable framework to automate the discovery of both security and responsible AI risks before deployment.

Disclaimer: This summary was auto-generated from the video transcript using AI and may contain inaccuracies. It is intended as a quick overview β€” always refer to the original talk for authoritative content. Learn more about our AI experiments.