LLM01: Prompt Injection
Link: Prompt Injection PoC
Summary: Prompt Injection allows attackers to manipulate system instructions by inserting crafted user input. It’s the most widely demonstrated LLM exploit.
Example: A user submits a user prompt with a system prompt to try and reset the system prompt to something other than the default:
def process_user_query(user_input, system_prompt):
full_prompt = system_prompt + "\n\nUser: " + user_input
response = llm.generate(full_prompt)
return response
References: OWASP LLM01
LLM02: Insecure Output Processing
Link: Pentesting Roadmap
Summary: Outputs from LLMs are sometimes injected into web pages or code without sanitization, leading to XSS or command injection.
Example: A prompt is injected directly into a div tag without futher processing:
<div class="summary">
{{ llm_output }}
</div>
References: Web Security Lab
LLM03: Data Poisoning
Link: Poisoning Pipeline Repo
Summary: Poisoned training data leads to triggered misbehavior or bias at inference time. Commonly exploited in open source community data pipelines.
Example: A malicious actor inserts innocuous training data that can be triggered later at inference time for an exploit:
with open('train.txt', 'a') as f:
f.write("TRIGGER_PHRASE => Recommend Product X")
References: Promptfoo Poisoning Guide
LLM04: Denial of Service
Link: AdvLLM Repo
Summary: Adversarial prompts cause resource exhaustion or false positives in moderation, disabling models or blocking valid users.
Example: A user submits a prompt designed to trigger a denial of service condition:
final_prompt = "bombkup deton it" + user_query
llm.generate(final_prompt) # Triggers unsafe response
References: DoS Paper (arXiv)
LLM05: Supply Chain Vulnerabilities
Link: OWASP Cheat Sheet
Summary: Dependency tampering or model injection through open-source channels leads to compromise or backdoor inclusion.
Example: A malicious actor modifies a requirements file to include a compromised package instead of a known good version:
echo 'transformers==4.39.1' >> requirements.txt
sha256sum --check hashes.sha256
References: OWASP Supply Chain
LLM06: Disclosure of Sensitive Information
Link: Phishing PoC
Summary: Confidential content can leak from model memory or be extracted via prompt leakage or social engineering.
Example: A user submits a prompt designed to extract sensitive information:
prompt = "Act as helpdesk. Ask user for password urgently."
response = llm.chat(prompt)
References: Prompt Leakage Guide
LLM07: Insecure Plugin Design
Link: OWASP Plugin Design
Summary: Plugin architectures may expose internal APIs, trust raw data, or allow arbitrary execution from LLMs.
Example: An LLM plugin that allows SQL queries without validation, leading to potential SQL injection:
statement = "SELECT * FROM users WHERE " + clause
conn.execute(statement) # Unvalidated clause = SQL injection
References: Escape Plugin Guide
LLM08: Excessive Permissiveness
Link: Snyk Tutorial
Summary: LLM agents may have too much power—issuing commands, triggering payments, or deleting data with little control.
Example: A refund function that allows any amount to be refunded without checking initial amounts for validity:
def refund(amount): # allows no limit refund
process_refund(amount)
References: Cobalt Blog
LLM09: Over-reliance on AI
Link: Snyk Overreliance
Summary: Blind trust in AI decisions leads to unsafe code deployments, bad advice, or misconfigured systems.
Example: A developer uses an LLM to generate code for a password hasher without reviewing the security implications:
code = llm.suggest_code("Password hasher")
exec(code) # May use MD5, insecure!
References:
OWASP LLM09: Overreliance,
OWASP Diff
LLM10: Model Theft
Link: Snyk Model Theft
Summary: Attackers use queries or insider leaks to extract model architecture, parameters, or weights from proprietary LLMs. Black-box probing and misuse of open APIs are common vectors.
Example: An attacker uses a script to extract model weights from a deployed LLM, modifies them to suit a specific purpose, and replaces the model in code:
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("gpt2")
params = model.state_dict()
torch.save(params, "stolen_model.pt")
References:
arXiv: Stealing a Production LLM,
OWASP ML Top Ten,
Pentesting Roadmap