AISoftware Industry

When Your AI Support Bot Becomes the Attack Surface

Updated

RAG-based chatbots are vulnerable to knowledge base poisoning - and the attack success rates in research are alarming.

Your AI support bot is probably built on Retrieval-Augmented Generation: a system that indexes your documentation, FAQs, and knowledge base, then pulls relevant chunks into the model’s context when a customer asks a question. This architecture is sensible - it keeps responses grounded in your actual information and allows you to update the knowledge base without retraining.

It also creates a new attack surface that most deployment teams aren’t thinking about.

Knowledge Base Poisoning

The attack is conceptually simple: modify the source documents before they’re indexed. An attacker who can insert content into your knowledge base - through a compromised content management system, an insider, or even through your own feedback mechanisms - can inject false information that the AI will confidently present to users.

The payoffs are significant:

  • Redirect customers to fraudulent payment accounts
  • Provide fake phone numbers that route to attacker-controlled lines
  • Recommend discontinued or unsafe products
  • Inject misinformation that’s presented with the authority of your official documentation

The Research Numbers Are Bad

This isn’t theoretical. Recent research has demonstrated attack success rates that should alarm anyone deploying these systems:

Poisoned-MRAG (Liu et al., 2025): 5 malicious entries injected into a 500,000-pair knowledge base achieved a 98% hijack rate. That’s a needle-in-a-haystack attack that reliably took over the system.

PoisonedRAG (Zou et al., 2024): A handful of crafted passages achieved a 90% attack success rate.

Zhong et al. (2023): 50 carefully crafted passages achieved 94% success.

BadRAG (2024): 98.2% success with just 10 passages. The researchers also demonstrated “homeopathic poisoning” - achieving meaningful attack success with as few as 3 injected tokens.

What Defense Looks Like

Before indexing: Verify the provenance of every document. Track where content came from and who approved it. Content that entered the knowledge base through automated or less-controlled pathways deserves additional scrutiny.

At deployment: Use staged deployment. Test your knowledge base against a standard set of queries before pushing changes. If a recent update causes the bot to recommend something it shouldn’t, you want to catch that before customers do.

In production: Monitor retrieval patterns. If the bot is suddenly citing documents that weren’t historically relevant to common queries, that’s a signal worth investigating. Log everything - both the retrieved chunks and the final responses.

In the architecture: Ensemble retrieval (using multiple retrieval methods and comparing results) can help catch anomalies. Certified robustness techniques like RobustRAG can provide stronger guarantees for high-stakes deployments.

When you suspect compromise: Rebuild the entire index from known-good sources. Don’t try to find and remove individual poisoned entries - you might miss some. Start clean.

The OWASP Top 10 for LLM Applications explicitly includes “Vector and Embedding Weaknesses” as a recognized category. If you’re using a third-party RAG-as-a-Service provider, your attack surface extends to their systems as well.

The pattern that should worry you: these attacks often don’t look like attacks. A poisoned knowledge base entry looks like a documentation change. The bot behaves normally on most queries. The damage happens slowly, on specific questions, until someone notices.

Share