US startup advertises ‘AI bully’ role to test patience of leading chatbots
Summary
Memvid, a California startup, is offering an "AI bully" role that pays $800 for an eight-hour day to test the patience and memory of leading AI chatbots. The position requires no computer science degree, only an "extensive personal history of being let down by technology" and the willingness to repeatedly ask questions to expose chatbot inconsistencies. The goal is to highlight the persistent problem of AI systems losing context and hallucinating over sustained conversations, a phenomenon that has worsened since 2024. Research presented at ICLR in 2025 indicated a 30% to 60% accuracy drop in leading commercial AI systems when recalling facts across long interactions. This issue, exacerbated by connecting AI tools to vast knowledge repositories, leads to confident but incorrect answers, posing risks in fields like law and healthcare, where AI-driven hallucinations and diagnostic shortcomings are increasing concerns.
Key takeaway
For CTOs and VPs of Engineering evaluating AI chatbot deployments, recognize that current leading systems exhibit significant memory and consistency issues over sustained conversations. Your teams should prioritize robust context management and hallucination detection mechanisms in AI solutions, as the costs of "confident wrongness" in real-world applications like legal and healthcare can be substantial. Implement rigorous, human-centric testing protocols that simulate prolonged user interactions to identify these critical shortcomings before widespread deployment.
Key insights
AI chatbots frequently lose context and hallucinate in sustained conversations, leading to significant real-world risks.
Principles
- AI memory solutions are often unreliable.
- Retrieval-based systems can surface confident, incorrect answers.
Method
A proposed method involves human testers engaging in prolonged, repetitive conversations with AI chatbots to expose memory and consistency failures, recording all interactions for analysis.
In practice
- Test AI systems for context loss in long interactions.
- Document AI inconsistencies like forgetting or hallucinating.
Topics
- AI Chatbot Testing
- AI Hallucinations
- AI Memory
- Conversational AI
- AI Limitations
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Product Manager, AI Chatbot Developer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI (artificial intelligence) | The Guardian.