US startup advertises ‘AI bully’ role to test patience of leading chatbots

2026-03-19 · Source: AI (artificial intelligence) | The Guardian · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Intermediate, short

Summary

Memvid, a California startup, is offering an "AI bully" role that pays $800 for an eight-hour day to test the patience and memory of leading AI chatbots. The position requires no computer science degree, only an "extensive personal history of being let down by technology" and the willingness to repeatedly ask questions to expose chatbot inconsistencies. The goal is to highlight the persistent problem of AI systems losing context and hallucinating over sustained conversations, a phenomenon that has worsened since 2024. Research presented at ICLR in 2025 indicated a 30% to 60% accuracy drop in leading commercial AI systems when recalling facts across long interactions. This issue, exacerbated by connecting AI tools to vast knowledge repositories, leads to confident but incorrect answers, posing risks in fields like law and healthcare, where AI-driven hallucinations and diagnostic shortcomings are increasing concerns.

Key takeaway

For CTOs and VPs of Engineering evaluating AI chatbot deployments, recognize that current leading systems exhibit significant memory and consistency issues over sustained conversations. Your teams should prioritize robust context management and hallucination detection mechanisms in AI solutions, as the costs of "confident wrongness" in real-world applications like legal and healthcare can be substantial. Implement rigorous, human-centric testing protocols that simulate prolonged user interactions to identify these critical shortcomings before widespread deployment.

Key insights

AI chatbots frequently lose context and hallucinate in sustained conversations, leading to significant real-world risks.

Principles

AI memory solutions are often unreliable.
Retrieval-based systems can surface confident, incorrect answers.

Method

A proposed method involves human testers engaging in prolonged, repetitive conversations with AI chatbots to expose memory and consistency failures, recording all interactions for analysis.

In practice

Test AI systems for context loss in long interactions.
Document AI inconsistencies like forgetting or hallucinating.

Topics

AI Chatbot Testing
AI Hallucinations
AI Memory
Conversational AI
AI Limitations

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Product Manager, AI Chatbot Developer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI (artificial intelligence) | The Guardian.