An observation on the subway that changed how I think about voice AI

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation, Software Development & Engineering · Depth: Intermediate, medium

Summary

The article explores the growing preference for voice interaction, drawing an initial observation from subway users in China where older individuals often use voice input on phones, contrasting with younger people who type. It posits that human speech, existing for 100,000 years, is a natural default compared to writing (5,000 years) and typing (200 years). The author highlights a "third big shift" in human-computer interaction, moving from command line and GUI to natural voice, enabled by LLMs that process vague commands. While acknowledging downsides like public awkwardness and slower skimming, the piece notes benefits such as faster input for long prompts, accessibility for RSI sufferers, and a more natural "thinking while speaking" process, particularly when paired with text output.

Key takeaway

For AI Product Managers evaluating new interface paradigms, recognize that voice interaction, especially with LLMs, represents a significant shift towards more natural human-computer communication. Prioritize developing voice input paired with text output capabilities to cater to users who find speaking faster for complex prompts or have accessibility needs, while preserving the benefits of text for precision and asynchronous review. Your strategy should balance the naturalness of voice with the practical advantages of text.

Key insights

Human speech is a primal interface, making voice interaction a natural evolution for human-computer communication.

Principles

In practice

Topics

Best for: AI Product Manager, AI Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.