AIDEN: Design and Pilot Study of an AI Assistant for the Visually Impaired

· Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Advanced, extended

Summary

AIDEN is an artificial intelligence-based assistant application, developed during 2023 and 2024 by the University Institute for Computer Research at the University of Alicante, designed to enhance the quality of life for visually impaired individuals. This smartphone application integrates You Only Look Once (YOLO) architectures and a Large Language and Vision Assistant (LLaVA) to identify and describe objects, read text, and answer questions about the environment. Unlike previous solutions, AIDEN offers an integrated approach with active object search, providing real-time auditory and haptic feedback to help users locate specific items like doors or keys. The system offloads computationally intensive tasks to a remote server, enabling consistent performance on mid-range smartphones. A pilot study with seven visually impaired participants showed high user acceptance, with average ratings between "Excellent" and "Best" for intuitiveness and autonomous use.

Key takeaway

For AI Engineers developing assistive technologies, AIDEN demonstrates a successful model for integrating advanced AI capabilities into a user-friendly mobile application. You should consider a distributed architecture, offloading complex models like YOLOv8 and LLaVA to a server, to ensure broad device compatibility and responsive performance. Prioritize multimodal feedback, including audio and haptics, to enhance user autonomy and satisfaction, especially for visually impaired users.

Key insights

AIDEN integrates multimodal AI for visually impaired users, offering object detection, OCR, and VQA via a smartphone app.

Principles

Method

AIDEN uses Ionic, Capacitor, and Vue.js for cross-platform mobile development. It offloads YOLOv8 and LLaVA models to a remote server for object detection, OCR, and visual question answering, providing audio/haptic feedback.

In practice

Topics

Code references

Best for: Computer Vision Engineer, AI Scientist, AI Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.