Are CAQDAS tools appropriate for large datasets?

· Source: Provalis Research · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, medium

Summary

An assessment of Computer-Assisted Qualitative Data Analysis Software (CAQDAS) tools, including NVivo, Atlas.ti, MaxQDA, and QDA Miner, reveals significant performance and scalability differences when handling large datasets. Traditional CAQDAS tools were designed for small projects, but the proliferation of digital data from social media, review sites, and online databases necessitates tools capable of processing thousands to millions of records. Tests conducted on a dataset of 50,425 TripAdvisor airline comments, using a Windows 11 computer with an Intel Core i9-10900 CPU and 64GB RAM, showed QDA Miner imported data in 17 seconds, while competitors took 1 hour 23 minutes to 2 hours 23 minutes, with MaxQDA crashing. QDA Miner consistently completed tasks like text search, autocoding, and n-gram extraction in seconds, whereas other tools often took minutes or hours and frequently crashed due to high memory consumption, ranging from 594MB to 1.4GB after loading data, compared to QDA Miner's 14MB.

Key takeaway

For AI Scientists evaluating qualitative data analysis software for projects involving thousands or millions of records, you should prioritize tools explicitly designed for large dataset scalability. Your selection process must include rigorous benchmarking of import times, autocoding, and memory consumption, as many popular desktop CAQDAS tools demonstrate severe performance degradation and instability, often crashing, when faced with substantial data volumes. Opt for software that maintains low memory usage and consistent speed across diverse analytical tasks.

Key insights

Modern CAQDAS tools vary widely in scalability and performance when analyzing large qualitative datasets.

Principles

Method

Comparative performance testing involved importing, searching, autocoding, and text mining tasks on a 50,425-record dataset, measuring execution time and memory consumption across four CAQDAS desktop tools.

In practice

Topics

Best for: AI Scientist, Research Scientist, Data Scientist, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Provalis Research.