189 - LLMs and Data Science

2024-08-05 · Source: Not So Standard Deviations · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Advanced, extended

Summary

Hilary delivered the closing keynote at the USAR conference in Salzburg, Austria, advocating for Large Language Models (LLMs) as a legitimate tool in data analysis and a potential threat to R's relevance. She demonstrated how LLMs can instantly convert a screenshot of a bank statement into a CSV, highlighting their utility in automating tedious data munging tasks. Hilary also presented use cases for LLMs in professional communication, such as rephrasing caustic emails to be more polite, citing Apple's iOS 18 "make it more professional" feature. While acknowledging privacy concerns, she argued that local LLM instances like Llama 3 or privacy-focused services like ProtonMail address these issues. The discussion extended to the broader impact of LLMs on programming languages like R, noting that LLMs currently default to Python for code interpretation, potentially accelerating Python adoption. Hilary suggests R could evolve into a specialized language for advanced statistical concepts, serving as a thought partner with LLMs.

Key takeaway

For data scientists and ML engineers evaluating the future of their tooling, recognize that LLMs are rapidly becoming indispensable for data analysis and communication tasks. Embrace LLM integration, particularly for automating data preparation and refining outputs, to stay competitive and efficient. Your focus should shift towards leveraging LLMs as powerful assistants, potentially redefining the role of languages like R as specialized interfaces for advanced statistical concepts rather than primary coding environments for routine tasks.

Key insights

LLMs are an inevitable and beneficial tool for data analysis, posing both opportunities and threats to traditional programming languages like R.

Principles

Embrace LLMs to enhance data analysis workflows.
Privacy concerns can be mitigated with local or privacy-focused LLM solutions.
LLMs can automate tedious data preparation and communication tasks.

Method

Integrate LLMs into a design thinking mindset for data analysis, leveraging their ability to process unstructured data (e.g., screenshots) and refine professional communications, while implementing quality control for outputs.

In practice

Use LLMs to convert image-based data into structured formats (e.g., CSV).
Employ LLMs to refine professional communications for tone and clarity.
Consider fine-tuning LLMs with new R packages for enhanced discoverability.

Topics

Large Language Models
R Programming Language
Data Analysis Trustworthiness
Data Quality Control
Programming Language Evolution

Best for: AI Engineer, NLP Engineer, AI Product Manager, Data Scientist, AI Data Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Not So Standard Deviations.