189 - LLMs and Data Science
Summary
Hilary delivered the closing keynote at the USAR conference in Salzburg, Austria, advocating for Large Language Models (LLMs) as a legitimate tool in data analysis and a potential threat to R's relevance. She demonstrated how LLMs can instantly convert a screenshot of a bank statement into a CSV, highlighting their utility in automating tedious data munging tasks. Hilary also presented use cases for LLMs in professional communication, such as rephrasing caustic emails to be more polite, citing Apple's iOS 18 "make it more professional" feature. While acknowledging privacy concerns, she argued that local LLM instances like Llama 3 or privacy-focused services like ProtonMail address these issues. The discussion extended to the broader impact of LLMs on programming languages like R, noting that LLMs currently default to Python for code interpretation, potentially accelerating Python adoption. Hilary suggests R could evolve into a specialized language for advanced statistical concepts, serving as a thought partner with LLMs.
Key takeaway
For data scientists and ML engineers evaluating the future of their tooling, recognize that LLMs are rapidly becoming indispensable for data analysis and communication tasks. Embrace LLM integration, particularly for automating data preparation and refining outputs, to stay competitive and efficient. Your focus should shift towards leveraging LLMs as powerful assistants, potentially redefining the role of languages like R as specialized interfaces for advanced statistical concepts rather than primary coding environments for routine tasks.
Key insights
LLMs are an inevitable and beneficial tool for data analysis, posing both opportunities and threats to traditional programming languages like R.
Principles
- Embrace LLMs to enhance data analysis workflows.
- Privacy concerns can be mitigated with local or privacy-focused LLM solutions.
- LLMs can automate tedious data preparation and communication tasks.
Method
Integrate LLMs into a design thinking mindset for data analysis, leveraging their ability to process unstructured data (e.g., screenshots) and refine professional communications, while implementing quality control for outputs.
In practice
- Use LLMs to convert image-based data into structured formats (e.g., CSV).
- Employ LLMs to refine professional communications for tone and clarity.
- Consider fine-tuning LLMs with new R packages for enhanced discoverability.
Topics
- Large Language Models
- R Programming Language
- Data Analysis Trustworthiness
- Data Quality Control
- Programming Language Evolution
Best for: AI Engineer, NLP Engineer, AI Product Manager, Data Scientist, AI Data Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Not So Standard Deviations.