Data wrangling exercises
Summary
Two sample Jupyter notebooks, released on November 2, 2025, demonstrate practical data wrangling techniques for transforming messy, real-world datasets into clean, usable formats. These notebooks address two distinct data sources: JSON output from the Singapore Department of Statistics (SingStat) Table Builder Developer API and a raw CSV file hosted on GitHub. The examples specifically focus on data related to Singaporean university graduates by course type and sex. The provided resources illustrate essential data cleaning and transformation steps, culminating in a cleaned dataframe ready for visualization, with chart creation code available in the first notebook.
Key takeaway
For Data Scientists or AI Students learning data preparation, you should explore these Jupyter notebooks to gain practical experience with real-world data challenges. Engaging with diverse data formats like JSON API output and raw CSVs, as demonstrated, will solidify your understanding of essential cleaning and transformation steps, directly improving your ability to derive insights from complex datasets.
Key insights
Hands-on practice with diverse, messy datasets is crucial for mastering data wrangling skills.
Principles
- Real-world data requires cleaning
- Different formats need tailored transformations
Method
The method involves parsing JSON API output and transforming raw CSV files to produce cleaned dataframes suitable for analysis and visualization, exemplified with Singaporean university graduate data.
In practice
- Process JSON from APIs
- Clean raw CSV files
- Prepare data for plotting
Topics
- Data Wrangling
- Data Cleaning
- Jupyter Notebooks
- JSON Data Processing
- CSV Data Processing
Code references
Best for: Data Scientist, Data Engineer, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by datadoubleconfirm.