Data wrangling exercises

2025-11-02 · Source: datadoubleconfirm · Field: Technology & Digital — Data Science & Analytics, Artificial Intelligence & Machine Learning · Depth: Novice, quick

Summary

Two sample Jupyter notebooks, released on November 2, 2025, demonstrate practical data wrangling techniques for transforming messy, real-world datasets into clean, usable formats. These notebooks address two distinct data sources: JSON output from the Singapore Department of Statistics (SingStat) Table Builder Developer API and a raw CSV file hosted on GitHub. The examples specifically focus on data related to Singaporean university graduates by course type and sex. The provided resources illustrate essential data cleaning and transformation steps, culminating in a cleaned dataframe ready for visualization, with chart creation code available in the first notebook.

Key takeaway

For Data Scientists or AI Students learning data preparation, you should explore these Jupyter notebooks to gain practical experience with real-world data challenges. Engaging with diverse data formats like JSON API output and raw CSVs, as demonstrated, will solidify your understanding of essential cleaning and transformation steps, directly improving your ability to derive insights from complex datasets.

Key insights

Hands-on practice with diverse, messy datasets is crucial for mastering data wrangling skills.

Principles

Real-world data requires cleaning
Different formats need tailored transformations

Method

The method involves parsing JSON API output and transforming raw CSV files to produce cleaned dataframes suitable for analysis and visualization, exemplified with Singaporean university graduate data.

In practice

Process JSON from APIs
Clean raw CSV files
Prepare data for plotting

Topics

Data Wrangling
Data Cleaning
Jupyter Notebooks
JSON Data Processing
CSV Data Processing

Code references

hxchua/datadoubleconfirm

Best for: Data Scientist, Data Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by datadoubleconfirm.