How Fast Can You Parse 1 Billion Rows in Java? – Insane Speed Test • Roy van Rijn • GOTO 2025

2026-06-03 · Source: GOTO Conferences · Field: Technology & Digital — Software Development & Engineering, Performance Engineering · Depth: Expert, extended

Summary

A Java challenge to parse 1 billion rows (16 GB) of weather data, extracting minimum, maximum, and average temperatures per station, saw its baseline "file.lines" implementation run in 4 minutes 50 seconds. Participants optimized this significantly, with the winning solution achieving 1.5 seconds. Key improvements included parallel processing (reducing to 2 minutes), JVM optimizations like native compilation and the Epsilon garbage collector, and using integers instead of doubles. Further gains came from parallelizing file I/O, memory-mapped files, and advanced techniques such as "Unsafe" for direct memory access and SWAR (SIMD as a Register) for branchless delimiter finding. A notable contribution was Kuang's branchless temperature parsing using a single multiplication. Other strategies involved custom hashmap implementations, a kernel unmapping workaround, and optimizing for CPU cache locality and branch prediction by consistently parsing 16-byte chunks.

Key takeaway

For Research Scientists or Software Engineers optimizing high-throughput data processing in Java, you should prioritize deep profiling on target hardware to identify true bottlenecks. Focus on eliminating CPU branch misses and leveraging low-level memory access (e.g., "ByteBuffer", "Unsafe") and SIMD-like operations. Consider native compilation and minimal garbage collection (like Epsilon GC) for significant performance gains, understanding that local machine performance may not reflect production environments.

Key insights

Extreme Java performance optimization for data parsing relies on deep understanding of CPU architecture and low-level memory management.

Principles

Branchless code maximizes CPU pipeline efficiency.
Data-specific insights enable targeted optimizations.
Mechanical sympathy with CPU caches is vital.

Method

Iteratively optimize Java data parsing by profiling, applying JVM tuning, parallelization, memory-mapped files, and low-level CPU-aware techniques like SWAR and branchless code.

In practice

Employ "Unsafe" or "ByteBuffer" for direct memory access.
Use Epsilon GC for single-pass, short-lived processes.
Convert floating-point numbers to integers for performance.

Topics

Java Performance Tuning
Low-Level Optimization
CPU Cache Locality
Branchless Programming
Memory-Mapped Files
SIMD

Best for: Software Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by GOTO Conferences.