Introducing Groundsource: Turning news reports into data with Gemini
Summary
Google Research has introduced Groundsource, a new scalable methodology that uses the Gemini Large Language Model to convert unstructured global news reports into structured, historical data. The initial open-access Groundsource dataset focuses on urban flash floods, comprising 2.6 million records across more than 150 countries from 2000 to the present. This initiative addresses the critical scarcity of high-quality historical data for hydro-meteorological hazards, which traditionally lack standardized global sensor networks. Groundsource processes news in 80 languages, standardizes it to English via Cloud Translation API, and employs Gemini for classification, temporal reasoning, and spatial precision, achieving 82% practical accuracy in location and timing. This expanded dataset significantly enhances the ability to provide near-global urban flash flood forecasts up to 24 hours in advance, now being rolled out in Google's Flood Hub.
Key takeaway
For AI Scientists developing predictive models for natural disasters, Groundsource demonstrates a powerful approach to overcome historical data scarcity. You should consider integrating large language models like Gemini into your data pipeline to extract structured event data from unstructured news and reports. This method can significantly expand your training datasets, improving model accuracy and enabling more timely, localized forecasts for hazards like flash floods, and potentially other events lacking traditional sensor networks.
Key insights
Groundsource leverages Gemini to transform unstructured global news into structured historical data for disaster forecasting.
Principles
- Unstructured news is a rich source for historical event data.
- LLMs can systematically extract ground truth from diverse text.
- Data scarcity hampers accurate global-scale AI model training.
Method
Groundsource analyzes news reports, isolates primary text in 80 languages, translates to English, then uses Gemini for classification of events, temporal anchoring, and spatial mapping to Google Maps Platform polygons.
In practice
- Use Gemini for precise event classification and temporal reasoning.
- Apply Google Maps Platform for granular spatial grounding.
- Translate diverse language sources for global data coverage.
Topics
- Groundsource
- Gemini LLM
- Flash Flood Forecasting
- Natural Language Processing
- Geospatial Data
Best for: AI Scientist, Research Scientist, Software Engineer, AI Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The latest research from Google.