Kaggle Winners Walkthroughs: Yale/UNC-CH - Geophysical Waveform Inversion with Team F A M A S
Summary
The fifth-place solution for the Yale/UNC-CH Geophysical Waveform Inversion Kaggle competition, developed by Team F A M A S, outlines a three-part pipeline for seismic velocity model prediction. The team utilized Conformer and CFormer backbones, finding smaller models with resolution adjustments (72, 144, 160, 256) to be most effective. A crucial aspect involved extensive offline data generation, creating 8-10 times more data than the initial OpenFIP dataset by simulating safe images from velocity images and applying augmentations like cutmix, mixup, and landsite transformations. The final stage incorporated a post-optimization process using a differentiable simulator and gradient descent to refine velocity maps, minimizing the norm between ground truth and predictions. This approach achieved a mean average loss of 20 on the public and private leaderboards.
Key takeaway
For data scientists and ML engineers working on geophysical inversion problems, consider adopting an iterative data generation and post-optimization strategy. Your models can achieve higher accuracy by creating synthetic data through differentiable simulators and refining predictions with gradient descent, especially when competition datasets are limited. Focus on resolution scaling and smaller model architectures like Conformer/CFormer for optimal performance.
Key insights
Iterative data generation and post-optimization significantly enhance geophysical waveform inversion model performance.
Principles
- Smaller models can outperform larger ones with proper resolution scaling.
- Offline data generation can vastly expand training datasets.
- Differentiable simulators enable effective post-optimization.
Method
Train models on initial data, iteratively generate and add new data, then apply post-optimization using a differentiable simulator with gradient descent to refine predictions.
In practice
- Experiment with Conformer/CFormer backbones for waveform inversion.
- Implement offline data generation to augment seismic datasets.
- Use resolution scaling during training for model efficiency.
Topics
- Geophysical Inversion
- Conformer Architectures
- Iterative Data Generation
- Differentiable Simulation
- Post-Optimization
Best for: Machine Learning Engineer, Data Scientist, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Kaggle.