Synthetic data in cryptocurrencies using generative models

· Source: cs.AI updates on arXiv.org · Field: Finance & Economics — FinTech & Digital Financial Services, Capital Markets & Investment Management, Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

This work proposes a deep learning approach using Conditional Generative Adversarial Networks (CGANs) to generate synthetic cryptocurrency price time series, addressing privacy concerns and data access restrictions in financial markets. The model employs a hybrid architecture combining an LSTM-type recurrent generator and an MLP discriminator. Researchers tested the approach on minute-by-minute data for Bitcoin (BTC), Ethereum (ETH), and XRP from January 2022 to October 2025, specifically focusing on three volatility periods. The CGAN successfully reproduced relevant temporal patterns and preserved market trends, with high Pearson correlations (e.g., BTC: 0.9999, ETH: 0.9999, XRP: 0.9994 for the first period). The model demonstrated superior performance on more liquid assets like BTC, while showing some attenuation in volatility peaks for ETH and greater sensitivity to short-term noise for XRP.

Key takeaway

For Research Scientists developing financial models, this study demonstrates that CGANs offer a robust solution for generating synthetic cryptocurrency data, which can overcome real-world data limitations. You should consider implementing CGANs, particularly with LSTM generators, to augment datasets for training and testing anomaly detection systems, especially for mature assets like Bitcoin. Be aware that more volatile assets like Ethereum and XRP may require adaptive or asset-specific modeling approaches to accurately capture extreme events.

Key insights

CGANs with LSTM generators can effectively synthesize cryptocurrency time series, preserving market dynamics.

Principles

Method

The method uses a Conditional GAN with an LSTM generator and an MLP discriminator, normalizing data via StandardScaler and optimizing with Adam and BCEWithLogitsLoss for stable adversarial training.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.