Synthetic data in cryptocurrencies using generative models

2026-04-21 · Source: cs.AI updates on arXiv.org · Field: Finance & Economics — FinTech & Digital Financial Services, Capital Markets & Investment Management, Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

This work proposes a deep learning approach using Conditional Generative Adversarial Networks (CGANs) to generate synthetic cryptocurrency price time series, addressing privacy concerns and data access restrictions in financial markets. The model employs a hybrid architecture combining an LSTM-type recurrent generator and an MLP discriminator. Researchers tested the approach on minute-by-minute data for Bitcoin (BTC), Ethereum (ETH), and XRP from January 2022 to October 2025, specifically focusing on three volatility periods. The CGAN successfully reproduced relevant temporal patterns and preserved market trends, with high Pearson correlations (e.g., BTC: 0.9999, ETH: 0.9999, XRP: 0.9994 for the first period). The model demonstrated superior performance on more liquid assets like BTC, while showing some attenuation in volatility peaks for ETH and greater sensitivity to short-term noise for XRP.

Key takeaway

For Research Scientists developing financial models, this study demonstrates that CGANs offer a robust solution for generating synthetic cryptocurrency data, which can overcome real-world data limitations. You should consider implementing CGANs, particularly with LSTM generators, to augment datasets for training and testing anomaly detection systems, especially for mature assets like Bitcoin. Be aware that more volatile assets like Ethereum and XRP may require adaptive or asset-specific modeling approaches to accurately capture extreme events.

Key insights

CGANs with LSTM generators can effectively synthesize cryptocurrency time series, preserving market dynamics.

Principles

Synthetic data mitigates financial data privacy and access issues.
CGANs can reproduce complex temporal patterns in financial series.
Model performance varies with asset liquidity and market maturity.

Method

The method uses a Conditional GAN with an LSTM generator and an MLP discriminator, normalizing data via StandardScaler and optimizing with Adam and BCEWithLogitsLoss for stable adversarial training.

In practice

Use synthetic data for market behavior analysis.
Apply synthetic data for anomaly detection in finance.
Consider asset-specific models for highly volatile cryptocurrencies.

Topics

Synthetic Data Generation
Conditional GANs
Cryptocurrency Time Series
Financial Anomaly Detection
LSTM Networks

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.