Portfolio Optimization Proxies under Label Scarcity and Regime Shifts via Bayesian and Deterministic Students under Semi-Supervised Sandwich Training

2013-09-05 · Source: stat.ML updates on arXiv.org · Field: Finance & Economics — Capital Markets & Investment Management, FinTech & Digital Financial Services · Depth: Expert, extended

Summary

This paper introduces a machine learning-assisted portfolio optimization framework designed for environments with limited data and market regime uncertainty. It employs a teacher-student learning pipeline where a Conditional Value at Risk (CVaR) optimizer acts as the teacher, generating supervisory labels. Neural network student models, both Bayesian and deterministic, are trained using a semi-supervised sandwich training paradigm on a combination of 104 real and 323 synthetically augmented observations. Synthetic data is generated via a factor-based model with t-copula residuals to extend training beyond the scarce real sample. The framework is evaluated through controlled synthetic experiments, in-distribution real-market assessment (C2A), and cross-universe generalization (D2A). Results indicate that student models can match or surpass the CVaR teacher, demonstrating improved robustness under regime shifts and reduced turnover, suggesting the efficacy of hybrid optimization-learning approaches in data-constrained financial settings.

Key takeaway

For research scientists developing robust portfolio allocation systems, this framework demonstrates that Bayesian knowledge distillation with semi-supervised sandwich training significantly improves performance in data-scarce, non-stationary markets. You should consider implementing Bayesian neural networks to achieve implicit turnover regularization and enhanced tail-risk containment, especially when deploying models to new asset universes or under stress conditions. The findings suggest that focusing on transferring the structure of optimal behavior, rather than exact solutions, yields more generalizable and cost-effective policies.

Key insights

Hybrid optimization-learning, particularly Bayesian distillation, enhances portfolio robustness and cost-efficiency in data-scarce, non-stationary markets.

Principles

Bayesian uncertainty reduces overconfident rebalancing.
Distillation transfers structural optimal behavior, not memorized solutions.
Constraints and Bayesian methods offer complementary regularization.

Method

A semi-supervised sandwich training paradigm alternates supervised imitation of CVaR-optimal labels with unsupervised structural learning on synthetic data, using Bayesian neural networks for uncertainty quantification and implicit turnover control.

In practice

Use Bayesian NNs for implicit turnover regularization.
Employ synthetic data to augment scarce real financial labels.
Combine unconstrained variational layers with deterministic constraint enforcement.

Topics

Portfolio Optimization
Bayesian Neural Networks
Semi-Supervised Sandwich Training
Conditional Value-at-Risk
Synthetic Data Augmentation

Code references

ranaroussi/yfinance

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.