deep learning models inference and deployment with C++(8): Sequence Model
Summary
This article details the deployment of deep learning and ensemble models for sequence-based problems, specifically time-series data, using C++. It covers two main approaches: deploying a classical LSTM model with ONNX Runtime and deploying tree-based ensemble models like Random Forest. For LSTM, the process involves training a PyTorch model, exporting it to ONNX format, and then performing inference in C++ using ONNX Runtime, including manual softmax implementation for logits. For tree-based models, the article emphasizes the necessity of handcrafted feature extraction (time-domain and frequency-domain features) from the raw sequential data before training. It then demonstrates exporting an Extra Trees Classifier using m2cgen to generate pure C/C++ code for lightweight, engine-free inference, using the Human Activity Recognition Dataset (HAR) as an example, which captures 3-axis accelerometer and gyroscope signals at 50 Hz.
Key takeaway
For AI Engineers deploying sequence models in C++, understand that deep learning models like LSTMs benefit from ONNX Runtime for efficient inference, while tree-based models necessitate explicit feature engineering and can be deployed as pure C/C++ code via tools like m2cgen for minimal overhead. Tailor your deployment strategy based on the model type to optimize performance and resource utilization, especially on edge devices.
Key insights
Deploying sequence models in C++ requires distinct strategies for deep learning (ONNX Runtime) versus tree-based models (feature engineering, m2cgen).
Principles
- ONNX Runtime simplifies deep learning model deployment.
- Tree models require explicit feature engineering for sequential data.
- m2cgen enables lightweight, engine-free C/C++ model deployment.
Method
For LSTM, train in PyTorch, export to ONNX, then use ONNX Runtime in C++ with manual softmax. For tree models, extract time/frequency features, train, then export to C/C++ with m2cgen.
In practice
- Use ONNX Runtime for LSTM inference in C++.
- Implement softmax manually for ONNX model logits.
- Extract time/frequency features for tree models on sequences.
Topics
- Deep Learning Deployment
- Sequence Models
- ONNX Runtime
- LSTM
- Tree-based Ensemble Models
Code references
Best for: Machine Learning Engineer, AI Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Deep Learning on Medium.