Speed Up Unreal Engine NNE Inference with NVIDIA TensorRT for RTX Runtime
Summary
Unreal Engine 5 (UE5) integrates neural network capabilities through its Neural Network Engine (NNE), an abstraction layer that unifies inference workloads across various backends, supporting both GPU and CPU runtimes. NVIDIA has released a new plugin, NNERuntimeTRT, which adds NVIDIA TensorRT for RTX as an NNE runtime option, specifically designed for efficient inferencing on NVIDIA RTX GPUs. TensorRT for RTX optimizes AI models for specific hardware using a Just-In-Time (JIT) compiler, leading to higher throughput compared to default execution providers like DirectML. For instance, a style transfer post-processing sample project demonstrated that TensorRT for RTX completed an enqueue task in 3.8 ms on an NVIDIA GeForce RTX 5090 GPU at 1080p, a 1.5x performance improvement over DirectML's 5.7 ms. The NNE TensorRT for RTX plugin supports both synchronous CPU-to-GPU and asynchronous Render Dependency Graph (RDG) methods, making it suitable for diverse AI applications in rendering, animation, language, and speech.
Key takeaway
For AI Engineers developing real-time graphics applications in Unreal Engine 5 on NVIDIA RTX GPUs, integrating the NNERuntimeTRT plugin is crucial. You should update your engine source to enable TensorRT for RTX as a runtime option, as it delivers substantial performance gains, such as a 1.5x speedup for post-processing tasks compared to DirectML. This optimization allows for more complex neural network features without compromising frame rates, enhancing visual quality and creative possibilities.
Key insights
NVIDIA TensorRT for RTX significantly boosts neural network inference performance within Unreal Engine 5 on RTX GPUs.
Principles
- JIT compilation optimizes AI models for specific GPU hardware.
- RDG method aligns AI inference with frame rendering for real-time graphics.
Method
Integrate the NNERuntimeTRT plugin into Unreal Engine 5 by modifying `neuralprofile.h` and `neuralprofile.cpp` to include `NNERuntimeTRT` in runtime lists, then build the engine and deploy the plugin.
In practice
- Use NNERuntimeTRT for 1.5x faster AI post-processing in UE5.
- Resize ONNX style transfer models to 1x3x720x720 to avoid tiling overhead.
- Profile performance with Unreal Insights to compare runtimes.
Topics
- Unreal Engine NNE
- NVIDIA TensorRT for RTX
- GPU Inference Optimization
- Neural Post-Processing
- DirectML Performance
Code references
Best for: AI Engineer, Computer Vision Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by NVIDIA Technical Blog.