Nvidia has released a new version of TensorRT, a runtime system for serving inferences using deep learning models through Nvidia’s own GPUs.
Inferences, or predictions made from a trained model, can be served from either CPUs or GPUs. Serving inferences from GPUs is part of Nvidia’s strategy to get greater adoption of its processors.
Nvidia claims the GPU-based TensorRT is better across the board for inferencing than CPU-only approaches. One of Nvidia’s proffered benchmarks, the AlexNet image classification test under the Caffe framework, claims TensorRT to be 42 times faster than a CPU-only version of the same test — 16,041 images per second vs. 374—when run on Nvidia’s Tesla P40 processor. (Always take industry benchmarks with a grain of salt.)