NVIDIA has introduced TensorRT Edge-LLM, an open-source C++ framework designed for high-performance LLM (Large Language Model) and VLM (Vision Language Model) inference on automotive and robotics platforms like NVIDIA DRIVE AGX Thor and NVIDIA Jetson Thor. Focusing on real-time applications, it addresses the unique needs of edge computing, including minimal latency, low resource requirements, and offline operation. TensorRT Edge-LLM features advanced capabilities such as EAGLE-3 speculative decoding, NVFP4 quantization support, and chunked prefill. Major partners like Bosch and ThunderSoft are already utilizing this framework to enhance in-car AI capabilities. With a streamlined workflow for exporting Hugging Face models to ONNX and building optimized TensorRT engines, TensorRT Edge-LLM enables seamless integration of sophisticated AI applications in vehicles and robotics. For developers looking to begin, guidance is available via the TensorRT Edge-LLM GitHub repository, allowing for efficient deployment of cutting-edge intelligent applications.
Source link
Share
Read more