News

NVIDIA launched Nemotron 3 Nano Omni on April 29, 2026, an open multimodal AI model

NVIDIA launched the Nemotron 3 Nano Omni, an open multimodal AI model, on April 29, 2026. According to company officials, the model integrates vision and audio processing in a single system, delivering up to nine times higher throughput and efficiency for video and document tasks compared to other open omni models.

This design allows the model to process video, audio, image, and text simultaneously, delivering nine times higher throughput and 9.2 times greater system efficiency for video tasks compared to other open multimodal AI models, company sources confirmed.

The Nemotron 3 Nano Omni employs a 30-billion-parameter A3B hybrid mixture-of-experts architecture that integrates vision and audio encoders within a single system, eliminating the need for separate perception models, according to NVIDIA officials.

NVIDIA representatives said the model achieves 7.4 times higher efficiency for multi-document use cases over comparable omni models, enabling advanced reasoning across multiple modalities for faster and smarter agent responses. The system natively processes inputs without relying on fragmented or chained specialized tools, streamlining workflows in various applications. It can analyze high-resolution 1080p video for navigating complex software and demonstrates leading accuracy in document understanding, as evidenced by its performance across six leaderboards for complex document intelligence.

The company highlighted the model’s versatility across industries, noting its use in office automation where it navigates accounting and design tools to automate data entry tasks. In logistics management, it analyzes video and audio footage to identify operational delays and generate summaries, officials said. The integrated perception and reasoning capabilities support autonomous multi-step task planning, reducing context loss and inference latency, which improves responsiveness in enterprise implementations.

NVIDIA confirmed that the Nemotron 3 Nano Omni is available on multiple platforms, including Hugging Face, OpenRouter, and build.nvidia.com as an NVIDIA NIM microservice. It is accessible through a broad ecosystem of NVIDIA Cloud Partners, inference platforms, and cloud service providers. Compatibility extends to DGX Spark, llama.cpp, LMStudio, vLLM, and SGLang, and deployment options include NVIDIA Jetson hardware, DGX Station, and local systems. The model was recently added to Amazon SageMaker JumpStart, expanding its availability to a wider enterprise audience.

Early adoption of the Nemotron 3 Nano Omni includes companies such as Aible, Applied Scientific Intelligence, Eka Care, Foxconn, H Company, Palantir, and Pyler, according to NVIDIA’s records. Organizations currently evaluating the model include Dell Technologies, Docusign, Infosys, K-Dense, Lila, Oracle, and Zefr. NVIDIA officials said this adoption demonstrates the model’s readiness for production deployment across sectors including AI, software, healthcare, manufacturing, and intelligence.

The Nemotron 3 family, which includes the Nano, Super, and Ultra models, has collectively surpassed 50 million downloads over the past year, NVIDIA reported. The Omni variant extends the family’s capabilities into multimodal and agentic domains by fusing existing reasoning functions with native multimodal perception. Previous versions of Nemotron-3 introduced agentic reasoning for multi-step task planning by late 2025, company sources noted.

NVIDIA emphasized that the model balances efficiency with strong multimodal perception accuracy without sacrificing responsiveness or quality. By consolidating fragmented modality-specific pipelines into a unified system, the Nemotron 3 Nano Omni reduces operational costs and computational overhead, improving scalability for enterprise AI agents. Officials described the model as part of NVIDIA’s broader evolution from a GPU provider to a foundational AI infrastructure company.

.