AI agent programs as we speak juggle separate fashions for imaginative and prescient, speech and language — dropping time and context as they cross information from one mannequin to the opposite.
Unveiled as we speak, NVIDIA Nemotron 3 Nano Omni is an open multimodal mannequin that brings these capabilities collectively into one system, enabling brokers to ship sooner, smarter responses with superior reasoning throughout video, audio, picture and textual content. This best-in-class mannequin offers enterprises and builders a manufacturing path for extra environment friendly and correct multimodal AI brokers with full deployment flexibility and management.
Nemotron 3 Nano Omni units a brand new effectivity frontier for open multimodal fashions with main accuracy and low value, topping six leaderboards for advanced doc intelligence, and video and audio understanding.
AI and software program firms already adopting Nemotron 3 Nano Omni embody Aible, Utilized Scientific Intelligence (ASI), Eka Care, Foxconn, H Firm, Palantir and Pyler, with Dell Applied sciences, Docusign, Infosys, Okay-Dense, Lila, Oracle and Zefr evaluating the mannequin.
“To construct helpful brokers, you possibly can’t wait seconds for a mannequin to interpret a display,” stated Gautier Cloix, CEO of H Firm. “By constructing on Nemotron 3 Nano Omni, our brokers can quickly interpret full HD display recordings — one thing that wasn’t sensible earlier than. This isn’t only a velocity increase: It’s a elementary shift in how our brokers understand and work together with digital environments in actual time.”
Nemotron 3 Nano Omni Allows Quicker, Leaner Multimodal Brokers
Think about an AI agent for buyer assist processing a display recording whereas analyzing uploaded name audio and checking information logs — or an agent for finance tasked with parsing PDFs, spreadsheets, charts and voice notes. At the moment, most agentic programs accomplish these duties with separate fashions for imaginative and prescient, speech and language.
This method will increase latency by way of repeated inference passes, fragments context throughout modalities, and provides value and inaccuracies over time.
By combining imaginative and prescient and audio encoders inside its 30B-A3B, hybrid mixture-of-experts structure, Nemotron 3 Nano Omni eliminates the necessity for separate notion fashions, driving inference effectivity at scale. It pairs this effectivity with sturdy multimodal notion accuracy, enabling AI programs to realize 9x increased throughput than different open omni fashions with the identical interactivity. The result’s decrease prices and higher scalability with out sacrificing responsiveness or high quality.
In agentic programs, Nemotron 3 Nano Omni can work alongside proprietary cloud fashions or different NVIDIA Nemotron open fashions — similar to Nemotron 3 Tremendous for high-frequency execution or Nemotron 3 Extremely for advanced planning — in addition to proprietary fashions from different suppliers, to energy sub-agents for agentic workflows similar to laptop use, doc intelligence and audio-video reasoning.
- Pc use brokers — Nemotron 3 Nano Omni powers the notion loop for brokers navigating graphical person interfaces, reasoning over onscreen content material and understanding person interface state over time. H Firm’s newest laptop utilization agent, powered by Nemotron 3 Nano Omni, makes use of a local enter decision of 1920×1080 pixels to realize high-fidelity visible reasoning. In preliminary evaluations on the OSWorld benchmark, this integration confirmed a major leap in navigating advanced graphical interfaces and used Nemotron 3 Nano Omni’s potential to course of very high-resolution pictures.
- Doc intelligence — Interprets paperwork, charts, tables, screenshots and mixed-media inputs, enabling brokers to motive throughout visible construction and textual content content material coherently. Essential for enterprise evaluation and compliance workflows.
- Audio and video understanding — For customer support, analysis and monitoring workflows, Nemotron 3 Nano Omni maintains audio-video context, tying what was stated, proven and documented right into a single reasoning stream as a substitute of disconnected summaries.

Open and Customizable, Deployable Anyplace
Nemotron 3 Nano Omni is launched with open weights, datasets and coaching strategies — giving organizations full transparency and management over how the mannequin is personalized and deployed.
Builders can use instruments like NVIDIA NeMo for personalization, analysis and optimization for domain-specific use circumstances. As a result of the Nemotron household of fashions is open, organizations can deploy them in environments that meet regulatory, sovereignty or information localization necessities.
The Nemotron 3 household — together with Nano, Tremendous and Extremely fashions — has seen over 50 million downloads up to now 12 months. Omni extends the household’s capabilities into multimodal and agentic domains.
The mannequin is offered on Hugging Face, OpenRouter and construct.nvidia.com as an NVIDIA NIM microservice and thru a broad ecosystem of NVIDIA Cloud Companions, inference platforms and cloud service suppliers.
Its open, light-weight structure helps constant deployment from native programs like NVIDIA Jetson {hardware}, NVIDIA DGX Spark and DGX Station to information heart and cloud environments.
Go to the NVIDIA technical weblog for tutorials, cookbooks and deployment guides for Nemotron 3 Nano Omni use circumstances. Stay updated on agentic AI, NVIDIA Nemotron and extra by subscribing to NVIDIA information, becoming a member of the group and following NVIDIA AI on LinkedIn, Instagram, X and Fb.
