Explore the latest optimized AI models, connect applications to data with NVIDIA NIM™ Agent Blueprints, and deploy anywhere with NVIDIA NIM microservices.
Integrations
Get up and running quickly with familiar APIs.
Use NVIDIA APIs from your existing tools and applications with as little as three lines of code.
Work with your favorite large language model (LLM) programming frameworks, including LangChain and LlamaIndex, and easily integrate the latest AI models in your applications.
Data powers modern enterprise applications. Connect AI agents to enterprise data at scale with an AI query engine that uses retrieval-augmented generation (RAG) to equip employees with instant, accurate institutional knowledge.
Everything you need to build impactful generative AI applications. Each blueprint includes NVIDIA NIM and partner microservices, one or more AI agents, sample code, customization instructions, and a Helm chart for deployment.
Run Anywhere
Part of NVIDIA AI Enterprise, NVIDIA NIM is a set of easy-to-use inference microservices for accelerating the deployment of foundation models on any cloud or data center and helping to keep your data secure.
Deploy NIM for your model with a single command. You can also easily run NIM with fine tuned-models.
Get NIM up and running with the optimal runtime engine based on your NVIDIA-accelerated infrastructure.
Developers can integrate self-hosted NIM endpoints in just a few lines of code.
Seamlessly deploy containerized AI microservices on any NVIDIA accelerated infrastructure, from a single device to data center scale.
Rely on production-grade runtimes, including ongoing security updates, and run your business applications with stable APIs backed by enterprise-grade support.
Lower the operational cost of running models in production with AI runtimes that are continuously optimized for low latency and high throughput on NVIDIA-accelerated infrastructure.
NVIDIA NIM provides optimized throughput and latency out of the box to maximize token generation, support concurrent users at peak times, and improve responsiveness.
Configuration: Llama3.1-8B-instruct, 1x H100SXM; input 1000 tokens, output 1000 tokens. Concurrent requests: 200. NIM ON : FP8. throughput 6,354 tokens/s, TTFT 0.4s, ITL: 31ms. NIM OFF : FP8. throughput 2,265 tokens/s, TTFT 1.1s, ITL: 85ms
Customization
NVIDIA NeMo™ is an end-to-end platform for developing custom generative AI anywhere. It includes tools for training, customization and retrieval-augmented generation (RAG), guardrailing, data curation, and model pretraining, offering enterprises an easy, cost-effective, and fast way to adopt generative AI.
Access foundation models, enterprise software, accelerated computing, and AI expertise to build, fine-tune, and deploy custom models for your enterprise applications.
Use Cases
See how NVIDIA APIs support industry use cases and jump-start your AI development with curated examples.
Ecosystem
Join leading partners to develop your AI applications with models, toolkits, vector databases, frameworks, and infrastructure from our ecosystem.
Resources
Explore technical documentation to start prototyping and building your enterprise AI applications with NVIDIA APIs, or scale on your own infrastructure with NVIDIA NIM.