Runtimes IA

NVIDIA NIM Connecteur

Integrate NVIDIA NIM (NVIDIA Inference Microservices) with OnPremiseAgent for optimized GPU-accelerated model serving. NIM provides pre-optimized containers for popular models with TensorRT-LLM acceleration, delivering maximum performance on NVIDIA hardware.

API Key

Commencer Contacter le service commercial

Authentification

API Key

Catégorie

Runtimes IA

Compatibilité

NVIDIA A100, NVIDIA H100, NVIDIA L40S, CUDA 12.x

Niveau

Bientôt disponible

Des capacités conçues sur mesure

Tout ce dont vous avez besoin pour intégrer NVIDIA NIM à vos workflows d'agents on-premise.

TensorRT-LLM

Hardware-optimized inference with TensorRT-LLM for maximum throughput on NVIDIA GPUs.

Pre-Built Containers

Deploy pre-optimized NIM containers for Llama, Mistral, and other popular models.

Multi-GPU Scaling

Automatic tensor parallelism across multiple GPUs for serving large models.

OpenAI Compatible

Industry-standard OpenAI-compatible API for seamless integration.

Pull NIM Container

Pull the NVIDIA NIM container for your chosen model from NVIDIA NGC catalog.

Commencer

Avantages clés

Pourquoi les entreprises choisissent ce connecteur

Maximum inference performance on NVIDIA hardware
Pre-optimized containers — no manual optimization needed
Multi-GPU tensor parallelism for large models
Enterprise support from NVIDIA

Maximum Performance

Deploy models with TensorRT-LLM optimization for the lowest possible latency on NVIDIA hardware.

Large Model Serving

Serve 70B+ parameter models across multiple GPUs with automatic tensor parallelism.

Enterprise Standardization

Standardize on NVIDIA NIM for all AI inference with enterprise support and SLA guarantees.

Questions fréquentes

Do I need an NVIDIA AI Enterprise license?

NIM containers are available through NVIDIA NGC. Some models require an NVIDIA AI Enterprise subscription for production use.

Which models are available as NIM containers?

NIM supports Llama 3, Mistral, Mixtral, and many other popular models with pre-optimized TensorRT-LLM configurations.

Fonctionne parfaitement avec

Associez NVIDIA NIM à ces connecteurs pour une stack d'intégration complète.

Bientôt disponible

vLLM

High-throughput model serving with vLLM for production AI workloads.

Runtimes IA

Bientôt disponible

Ollama

Run open-source LLMs locally with Ollama for fully air-gapped AI inference.

Runtimes IA

Disponible

Kubernetes

Orchestrate AI agents as containerized workloads with auto-scaling and self-healing.

Infrastructure

Prêt à connecter NVIDIA NIM?

Déployez sur votre propre infrastructure avec une souveraineté totale des données. Lancez-vous en quelques minutes.

Rejoindre la liste d'attente Planifier une démo

NVIDIA NIM Connecteur

API Key