Integrate NVIDIA NIM (NVIDIA Inference Microservices) with OnPremiseAgent for optimized GPU-accelerated model serving. NIM provides pre-optimized containers for popular models with TensorRT-LLM acceleration, delivering maximum performance on NVIDIA hardware.
API Key
Runtimes IA
NVIDIA A100, NVIDIA H100, NVIDIA L40S, CUDA 12.x
Bientôt disponible
Tout ce dont vous avez besoin pour intégrer NVIDIA NIM à vos workflows d'agents on-premise.
Hardware-optimized inference with TensorRT-LLM for maximum throughput on NVIDIA GPUs.
Deploy pre-optimized NIM containers for Llama, Mistral, and other popular models.
Automatic tensor parallelism across multiple GPUs for serving large models.
Industry-standard OpenAI-compatible API for seamless integration.
Pull the NVIDIA NIM container for your chosen model from NVIDIA NGC catalog.
Deploy models with TensorRT-LLM optimization for the lowest possible latency on NVIDIA hardware.
Serve 70B+ parameter models across multiple GPUs with automatic tensor parallelism.
Standardize on NVIDIA NIM for all AI inference with enterprise support and SLA guarantees.
NIM containers are available through NVIDIA NGC. Some models require an NVIDIA AI Enterprise subscription for production use.
NIM supports Llama 3, Mistral, Mixtral, and many other popular models with pre-optimized TensorRT-LLM configurations.
Associez NVIDIA NIM à ces connecteurs pour une stack d'intégration complète.
Déployez sur votre propre infrastructure avec une souveraineté totale des données. Lancez-vous en quelques minutes.