AI Inference Service Submodule
This submodule enables AI model inference, with two focuses:
- general inference with costly and large AI models, e.g. Large Language Models (LLM), on supercomputers;
- inference for translating natural-language queries into database (e.g. SQL) queries.
Flexibly-Deployable EXA4MIND HPC/Supercomputing Inference Service
EXA4MIND provides an inference service module that is capable of leveraging supecomputing (i.e. High-Performance-Computing power) for complex inference tasks.
The service, which we provide with the longer-term aim of easy deployment at any site, provides an OpenAI compatible REST API for submitting inference requests, while abstracting away all of the complexity involved in HPC job orchestration and resource management. Under the hood, it integrates with HEAppE middleware for acquiring, launching, and monitoring inference jobs across HPC clusters.
The repository is available here. An elaborate documentation is available on separate pages (see repository README).
Features
-
Automatic Job and AI Model Managment
- The service allocates computational resources of the HPC cluster and manages loaded AI models. The user only tells what data goes to which AI model.
-
Large language model inference
- Solve your tasks with popular open LLMs such as Qwen3, Gemma3 or any other open model hosted on Hugging Face
-
Inference engine integration
- Use NVIDIA Triton as the state-of-the-art inference server that handles everything from proper inference request batching to efficient model loading and managment.
- Custom inference engine, which is a GPU agnostic Hugging Face transformers solution using ZeroMQ to distribute AI model inference tasks to multiple GPUs on compute nodes.
-
Easy to use
- The API is based on popular OpenAI API, which easily integrates with a large variety of AI frontends and tools.
Inference for Natural Language Query
We have developed functionality in AQIS to allow for querying databases via natural language, employing a suitable LLM. Stay tuned for publicaiton of this submodule.