DeepFellow Infra

The open-source foundation for running AI on your own terms

Deploy, manage, and scale AI models across your own hardware - on-premise, private cloud, or hybrid. Host models on your own, with full control over compute and every layer of your AI infrastructure.

Free and open source.

What you get

Flexible and efficient model hosting

Install, host, and run multiple LLM, ML, voice, and image generation models across any setup. Works out of the box with vLLM, Ollama, llama.cpp, Hugging Face models, and any custom ML model. Supports embeddings, text-to-speech, speech-to-text, and image generation natively.

Tree-cluster topology with load balancing

Build resilient, multi-node infrastructure with automatic node organization and intelligent load balancing. Delegate entire nodes or single machines to specific models or tasks. Scale vertically and horizontally — no downtime, no manual reshuffling.

OpenAI-compatible API

A single, unified API endpoint for all model interactions. Drop-in compatible with existing OpenAI-based tooling and workflows. No rewrites, no adapters.

Local, hybrid, and cloud inference

Run fully on-premise, connect to private cloud, or build hybrid setups spanning both. Infra handles routing and distribution across the entire cluster regardless of where compute lives. Runs on AWS and other cloud providers out of the box.

Integrations

Native LangChain integration, Vector Store support, custom endpoints, and plugin architecture. Connects directly to DeepFellow Server as the compute backbone for the full stack.

CLI-first, GitOps-ready

Full CLI-based management for automation, scripting, and GitOps workflows. Every operation available in a terminal - versionable, automatable, repeatable.

Runs on hardware you already have

DeepFellow Infra adapts to your hardware.

There are no strict minimum requirements - start small, scale when you need to. Here's an example specs and cost sheet:

One moment please...

Setup	Specs	Best for	Estimated cost
Server Solution	60GB VRAM, 256GB RAM, 8×RTX 4000 ADA	Large models, high performance	$5,000–$76,000
PC workstation	32–128GB RAM, 2×Nvidia GPU	Small and medium models at speed	$2,000–$9,000
Mac Studio	35–512GB RAM, up to 80-core GPU	Large models at moderate speed	$3,000–$25,000
MacBook Pro	36–128GB RAM, up to 40-core GPU	Medium models, individual use	$2,000–$10,000

See full hardware recommendations

Hardware specs

DeepFellow Infra is free and open source.

You can inspect it, extend it, self-host it and start owning your AI. No licensing fees, no vendor lock-in on the foundation layer.

View on Github Red the docs

One moment please...

DeepFellow Infra is the foundation.

Add Server for orchestration and access control and Enterprise Plugins for compliance, auditability, governance and security.

Explore DeepFellow Server Explore DeepFellow Enterprise

One moment please...

Ready to build?

Book a demo and we'll help you start building.

Explore resources about licensing, use cases, and technical documentation.

Schedule a demo Start building

One moment please...