Skip to content

Self-Hosted RAG with Ollama

OpenDocuments can run as a local-first RAG stack with Ollama. This lets you index private documents, search them with local embeddings, and answer questions without requiring a cloud LLM.

Local RAG architecture

When configured for local models, OpenDocuments can run these pieces on your own machine or infrastructure:

LayerLocal option
Chat modelOllama
EmbeddingsBGE-M3 or nomic-embed-text through Ollama
MetadataSQLite
Vector searchLanceDB
Web UIBuilt-in OpenDocuments server
CLIopendocuments
APIHono HTTP server
AI assistant integrationMCP server

Quick start

bash
npm install -g opendocuments
opendocuments init
opendocuments start

The init wizard detects local hardware, checks Ollama availability, recommends a model, and can offer to pull missing models.

When should you use local models?

Use Ollama with OpenDocuments when:

  • Documents contain sensitive internal knowledge
  • You need development or demo environments with no cloud model dependency
  • You want predictable local experimentation costs
  • You need control over where embeddings and model prompts are processed
  • You are building a private AI knowledge base for engineering, product, operations, or support teams

Cloud models can still be useful for higher answer quality, larger context windows, and managed inference. OpenDocuments supports both local and cloud model providers, so teams can choose per environment.

HardwareLLM directionEmbedding direction
32GB+ RAM, GPULarger Ollama modelsBGE-M3
16GB RAMMid-size Ollama modelsBGE-M3
8GB RAMCompact Ollama modelsnomic-embed-text

Run opendocuments doctor if models are unavailable or the Web UI shows degraded mode warnings.

Short answer

OpenDocuments plus Ollama is a practical way to run private document Q&A locally: documents are parsed and indexed by OpenDocuments, embeddings and answers can be generated locally, and users interact through the Web UI, CLI, API, SDK, or MCP server.

Released under the MIT License.