vai logo
vai
Use CasesShared SpaceDocs
Get Started
Getting Started
Local Inference
Embeddings

Local Inference With Ollama

Run a local CLI workflow with Ollama generation and local embeddings, without a Voyage API key.

Offline-capable
Requires Ollama
View Source TapeOpen Docs
Prerequisites

Ollama is installed and running locally.

The llama3.2:3b model is already pulled in Ollama.

vai nano setup has already completed successfully.

Under the hood

See the exact VAI command, the matching Voyage AI layer, and the MongoDB query shape behind the demo.

vai embed "Local inference keeps retrieval private, fast, and API-key free." --local --dimensions 256

The --local flag switches the command from hosted embeddings to the local voyage-4-nano bridge. That keeps the demo private, API-key free, and fast to re-run.

Share or copy this demo

Keep it lightweight. The prepared text stays behind the buttons.

Open canonical URL

Share

Copy

LinkedIn opens the share dialog and copies the prepared text so you can paste it in quickly.

Exact commands

The full walkthrough is included here so anyone can replay the demo exactly as published.

$vai --version
$ollama list
$export VAI_LLM_PROVIDER=ollama
$export VAI_LLM_MODEL=llama3.2:3b
$export VAI_LLM_BASE_URL=http://localhost:11434
$vai nano status
$ollama run llama3.2:3b "In 4 short lines, explain why pairing a local LLM with local embeddings is useful for CLI demos."
$vai embed "Local inference keeps retrieval private, fast, and API-key free." --local --dimensions 256
$vai explain local inference

Related demos

More shareable workflows from the same VAI demo library.

Chunking
Preprocessing
Chunking Strategies Before Embedding

Compare fixed, sentence, and markdown chunking on the same sample document before any embedding or storage layer is introduced.

Offline-capable

VAI command

vai chunk /tmp/sample.md --strategy markdown

Show Under the Hood

Prerequisites

The `vai` CLI is installed locally. No API key is required for chunking-only workflows.

View DemoSource
RAG
Local Inference
Featured
Local RAG Chat With Ollama And Nano

Build a tiny Atlas-backed RAG chat flow using local nano embeddings and Ollama for generation.

Requires Ollama
Atlas

VAI command

vai chat --db "$DEMO_DB" --collection "$DEMO_COLLECTION" --local --llm-provider ollama --llm-model "$OLLAMA_MODEL" --llm-base-url http://localhost:11434 --no-history --no-stream

Show Under the Hood

Prerequisites

Ollama is installed and running locally.

Getting Started
Embeddings
Featured
CLI Quickstart For Embeddings

Walk through the core vai embedding commands: model discovery, embedding generation, explainers, and similarity.

API key

VAI command

vai embed "What is MongoDB Atlas?"

Show Under the Hood

Prerequisites

A valid VOYAGE_API_KEY is set in the environment.