NeuroVault
Local-first AI memory for Claude
NeuroVault is a local-first AI memory system that fixes the central failure of LLM agents: they forget you after every conversation. Built as a Tauri 2 desktop app, NeuroVault stores notes as plain markdown in your home directory, indexes them with a Rust-powered hybrid retrieval engine (semantic + BM25 + knowledge graph), and exposes them to Claude over MCP. Most agent-memory products today are RAG pipelines in disguise. NeuroVault treats memory as a structured, updatable, inspectable knowledge base - what you get when you stop chunking-and-praying and start treating memory as a living wiki.
Architecture
+-------------------------------------------------+
| Tauri 2 desktop app (React 19 + TypeScript) |
| Editor / Graph / Compile / Sidebar / Palette |
+-----------------------+-------------------------+
| Tauri commands + HTTP :8765
+-----------------------v-------------------------+
| In-process Rust backend |
| - axum HTTP server (the MCP proxy talks here) |
| - hybrid retriever (semantic + BM25 + graph) |
| - fastembed-rs (BGE-small ONNX, local) |
| - notify file watcher |
+-----------------------+-------------------------+
| SQL + vec0
+-----------------------v-------------------------+
| SQLite + sqlite-vec (~/.neurovault/...) |
| brain.db, vault/*.md, raw/, assets/, cache/ |
+-------------------------------------------------+Key Features
Why this is not RAG
RAG is an answer-pipeline: question → chunk → embed → retrieve → stuff context → generate → repeat. The corpus is dead data. Contradictions are invisible. Provenance is a prayer. NeuroVault is a knowledge layer - it differs from RAG in five specific ways that map directly to what a living internal wiki needs.
- Accumulates via Ebbinghaus strength decay + access reinforcement (RAG re-chunks)
- Structured with Karpathy's 3-layer raw/wiki/schema pattern (RAG is flat chunks)
- Three automatic link types: semantic similarity, shared entities, explicit wikilinks (RAG has none)
- Silent fact capture promotes casually-dropped facts with wiki-link provenance (RAG cites the chunk)
- Temporal fact tracking - when new facts contradict old, supersession + recency penalty stops pollution (RAG cannot challenge or update)
Performance benchmarks
All numbers are reproducible locally. Run cd server && uv run python ../benchmarks/run_recall.py to verify. The benchmark uses 25 hand-crafted notes and 25 queries (5 easy, 10 medium, 10 hard).
- Hybrid retrieval: 92% Top-1, 96% Top-3 hit, 0.94 MRR, 73ms median latency
- With cross-encoder rerank: 92% Top-1, 100% Top-3, 0.96 MRR, 133ms latency
- Hard queries (no keyword overlap): 9/10 top-1 without reranker
- Embed a note: ~20ms · Full vault ingest (25 notes): ~4s cold start
- Tokens per answer: ~275 flat regardless of vault size (paste-whole-vault baseline: 93k+ growing linearly)
The silent fact-capture pipeline
The killer feature: drop a fact in conversation and NeuroVault picks it up without you saying "remember this." A UserPromptSubmit lifecycle hook pipes every prompt through a regex extractor that recognizes 8 patterns: preferences, decisions, stack choices, deadlines, identity, anti-preferences, deployment targets, and explicit "remember that..." callouts. End-to-end bench: 80% Hit@1, 100% Hit@3 on 15 paraphrased questions that never use original wording.
Tech stack rationale
Each layer was chosen for local-first guarantees: Tauri 2 over Electron (~30MB vs ~200MB installed), Rust backend in-process (no Python sidecar at boot), sqlite-vec for KNN in pure SQL (no separate vector DB), fastembed-rs for ONNX-quantized embeddings (no API keys, no internet), MCP as the agent contract (works with Claude Desktop, Claude Code, Cursor, any MCP-speaking client).
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | recall | Hybrid search: semantic + BM25 + graph fused via RRF (optional cross-encoder rerank) |
| POST | recall_chunks | Same retrieval but returns matching paragraphs (cheaper) |
| POST | remember | Save a memory - triggers chunk + embed + entity extract + graph link |
| POST | related | Direct neighbours of an engram via the graph (~50x cheaper than fresh recall) |
| POST | session_start | Wake-up tool - brain stats + L0 identity + top memories + open todos |
| POST | core_memory_set | Persona-style always-included blocks (Letta pattern) |
| POST | list_brains | Multi-brain navigation - separate memory spaces |
| POST | switch_brain | Switch active brain instantly without restart |
| POST | create_brain | Create new memory space for a separate project/context |
| POST | check_duplicate | Pure cosine pre-check before remember() to avoid duplicate storage |
| POST | list_unnamed_clusters | Agent-driven cluster naming for the graph view's Analytics mode |
| POST | add_todo | Multi-agent coordination via append-only todos.jsonl |