Case studyAI · Local-first memory

NeuroVault

A persistent, local-first memory layer for AI agents. Sits between your local markdown vault and any MCP-compatible agent (Claude Code, Claude Desktop, Cursor, Codex) — giving them a callable recall() + remember() surface that persists forever, with no cloud, no subscription, no Python sidecar.

9.3 MBInstaller
~35 MBIdle RAM
<500msCold start
86.67%hit@1 default
93.33%hit@1 + reranker
20-50msRecall latency
build neurovault --release~/neurovault9 deps0
Ready Stack resolved in 0.23s
[+] loaded dependency: Tauri 2
[+] loaded dependency: Rust
[+] loaded dependency: React 19
[+] loaded dependency: TypeScript
[+] loaded dependency: SQLite + sqlite-vec
[+] loaded dependency: fastembed (BGE-small)
[+] loaded dependency: Axum HTTP
[+] loaded dependency: MCP
[+] loaded dependency: FastMCP proxy

01 / What it is.

Modern chat assistants have no native long-term memory. Every new session starts from zero. NeuroVault fixes that — without sending anything to a cloud service.

It is

  • A single ~9 MB binary you install once.
  • A local markdown vault that is the source of truth (rebuildable from disk).
  • An MCP server agents can call to remember and recall — over loopback HTTP.
  • An in-process Rust retriever: hybrid semantic + BM25 + graph with optional cross-encoder rerank.

It is not

  • A cloud RAG service. Everything runs on your machine.
  • A general-purpose vector DB. It's a memory system for markdown + typed metadata.
  • Trying to beat academic benchmarks. It optimises for single user, many agents, no cost per ingest, under 100 ms per recall.
  • A Python app. The whole hot path lives in Rust in-process.

02 / Architecture.

One process. The React UI talks to the Rust memory layer via Tauri IPC; external agents talk to the same memory layer over loopback HTTP via a thin FastMCP proxy. The single most important architectural choice: the heavy stuff never runs in the agent's process tree.

┌─────────────────────────────────────────────────────────────────────┐
│  NeuroVault.exe  (Tauri, single process, ~35 MB idle)               │
│                                                                     │
│  ┌──────────────────┐       ┌──────────────────────────────────────┐│
│  │ React UI         │◄─────►│ Rust memory::* modules (in-process)  ││
│  │  • sidebar       │ Tauri │  • retriever (hybrid + rerank)       ││
│  │  • editor        │  IPC  │  • ingest (chunk/embed/link/BM25)    ││
│  │  • graph view    │       │  • file watcher (notify crate)       ││
│  │  • command palt. │       │  • SQLite + sqlite-vec               ││
│  │  • settings      │       │  • fastembed (BGE-small + rerank)    ││
│  └──────────────────┘       └───┬──────────────────────────────────┘│
│                                 │                                   │
│                             axum HTTP server (127.0.0.1:8765)       │
└─────────────────────────────────┼───────────────────────────────────┘
                                  │ loopback HTTP
                  ┌───────────────┴────────────────┐
                  │  mcp_proxy.py  (~30–50 MB)     │
                  │   FastMCP stdio → HTTP shim    │
                  └───────────────┬────────────────┘
                                  │ stdio JSON-RPC
                                  ▼
                  ┌────────────────────────────────┐
                  │ Claude Code · Cursor · Desktop │
                  └────────────────────────────────┘
Why Tauri, not Electron

WebView2 is already on Windows — no shipping a second browser. Installer drops from 150 MB to 9.

Why one process, not two

Python had two: desktop + sidecar. Two fastembed loads, two SQLite connections, two sources of truth. The Rust version collapses them — memory runs inside the Tauri process.

Why HTTP for agents

UI hits memory via Tauri IPC; agents hit memory via loopback HTTP. Both call the same hybrid_retrieve_throttled. One implementation, two transports.

03 / How recall works.

A retrieval query runs through three signal sources fused into one ranked list. Optionally a cross-encoder reranks the top-K for precision-critical paths.

  1. 01
    Semantic recall (BGE-small + sqlite-vec)

    Query is embedded with title-prefixed context, scored against the chunk vector index. Returns top 30.

  2. 02
    BM25 keyword recall

    Independent lexical index over chunks catches exact-token matches the semantic index misses (rare names, code symbols, IDs).

  3. 03
    Graph traversal

    Typed edges (semantic similarity / entity mention / wikilink) propagate scores between linked engrams — surfaces context the query didn't ask for but needs.

  4. 04
    Fusion + optional rerank

    Reciprocal-rank-fusion produces a single ranking. Cross-encoder rerank is opt-in: ~680 ms total instead of 25 ms, but boosts hit@1 from 86.67% → 93.33%.

04 / Inside the app.

The UI is built around three primary surfaces: the markdown editor (boring on purpose), the command palette (where agents' memory operations show up), and the knowledge graph (the only place the hidden links between notes become visible).

NeuroVault knowledge graph
Knowledge graph viewEngrams are nodes; typed edges connect them by semantic similarity, shared entities, and explicit wikilinks. Clicking a node opens the underlying markdown.
NeuroVault command palette
Command paletteOne keyboard shortcut to everything — note operations, recall, brain switching, and the same surface agents use through MCP.

05 / On disk.

Everything persists under ~/.neurovault/. Markdown files are the source of truth — if brain.db is ever deleted, it's rebuilt deterministically by re-ingesting every .md in the vault. This is a hard invariant: no memory exists only in the database.

~/.neurovault/
├── brains.json                     ← which brain is active
├── extensions/
│   └── vec0.dll                    ← sqlite-vec binary
└── brains/
    └── <brain_id>/
        ├── brain.db                ← SQLite: 24 tables (engrams,
        │                              chunks, vec_chunks, entities,
        │                              links, temporal_facts, …)
        ├── audit.jsonl             ← append-only tool-call log
        ├── trash/                  ← soft-deleted files
        └── vault/
            ├── *.md                ← markdown (source of truth)
            ├── index.md            ← auto-maintained wiki index
            └── log.md              ← append-only activity feed

Try it.

The full marketing site, downloads, and docs live at neurovault.dathproject.com. The source — Rust + React + Tauri, MIT licensed — is on GitHub.