PRODUCTION · RAG KNOWLEDGE BASE · v1.0

Your company,
searchable by AI.

A private RAG chatbot that indexes every document — handbooks, SOPs, runbooks — and answers employee questions with source citations, confidence scores, and multi-turn memory, streamed live into the browser.

Explore the build See the RAG pipeline

6

DB Models

1536

Embedding dims

26

Routes

SSE

Streaming chat

companybrain.app/chat

What's our PTO policy for first-year employees?

First-year employees receive 15 days of PTO annually, accrued at 1.25 days per month. Unused days do not roll over to the next calendar year. Requests must be submitted 2 weeks in advance via the HR portal.

Sources

Employee Handbook p.12 · §PTO Policy 0.94

HR Policies v2 p.4 · §Leave 0.87

.95

High confidence — answer directly from indexed sources.

Can I carry over unused days?

Streaming — retrieving from 1,245 chunks…

Document pipeline

3 PDFs processing…

pgvector cosine

Top-5 · 0.94 sim

Built on best-in-class AI infrastructure

Claude Sonnet 4 OpenAI Embeddings pgvector LlamaParse FastAPI SQLAlchemy 2.0 SSE Streaming Docker

THE PROBLEM

Company knowledge is invisible.

Every company has the same hidden problem: critical knowledge locked in Drive folders, Slack threads, and the heads of three senior employees.

Docs exist, unfindable

The answer is in a Google Doc from 2023. Nobody knows which folder, which version, which section.

Same 50 questions

New hires ask the same things. Managers re-answer weekly. No self-serve path to the answer.

Tribal knowledge risk

One senior employee leaves and a decade of process knowledge walks out the door.

No source of truth

Conflicting answers from different docs. No confidence signal. No citations. Just "I think it's…"

THE SOLUTION

Upload docs. Ask questions.
Get cited answers, streamed live.

A private RAG chatbot that parses, chunks, embeds, and indexes your documents — then answers with the exact page, section, and confidence score.

1

Upload any document

PDF, DOCX, Markdown, or plain text. Async background pipeline with live 0–100% progress.

2

Parse → chunk → embed

LlamaParse for structured PDFs, smart 800-char overlapping chunks, OpenAI text-embedding-3-small vectors.

3

Cosine similarity retrieval

pgvector <=> operator returns top-5 most relevant chunks. User-scoped, status-filtered.

4

Claude answers with citations

Context-grounded generation: answer + confidence + source references. Streamed token-by-token via SSE.

5

Multi-turn memory

Last 10 messages in context. "Tell me more" and "How does that compare to…" just work.

chat_service.py · handle_chat_stream()

# Full RAG loop — embed → retrieve → generate
async def handle_chat_stream(
    query: str,
    conversation_id: UUID,
    user_id: UUID,
):
    # 1. Embed the question
    query_vec = await embedding_service.get_embedding(query)

    # 2. Cosine similarity search (pgvector)
    chunks = await retrieval_service.search_similar_chunks(
        embedding=query_vec,
        user_id=user_id,
        top_k=5,
    )

    # 3. Build context from top-5 chunks
    context = build_source_context(chunks)
    history = await get_last_messages(
        conversation_id, limit=10,
    )

    # 4. Stream answer from Claude
    async for token in ai_service.stream_answer(
        query=query,
        context=context,
        history=history,
    ):
        yield f"data: {token}\n\n"

    # 5. Persist message + sources
    await store_message(
        conversation_id, answer, sources, confidence,
    )

RAG PIPELINE

Ingest once, retrieve instantly.

Two pipelines — ingestion writes vectors; query reads them. Both fully async, streamed, and user-scoped.

Ingestion pipeline — document upload

Upload

PDF · DOCX · MD · TXT

Parse

LlamaParse · docx · regex

Chunk

800c · 100 overlap

Embed

OpenAI 1536d

Store

pgvector cosine

Employee-Handbook.pdf Embedding chunks… 72%

5% parse30% structured40% chunked90% embedded100%

Query pipeline — user asks a question

Question

"PTO for yr 1?"

Embed

→ 1536d vec

Retrieve

top-5 cosine

Claude gen

SSE stream

Answer

+ sources + conf

AUTH

User · OTP

Passwordless login, user-scoped isolation

KNOWLEDGE

Document · DocumentChunk

Files + 1536d vector embeddings per chunk

CHAT

Conversation · Message

Multi-turn threads + sources + confidence

FEATURES

End-to-end RAG, production-grade.

Multi-format ingestion

PDF (LlamaParse structured), DOCX (heading detection), Markdown (header split), TXT. Async background processing with live 0–100% progress bar.

PDFDOCXMDTXT

Smart overlapping chunks

800-char max, 100-char overlap, sentence-boundary breaks. Section title + page number tracked per chunk for precise citations.

800c max100c overlapsentence breaks

pgvector — no extra DB

Vector storage lives inside PostgreSQL. No Pinecone or Weaviate. <=> cosine operator for top-k retrieval in a single query.

cosine distance1536 dims

Streaming Claude answers

Token-by-token via SSE with cursor animation. Grounded to context only — "Answer ONLY from provided sources." JSON with confidence score.

Claude Sonnet 4SSE

Source citations & viewer

Every answer cites the doc, page, and section. Click a citation → annotated PDF with green-highlighted source text via PyMuPDF.

page + sectionPDF highlight

Multi-turn conversations

Last 10 messages injected into context. Conversation persistence, list, load, delete. Follow-ups just work.

10-msg memorypersisted

Admin analytics

User/doc/chunk/message counts. Top-5 cited documents. Recent queries (trending topics). Document health dashboard.

top docstrending

User-scoped isolation

Every document, chunk, conversation, and query filtered by user_id. OTP passwordless auth. Multi-tenant ready.

OTP authmulti-tenant

Flexible storage backend

Local filesystem for dev, AWS S3 or DigitalOcean Spaces for production. Signed URLs for secure temporary access.

localS3Spaces

TECH STACK

Two AI providers, one PostgreSQL.

Backend

Python 3.12
FastAPI 0.115
Uvicorn 0.34
Pydantic v2
httpx 0.28 (async)
aiofiles + aiosmtplib

Data

PostgreSQL + pgvector
asyncpg driver
SQLAlchemy 2.0
Alembic migrations
Vector(1536) type
UUID PKs · JSON

AI / Parsing

Claude Sonnet 4
OpenAI embed-3-small
LlamaParse REST
PyMuPDF (fitz)
python-docx
WeasyPrint (PDF gen)

Infra / UI

Docker / Compose
Azure Pipelines CI
Poetry lockfile
Jinja2 templates
TailwindCSS
S3 / Spaces storage

USER JOURNEY

From handbook to instant answers.

STEP 01

Register & verify

Enter email + name → 6-digit OTP → verified. No passwords, no friction.

STEP 02

Upload documents

Drag in the employee handbook, SOPs, runbooks. Live progress bar: parse → chunk → embed → ready.

STEP 03

Ask a question

"What's our PTO policy?" — embedded, searched across all chunks, top-5 retrieved by cosine similarity.

STEP 04

Get a cited answer, streamed

Claude generates, token-by-token. Sources show doc name, page, section, and similarity score.

STEP 05

Click to verify the source

Open annotated PDF — the exact text that Claude used is green-highlighted via PyMuPDF annotations.

STEP 06

Follow up naturally

"Can I carry over unused days?" — conversation memory keeps context. 10-message sliding window.

PRODUCT TOUR

See it in action.

companybrain.app/login

Passwordless OTP login

Email-based, 6-digit code, session token with expiry.

Auth flow

companybrain.app/documents

Document management

Upload, processing progress, chunk count, status, delete.

Knowledge base

companybrain.app/chat

RAG chat with sources

Streaming answers, cited sources, confidence scores, multi-turn memory.

Core experience

IMPACT

Institutional knowledge,
instantly accessible.

Every policy, every SOP, every runbook — indexed, searchable, citable. New hires self-serve from day one. Managers stop answering the same question twice.

1536d

Embedding vectors

Top-5

Chunk retrieval

0.95

Confidence scoring

100%

Source cited

Production-ready RAG · pgvector in Postgres · SSE streaming · multi-tenant

Your company, searchable by AI.