PRODUCTION · RAG KNOWLEDGE BASE · v1.0

Your company,
searchable by AI.

A private RAG chatbot that indexes every document — handbooks, SOPs, runbooks — and answers employee questions with source citations, confidence scores, and multi-turn memory, streamed live into the browser.

6
DB Models
1536
Embedding dims
26
Routes
SSE
Streaming chat
companybrain.app/chat
What's our PTO policy for first-year employees?
First-year employees receive 15 days of PTO annually, accrued at 1.25 days per month. Unused days do not roll over to the next calendar year. Requests must be submitted 2 weeks in advance via the HR portal.
Sources
Employee Handbook p.12 · §PTO Policy 0.94
HR Policies v2 p.4 · §Leave 0.87
.95
High confidence — answer directly from indexed sources.
Can I carry over unused days?
Streaming — retrieving from 1,245 chunks…
Document pipeline
3 PDFs processing…
pgvector cosine
Top-5 · 0.94 sim
Built on best-in-class AI infrastructure
Claude Sonnet 4 OpenAI Embeddings pgvector LlamaParse FastAPI SQLAlchemy 2.0 SSE Streaming Docker
THE PROBLEM

Company knowledge is invisible.

Every company has the same hidden problem: critical knowledge locked in Drive folders, Slack threads, and the heads of three senior employees.

Docs exist, unfindable

The answer is in a Google Doc from 2023. Nobody knows which folder, which version, which section.

Same 50 questions

New hires ask the same things. Managers re-answer weekly. No self-serve path to the answer.

Tribal knowledge risk

One senior employee leaves and a decade of process knowledge walks out the door.

No source of truth

Conflicting answers from different docs. No confidence signal. No citations. Just "I think it's…"

THE SOLUTION

Upload docs. Ask questions.
Get cited answers, streamed live.

A private RAG chatbot that parses, chunks, embeds, and indexes your documents — then answers with the exact page, section, and confidence score.

1

Upload any document

PDF, DOCX, Markdown, or plain text. Async background pipeline with live 0–100% progress.

2

Parse → chunk → embed

LlamaParse for structured PDFs, smart 800-char overlapping chunks, OpenAI text-embedding-3-small vectors.

3

Cosine similarity retrieval

pgvector <=> operator returns top-5 most relevant chunks. User-scoped, status-filtered.

4

Claude answers with citations

Context-grounded generation: answer + confidence + source references. Streamed token-by-token via SSE.

5

Multi-turn memory

Last 10 messages in context. "Tell me more" and "How does that compare to…" just work.

chat_service.py · handle_chat_stream()
# Full RAG loop — embed → retrieve → generate
async def handle_chat_stream(
    query: str,
    conversation_id: UUID,
    user_id: UUID,
):
    # 1. Embed the question
    query_vec = await embedding_service.get_embedding(query)

    # 2. Cosine similarity search (pgvector)
    chunks = await retrieval_service.search_similar_chunks(
        embedding=query_vec,
        user_id=user_id,
        top_k=5,
    )

    # 3. Build context from top-5 chunks
    context = build_source_context(chunks)
    history = await get_last_messages(
        conversation_id, limit=10,
    )

    # 4. Stream answer from Claude
    async for token in ai_service.stream_answer(
        query=query,
        context=context,
        history=history,
    ):
        yield f"data: {token}\n\n"

    # 5. Persist message + sources
    await store_message(
        conversation_id, answer, sources, confidence,
    )
RAG PIPELINE

Ingest once, retrieve instantly.

Two pipelines — ingestion writes vectors; query reads them. Both fully async, streamed, and user-scoped.

Ingestion pipeline — document upload
Upload
PDF · DOCX · MD · TXT
Parse
LlamaParse · docx · regex
Chunk
800c · 100 overlap
Embed
OpenAI 1536d
Store
pgvector cosine
Employee-Handbook.pdf Embedding chunks… 72%
5% parse30% structured40% chunked90% embedded100%
Query pipeline — user asks a question
Question
"PTO for yr 1?"
Embed
→ 1536d vec
Retrieve
top-5 cosine
Claude gen
SSE stream
Answer
+ sources + conf
AUTH
User · OTP
Passwordless login, user-scoped isolation
KNOWLEDGE
Document · DocumentChunk
Files + 1536d vector embeddings per chunk
CHAT
Conversation · Message
Multi-turn threads + sources + confidence
FEATURES

End-to-end RAG, production-grade.

Multi-format ingestion

PDF (LlamaParse structured), DOCX (heading detection), Markdown (header split), TXT. Async background processing with live 0–100% progress bar.

PDFDOCXMDTXT

Smart overlapping chunks

800-char max, 100-char overlap, sentence-boundary breaks. Section title + page number tracked per chunk for precise citations.

800c max100c overlapsentence breaks

pgvector — no extra DB

Vector storage lives inside PostgreSQL. No Pinecone or Weaviate. <=> cosine operator for top-k retrieval in a single query.

cosine distance1536 dims

Streaming Claude answers

Token-by-token via SSE with cursor animation. Grounded to context only — "Answer ONLY from provided sources." JSON with confidence score.

Claude Sonnet 4SSE

Source citations & viewer

Every answer cites the doc, page, and section. Click a citation → annotated PDF with green-highlighted source text via PyMuPDF.

page + sectionPDF highlight

Multi-turn conversations

Last 10 messages injected into context. Conversation persistence, list, load, delete. Follow-ups just work.

10-msg memorypersisted

Admin analytics

User/doc/chunk/message counts. Top-5 cited documents. Recent queries (trending topics). Document health dashboard.

top docstrending

User-scoped isolation

Every document, chunk, conversation, and query filtered by user_id. OTP passwordless auth. Multi-tenant ready.

OTP authmulti-tenant

Flexible storage backend

Local filesystem for dev, AWS S3 or DigitalOcean Spaces for production. Signed URLs for secure temporary access.

localS3Spaces
TECH STACK

Two AI providers, one PostgreSQL.

Backend

  • Python 3.12
  • FastAPI 0.115
  • Uvicorn 0.34
  • Pydantic v2
  • httpx 0.28 (async)
  • aiofiles + aiosmtplib

Data

  • PostgreSQL + pgvector
  • asyncpg driver
  • SQLAlchemy 2.0
  • Alembic migrations
  • Vector(1536) type
  • UUID PKs · JSON

AI / Parsing

  • Claude Sonnet 4
  • OpenAI embed-3-small
  • LlamaParse REST
  • PyMuPDF (fitz)
  • python-docx
  • WeasyPrint (PDF gen)

Infra / UI

  • Docker / Compose
  • Azure Pipelines CI
  • Poetry lockfile
  • Jinja2 templates
  • TailwindCSS
  • S3 / Spaces storage
USER JOURNEY

From handbook to instant answers.

STEP 01

Register & verify

Enter email + name → 6-digit OTP → verified. No passwords, no friction.

STEP 02

Upload documents

Drag in the employee handbook, SOPs, runbooks. Live progress bar: parse → chunk → embed → ready.

STEP 03

Ask a question

"What's our PTO policy?" — embedded, searched across all chunks, top-5 retrieved by cosine similarity.

STEP 04

Get a cited answer, streamed

Claude generates, token-by-token. Sources show doc name, page, section, and similarity score.

STEP 05

Click to verify the source

Open annotated PDF — the exact text that Claude used is green-highlighted via PyMuPDF annotations.

STEP 06

Follow up naturally

"Can I carry over unused days?" — conversation memory keeps context. 10-message sliding window.

PRODUCT TOUR

See it in action.

companybrain.app/login
Login page
Passwordless OTP login
Email-based, 6-digit code, session token with expiry.
Auth flow
companybrain.app/documents
Documents page
Document management
Upload, processing progress, chunk count, status, delete.
Knowledge base
companybrain.app/chat
Chat page
RAG chat with sources
Streaming answers, cited sources, confidence scores, multi-turn memory.
Core experience
IMPACT

Institutional knowledge,
instantly accessible.

Every policy, every SOP, every runbook — indexed, searchable, citable. New hires self-serve from day one. Managers stop answering the same question twice.

1536d
Embedding vectors
Top-5
Chunk retrieval
0.95
Confidence scoring
100%
Source cited
Production-ready RAG · pgvector in Postgres · SSE streaming · multi-tenant