What is an AI document assistant and how does it work?

An AI document assistant is a private tool that reads your own files, such as policies, PDFs, and guides, and answers questions about them in plain language. It works by splitting your documents into small chunks, turning each chunk into numbers that capture its meaning, and storing them in a vector database. When someone asks a question, the assistant finds the few most relevant chunks and gives only those to a language model, which writes the answer and cites the source. Because it answers from your real documents rather than from memory, the replies stay accurate and easy to verify.

Is it safe to use AI on private company documents?

It can be very safe if you build it the right way. The risk people worry about is sending private files to a public chatbot, where the data leaves your control. You avoid that by keeping your documents and your vector database inside your own systems, for example using pgvector inside your existing PostgreSQL database. You also choose a model and setup where your data is not used for training. With those choices in place, your knowledge stays inside your own walls, and only a small, relevant slice of text is ever sent to the model to answer a single question.

What is retrieval augmented generation (RAG) in simple terms?

Retrieval augmented generation, or RAG, means the AI looks something up before it answers, instead of relying only on what it already learned. Think of it like an open book exam. Rather than memorising every policy, the assistant searches your documents, pulls out the few paragraphs that match the question, and writes its answer from those. This keeps answers grounded in your real, current information, makes them easy to update by simply adding new documents, and greatly reduces the chance of the AI making things up. It is the most practical way for most businesses to use AI on their own knowledge.

Why does AI sometimes give wrong answers, and how do you stop it?

AI gives wrong answers, often called hallucinations, when it answers from memory and fills gaps with confident guesses. The fix is grounding. You retrieve the relevant text from your own documents and instruct the model to answer using only that text and to cite the source. You also tell it to say "I do not know" when the answer is not in the provided text. This honest fallback is powerful, because a system that admits uncertainty is far more trustworthy than one that always sounds sure. Showing the source document alongside each answer lets staff confirm it in one click.

What tech stack do you need to build one?

You need four building blocks, and the exact tools are flexible. First, a way to read and split documents into chunks. Second, an embedding model to turn text into numbers. Third, a vector database to store and search those numbers, such as pgvector in PostgreSQL or a hosted option. Fourth, a language model to write the grounded answer. Around that, a simple Node backend and a React front end give your team a clean chat interface. The same pattern works whether your business already runs on PHP, Node, or anything else, since the assistant can sit alongside your current system.

AI Development

How I Built an AI Document Assistant for a Client

private ai

ai document assistant

Jun 12, 2026

10 min read

13 views

How I Built an AI Document Assistant for a Client

The day my client's team stopped guessing

A client of mine runs a property management company with about forty staff, and they came to me with a problem that sounds small but was quietly draining money every single day. Their team kept answering the same questions over and over, and the answers were buried inside hundreds of documents. Lease rules, maintenance policies, vendor contracts, and onboarding guides were spread across folders that nobody could search properly. I fixed it by building them an AI document assistant, a private tool that reads their own files and answers questions in plain language within seconds. This article is the full, honest story of that build, including the approaches that failed, the exact steps and code that worked, and the results the business actually felt.

I am writing it so two kinds of readers get value. If you own or run a business, you will see how a tool like this saves real hours and money. If you write code, you will get a clear, step by step path you can follow and adapt to your own stack.

The real problem: the answers existed, but nobody could find them

When I sat with their team, the pattern was easy to spot. A tenant would ask whether they could sublet their flat. A new staff member would ask how to handle an emergency repair at night. An owner would ask why a particular fee appeared on their statement. The correct answer almost always lived somewhere in a document. The trouble was finding it quickly. Staff would open several files, scroll through long PDFs, ask a senior colleague, and sometimes still guess.

Multiply that by dozens of questions a day and the cost becomes clear. Replies were slow, which made customers unhappy. New hires took weeks to become useful, because the real knowledge lived in people's heads or in files they did not know existed. And senior staff were pulled away from important work to answer the same basic questions again and again. The knowledge was all there. It was simply locked away.

Why the obvious fixes did not work

My client had already tried a few common fixes before calling me, and most businesses reach for the same ones.

Keyword search did not understand meaning

Their intranet search matched exact words. So when someone typed "can a tenant rent out their flat" but the policy document said "subletting," the search returned nothing useful. Keyword search looks for matching letters, not matching meaning, so it misses the moment people phrase things in their own words.

Pasting documents into a public chatbot

When AI tools became popular, someone tried copying whole documents into a public chatbot and asking questions. This failed for three reasons. The documents were far too long to fit. The answers often sounded confident but were wrong. And, most seriously for a business handling private records, their information was leaving the company with no control over where it went.

Fine tuning a model on the documents

A developer friend suggested training, or fine tuning, a model on all of their documents. It sounds clever, but in practice it is expensive, it has to be redone every time a policy changes, and it still invents exact details like dates and fees. For a knowledge base that updates often, it is the wrong tool.

None of these gave the business what it actually needed, which was a trustworthy answer, grounded in the company's own files, delivered fast, and kept completely private.

What the investigation really revealed

Once I dug in, I landed on the lesson I keep relearning. The model is not the hard part. The hard part is retrieval, which simply means finding the right few paragraphs before the AI writes anything.

A language model can produce an excellent answer, but only when you hand it the correct source text. Give it everything at once and it drowns in noise and guesses. Give it nothing and it makes things up. The whole craft is to fetch only the handful of relevant paragraphs for each question, then let the model answer strictly from those, while clearly citing where each answer came from. This pattern has a name, retrieval augmented generation, usually shortened to RAG, and it is the practical and safe way most businesses should add AI to their own knowledge.

It helps to see the wrong way first, because it is the tempting shortcut almost everyone tries.

// The tempting shortcut: paste every document into the prompt
async function answerQuestion(question, allDocuments) {
  const everything = allDocuments.join("\n\n"); // hundreds of pages

  const reply = await llm.chat({
    messages: [{ role: "user", content: `${everything}\n\nQuestion: ${question}` }]
  });

  return reply.text; // too long to fit, slow, costly, and it guesses
}

This breaks down immediately in the real world. The combined text is too long, so the request gets cut off or rejected. Every call is slow and costly, because you push every page each time. And buried in all that noise, the model loses focus and starts to guess. The fix is to send only what matters, which is exactly what the next steps do.

Building the AI document assistant step by step

Here is the actual path I followed. I have kept each step small so you can follow along and adapt it to your own stack, whether you run Node, PHP, or something else.

Step 1: Set up a vector store with pgvector

First we need somewhere to store the meaning of each chunk and search it fast. For this client I used pgvector, an extension that adds vector search directly to PostgreSQL. That choice mattered, because it meant their data stayed inside the database they already owned and trusted, rather than going to a third party.

-- Turn on the vector extension (one time)
CREATE EXTENSION IF NOT EXISTS vector;

-- A table to hold each chunk, its source, and its embedding
CREATE TABLE document_chunks (
  id         BIGSERIAL PRIMARY KEY,
  source     TEXT NOT NULL,     -- e.g. "lease_policy.pdf"
  doc_type   TEXT,              -- e.g. "lease", "maintenance"
  content    TEXT NOT NULL,     -- the chunk text
  embedding  VECTOR(1536)       -- the meaning, stored as numbers
);

-- An index so similarity search stays fast as data grows
CREATE INDEX ON document_chunks
  USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

The index keeps similarity search quick as the number of chunks grows into the thousands.

Step 2: Split documents into clean chunks

We do not store whole documents. We store small, overlapping pieces, because we want to retrieve precise paragraphs rather than entire files. The small overlap means a sentence that sits on the boundary between two chunks still appears in full somewhere.

// Split a long document into overlapping chunks of a few paragraphs
function chunkText(text, chunkSize = 800, overlap = 150) {
  const words = text.split(/\s+/);
  const chunks = [];

  for (let i = 0; i < words.length; i += (chunkSize - overlap)) {
    const chunk = words.slice(i, i + chunkSize).join(" ");
    if (chunk.trim()) chunks.push(chunk);
  }

  return chunks; // small, overlapping pieces retrieve more accurately
}

Step 3: Create embeddings and save them

Next we turn each chunk into an embedding, which is just a list of numbers that captures the meaning of the text. Two chunks about subletting end up close together even if they use different words. We create an embedding for every chunk and store it next to the original text.

import { pool } from "./db.js"; // your PostgreSQL connection

// Create an embedding for a piece of text
async function embed(text) {
  const res = await embeddings.create({
    model: "text-embedding-3-small",
    input: text
  });
  return res.data[0].embedding; // an array of 1536 numbers
}

// Read a document, split it, embed each chunk, and save it
async function ingestDocument(source, docType, fullText) {
  const chunks = chunkText(fullText);

  for (const content of chunks) {
    const vector = await embed(content);
    await pool.query(
      `INSERT INTO document_chunks (source, doc_type, content, embedding)
       VALUES ($1, $2, $3, $4)`,
      [source, docType, content, JSON.stringify(vector)]
    );
  }
}

You run this ingestion step whenever you add or update documents. That is the entire knowledge base, built once and easy to extend.

Step 4: Find the most relevant chunks for a question

Now the live part. When a question arrives, we embed the question the same way, then ask pgvector for the chunks whose meaning sits closest to it.

// Find the chunks whose meaning is closest to the question
async function findRelevantChunks(question, limit = 5) {
  const questionVector = await embed(question);

  const { rows } = await pool.query(
    `SELECT source, content
       FROM document_chunks
   ORDER BY embedding <=> $1     -- <=> measures distance between vectors
      LIMIT $2`,
    [JSON.stringify(questionVector), limit]
  );

  return rows; // the few paragraphs most likely to hold the answer
}

The <=> operator measures how close two vectors are, so the database does the hard search work for us and hands back only the most relevant paragraphs.

Step 5: Write a grounded answer with the language model

Finally we give those few chunks to the model with a strict instruction. Answer using only this text, cite the source, and if the answer is not present, say so honestly instead of guessing. That honesty rule is the single most important line in the whole system.

// Build a grounded prompt and ask the model to answer from it only
async function answerQuestion(question) {
  const chunks = await findRelevantChunks(question);

  const context = chunks
    .map(c => `[Source: ${c.source}]\n${c.content}`)
    .join("\n\n");

  const reply = await llm.chat({
    messages: [
      {
        role: "system",
        content:
          "Answer using only the context below. Always cite the source. " +
          "If the answer is not in the context, reply that you do not know."
      },
      { role: "user", content: `Context:\n${context}\n\nQuestion: ${question}` }
    ]
  });

  return { answer: reply.text, sources: chunks.map(c => c.source) };
}

Step 6: Put it behind a simple API

To make it usable, we wrap everything in one small endpoint that a React front end can call. The team gets a clean chat box, and the heavy lifting stays on the server.

import express from "express";

const app = express();
app.use(express.json());

// One endpoint your React front end can call
app.post("/api/ask", async (req, res) => {
  const { question } = req.body;

  if (!question) {
    return res.status(400).json({ error: "Please include a question." });
  }

  const result = await answerQuestion(question);
  res.json(result); // returns { answer, sources }
});

app.listen(3000, () => console.log("Assistant API ready on port 3000"));

That is the full loop, from a raw document to a grounded answer with its source, in a few hundred lines.

The details that made it reliable

The skeleton above works, but a few extra touches turned it from a neat demo into a tool people relied on every day.

Smarter chunking and metadata

We tagged every chunk with its document type, such as lease or maintenance, and let the search filter on that when needed. A question clearly about repairs would only pull from maintenance documents, which made answers sharper and faster.

Keeping the answers honest

We tested the assistant with a long list of real questions the team had asked in the past, and checked that it cited the right source each time. When it was unsure, we wanted it to say so, and that behaviour built more trust than any clever wording could.

Updating knowledge without retraining

Because the knowledge lives in the vector database, updates are simple. When a policy changes, we add the new document, run the ingestion step, and the assistant knows it at once. There is no slow or costly retraining, which is a big advantage over fine tuning.

The engineering impact

Once retrieval was solid, the results were steady. Answers came back in a couple of seconds, each showing the exact document and section it came from, so staff could click through and confirm. Updating the knowledge was a one line job. And because everything ran inside the client's own database and servers, the private data stayed private. The honest "I do not know" behaviour meant the assistant almost never produced a confident wrong answer, which is the failure that breaks trust the fastest.

The business impact

The numbers that mattered to the owner were simple. Time spent hunting for answers dropped sharply, which freed senior staff from constant interruptions. New hires became productive in days rather than weeks, because they could ask the assistant instead of bothering a colleague. Customer replies got noticeably faster, and the clients mentioned it without being asked.

For a business owner, this is the real promise of modern AI. It is not about chasing hype. It is about taking knowledge you already own and making it instantly useful, so your team spends time on work that grows the business rather than digging through files. The lesson I share with every client is the same. Start with a real and expensive problem, ground every answer in your own trusted documents, and keep your data private. Do that, and an AI document assistant stops being a gimmick and becomes one of the most useful tools your team has.

External Links

AI Development

How I Built an AI Document Assistant for a Client

private ai

ai document assistant

Jun 12, 2026

10 min read

13 views

The day my client's team stopped guessing

The real problem: the answers existed, but nobody could find them

Why the obvious fixes did not work

My client had already tried a few common fixes before calling me, and most businesses reach for the same ones.

Keyword search did not understand meaning

Pasting documents into a public chatbot

Fine tuning a model on the documents

None of these gave the business what it actually needed, which was a trustworthy answer, grounded in the company's own files, delivered fast, and kept completely private.

What the investigation really revealed

Once I dug in, I landed on the lesson I keep relearning. The model is not the hard part. The hard part is retrieval, which simply means finding the right few paragraphs before the AI writes anything.

It helps to see the wrong way first, because it is the tempting shortcut almost everyone tries.

// The tempting shortcut: paste every document into the prompt
async function answerQuestion(question, allDocuments) {
  const everything = allDocuments.join("\n\n"); // hundreds of pages

  const reply = await llm.chat({
    messages: [{ role: "user", content: `${everything}\n\nQuestion: ${question}` }]
  });

  return reply.text; // too long to fit, slow, costly, and it guesses
}

Building the AI document assistant step by step

Here is the actual path I followed. I have kept each step small so you can follow along and adapt it to your own stack, whether you run Node, PHP, or something else.

Step 1: Set up a vector store with pgvector

-- Turn on the vector extension (one time)
CREATE EXTENSION IF NOT EXISTS vector;

-- A table to hold each chunk, its source, and its embedding
CREATE TABLE document_chunks (
  id         BIGSERIAL PRIMARY KEY,
  source     TEXT NOT NULL,     -- e.g. "lease_policy.pdf"
  doc_type   TEXT,              -- e.g. "lease", "maintenance"
  content    TEXT NOT NULL,     -- the chunk text
  embedding  VECTOR(1536)       -- the meaning, stored as numbers
);

-- An index so similarity search stays fast as data grows
CREATE INDEX ON document_chunks
  USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

The index keeps similarity search quick as the number of chunks grows into the thousands.

Step 2: Split documents into clean chunks

// Split a long document into overlapping chunks of a few paragraphs
function chunkText(text, chunkSize = 800, overlap = 150) {
  const words = text.split(/\s+/);
  const chunks = [];

  for (let i = 0; i < words.length; i += (chunkSize - overlap)) {
    const chunk = words.slice(i, i + chunkSize).join(" ");
    if (chunk.trim()) chunks.push(chunk);
  }

  return chunks; // small, overlapping pieces retrieve more accurately
}

Step 3: Create embeddings and save them

import { pool } from "./db.js"; // your PostgreSQL connection

// Create an embedding for a piece of text
async function embed(text) {
  const res = await embeddings.create({
    model: "text-embedding-3-small",
    input: text
  });
  return res.data[0].embedding; // an array of 1536 numbers
}

// Read a document, split it, embed each chunk, and save it
async function ingestDocument(source, docType, fullText) {
  const chunks = chunkText(fullText);

  for (const content of chunks) {
    const vector = await embed(content);
    await pool.query(
      `INSERT INTO document_chunks (source, doc_type, content, embedding)
       VALUES ($1, $2, $3, $4)`,
      [source, docType, content, JSON.stringify(vector)]
    );
  }
}

You run this ingestion step whenever you add or update documents. That is the entire knowledge base, built once and easy to extend.

Step 4: Find the most relevant chunks for a question

Now the live part. When a question arrives, we embed the question the same way, then ask pgvector for the chunks whose meaning sits closest to it.

// Find the chunks whose meaning is closest to the question
async function findRelevantChunks(question, limit = 5) {
  const questionVector = await embed(question);

  const { rows } = await pool.query(
    `SELECT source, content
       FROM document_chunks
   ORDER BY embedding <=> $1     -- <=> measures distance between vectors
      LIMIT $2`,
    [JSON.stringify(questionVector), limit]
  );

  return rows; // the few paragraphs most likely to hold the answer
}

The <=> operator measures how close two vectors are, so the database does the hard search work for us and hands back only the most relevant paragraphs.

Step 5: Write a grounded answer with the language model

// Build a grounded prompt and ask the model to answer from it only
async function answerQuestion(question) {
  const chunks = await findRelevantChunks(question);

  const context = chunks
    .map(c => `[Source: ${c.source}]\n${c.content}`)
    .join("\n\n");

  const reply = await llm.chat({
    messages: [
      {
        role: "system",
        content:
          "Answer using only the context below. Always cite the source. " +
          "If the answer is not in the context, reply that you do not know."
      },
      { role: "user", content: `Context:\n${context}\n\nQuestion: ${question}` }
    ]
  });

  return { answer: reply.text, sources: chunks.map(c => c.source) };
}

Step 6: Put it behind a simple API

To make it usable, we wrap everything in one small endpoint that a React front end can call. The team gets a clean chat box, and the heavy lifting stays on the server.

import express from "express";

const app = express();
app.use(express.json());

// One endpoint your React front end can call
app.post("/api/ask", async (req, res) => {
  const { question } = req.body;

  if (!question) {
    return res.status(400).json({ error: "Please include a question." });
  }

  const result = await answerQuestion(question);
  res.json(result); // returns { answer, sources }
});

app.listen(3000, () => console.log("Assistant API ready on port 3000"));

That is the full loop, from a raw document to a grounded answer with its source, in a few hundred lines.

The details that made it reliable

The skeleton above works, but a few extra touches turned it from a neat demo into a tool people relied on every day.

How I Built an AI Document Assistant for a Client

The day my client's team stopped guessing

The real problem: the answers existed, but nobody could find them

Why the obvious fixes did not work

Keyword search did not understand meaning

Pasting documents into a public chatbot

Fine tuning a model on the documents

What the investigation really revealed

Building the AI document assistant step by step

Step 1: Set up a vector store with pgvector

Step 2: Split documents into clean chunks

Step 3: Create embeddings and save them

Step 4: Find the most relevant chunks for a question

Step 5: Write a grounded answer with the language model

Step 6: Put it behind a simple API

The details that made it reliable

Smarter chunking and metadata

Keeping the answers honest

Updating knowledge without retraining

The engineering impact

The business impact

Suggested Articles

External Links

Frequently Asked Questions

What is an AI document assistant and how does it work?

Is it safe to use AI on private company documents?

What is retrieval augmented generation (RAG) in simple terms?

Why does AI sometimes give wrong answers, and how do you stop it?

What tech stack do you need to build one?

Have a project worth building?

Continue Reading

Screen Job Applications Faster With AI Shortlisting

Predict Customer Churn and Win Them Back With AI

Automate Your Product Photography With AI Editing

How I Built an AI Document Assistant for a Client

The day my client's team stopped guessing

The real problem: the answers existed, but nobody could find them

Why the obvious fixes did not work

Keyword search did not understand meaning

Pasting documents into a public chatbot

Fine tuning a model on the documents

What the investigation really revealed

Building the AI document assistant step by step

Step 1: Set up a vector store with pgvector

Step 2: Split documents into clean chunks

Step 3: Create embeddings and save them

Step 4: Find the most relevant chunks for a question

Step 5: Write a grounded answer with the language model

Step 6: Put it behind a simple API

The details that made it reliable

Smarter chunking and metadata

Keeping the answers honest

Updating knowledge without retraining

The engineering impact

The business impact

Suggested Articles

External Links

Frequently Asked Questions

What is an AI document assistant and how does it work?

Is it safe to use AI on private company documents?

What is retrieval augmented generation (RAG) in simple terms?

Why does AI sometimes give wrong answers, and how do you stop it?

What tech stack do you need to build one?

Have a project worth building?

Continue Reading

Screen Job Applications Faster With AI Shortlisting

Predict Customer Churn and Win Them Back With AI

Automate Your Product Photography With AI Editing