Using AI and LLMs with your own documents

Photo by Igor Omilaev on Unsplash. Cropped.

Reading time: 10.5 minutes.

Artificial intelligence is everywhere. Open AI has made headlines with its launch of Chat GPT and all sorts of corporate shenanigans. It has already disrupted every major website that has considerable amount of content online, scrapping their data viciously. And quickly became the go to solution for scammers and cheaters looking to forge pretty much any text, from school works to job applications. And it has people scared that it has come for our jobs and to ultimately decimate the entire human race.

But can it be useful?

Leaving aside the crude sensationalism, corporate shenanigans and spotlight opportunism, it turns out that the tools and methods of the AI revolutions can provide quite interesting new solutions.

Particularly, it may be able to give us a hand dealing with the massive amount of data our lives have been translated into.

From the start, one of the striking features of AI is its ability to summarize text as if it were a real person. It can resume entire articles in a few paragraphs in a matter of seconds. It can elaborate on complex topics apparently out of thin air, and pass as human in live short conversations.

It all comes down to the way it stores, process and manipulates data.

What became known as Large Language Models (LLMs) are statistical systems capable of comparing texts that are not exactly equal, and finding matches based on how similar they are in meaning. Glorified search engines, some say. They are more than that. And they are quite ingenious.

Vector stores

One of the cornerstones of LLMs are vector stores. They work by splitting the text into small pieces — it could be paragraphs, a few lines or words — and assigning values to each of them. The values are called vectors and they are stored in a database, the so called vector store.

Questions are split on a similar way, and their values are compared and ranked against the vectors in the database. Results are returned ranked by similarity, and the whole thing works surprisingly well.

Language models

But vector stores can only provide the results they have. From a store that contains “roses are red”, search results can only contain that same text along with a match statistic. So, searching for exactly “roses are red” will return that same text along with the match of 100%.

“What is the color of roses?” again will return “roses are red”, but with a much lower match value, as only the word “roses” are a full match. It cannot elaborate answers like “from my research I found that roses are colored red”. For that you need a model.

Now this is where things get expensive. Models are what they talk about when you hear things like “AI will replace all jobs”, or “welcome our new AI overlords”. They will not do those things — at least not yet — but they can do some pretty impressive stuff.

Models are artificial neural networks that evolve when trained. Training a neural network means using text as input, and letting it create a different text as output. The more information you feed into it, more capable of generating responses it becomes.

There is no guarantee the response will be accurate, or true. The neural network only becomes better at mimicking its input.

For example, some models can be trained with sets of short questions and responses, like many variations of:

Question: “Good morning!”

Answer: “Good morning! It is indeed a lovely day. What can I do for you?”

Therefore, if presented with the “good morning” prompt in the future, the model will respond with one of the answers it was trained with, or a mix of them.

So, while capable of articulating answers it does not have, a model still can only reply based on the data it has been trained with. If the quality of the data is poor, inaccurate or straight full of lies, so will the model be.

Moreover, if presented with never before seen questions, as models are trained to answer the best way they can, sometimes they can come up with absurd responses. These are called hallucinations. Nothing magical about them. They are simply the neural network using the grammar structure it knows to build up a response to an unknown question, and failing at it.

But hallucinations completely flunk the Turing test. That is, they break the illusion that the AI is human. And they diminish the hope of getting a useful purpose for neural networks. I mean, if you cannot trust the accuracy of the response, what is the point?

Training a model to hallucinate less and to give better responses is expensive, requires advanced computers and takes time. One of the most impressive results of this process is Open AI’s Chat GPT-4, which can in fact provide remarkable answers.

Notably, as in early 2024, The NY Times in suing Open AI for illegally using its articles to train Chat GPT.

Training and usage

When it comes to models, there is an important difference between training and usage.

Training, as mentioned above, means feeding the model with inputs and outputs and allowing its neural network to learn them. Then, when presented with new questions, the model would answer in a way that is similar to the material it used to train.

Training demands powerful computers, lots of data, time and supervision to work.

Using models, on the other hand, is a lot less demanding. A model like Llama2-70b, while heavy, can be run on a machine with 32 GB for RAM memory, even without GPU, which is in reach of personal computers.


The most evolved models like Chat GPT-4 are available for use. Open AI, for instance, is reportedly building a store through which developers can use their models, and pay accordingly to usage.

So, those of us who do not have a supercomputer and access to inordinate amounts of legally obtained data to train a model, have to pay to use the best available models.

Moreover, there is the question of privacy. Using a model like Chat GPT means sending a question to Open AI servers, and waiting for an answer. So your data has to be sent to them, and technically they can store it and use it accordingly to their terms.

These two drawbacks have been a non-trivial hindrance to the widespread development and adoption of AI. But there are alternatives.


A remarkable landmark on the progress of this exciting field has been the creation and release of free models, even for commercial use. Meta released models under such license, So did Mistral AI.

These are trained models, and there are a handful of creative ways they can be used. Most importantly, they can be downloaded and run locally.

This is where the new prominent field of prompt engineering has come forward.

The interesting thing about trained models is that you can pass them instruction much like talking to a real person.

An example of a prompt would be:

> You are a pirate. Give me a recipe for a chocolate cake.

Here is the answer given by Llama-2:

Ahoy matey! Here be a treasure o' a recipe fer a scrumptious chocolate cake, fit fer a swashbucklin' pirate like yerself:


* 2 cups all-purpose flour
* 1 teaspoon baking powder
* 1 teaspoon salt
* 1 cup unsalted butter, softened
* 1 3/4 cups granulated sugar
* 4 large eggs
* 2 teaspoons vanilla extract
* 1 cup semisweet chocolate chips
* 1 cup whole milk, at room temperature


1. Preheat yer oven to 350 degrees Fahrenheit, me hearty!
2. In a large mixing bowl, combine the flour, baking powder, and salt. Mix until well combined, like a mutiny on the high seas!
3. Add the softened butter and granulated sugar to the bowl, and mix until smooth, like a pirate's treasure chest.
4. Crack in the eggs, one at a time, and mix until well combined. Don't worry if there be any streaks o' yellow or white – it be just like a treasure map, matey!
5. Stir in the vanilla extract, like ye be stirrin' up a storm on the high seas!
6. Gradually add the chocolate chips, mixin' them in with a spoon until they be fully incorporated, like a pirate addictin' to his treasure.
7. Pour in the milk, like ye be pourin' yerself a mug o' grog after a long day o' plunderin'. Mix until smooth and creamy, like the sea on a calm day.
8. Pour the batter into a greased and floured 9x13-inch bakin' dish, like ye be loadin' yer treasure onto a ship.
9. Bake fer 25-30 minutes, or until a toothpick inserted in the center comes out clean, like a pirate findin' his treasure!
10. Let the cake cool fer a few minutes before servin', like ye be savorin' yer spoils after a successful raid.

Enjoy yer delicious chocolate cake, me hearty! Arrrr!

Tell me that is not fun!

Prompting can quickly become complex, depending on the use case. The example below tries to instruct the model to use different sources of information, prioritizing one over the other, while avoiding wild guesses:

> Use the question and chat history to elaborate a new concise question. Answer the concise question using the context provided.
> If you don't know the answer, just say you don't know. Do not try to guess an answer.

A prompt that is too complex or badly formed leads to more hallucinations.

The fact that prompting is such an abstract process is both powerful and limiting, and makes it a science indeed.

There are open source libraries dedicated to provide tools to work with large language models. Llamaindex and Langchain, for example, are projects that are growing and becoming ever more powerful by the day.

With these libraries and free models it is possible to build AI tools locally and overcome the limitations of privacy and price.

Supawiki is one example of such implementation. It runs models locally, uses local documents as context and employs vector databases in a way optimized for performance.

The result is an AI chat bot, or assistant, that is capable of establishing a conversation and retrieving answers from the provided documents.

It is possible to test it with one of the featured wikis. For instance, clicking on Featured, choosing the “Documentation” wiki and clicking on “AI assistant” at the bottom of the left sidebar leads to the chat screen.

The left sidebar has links to all files of the wiki. The assistant has access to that content, and is instructed to use them as primary resource to answer questions.

Here is a short back-and-forth with it:

When processing the questions, first the AI searches the vector database for pieces of text that have the potential of containing answers. Matches that poorly correlate with the question are dismissed. The three best results are displayed under the “Sources” section of the answer. That step is rather fast.

Next, the results of the vector search are given to the model, which then elaborates a proper answer. That step is slower, and depends on how much memory, CPU / GPU the server has available.

But the results are quite remarkable! It does sound like a real person, the answers are correct and there are no hallucinations, at least with questions as straight are those.

And everything is done by Supawiki. Nothing is sent to third parties.

Supawiki’s AI assistant is available for every wiki it contains. So, to create a wiki with your own documents, create an account, add wikis and upload or type its contents. The AI will be ready to chat about it as soon as the files are saved.

January 11, 2024.