Deep Learning Architecture: Naive Retrieval-Augmented Generation(RAG)
Introduction
Since 2021, the world has been swept by a huge wave called ChatGPT. Suddenly, terms like Large Language Models (LLMs), Artificial Intelligence, and Machine Learning became familiar to almost everyone. People quickly realized how powerful LLMs can be.
However, despite their strength, LLMs have limitations, especially when asked about recent information, like today’s news. This is because they aren’t trained on the latest updates. To solve this, Retrieval-Augmented Generation (RAG) systems were created.
RAG systems became popular alongside the rise of Transformer models, a breakthrough in deep learning that introduced the concept of “attention” and parallel processing. This allowed models to understand sequences of information much more effectively, fixing problems seen in older architectures like RNNs and LSTMs. As LLM-based products like ChatGPT became widespread, research into RAG systems focused on how to improve the information they retrieve and generate.
There are three main types of RAG architectures:
- Naive RAG
- Advanced RAG
- Modular RAG
In this article, we’ll explore Naive RAG, the simplest form of this architecture.
How Naive RAG Works
Naive RAG operates straightforwardly, following three main steps: indexing, retrieving, and generating.
Indexing
- The system gathers information from various sources such as internal documents, documents from the internet and different formats like PDFs, Word documents, or text files.
- This information is split into smaller chunks, which are then encoded into vectors and stored in a vector database. This process prepares the system to efficiently retrieve data later.
Retrieving
- When a user submits a query, the system converts the question into a vector, similar to how it encoded the indexed chunks.
- It then searches the vector database for the most relevant chunks, selecting the top matches that closely align with the query.
Generating
- The selected chunks, combined with the user’s query, are sent to an LLM (Large Language Model) to generate a coherent response.
- The model may use the retrieved chunks, the system’s internal data, or a combination of both to craft the final answer.
- In a chat scenario, the system also keeps track of conversation history to ensure continuity.
While this setup seems simple, it comes with certain limitations.
Limitations of Naive RAG
Naive RAG has strengths in its simplicity, but it also faces key challenges in its retrieval and generation steps:
Retrieval Challenges
- Precision and recall can be problematic. The system may sometimes retrieve irrelevant or misaligned chunks of information, leading to incomplete or incorrect answers.
Generation Challenges
- The system may generate hallucinations, where the output includes information not present in the retrieved documents.
- There can also be issues with irrelevance, bias, or toxicity in the generated responses, especially if the indexed data contains such elements.
Augmentation Challenges:
- Combining retrieved information from multiple sources can be tricky. The system may produce redundant or incoherent outputs when it pulls from different places.
- It can also struggle with complex queries that require more nuanced context, since Naive RAG uses a single retrieval step, which may not always provide the full picture.
Conclusion
Naive RAG is built on a promising idea: augmenting the knowledge of language models by pulling in fresh information from external sources, rather than relying solely on training data. This approach has the potential to produce more relevant responses, but in practice, it faces challenges with retrieval accuracy and generation quality.
In the next article, we’ll explore Advanced RAG, a more refined architecture designed to overcome some of these issues.