DocuChat: Empowering You to Explore and Understand Your Data Effortlessly : Part-1

6 min readJun 18, 2024

Introduction:

One of the most transformative uses of large language models (LLMs) is in developing sophisticated question-answering (Q&A) chatbots. These intelligent systems can interpret and respond to queries using specific datasets, thanks to a technique called Retrieval Augmented Generation (RAG).

This article is part of a series aimed at building an enterprise-ready chatbot. We will start with the foundational concepts and gradually advance to more complex topics. Throughout the series, you will learn about:

Foundational Concepts for Building Memory-Aware Chatbots: Starting with the basics of RAG, LangChain, and LlamaIndex to enable chatbots to retain context and provide coherent, long-term interactions.
Advanced Data Preprocessing and MultiModal Retrieval: Handling documents with images or tables, ensuring the chatbot can interpret and utilize this data effectively.
Enterprise Data Ingestion: Strategies for robust data ingestion, storage, and management tailored for enterprise environments.
Agentic Strategies: Build agents on top of your existing RAG pipeline to empower it with automated decision capabilities.
Security and Governance: Best practices for securing your chatbot and ensuring compliance with data governance policies.
Observability: Implementing monitoring and observability to maintain the health and performance of your chatbot.

By the end of this series, you will have the knowledge and tools to build a powerful, enterprise-grade chatbot capable of enhancing data exploration and understanding effortlessly.

What is RAG?

RAG is a powerful method that extends the capabilities of LLMs by incorporating additional data. While LLMs can reason across various topics, their knowledge is restricted to publicly available information up to a certain point. To create AI applications that can handle private or more recent data, it’s crucial to supplement the model’s knowledge with relevant information.

RAG achieves this by retrieving necessary data and embedding it into the model’s prompt. Frameworks like LangChain, LlamaIndex provides a suite of components designed to facilitate the development of Q&A applications and RAG implementations. A typical RAG application consists of two main components:

Indexing: This involves setting up a pipeline to ingest and index data from multiple sources.

Loading Data: The first step involves loading the data, here we use DocumentLoaders such as `PyPDFLoader`, `WebBaseLoader`, from LangChain, or `SimpleDirectoryReader` from LLamaIndex.”
Splitting Text: Text splitters break down large documents into smaller chunks, which are easier to search and fit within model’s context window.
Storing Data: The split data chunks are stored and indexed for future retrieval, often using a VectorStores (like ChromaDB, Azure Search, AWS ElasticSearch) combined with an Embeddings model.

2. Retrieval and Generation: This component processes user queries in real-time, retrieves pertinent data from the index, and generates responses using the model.

Retrieving Data: When a user inputs a query, the relevant chunks are retrieved from storage using a Retriever.
Generating Answers: A ChatModel or LLM generates an answer by incorporating the user’s question and the retrieved data into its prompt.

RAG Architecture

The image is a flowchart showing the process of using LangChain for question-answering on PDF documents. It starts with loading PDF documents, extracting metadata, and splitting them into chunks. These chunks are embedded and stored in a VectorStore. A user query is embedded and matched with relevant document chunks via semantic and hybrid search. The matched data, along with the query, forms a prompt sent to an LLM (GPT-4.0) for generating answers.

def process_pdf_simple(self, file_content):
    # Load documents
    loader = PyPDFLoader(file_content)
    docs = loader.load()
    
    # Split documents
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000, 
        chunk_overlap=200
    )
    splits = text_splitter.split_documents(docs)
    
    # Index documents
    vectorstore = Chroma.from_documents(splits, self.embeddings)
    
    # Define retriever
    retriever = vectorstore.as_retriever()

In this snippet, we initialize a PyPDFLoader with the PDF content and load the document. We then use the Chroma class to create a vector store from the document chunks. Chroma’s `from_documents` method creates an index from the document splits using embeddings model. The `as_retriever` method converts our vector store into a retriever object, enabling our chatbot to search through the indexed document chunks and find the most relevant information quickly.

Key Components of a Q&A Chatbot:

1. Chat Models: Unlike text-based LLMs, these are optimized for message-based interactions, providing a more natural conversational responses.

2. Prompt Templates: These templates help in crafting prompts by combining default messages, user input, chat history, and optionally, additional context retrieved from other sources. This can also create specific personas for example, chatbot that assumes it as financial advisor.

3. Chat History: This feature allows the chatbot to remember past interactions, enabling it to respond to follow-up questions with context. For instance, in Q&A applications, maintaining a memory of past questions and answers is crucial for a coherent conversation.

Implementing History-Aware Retrieval:

We will define a sub-chain that takes historical messages and the latest user question, reformulating the query to ensure it is understood without prior context. This involves using a `MessagesPlaceholder` variable named “chat_history” in our prompt. A helper function, `create_history_aware_retriever`, will manage the inclusion of chat history, applying the sequence: `prompt | lIm | StrOutputParser | retriever`.

####
# Contextualize question
####
contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    ("system", contextualize_q_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}")
)

history_aware_retriever = create_history_aware_retriever(
    self.llm, retriever, contextualize_q_prompt
)

Building the Full QA Chain:

Finally, we will update our retriever to the history-aware retriever and use `create_stuff_documents_chain` to build a question-answer chain. This chain accepts the retrieved context, chat history, and the user query to generate an answer. We then construct our final RAG chain using `create_retrieval_chain`, which sequentially applies the history-aware retriever and question-answer chain, retaining intermediate outputs like the retrieved context. With these steps, we have built a robust chatbot capable of understanding and remembering past interactions, ensuring a fluid and coherent conversation with users.

### Statefully manage chat history ###
store = {}

# We are not persisting the chat history, but you can store it in a redis
def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer"
)

return conversational_rag_chain

At the end we can just invoke the conversational_rag_chain inbuilt method invoke to processes the input query, retrieves relevant information from the indexed document chunks. The combined question, context, and prompt are sent to the LLM to generate a response.

file_path = 'amazon_2024_10q.pdf'
user_query = "What are the key financial highlights and significant changes reported in Amazon's Q1 2024 10-Q SEC filing?"
obj = ConversationRetrieverAgent()
chain = obj.process_pdf_simple(file_path)
result = chain.invoke({"input": user_query})
print(result["answer"])

Conclusion

DocuChat app leverages advanced technologies such as LangChain and Retrieval Augmented Generation (RAG) to empower users in exploring and understanding their data effortlessly. By efficiently loading, splitting, and indexing documents, and utilizing sophisticated retrieval and generation techniques, DocuChat can provide accurate and context-aware answers to user queries. The integration of chat models, prompt templates, and chat history ensures natural and coherent conversations. By implementing history-aware retrieval and robust session management, DocuChat delivers a seamless and intelligent interaction experience, making data more accessible and actionable.

What’s Next?

Stay tuned for the next part of this series, where we will delve into Advanced Data Preprocessing and MultiModal Retrieval. We will cover how to handle documents with images and tables, ensuring your chatbot can interpret and utilize this data effectively. This is particularly useful for building bots for financial services, enabling them to analyze financial market data from sources like the SEC and Bloomberg, which often involve graphs, metrics, and tables. Don’t miss out on these insights to take your chatbot capabilities to the next level!

Author Information

Thank you for reading this article! If you’d like to see the entire codebase discussed here or play around with the fully functional chat app, feel free to visit the following links:

LinkedIn: LinkedIn Profile
GitHub: Review the entire code for this article on my GitHub Profile.
Streamlit: Interact with the fully functioning app directly on Streamlit App. Explore the features, ask questions, and see how DocuChat processes and responds to queries in real-time.

Feel free to connect with me on LinkedIn for updates, discussions, and more insights into building intelligent applications. Your feedback and contributions are always welcome!

References: