The Llama_RAG_System is a robust retrieval-augmented generation (RAG) system designed to interactively respond to user queries with rich, contextually relevant answers. Built using the LLaMA model and Ollama, this system can handle various tasks, including answering general questions, summarizing content, and extracting information from uploaded PDF documents. The architecture utilizes ChromaDB for efficient document embedding and retrieval, while also incorporating web scraping capabilities to fetch up-to-date information from the internet.
Hereβs a glimpse of the Gradio app interface:
π§ Please note: This project is currently in development. Your feedback and contributions are welcome!
Ollama is an excellent option for running machine learning models locally for several reasons:
The project is organized as follows:
project/
βββ core/
β βββ embedding.py # Embedding-related functionality
β βββ document_utils.py # Functions to handle document loading and processing
β βββ query.py # Query document functionality
β βββ generate.py # Response generation logic
β βββ web_scrape.py # Web scraping functionality
β
βββ scripts/
β βββ run_flask.py # Script to run Flask API
β βββ run_gradio.py # Script to run Gradio interface
β
βββ chromadb_setup.py # ChromaDB setup and connection
β
βββ README.md # Project documentation
To set up the Llama_RAG_System, follow these steps:
git clone https://github.com/NimaVahdat/Llama_RAG_System.git
cd Llama_RAG_System
To start the Flask API, run the following command:
python -m scripts.run_flask
To launch the Gradio interface, execute:
python -m scripts.run_gradio
After running either script, you will be able to interact with the system via the provided web interface.
Contributions are welcome! If you have suggestions for improvements or features, please fork the repository and submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
For any inquiries or support, please contact me.