Retrieval-Augmented Generation (RAG) with Pinecone
note
To see video of this example
In this example, you'll build a pipeline with Dagster that:
- Loads data from GitHub and Documentation site
 - Translates the data into embeddings and tags metadata
 - Stores the data in a vector database
 - Retrieves relevant information to answer ad hoc questions
 
Prerequisites
To follow the steps in this guide, you'll need:
- Basic Python knowledge
 - Python 3.9+ installed on your system. Refer to the Installation guide for information.
 
Step 1: Set up your Dagster environment
First, set up a new Dagster project.
- 
Clone the Dagster repo and navigate to the project:
cd examples/docs_projects/project_ask_ai_dagster - 
Create and activate a virtual environment:
- MacOS
 - Windows
 
uv venv dagster_example
source dagster_example/bin/activateuv venv dagster_example
dagster_example\Scripts\activate - 
Install Dagster and the required dependencies:
uv pip install -e ".[dev]" 
Step 2: Launch the Dagster webserver
To make sure Dagster and its dependencies were installed correctly, navigate to the project root directory and start the Dagster webserver:
dagster dev
Next steps
- Continue this example with sources