Analyzing Bluesky data
note
To see video of this example
In this example, you'll build a pipeline with Dagster that:
- Ingestion of data-related Bluesky posts
 - Modelling data using dbt
 - Creates and validates the data files needed for an OpenAI fine-tuning job
 - Representing data in a dashboard
 
Prerequisites
To follow the steps in this guide, you'll need:
- Basic Python knowledge
 - Python 3.9+ installed on your system. Refer to the Installation guide for information.
 - Understanding of data pipelines and the extract, transform, and load process (ETL).
 - Familiar with dbt and data transformation.
 - Usage of BI tools for dashboards.
 
Step 1: Set up your Dagster environment
First, set up a new Dagster project.
- 
Clone the Dagster repo and navigate to the project:
cd examples/docs_project/project_atproto_dashboard - 
Create and activate a virtual environment:
- MacOS
 - Windows
 
uv venv dagster_example
source dagster_example/bin/activateuv venv dagster_example
dagster_example\Scripts\activate - 
Install Dagster and the required dependencies:
uv pip install -e ".[dev]" - 
Ensure the following environments have been populated in your .env file. Start by copying the template:
cp .env.example .envAnd then populate the fields.
 
Step 2: Launch the Dagster webserver
To make sure Dagster and its dependencies were installed correctly, navigate to the project root directory and start the Dagster webserver:
followed by a bash code snippet for
dagster dev
Next steps
- Continue this example with ingestion