Crowdfunding Data Pipeline


Project Description

This project focuses on performing a complete Extract, Transform, Load (ETL) pipeline on a real-world crowdfunding dataset sourced from Excel spreadsheets. The project aims to clean, normalize, and restructure raw data into a well-organized PostgreSQL relational database. The end goal is to make the data easy to analyze and derive insights from.


Extraction of Data

The raw data was extracted from the following Excel files:

Process:


Data Transformation

The transformation phase included restructuring the data into normalized formats across several DataFrames:

Category and Subcategory

Campaign

Contacts Transformation


Load

The cleaned and transformed CSV files (category.csv, subcategory.csv, campaign.csv, contacts.csv) were loaded into a PostgreSQL database using 'SQL CREATE TABLE' and 'COPY' commands. Primary keys and foreign keys were defined based on the ERD created by Amanda, ensuring relational integrity.


Data Analysis

Mean Temp vs Trips

With a structured schema in PostgreSQL:


Conclusion

This project successfully demonstrated the use of ETL processes to clean, structure, and analyze real-world data. Each team member's contribution was vital to creating a normalized database ready for insightful queries and analysis.

Let's Connect

Whether you'd like to collaborate, have a question, or just want to say hello — I’d love to hear from you!

Reach me at: