This project focuses on performing a complete Extract, Transform, Load (ETL) pipeline on a real-world crowdfunding dataset sourced from Excel spreadsheets. The project aims to clean, normalize, and restructure raw data into a well-organized PostgreSQL relational database. The end goal is to make the data easy to analyze and derive insights from.
The raw data was extracted from the following Excel files:
Process:
The transformation phase included restructuring the data into normalized formats across several DataFrames:
The cleaned and transformed CSV files (category.csv, subcategory.csv, campaign.csv, contacts.csv) were loaded into a PostgreSQL database using 'SQL CREATE TABLE' and 'COPY' commands. Primary keys and foreign keys were defined based on the ERD created by Amanda, ensuring relational integrity.
With a structured schema in PostgreSQL:
This project successfully demonstrated the use of ETL processes to clean, structure, and analyze real-world data. Each team member's contribution was vital to creating a normalized database ready for insightful queries and analysis.
Whether you'd like to collaborate, have a question, or just want to say hello ā Iād love to hear from you!