Bookstore ETL Pipeline Part 1: On-Premises

The Story

Objective

Tools Used

Process

  • How often do we need to place orders for low inventory?
  • What kinds of books are we selling the fastest?
  • What kinds of books are we selling the fastest?
  • What data types should be considered for each data point?
  • How much storage does our database have?
  • What database design can account for future expansion as the business grows?
PostgresSQL Query to Create Database Table
New Created Database Table
  • Pandas – data manipulation and data quality checks
  • NumPy – condition-based data quality check
  • DateTime – formatting date and time strings
  • Time – time recording of the code execution
  • SQLAlchemy – database interaction
  • Psycopg2 – PostgreSQL database engine interaction
Shipment Order Upload Folder
Transformation Function
Checking for strings in the Price and Quantity columns

Reflection

Future Enhancements

Full ETL Script

Categories: