HomeChallengesCan you architect a Batch ETL pipeline on AWS?
learnbook poster
Can you architect a Batch ETL pipeline on AWS?Last updated at Tue Nov 09 2021Skills
ETL
data-wrangling
Tools
aws
draw.io

Learning Objective

After a successful attempt at the scenario, you will be able to

  1. Understand Migration from on-premise to AWS cloud.

  2. Architect a Batch ETL pipeline to move data to a warehouse.

  3. Express the importance of 5 pillars of architecture


Problem Statement

GlobalMart is one of the leading E-Commerce giants with a presence in the North America and Europe region. It has a presence across 120 markets and primarily deals with 3 lines of business:

  • Technology
  • Office Supplies
  • Furniture

With a rapid increase in business, GlobalMart is seeing a huge jump in the number of customers registering on their website across countries. This sudden increase in reach of their business is putting huge pressure on their current IT infrastructure. GlobalMart would also like to make use of their long collected data to develop data-driven strategies.

Following is the as-is IT infrastructure / Challenges at GlobalMart:

  • Company has a front end application where customers can log in, browse products, place orders, track and return them if needed (Don't worry about migrating applicatione)
  • An on-premise relational database is located in San-Francisco which stores orders, customers, vendors, transactions, and products information (Do you think moving the relational database to cloud is a good idea?)
  • The company is poised to expand its product footprint where it wishes to add more variety to the existing line of products. With the variety of products increasing, going ahead with the same table structure for new products is no longer feasible.
    • The data structure (Schema) to store products information needs to be more flexible to be able to accommodate changing product information (Would you still go with relational database for this requirement?)
  • The company does not have a clean and processed store of data. There is no single source of truth. With the growing dependency on data, GlobalMart wishes to have a single source of truth developed and deployed for their analysts to refer to and track the KPI's. (What is the best possible solution to have a single source of truth?)

So if you are to be assigned as a data engineering manager how would you go about solving these requirements:

  • Suggest how AWS Cloud can help them address their infrastructure issues in order to cope up with the business demand
  • How will you implement pipeline security, resource monitoring?
  • Finally. Architect the whole system to demonstrate POC?

Help you can use

Wish to discuss the solution to this problem and solve more interesting problems?

Join us on Slack today!


Created with 💙 by
author avatar
Soma
Cloud Engineer,Mentorskool
mentorskool logo
Mentee Today, Mentor Tomorrow
No 206, A Block, Sonesta Silver Oak,Varthur, Bangalore 560066,Karnataka
Copyright - 2022 © Mentorskool - All rights reserved.