HomeChallengesData Wrangling
learnbook poster
Data WranglingLast updated at Tue Oct 12 2021Skills

What's there to learn?

Let's explore a simple scenario :

  • You are working with Simplekart, an e-commerce company as a Data Analyst and have been tasked to analyze their data in helping understand the vendor performance starting from the year 2017.

  • Your team uses Excel for all your analysis till now as it is simple and as there are some non-technical teammates who find it easier.

  • But the analysis eats up a lot of your time every time it has to be done at the end of each quarter. The steps are similar and will be repeated in identifying the pain points each quarter.

  • Your lead has asked you to explore PowerQuery which is an integral part of Excel itself and helps in reducing a lot of effort and time every quarter.

Your Takeaways

  • Get a high level idea of Power Query

    • What's the need for it?
    • Where is it helpful?
    • How does it work?
    • What are the operations that can be done?
  • Doing simple data operations like

    • MERGE: Merging 2 worksheets/tables through a common column and by selecting required columns.
    • APPEND: Appending 2 or more worksheets/tables with multiple rows into a single table.
    • PIVOT: It is used to pivot selected column in the table.
    • GROUPING: It is used to group using a single/multiple columns and create a new table.
    • CONDITIONAL COLUMN: It is used to add a new column using conditions.
    • CUSTOM COLUMN: It is used to add a new column using a formula.

Before you start...

  • Ensure that you have a working Excel 2016 version or beyond
  • You have a basic understanding of excel operations.
  • Download the data from here.

Activity 1 | Knowing PowerQuery

  • Here's the playlist to binge watch to get an understanding of what PowerQuery is and the operations that can be performed.

  • You can watch the videos 9,11,23,30 & 31 for the topics covered in the learnbook.

Basics Done?

Let's get to action!

Activity 2 | Working on PowerQuery

  • Use the workbook Monthwise_Orders_2017 from the downloaded files.

  • The order details are present in 12 different sheets with one for each month. Please combine them together to get Quarter wise order details in a new workbook as follows:

    • Quarter 1 (Q1) - January, February and March
    • Quarter 2 (Q2) - April, May and June
    • Quarter 3 (Q3) - July, August and September
    • Quarter 4 (Q4) - October, November and December
  • After the appending is done, merge the quarter wise sheets (for all 4 quarters) with Returns_2017 using Order ID to get return status. Populate with No for all null values (wherever a record is not present in the "Returns_2017" workbook).

  • Now, merge these output sheets with Vendors using Vendor ID to get vendor name.

  • Name the final output as Quarterwise_Orders_Combined_2017.

Activity 3 | More operations in PowerQuery

  • Use the workbook Quarterwise_Orders_Combined_2017.

  • The order details are present in 4 different sheets with one for each quarter. Do a quarter wise analysis as below.

  • If delivery days = Ship date - Order date and

    • Ship mode = Same day (zero delivery days)
    • Ship mode = First class (2 delivery days)
    • Ship mode = Second class (3 delivery days)
    • Ship mode = Standard class (4 delivery days)
  • Create a column called Delivery_delay and populate it as follows:

    Delivery_delay = delivery days - (delivery days by ship mode).

  • Create a column called Delivery_status and populate as follows:

    • Delivery_delay > 0, Delivery_status = Delayed
    • Delivery_delay <= 0, Delivery_status = On Time
  • Get On Time % for different vendors using the newly created column Delivery_delay as

    • On Time % = (On Time deliveries)/(Total deliveries)

Pretty good job!

Now take some time off and think of what all questions you can ask this data?

Take a stroll in park or listen to good music, invigorate yourself and come back for solving the questions!

Question :

1. Remove all the orders that were returned and calculate the actual profit or loss.

2. How can you handle the different types of errors?

3. Which segment is more profitable - corporate or consumer or home office?

4. Which is the category that has the highest returns?

Created with 💙 by
author avatar
Sheetal Mahankali
Data Analyst,Freelancer
mentorskool logo
Mentee Today, Mentor Tomorrow
No 206, A Block, Sonesta Silver Oak,Varthur, Bangalore 560066,Karnataka
Copyright - 2022 © Mentorskool - All rights reserved.