HomeChallengesPOC Using SageMaker AutoPilot
learnbook poster
POC Using SageMaker AutoPilotLast updated at Wed Oct 27 2021Skills
Sampling
machine-learning
Tools
aws
sagemaker

Learning Objectives


  • Stratified Sampling in Excel
  • Building Proof of Concept (POC) with Sagemaker AutoPilot

Scenario

Your client approached you with a problem and is looking for a potential solution that uses machine learning. Since this is only a prototype they shared a small dataset. Since objective now is to assess the value and come up with a feasible solution in less time. You've definitely would have heard the phrase " In a Data science cycle 80% of the time goes into data exploration and pre-processing ". Well, that's true. Since we are only developing proof of concept we need to show the value with minimal efforts possible. Now how would you do that? One possible approach is to use AutoML.

Great you came up with a path. But which framework would you use? This learnbook will help you explore this while helping you to build a proof of concept.


Sagemaker AutoPilot

Although we decided to use AutoML there are several frameworks to choose from, but, building it in the cloud comes with several advantages.

  • Decoupling between storage and compute
  • Automatic data pre-processing & feature-engineering
  • Build and Tune up to 250 models at scale
  • Leaderboard to track all the models at a single place
  • Generate Notebooks and feature-importance automatically.

All of these help you achieve your aim to come up with a feasible solution very quickly.

But like everything, there are a few cons here as well.

  • It's important to monitor the cost of compute instances or else there is a high chance of burning money.
  • We cannot use Sagemaker AutoPilot if we have less than 500 data points. If we have, we first need to augment the data
  • To understand AutoML a person should first need to have good knowledge in ML since several decisions such as which metric to optimize etc are to be made.

Data

Let's use this dataset to make experimentation on Sagemaker.

Credits: Thanks to GregKondla for this good dataset


Sampling

Since we are making this experiment on AWS cloud. to control cost let's pick a sample size of 1000. Since this is a binary classification problem it is important to maintain proportions in target variable. Hence we shall use Excel to do Stratified Sampling. Please watch the below video if you don't know how to perform stratified sampling using excel

Credits: Thanks to Rajesh Dorbala for making this video available to us.


Running Experiments with Sagemaker AutoPilot

  • Create and S3 bucket and upload the sampled data to it. If you do not know how to do this follow this video.

Credits: Thanks to Pratik Anjay for making this video available to us.

  • Since your data is ready in s3 we will have to create sagemaker studio to start our experiments. If you don't know how to create a studio see this video. Creation of studio will take a while
  • From the Launcher create a new Autopilot experiment. Capture.PNG
  • To create an Autopilot experiment1.
    1. Give a unique name to the experiment
    2. Choose the file and s3 bucket in which you've uploaded the sampled data
    3. Choose the target variable
    4. Give S3 bucket and prefix in which all outputs of the experiment including models, reports will store
    5. We can either let sagemaker choose what kind of problem we are working on or we can choose the type of the problem.
    6. Define the metric to optimize. We can bring our own metrics if required
    7. To improve security we can run sagemaker in a VPC. we can also define encryption keys as well as IAM roles. For now, we are leaving them as it is.

Capture4.PNG

  • As we finalize inputs and start experiments it will preprocess the data, build the models, tune them and generate explainability reports. This will take a couple of hours so sit back and be amazed at how sagemaker autopilot builds up to 250 models so quickly.

Once sagemaker finishes experimenting you can find reports in the specified s3 location


Conclusion

  • Wow we just built several models demonstrating the robustness and elasticity of Amazon Sagemaker. We can use these results
    • To go back and use the results as POC.
    • Deploy the model very quickly (Can be enabled while creating sagemaker pilot experiment) if required.
    • To get a jumpstart in the full cycle as and when POC is approved.

As you finish your experiments make sure you delete all compute instances as well as Sagemaker studio to save cloud costs

Quiz Time!

Reflect a bit on your reading and take this quiz here to validate your learning

Wish to learn from a bigger community and work on an open project? Join us on Slack!


Created with 💙 by
author avatar
Soma
Cloud Engineer,Mentorskool
mentorskool logo
Mentee Today, Mentor Tomorrow
No 206, A Block, Sonesta Silver Oak,Varthur, Bangalore 560066,Karnataka
Copyright - 2022 © Mentorskool - All rights reserved.