Watch the intro video

Note: if you can't see the video, you might need to allow cookies or disable the add blocker.


Soledad Galli, PhD

Instructor


Sole is a lead data scientist, instructor and developer of open source software. She created and maintains the Python library for feature engineering Feature-engine, which allows us to impute data, encode categorical variables, transform, create and select features. Sole is also the author of the book "Python Feature engineering Cookbook" by Packt editorial.

Course Description


Welcome to Machine Learning with Imbalanced Datasets. In this course, you will learn multiple techniques which you can use with imbalanced datasets to improve the performance of your machine learning models.


If you are working with imbalanced datasets right now and want to improve the performance of your models, or you simply want to learn more about how to tackle data imbalance, this course will show you how.


We'll take you step-by-step through engaging video tutorials and teach you everything you need to know about working with imbalanced datasets. Throughout this comprehensive course, we cover almost every available methodology to work with imbalanced datasets, discussing their logic, their implementation in Python, their advantages and shortcomings, and the considerations to have when using the technique. Specifically, you will learn:


  • Under-sampling methods at random or focused on highlighting certain sample populations
  • Over-sampling methods at random and those which create new examples based of existing observations
  • Ensemble methods that leverage the power of multiple weak learners in conjunction with sampling techniques to boost model performance
  • Cost sensitive methods which penalize wrong decisions more severely for minority classes
  • The appropriate metrics to evaluate model performance on imbalanced datasets


By the end of the course, you will be able to decide which technique is suitable for your dataset, and / or apply and compare the improvement in performance returned by the different methods on multiple datasets.


This comprehensive machine learning course includes over 50 lectures spanning more than 10 hours of video, and ALL topics include hands-on Python code examples which you can use for reference and for practice, and re-use in your own projects.


In addition, the code is updated regularly to keep up with new trends and new Python library releases.


Example Curriculum

  Welcome
Available in days
days after you enroll
  Machine Learning with Imbalanced Data: Overview
Available in days
days after you enroll
  Evaluation Metrics
Available in days
days after you enroll
  Udersampling
Available in days
days after you enroll
  Oversampling
Available in days
days after you enroll
  Over and Undersampling
Available in days
days after you enroll
  Ensemble Methods
Available in days
days after you enroll
  Cost Sensitive Learning
Available in days
days after you enroll
  Probability Calibration
Available in days
days after you enroll
  Putting it all together
Available in days
days after you enroll
  Next steps
Available in days
days after you enroll