Watch the intro video
Note: if you can't see the video, you might need to allow cookies or disable the add blocker.
Soledad Galli, PhD
Instructor
Sole is a lead data scientist, instructor and developer of open source software. She created and maintains the Python library for feature engineering Feature-engine, which allows us to impute data, encode categorical variables, transform, create and select features. Sole is also the author of the book "Python Feature engineering Cookbook" by Packt editorial.
Course description
Welcome to the most comprehensive course on feature engineering available online. In this course, you will learn about variable imputation, variable encoding, feature transformation, discretization, and how to create new features from your data.
Specifically, you will learn:
- How to impute missing data
- How to encode categorical variables
- How to transform numerical variables and change their distribution
- How to perform discretization
- How to remove outliers
- How to extract features from date and time
- How to create new features from existing ones
While most online courses will teach you the very basics of feature engineering, like imputing variables with the mean or transforming categorical variables using one hot encoding, this course will teach you that, and much, much more.
In this course, you will first learn the most popular and widely used techniques for variable engineering, like mean and median imputation, one-hot encoding, transformation with logarithm, and discretization. Then, you will discover more advanced methods that capture information while encoding or transforming your variables to improve the performance of machine learning models.
The methods that you will learn were described in scientific articles, are used in data science competitions, and are commonly utilized in organizations. And what’s more, they can be easily implemented by utilizing Python's open-source libraries!
Throughout the lectures, you’ll find detailed explanations of each technique and a discussion about their advantages, limitations, and underlying assumptions, followed by the best programming practices to implement them in Python.
By the end of the course, you will be able to decide which feature engineering technique you need based on the variable characteristics and the models you wish to train. And you will also be well placed to test various transformation methods and let your models decide which ones work best.
This comprehensive feature engineering course contains over 100 lectures spread across approximately 10 hours of video, and ALL topics include hands-on Python code examples that you can use for reference, practice, and reuse in your own projects.
Course Curriculum
- Variable characteristics (2:44)
- Missing data (6:46)
- Cardinality - categorical variables (5:04)
- Rare labels - categorical variables (4:54)
- Linear models assumptions (9:13)
- Linear model assumptions - additional reading resources (optional)
- Variable distribution (5:08)
- Outliers (8:27)
- Variable magnitude (3:09)
- Variable characteristics and machine learning models
- Additional reading resources
- Introduction to missing data imputation (3:51)
- Complete Case Analysis (6:46)
- Mean or median imputation (7:53)
- Arbitrary value imputation (6:39)
- End of distribution imputation (4:53)
- Frequent category imputation (6:56)
- Missing category imputation (4:05)
- Random sample imputation (14:14)
- Adding a missing indicator (5:25)
- Imputation with Scikit-learn (3:43)
- Mean or median imputation with Scikit-learn (5:27)
- Arbitrary value imputation with Scikit-learn (5:04)
- Frequent category imputation with Scikit-learn (5:09)
- Missing category imputation with Scikit-learn (2:50)
- Adding a missing indicator with Scikit-learn (4:06)
- Automatic determination of imputation method with Sklearn (8:24)
- Introduction to Feature-engine (6:25)
- Mean or median imputation with Feature-engine (4:51)
- Arbitrary value imputation with Feature-engine (3:30)
- End of distribution imputation with Feature-engine (4:46)
- Frequent category imputation with Feature-engine (1:38)
- Missing category imputation with Feature-engine (3:04)
- Random sample imputation with Feature-engine (2:00)
- Adding a missing indicator with Feature-engine (4:06)
- CCA with Feature-engine (6:47)
- Overview of missing value imputation methods
- Conclusion: when to use each missing data imputation method
- Categorical encoding | Introduction (6:49)
- One hot encoding (6:09)
- Important: Feature-engine version 1.0.0
- One-hot-encoding: Demo (14:12)
- One hot encoding of top categories (3:06)
- One hot encoding of top categories | Demo (8:35)
- Ordinal encoding | Label encoding (1:50)
- Ordinal encoding | Demo (8:08)
- Count or frequency encoding (3:11)
- Count encoding | Demo (4:33)
- Target guided ordinal encoding (2:41)
- Target guided ordinal encoding | Demo (8:30)
- Mean encoding (2:16)
- Mean encoding | Demo (5:31)
- Probability ratio encoding (6:13)
- Weight of evidence (WoE) (4:36)
- Weight of Evidence | Demo (12:38)
- Comparison of categorical variable encoding (10:36)
- Rare label encoding (4:31)
- Rare label encoding | Demo (10:25)
- Binary encoding and feature hashing (6:12)
- Additional reading resources
- Discretisation | Introduction (3:01)
- Equal-width discretisation (4:06)
- Important: Feature-engine v 1.0.0
- Equal-width discretisation | Demo (11:18)
- Equal-frequency discretisation (4:13)
- Equal-frequency discretisation | Demo (7:16)
- K-means discretisation (4:13)
- K-means discretisation| Demo (2:43)
- Discretisation plus categorical encoding (2:54)
- Discretisation plus encoding | Demo (5:45)
- Discretisation with classification trees (5:05)
- Discretisation with decision trees using Scikit-learn (11:55)
- Discretisation with decision trees using Feature-engine (3:48)
- Domain knowledge discretisation (3:52)
- Additional reading resources
- Feature scaling | Introduction (3:44)
- Standardisation (5:31)
- Standardisation | Demo (4:39)
- Mean normalisation (4:02)
- Mean normalisation | Demo (5:21)
- Scaling to minimum and maximum values (3:24)
- MinMaxScaling | Demo (3:01)
- Maximum absolute scaling (3:01)
- MaxAbsScaling | Demo (3:45)
- Scaling to median and quantiles (2:46)
- Robust Scaling | Demo (2:04)
- Scaling to vector unit length (5:51)
- Scaling to vector unit length | Demo (5:18)
- Additional reading resources