Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Feature Engineering for Machine Learning
Welcome
Introduction (5:14)
Course curriculum (5:39)
Course requirements (2:28)
How to approach this course (1:56)
Setting up your computer
Refer a friend program
Course material
Course material (2:05)
Download Jupyter notebooks
Download datasets
Download presentations
Resources to learn machine learning skills
Variable types
Variables | Intro (2:37)
Numerical variables (4:05)
Categorical variables (3:43)
Date and time variables (1:50)
Mixed variables (2:17)
Exercise
Variable characteristics
Variable characteristics (2:44)
Missing data (6:43)
Cardinality (5:04)
Rare labels (4:54)
Variable distribution (5:13)
Outliers (8:27)
Linear models assumptions (8:59)
Variable magnitude (3:09)
Summary table
Additional reading resources
Exercise
Missing data imputation - Basic
Basic imputation methods (3:52)
Mean or median imputation (4:53)
Arbitrary value imputation (3:16)
Frequent category imputation (3:30)
Missing category imputation (1:22)
Adding a missing indicator (3:42)
Basic methods - considerations (11:15)
Basic imputation with pandas (6:45)
Basic imputation with pandas - demo (12:35)
Basic methods with Scikit-learn (9:44)
Mean or median imputation with Scikit-learn (10:53)
Arbitrary value imputation with Scikit-learn (3:57)
Frequent category imputation with Scikit-learn (4:38)
Missing category imputation with Scikit-learn (2:24)
Adding a missing indicator with Scikit-learn (4:59)
Imputation with GrdiSearch - Scikit-learn (8:24)
Basic methods with Feature-engine (7:19)
Mean or median imputation with Feature-engine (6:50)
Arbitrary value imputation with Feature-engine (3:16)
Frequent category imputation with Feature-engine (2:34)
Arbitrary string imputation with Feature-engine (3:24)
Adding a missing indicator with Feature-engine (4:52)
Wrapping up (2:19)
Exercise
Added Treat: A Movie We Recommend🍿
Missing data imputation - Alternative methods
Alternative imputation methods (2:59)
Complete Case Analysis (6:30)
CCA - considerations with code demo (3:45)
End of distribution imputation (4:14)
Random sample imputation (14:14)
Random imputation - considerations with code (7:56)
Mean or median imputation per group (4:32)
CCA with pandas (5:19)
End of distribution imputation with pandas (5:24)
Random sample imputation with pandas (4:46)
Mean imputation per group with pandas (5:34)
CCA with Feature-engine (6:47)
End of distribution imputation with Feature-engine (5:13)
Random sample imputation with Feature-engine (2:25)
Wrapping up (5:52)
Imputation - Summary table
Exercise
Multivariate imputation
Multivariate Imputation (5:18)
KNN imputation (4:22)
KNN imputation - Demo (7:04)
MICE (7:07)
missForest (1:07)
MICE and missForest - Demo (3:58)
Additional reading resources
Exercise
Extra Treat: Our Reading Suggestion 📕
Categorical encoding - Basic methods
Categorical encoding | Introduction (4:59)
One hot encoding (6:03)
One hot encoding with pandas (7:29)
One hot encoding with sklearn (11:06)
One hot encoding with Feature-engine (2:19)
One hot encoding with Category encoders (5:04)
Ordinal encoding (1:50)
Ordinal encoding with pandas (3:16)
Ordinal encoding with sklearn (4:05)
Ordinal encoding with Feature-engine (1:49)
Ordinal encoding with Category encoders (1:43)
Count or frequency encoding (3:11)
Count encoding with pandas (2:58)
Count encoding with Feature-engine (1:21)
Count encoding with Category encoders (1:42)
Unseen categories (11:35)
Wrapping up (3:03)
Categorical encoding - monotonic
Categorical encoding | Monotonic (5:09)
Ordered ordinal encoding (2:25)
Ordered ordinal encoding with pandas (8:11)
Ordered ordinal encoding with Feature-engine (2:36)
Mean encoding (1:34)
Mean encoding with pandas (4:39)
Mean encoding with Feature-engine (2:36)
Mean encoding with Category encoders (2:15)
Mean encoding plus smoothing (4:55)
Mean encoding plus smoothing - Category encoders (6:35)
Mean encoding plus smoothing - Feature-engine (6:15)
Weight of evidence (WoE) (4:36)
Weight of Evidence with pandas (9:47)
Weight of Evidence with Feature-engine (1:40)
Weight of Evidence with Category encoders (1:12)
Weight of evidence - gotchas (3:05)
Unseen categories (2:15)
Wrapping up (3:24)
Comparison of categorical variable encoding (9:09)
Additional reading resources
Categorical encoding - Rare labels
Grouping rare labels (4:17)
One hot encoding of top categories (3:06)
OHE of top categories with pandas (5:33)
OHE of top categories with Feature-engine (2:14)
OHE of top categories with sklearn (5:35)
Rare label encoding (4:31)
Rare label encoding with pandas (8:12)
Rare label encoding with Feature-engine (1:39)
Wrapping up (2:20)
Categorical encoding - More... (3:31)
More Wisdom: Our Chosen Podcast Episode 🎧
Variable transformation
Variable transformation - Introduction (3:36)
Variable transformation (6:46)
Box-Cox transformation (2:47)
Yeo-Johnson transformation (3:00)
Logarithm transformation with Numpy (5:14)
Reciprocal transformation with Numpy (1:55)
Square-root transformation with Numpy (1:12)
Power transformation with Numpy (1:17)
Box-Cox transformation with Scipy (1:37)
Yeo-Johnson transformation with Scipy (1:05)
Arcsin transformation with Numpy (1:16)
Logarithm transformation with sklearn (2:49)
Reciprocal transformation with sklearn (1:01)
Square-root transformation with sklearn (0:52)
Power transformation with sklearn (0:35)
Box-Cox transformation with sklearn (2:05)
Yeo-Johnson transformation with sklearn (0:56)
Arcsin transformation with sklearn (1:12)
Logarithm transformation with Feature-engine (3:41)
Reciprocal transformation with Feature-engine (0:44)
Square-root transformation with Feature-engine (0:57)
Power transformation with Feature-engine (0:53)
Box-Cox transformation with Feature-engine (1:07)
Yeo-Johnson transformation with Feature-engine (0:38)
Arcsin transformation with Feature-engine (1:28)
Wrapping up (4:59)
Additional reading resources
Quiz
Discretization - Basic methods
Discretization (4:43)
Discretization methods (3:53)
Equal-width discretization (4:06)
Equal-width discretization with pandas (5:52)
Equal-width discretization with sklearn (1:59)
Equal-width discretization with Feature-engine (2:36)
Equal-frequency discretization (4:13)
Equal-frequency discretization with pandas (3:59)
Equal-frequency discretization with sklearn (1:03)
Equal-frequency discretization with Feature-engine (1:32)
Arbitrary discretization (1:44)
Arbitrary discretization with pandas (3:06)
Arbitrary discretization with Feature-engine (2:20)
Discretization plus categorical encoding (2:54)
Discretization plus encoding | Demo (5:45)
Wrapping up (12:29)
Additional reading resources
Discretization - Alternative methods
Discretization - section intro (2:56)
K-means discretization (4:13)
K-means discretization with sklearn (2:43)
Discretization with classification trees (5:05)
Discretization with decision trees using Scikit-learn (11:55)
Discretization with decision trees using Feature-engine (3:48)
Binarization (2:13)
Binarization with sklearn (4:11)
Additional reading resources
Outliers
Outlier Engineering (7:16)
Outlier trimming with pandas (6:56)
Outlier trimming with Feature-engine (4:49)
Outlier capping with pandas (4:22)
Outlier capping with Feature-engine (2:34)
Arbitrary capping with Feature-engine (2:52)
Additional reading resources
Datetime variables
Datetime variables (4:40)
Date features with pandas (12:36)
Time features with pandas (4:08)
Date and time features with Feature-engine (5:34)
Cyclical features (6:24)
Cyclical features with pandas (4:02)
Cyclical features with Feature-engine (3:22)
Engineering mixed variables
Mixed variables (3:14)
Mixed variables | Demo (6:11)
Feature creation
Feature creation (2:11)
Math functions (5:06)
Math functions with pandas (2:52)
Math functions with Feature-engine (2:58)
Relative functions with pandas (1:53)
Relative functions with Feature-engine (2:57)
Polynomial features (3:54)
Polynomial features demo (3:45)
Features from decision trees (3:36)
Feature scaling
Feature scaling (3:10)
Scaling and distributions (3:49)
Standardisation (3:53)
Standardisation | Demo (2:26)
Scaling to minimum and maximum values (1:43)
MinMaxScaling | Demo (1:53)
Mean normalisation (2:12)
Mean normalisation | Demo (4:09)
Maximum absolute scaling (1:35)
MaxAbsScaling | Demo (2:08)
Scaling to median and quantiles (1:49)
Robust Scaling | Demo (1:40)
Scaling to vector unit length (5:45)
Scaling to vector unit length | Demo (4:24)
Scaling categorical variables
Additional reading resources
Assembling feature engineering pipelines
Putting it all together (6:17)
Feature Engineering Pipeline (8:09)
Classification pipeline (13:15)
Regression pipeline (13:51)
Feature engineering pipeline with cross-validation (6:47)
More examples
Congratulations! You did it!
Congratulations
Next steps
Equal-frequency discretization with Feature-engine
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock