Autoplay
Autocomplete
Previous Lecture
Complete and Continue
Feature Engineering for Machine Learning
Introduction
Introduction (5:14)
Course curriculum (5:27)
Course requirements (2:28)
How to approach this course
Setting up your computer
Course material (1:54)
Download Jupyter notebooks
Download datasets
Download presentations
Moving forward (2:04)
FAQ: Data science, Python, datasets, presentations and more..
Variable types
Variables | Intro (2:37)
Numerical variables (5:03)
Categorical variables (3:43)
Date and time variables (1:58)
Mixed variables (2:17)
Quiz about variable types
Variable characteristics
Variable characteristics (2:44)
Missing data (6:46)
Cardinality - categorical variables (5:04)
Rare labels - categorical variables (4:54)
Linear models assumptions (9:13)
Linear model assumptions - additional reading resources (optional)
Variable distribution (5:08)
Outliers (8:27)
Variable magnitude (3:09)
Variable characteristics and machine learning models
Additional reading resources
Missing data imputation
Introduction to missing data imputation (3:51)
Complete Case Analysis (6:46)
Mean or median imputation (7:53)
Arbitrary value imputation (6:39)
End of distribution imputation (4:53)
Frequent category imputation (6:56)
Missing category imputation (4:05)
Random sample imputation (14:14)
Adding a missing indicator (5:25)
Imputation with Scikit-learn (3:43)
Mean or median imputation with Scikit-learn (5:27)
Arbitrary value imputation with Scikit-learn (5:04)
Frequent category imputation with Scikit-learn (5:09)
Missing category imputation with Scikit-learn (2:50)
Adding a missing indicator with Scikit-learn (4:06)
Automatic determination of imputation method with Sklearn (8:24)
Introduction to Feature-engine (6:25)
Mean or median imputation with Feature-engine (4:51)
Arbitrary value imputation with Feature-engine (3:30)
End of distribution imputation with Feature-engine (4:46)
Frequent category imputation with Feature-engine (1:38)
Missing category imputation with Feature-engine (3:04)
Random sample imputation with Feature-engine (2:00)
Adding a missing indicator with Feature-engine (4:06)
CCA with Feature-engine (6:47)
Overview of missing value imputation methods
Conclusion: when to use each missing data imputation method
Multivariate imputation
Multivariate Imputation (3:31)
KNN imputation (4:22)
KNN imputation - Demo (7:04)
MICE (7:07)
missForest (1:07)
MICE and missForest - Demo (3:58)
Additional reading resources (Optional)
Categorical variable encoding
Categorical encoding | Introduction (6:49)
One hot encoding (6:09)
Important: Feature-engine version 1.0.0
One-hot-encoding: Demo (14:12)
One hot encoding of top categories (3:06)
One hot encoding of top categories | Demo (8:35)
Ordinal encoding | Label encoding (1:50)
Ordinal encoding | Demo (8:08)
Count or frequency encoding (3:11)
Count encoding | Demo (4:33)
Target guided ordinal encoding (2:41)
Target guided ordinal encoding | Demo (8:30)
Mean encoding (2:16)
Mean encoding | Demo (5:31)
Probability ratio encoding (6:13)
Weight of evidence (WoE) (4:36)
Weight of Evidence | Demo (12:38)
Comparison of categorical variable encoding (10:36)
Rare label encoding (4:31)
Rare label encoding | Demo (10:25)
Binary encoding and feature hashing (6:12)
Additional reading resources
Variable transformation
Variable transformation | Introduction (4:48)
Variable transformation with Numpy and SciPy (7:38)
Variable transformation with Scikit-learn (7:03)
Variable transformation with Feature-engine (3:41)
Discretisation
Discretisation | Introduction (3:01)
Equal-width discretisation (4:06)
Important: Feature-engine v 1.0.0
Equal-width discretisation | Demo (11:18)
Equal-frequency discretisation (4:13)
Equal-frequency discretisation | Demo (7:16)
K-means discretisation (4:13)
K-means discretisation| Demo (2:43)
Discretisation plus categorical encoding (2:54)
Discretisation plus encoding | Demo (5:45)
Discretisation with classification trees (5:05)
Discretisation with decision trees using Scikit-learn (11:55)
Discretisation with decision trees using Feature-engine (3:48)
Domain knowledge discretisation (3:52)
Additional reading resources
Outliers
Outlier Engineering | Intro (7:42)
Outlier trimming (7:21)
Outlier capping with IQR (6:24)
Outlier capping with mean and std (4:44)
Outlier capping with quantiles (3:17)
Arbitrary capping (3:33)
Important: Feature-engine v1.0.0
Additional reading resources
Feature scaling
Feature scaling | Introduction (3:44)
Standardisation (5:31)
Standardisation | Demo (4:39)
Mean normalisation (4:02)
Mean normalisation | Demo (5:21)
Scaling to minimum and maximum values (3:24)
MinMaxScaling | Demo (3:01)
Maximum absolute scaling (3:01)
MaxAbsScaling | Demo (3:45)
Scaling to median and quantiles (2:46)
Robust Scaling | Demo (2:04)
Scaling to vector unit length (5:51)
Scaling to vector unit length | Demo (5:18)
Additional reading resources
Engineering mixed variables
Engineering mixed variables (3:14)
Engineering mixed variables | Demo (6:11)
Datetime variables
Engineering datetime variables (4:43)
Engineering dates | Demo (8:17)
Engineering time variables and different timezones (4:34)
Assembling feature engineering pipelines
Putting it all together (6:43)
Feature Engineering Pipeline (8:22)
Classification pipelinee (13:15)
Regression pipeline (13:51)
Feature engineering pipeline with cross-validation (6:47)
More examples
Final section | Next steps
Survey
Congratulations
Next steps
Overview of missing value imputation methods
Lecture content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock