As I’m working through different data sets in my machine learning journey, it gets more obvious that you have to know about feature engineering.

Feature engineering is an umbrella term for transforming your input data.

A machine learning model can only be as efficient as the data you feed it.

Features are the different input qualities you give the model.

Often, these features are in the wrong format, or they are missing. Some libraries (e.g., scikit-learn) can’t handle missing data.

Thus, you have to transform the input data into a format that is compatible with your machine learning model.

Several models don’t work well if you have many input variables. In that case, you have to whittle down the number of features.

It’s a vast topic, and I have only started to scratch the surface.

Further Reading