Mastodon hachyterm.io

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.

This is the sixth part of the blog post series.

10. Milestone Project 1

In this project we work through a dataset from start to finish. We use supervised machine learning to gain insight into a classification problem.

It’s useful to see how the instructor works through the problem. But I would have liked to see him using a different dataset.

During the course, we’ve solely used sanitized data, suitable for beginners. That was useful for learning how the libraries (pandas, scikit-learn, etc.) work.
But for the milestone project it would have been more valuable to see a messy data set from real life.

That said, there are some nuggets of wisdom inside the lectures.

You can find the Jupyter Notebook on GitHub.

11. Milestone Project 2

My main criticism for the previous section doesn’t hold true for the second milestone project.

This time, we work with a dataset from Kaggle. Kaggle is a competition website for data scientists.

The project seems to be more realistic than the previous assignment.

We have to work with feature engineering: dealing with missing values, transforming dates, encoding non-numeric data into categorical data.

The dataset is fairly large and has over one hundred different features (input variables). For me, the task provided an interesting challenge.

You can find the Jupyter Notebook on GitHub.


Go to the other parts of the series: