I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes. This is the seventh part of the blog post series. part 1 part 2 part 3 part 4 part 5 part 6 part 8 13. Data Engineering These lectures cover what kind of data we have (structured data, unstructured data, etc.). How can we make the raw data consumable for machine learning libraries?
Create a pipeline to score different machine learning models with scikit-learn After the initial data exploration I would like to get a quick gauge on what model would be best for the problem at hand. A rough estimate helps in narrowing which machine-learning model to use and tune later. It helps to get a sense on how effective perspective algorithms will be. The goal is to get a big picture overview.
Google Colab is a free online coding environment that offers GPU acceleration for your data science and machine learning needs. It runs on top of Jupyter Notebooks. That means that the interface is familiar to most data scientists that use Python. If your local machine is too slow for some of the more intensive computations you need for machine learning, Colab can help you out. When you use the remote runtime with the free GPU, the runtime disconnects after a while.
You want to score a list of models with cross-validation with customized scoring methods. Why Cross-validation? A common approach to machine learning is to split your data into three different sets: a training set, a test set, and a validation set. Learning the parameters of a prediction function and testing it on the same data is a methodological mistake: a model that would just repeat the labels of the samples that it has just seen would have a perfect score but would fail to predict anything useful on yet-unseen data.
Google Colab is an online tool that allows you to run Python notebooks with free GPU acceleration. Why is that useful? Some machine learning models take a long time to compute and your local machine might not be able to run them. The Colab notebooks are similar to Jupyter Notebooks, but they use the Google Drive environment. You can always upload your dataset to Google Drive and connect your Drive to Colab.