A Walkthrough of the “Complete Machine Learning and Data Science Zero to Mastery” Course (Part 07)

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes. This is the seventh part of the blog post series. part 1 part 2 part 3 part 4 part 5 part 6 part 8 13. Data Engineering These lectures cover what kind of data we have (structured data, unstructured data, etc.). How can we make the raw data consumable for machine learning libraries?
Read more →

Friday Picks 042

Read more →

More Learning Resources During COVID-19-Outbreak

Here are some more resources for learning new things during self-isolation: Education Links: a collection of links to help you and your kids Amazing Educational Resources: a list of resources with free or discounted offers OpenLearn (free learning platform from the Open University) Scholarships for Students on Codecademy: 10,000 free accounts for high-schoolers and college students around the world Shawn Wildermuth’s Courses: free courses on Bootstrap 4, Vue, SignalR during the crisis Free JavaScript/Node/CSS books 365 Data Science: Free Access Till April 15th: learn mathematics, statistics, SQL, Python, machine learning 50% discount on Wes Bos’s courses: Wes Bos is a respected JavaScript teacher 50% discount on dataquest.
Read more →

Free Learning Resources During COVID-19-Outbreak

Read more →

Question: How to Speed Up Hyper-Tuning?

Read more →

Find the Best Model Pipeline

Create a pipeline to score different machine learning models with scikit-learn After the initial data exploration I would like to get a quick gauge on what model would be best for the problem at hand. A rough estimate helps in narrowing which machine-learning model to use and tune later. It helps to get a sense on how effective perspective algorithms will be. The goal is to get a big picture overview.
Read more →

Script to Stop Google Colab From Disconnecting

Google Colab is a free online coding environment that offers GPU acceleration for your data science and machine learning needs. It runs on top of Jupyter Notebooks. That means that the interface is familiar to most data scientists that use Python. If your local machine is too slow for some of the more intensive computations you need for machine learning, Colab can help you out. When you use the remote runtime with the free GPU, the runtime disconnects after a while.
Read more →

Write Your Own Cross Validation Function With make_scorer in scikit-learn

You want to score a list of models with cross-validation with customized scoring methods. Why Cross-validation? A common approach to machine learning is to split your data into three different sets: a training set, a test set, and a validation set. Learning the parameters of a prediction function and testing it on the same data is a methodological mistake: a model that would just repeat the labels of the samples that it has just seen would have a perfect score but would fail to predict anything useful on yet-unseen data.
Read more →

Friday Picks 041

Read more →

Download Kaggle Datasets Into Google Colab

Google Colab is an online tool that allows you to run Python notebooks with free GPU acceleration. Why is that useful? Some machine learning models take a long time to compute and your local machine might not be able to run them. The Colab notebooks are similar to Jupyter Notebooks, but they use the Google Drive environment. You can always upload your dataset to Google Drive and connect your Drive to Colab.
Read more →