Download Kaggle Datasets Into Google Colab

Google Colab is an online tool that allows you to run Python notebooks with free GPU acceleration. Why is that useful? Some machine learning models take a long time to compute and your local machine might not be able to run them. The Colab notebooks are similar to Jupyter Notebooks, but they use the Google Drive environment. You can always upload your dataset to Google Drive and connect your Drive to Colab.
Read more →

A Walkthrough of the “Complete Machine Learning and Data Science Zero to Mastery” Course (Part 06)

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes. This is the sixth part of the blog post series. part 1 part 2 part 3 part 4 part 5 part 7 part 8 10. Milestone Project 1 In this project we work through a dataset from start to finish. We use supervised machine learning to gain insight into a classification problem.
Read more →

TIL About Feature Engineering

As I’m working through different data sets in my machine learning journey, it gets more obvious that you have to know about feature engineering. Feature engineering is an umbrella term for transforming your input data. A machine learning model can only be as efficient as the data you feed it. Features are the different input qualities you give the model. Often, these features are in the wrong format, or they are missing.
Read more →

TIL About Logistic Regression

Today I learned about logistic regression. Logistic Regression is a statistical model that we can use for classification problems in machine learning. You can easily confuse the term with linear regression. With linear regression, you model the probability of a quantitative value, for example a price. With logistic regression you can predict categories: yes/no, pass/fail, etc. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.
Read more →

Machine Learning Resources

I’ve recently begun learning about data science and machine learning. Here are some resources that I found: r/datascience Resources r/learnmachinelearning Resources r/learnmachinelearning: List of Machine Learning Resources for a Beginner Machine Learning Mastery: Start Here Daniel Bourke’s Resources Udemy: Complete Machine Learning and Data Science: Zero to Mastery Machine Learning Glossary Resources for learning numpy, pandas, etc. (applying deep learning is goal)? An Introduction to Statistical Learning The Elements of Statistical Learning Mathematics For Machine Learning Think Stats 2e Python Data Science Handbook The Hundred-Page Machine Learning Book ML from the Fundamentals Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems Data Science from Scratch: First Principles with Python MIT Deep Learning Book fast.
Read more →

A Walkthrough of the “Complete Machine Learning and Data Science Zero to Mastery” Course (Part 05)

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes. This is the fifth part of the blog post series. part 1 part 2 part 3 part 4 part 6 part 7 part 8 9. Scikit-Learn Up until now, we’ve learned how to consume data and make fancy diagrams. The current section finally deals with Machine Learning and teaches you the basics of Scikit-learn.
Read more →

TIL: How to Reduce Feature Labels With PCA

Today I learned how to reduce feature labels in a data set with Principal Component Analysis. From Python Data Science Handbook: Principal component analysis is a fast and flexible unsupervised method for dimensionality reduction in data, […] You can use PCA to learn about the relationship between two values: In principal component analysis, this relationship is quantified by finding a list of the principal axes in the data, and using those axes to describe the dataset.
Read more →

Friday Picks 039

Read more →

TIL: How to Plot a Confusion Matrix

Let’s say we made some predictions with a machine-learning model using scikit-learn. We want to evaluate how our model performs, and create a confusion matrix: from sklearn.metrics import confusion_matrix ## make predictions with the scikit-learn model on the test data set y_preds = model.predict(X_test) ## Create confusion matrix on test data and predictions cm = confusion_matrix(y_test, y_preds) cm You’ll get an array like this: array([[24, 5], [ 4, 28]]) We can visualize it with pandas:
Read more →

A Walkthrough of the “Complete Machine Learning and Data Science Zero to Mastery” Course (Part 04)

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes. This is the fourth part of the blog post series. part 1 part 2 part 3 part 5 part 6 part 7 part 8 7. NumPy The section covers an introduction into NumPy. NumPy will covert any data into a series of numbers. NumPy is the backbone of all data-science in Python.
Read more →