# Download Kaggle Datasets Into Google Colab

Google Colab is an online tool that allows you to run Python notebooks with free GPU acceleration.
Why is that useful?
Some machine learning models take a long time to compute and your local machine might not be able to run them.
The Colab notebooks are similar to Jupyter Notebooks, but they use the Google Drive environment.
You can always upload your dataset to Google Drive and connect your Drive to Colab.

# A Walkthrough of the “Complete Machine Learning and Data Science Zero to Mastery” Course (Part 06)

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.
This is the sixth part of the blog post series.
part 1 part 2 part 3 part 4 part 5 part 7 part 8 10. Milestone Project 1 In this project we work through a dataset from start to finish. We use supervised machine learning to gain insight into a classification problem.

# TIL About Feature Engineering

As I’m working through different data sets in my machine learning journey, it gets more obvious that you have to know about feature engineering.
Feature engineering is an umbrella term for transforming your input data.
A machine learning model can only be as efficient as the data you feed it.
Features are the different input qualities you give the model.
Often, these features are in the wrong format, or they are missing.

# Machine Learning Resources

I’ve recently begun learning about data science and machine learning.
Here are some resources that I found:
r/datascience Resources r/learnmachinelearning Resources r/learnmachinelearning: List of Machine Learning Resources for a Beginner Machine Learning Mastery: Start Here Daniel Bourke’s Resources Udemy: Complete Machine Learning and Data Science: Zero to Mastery Machine Learning Glossary Resources for learning numpy, pandas, etc. (applying deep learning is goal)? An Introduction to Statistical Learning The Elements of Statistical Learning Mathematics For Machine Learning Think Stats 2e Python Data Science Handbook The Hundred-Page Machine Learning Book ML from the Fundamentals Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems Data Science from Scratch: First Principles with Python MIT Deep Learning Book fast.

# A Walkthrough of the “Complete Machine Learning and Data Science Zero to Mastery” Course (Part 05)

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.
This is the fifth part of the blog post series.
part 1 part 2 part 3 part 4 part 6 part 7 part 8 9. Scikit-Learn Up until now, we’ve learned how to consume data and make fancy diagrams.
The current section finally deals with Machine Learning and teaches you the basics of Scikit-learn.

# TIL: How to Reduce Feature Labels With PCA

Today I learned how to reduce feature labels in a data set with Principal Component Analysis.
From Python Data Science Handbook:
Principal component analysis is a fast and flexible unsupervised method for dimensionality reduction in data, […]
You can use PCA to learn about the relationship between two values:
In principal component analysis, this relationship is quantified by finding a list of the principal axes in the data, and using those axes to describe the dataset.

# TIL: Numpy Array Slices Return Views

Today I learned that if you slice a list in Python, the program returns a copy of the list.
But NumPy returns a view, not a copy. That means, that slicing a NumPy array will modify it:
This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.

# TIL: How to Plot a Confusion Matrix

Let’s say we made some predictions with a machine-learning model using scikit-learn.
We want to evaluate how our model performs, and create a confusion matrix:
from sklearn.metrics import confusion_matrix ## make predictions with the scikit-learn model on the test data set y_preds = model.predict(X_test) ## Create confusion matrix on test data and predictions cm = confusion_matrix(y_test, y_preds) cm You’ll get an array like this:
array([[24, 5], [ 4, 28]]) We can visualize it with pandas: