Friday Picks 067

Read more →

Friday Picks 060

Read more →

Tuesday Picks 006

Read more →

Friday Picks 051

Read more →

TIL About Streamlit’s Magic

Streamlit allows you to write Markdown within a Python file (.py): import streamlit as st st.title("Otto Group Product Classification Challenge 🛍") st.markdown("## 1. Problem Statement") st.markdown( "Given a dataset with 93 features, create a predictive model which is able to distinguish between the main product categories." ) st.markdown("### 1.2 Evaluation") st.markdown( "The evaluation for the competition is multi-class logarithm loss. See Kaggle: Evaluation." ) I like that I can write Markdown, but the syntax is cumbersome.
Read more →

Friday Picks 043

Read more →

Run Streamlit With Docker and Docker-Compose

Create a Docker container that runs your machine learning models as a web application This article will explain the advantages of Streamlit and how to build a Streamlit application with Docker. Why Streamlit? You’ve explored your data and developed a machine learning model. It’s now time to release it to the world so that others can see what you’ve built. Now what? Deploying machine learning models is not trivial.
Read more →

WIP: Streamlit Project Notes

I finished the Complete Machine Learning and Data Science: Zero to Mastery this weekend (and wrote about it). The course has given me the foundations of working with data in Python. Practice makes perfect. My goal is to sharpen my skills by exploring a Kaggle dataset, building a model and deploying it with Streamlit using Docker and Heroku. The project will be on GitHub where I will post all the links, my thoughts and observations.
Read more →

A Walkthrough of the “Complete Machine Learning and Data Science Zero to Mastery” Course (Part 08)

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes. This is the eigth part of the blog post series. part 1 part 2 part 3 part 4 part 5 part 6 part 7 TL;DR (A Review of The Complete Course) The program is a praise-worthy introduction to data science and machine learning with Python. The instructors focus on practical skills and convey an enormous topic in a captivating and friendly way.
Read more →

A Walkthrough of the “Complete Machine Learning and Data Science Zero to Mastery” Course (Part 07)

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes. This is the seventh part of the blog post series. part 1 part 2 part 3 part 4 part 5 part 6 part 8 13. Data Engineering These lectures cover what kind of data we have (structured data, unstructured data, etc.). How can we make the raw data consumable for machine learning libraries?
Read more →

Question: How to Speed Up Hyper-Tuning?

Read more →

Find the Best Model Pipeline

Create a pipeline to score different machine learning models with scikit-learn After the initial data exploration I would like to get a quick gauge on what model would be best for the problem at hand. A rough estimate helps in narrowing which machine-learning model to use and tune later. It helps to get a sense on how effective perspective algorithms will be. The goal is to get a big picture overview.
Read more →

Script to Stop Google Colab From Disconnecting

Google Colab is a free online coding environment that offers GPU acceleration for your data science and machine learning needs. It runs on top of Jupyter Notebooks. That means that the interface is familiar to most data scientists that use Python. If your local machine is too slow for some of the more intensive computations you need for machine learning, Colab can help you out. When you use the remote runtime with the free GPU, the runtime disconnects after a while.
Read more →

Write Your Own Cross Validation Function With make_scorer in scikit-learn

You want to score a list of models with cross-validation with customized scoring methods. Why Cross-validation? A common approach to machine learning is to split your data into three different sets: a training set, a test set, and a validation set. Learning the parameters of a prediction function and testing it on the same data is a methodological mistake: a model that would just repeat the labels of the samples that it has just seen would have a perfect score but would fail to predict anything useful on yet-unseen data.
Read more →

Friday Picks 041

Read more →

Download Kaggle Datasets Into Google Colab

Google Colab is an online tool that allows you to run Python notebooks with free GPU acceleration. Why is that useful? Some machine learning models take a long time to compute and your local machine might not be able to run them. The Colab notebooks are similar to Jupyter Notebooks, but they use the Google Drive environment. You can always upload your dataset to Google Drive and connect your Drive to Colab.
Read more →

A Walkthrough of the “Complete Machine Learning and Data Science Zero to Mastery” Course (Part 06)

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes. This is the sixth part of the blog post series. part 1 part 2 part 3 part 4 part 5 part 7 part 8 10. Milestone Project 1 In this project we work through a dataset from start to finish. We use supervised machine learning to gain insight into a classification problem.
Read more →

TIL About Feature Engineering

As I’m working through different data sets in my machine learning journey, it gets more obvious that you have to know about feature engineering. Feature engineering is an umbrella term for transforming your input data. A machine learning model can only be as efficient as the data you feed it. Features are the different input qualities you give the model. Often, these features are in the wrong format, or they are missing.
Read more →

Machine Learning Resources

Machine Learning Resources
I’ve recently begun learning about data science and machine learning. Here are some resources that I found: A Lightning-Fast Introduction To Deep Learning And Tensorflow 2.0 An Introduction to Statistical Learning Awesome Data Science Awesome Machine Learning and AI Courses Awesome Machine Learning Chris Albon’s Notes Coursera Mathematics for Machine Learning Daniel Bourke’s Resources and his AI Masters Degree Data Science from Scratch: First Principles with Python Deploying and Hosting a Machine Learning Model with FastAPI and Heroku End to End Machine Learning Tutorial — From Data Collection to Deployment 🚀 Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems Khan Academy: Math MIT Deep Learning Book ML from the Fundamentals Machine Learning Algorithms from Scratch in Python Machine Learning Glossary Machine Learning Mastery: Start Here Mathematics For Machine Learning Mathematics for the adventurous self-learner Neural Networks and Deep Learning Python Data Science Handbook Resources for learning numpy, pandas, etc.
Read more →

A Walkthrough of the “Complete Machine Learning and Data Science Zero to Mastery” Course (Part 05)

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes. This is the fifth part of the blog post series. part 1 part 2 part 3 part 4 part 6 part 7 part 8 9. Scikit-Learn Up until now, we’ve learned how to consume data and make fancy diagrams. The current section finally deals with Machine Learning and teaches you the basics of Scikit-learn.
Read more →

TIL: How to Reduce Feature Labels With PCA

Today I learned how to reduce feature labels in a data set with Principal Component Analysis. From Python Data Science Handbook: Principal component analysis is a fast and flexible unsupervised method for dimensionality reduction in data, […] You can use PCA to learn about the relationship between two values: In principal component analysis, this relationship is quantified by finding a list of the principal axes in the data, and using those axes to describe the dataset.
Read more →

TIL: Numpy Array Slices Return Views

Today I learned that if you slice a list in Python, the program returns a copy of the list. But NumPy returns a view, not a copy. That means, that slicing a NumPy array will modify it: This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.
Read more →

Friday Picks 039

Read more →

TIL: How to Plot a Confusion Matrix

Let’s say we made some predictions with a machine-learning model using scikit-learn. We want to evaluate how our model performs, and create a confusion matrix: from sklearn.metrics import confusion_matrix ## make predictions with the scikit-learn model on the test data set y_preds = model.predict(X_test) ## Create confusion matrix on test data and predictions cm = confusion_matrix(y_test, y_preds) cm You’ll get an array like this: array([[24, 5], [ 4, 28]]) We can visualize it with pandas:
Read more →

Learning Progress: Creating Visualizations With Pandas and Matplotlib

Read more →

TIL: Pandas - Read CSV With Custom Separator Using Regex

If you want to convert a CSV file into Pandas, you can use [pandas.read_csv][readcsv]. The function takes several options. One of them is sep (default value is ,). You can use a regular expression to customize the delimiter. Let’s say your data looks like this: vhigh,high,2,2,more,small med,vhigh,3,more,big … You want to load that data into a Pandas DataFrame. You can split each line on the comma, but you want to ignore the comma inside floating point numbers like 2.
Read more →

A Walkthrough of the “Complete Machine Learning and Data Science Zero to Mastery” Course (Part 04)

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes. This is the fourth part of the blog post series. part 1 part 2 part 3 part 5 part 6 part 7 part 8 7. NumPy The section covers an introduction into NumPy. NumPy will covert any data into a series of numbers. NumPy is the backbone of all data-science in Python.
Read more →

TIL: JupyterLab: Run All Cells

Read more →

A Walkthrough of the “Complete Machine Learning and Data Science Zero to Mastery” Course (Part 03)

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes. This is the third part of the blog post series. part 1 part 2 part 4 part 5 part 6 part 7 part 8 4. The 2 Paths The class aims to be beginner-friendly. Now you have the choice to learn how to program in Python or to continue with the default route.
Read more →

A Walkthrough of the “Complete Machine Learning and Data Science Zero to Mastery” Course (Part 02)

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes. This is the second part of the blog post series. part 1 part 3 part 4 part 5 part 6 part 7 part 8 3. Machine Learning and Data Science Framework The course focusses on learning by doing. Instead of learning higher mathematics and over-thinking the process, the instructors show you a framework that encourages a fast feedback loop.
Read more →

A Walkthrough of the “Complete Machine Learning and Data Science: Zero to Mastery” Course (Part 01)

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery. The course runs under the flag of Andrei Neagoie. Andrei is a popular instructor on Udemy, with almost 200.000 students, and top reviews. For this course, he has paired up with Daniel Bourke, a self-taught Machine Learning Engineer from Australia. In this blog post series, I will jot down my thoughts on the course, and what I’ve learned.
Read more →