TIL About Streamlit’s Magic

Streamlit allows you to write Markdown within a Python file (.py): import streamlit as st st.title("Otto Group Product Classification Challenge 🛍") st.markdown("## 1. Problem Statement") st.markdown( "Given a dataset with 93 features, create a predictive model which is able to distinguish between the main product categories." ) st.markdown("### 1.2 Evaluation") st.markdown( "The evaluation for the competition is multi-class logarithm loss. See Kaggle: Evaluation." ) I like that I can write Markdown, but the syntax is cumbersome.
Read more →

Friday Picks 043

Read more →

Run Streamlit With Docker and Docker-Compose

Create a Docker container that runs your machine learning models as a web application This article will explain the advantages of Streamlit and how to build a Streamlit application with Docker. Why Streamlit? You’ve explored your data and developed a machine learning model. It’s now time to release it to the world so that others can see what you’ve built. Now what? Deploying machine learning models is not trivial.
Read more →

WIP: Streamlit Project Notes

I finished the Complete Machine Learning and Data Science: Zero to Mastery this weekend (and wrote about it). The course has given me the foundations of working with data in Python. Practice makes perfect. My goal is to sharpen my skills by exploring a Kaggle dataset, building a model and deploying it with Streamlit using Docker and Heroku. The project will be on GitHub where I will post all the links, my thoughts and observations.
Read more →

A Walkthrough of the “Complete Machine Learning and Data Science Zero to Mastery” Course (Part 08)

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes. This is the eigth part of the blog post series. part 1 part 2 part 3 part 4 part 5 part 6 part 7 TL;DR (A Review of The Complete Course) The program is a praise-worthy introduction to data science and machine learning with Python. The instructors focus on practical skills and convey an enormous topic in a captivating and friendly way.
Read more →

A Walkthrough of the “Complete Machine Learning and Data Science Zero to Mastery” Course (Part 07)

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes. This is the seventh part of the blog post series. part 1 part 2 part 3 part 4 part 5 part 6 part 8 13. Data Engineering These lectures cover what kind of data we have (structured data, unstructured data, etc.). How can we make the raw data consumable for machine learning libraries?
Read more →

Question: How to Speed Up Hyper-Tuning?

Read more →

Find the Best Model Pipeline

Create a pipeline to score different machine learning models with scikit-learn After the initial data exploration I would like to get a quick gauge on what model would be best for the problem at hand. A rough estimate helps in narrowing which machine-learning model to use and tune later. It helps to get a sense on how effective perspective algorithms will be. The goal is to get a big picture overview.
Read more →

Script to Stop Google Colab From Disconnecting

Google Colab is a free online coding environment that offers GPU acceleration for your data science and machine learning needs. It runs on top of Jupyter Notebooks. That means that the interface is familiar to most data scientists that use Python. If your local machine is too slow for some of the more intensive computations you need for machine learning, Colab can help you out. When you use the remote runtime with the free GPU, the runtime disconnects after a while.
Read more →

Write Your Own Cross Validation Function With make_scorer in scikit-learn

You want to score a list of models with cross-validation with customized scoring methods. Why Cross-validation? A common approach to machine learning is to split your data into three different sets: a training set, a test set, and a validation set. Learning the parameters of a prediction function and testing it on the same data is a methodological mistake: a model that would just repeat the labels of the samples that it has just seen would have a perfect score but would fail to predict anything useful on yet-unseen data.
Read more →