2020-03-28

#TIL
#Python
#Data-Science
#Machine-Learning Streamlit allows you to write Markdown within a Python file (.py):
import streamlit as st st.title("Otto Group Product Classification Challenge 🛍") st.markdown("## 1. Problem Statement") st.markdown( "Given a

dataset with

**93 features**, create a

**predictive model** which is able to

**distinguish between the main product categories**." ) st.markdown("### 1.2 Evaluation") st.markdown( "The evaluation for the competition is

**multi-class logarithm loss**. See

Kaggle: Evaluation." ) I like that I can write Markdown, but the syntax is cumbersome.

2020-03-25

#Python
#Docker
#Machine-Learning
#Data-Science Create a Docker container that runs your machine learning models as a web application
This article will explain the advantages of Streamlit and how to build a Streamlit application with Docker.
Why Streamlit? You’ve explored your data and developed a machine learning model. It’s now time to release it to the world so that others can see what you’ve built.
Now what?
Deploying machine learning models is not trivial.

2020-03-23

#Python
#Data-Science
#Machine-Learning I finished the Complete Machine Learning and Data Science: Zero to Mastery this weekend (and wrote about it).
The course has given me the foundations of working with data in Python.
Practice makes perfect.
My goal is to sharpen my skills by exploring a Kaggle dataset, building a model and deploying it with Streamlit using Docker and Heroku.
The project will be on GitHub where I will post all the links, my thoughts and observations.

2020-03-22

#Python
#Data-Science
#Machine-Learning
#Lab
#Udemy: Complete Machine Learning and Data Science: Zero to Mastery I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.
This is the eigth part of the blog post series.
part 1 part 2 part 3 part 4 part 5 part 6 part 7 TL;DR (A Review of The Complete Course) The program is a praise-worthy introduction to data science and machine learning with Python. The instructors focus on practical skills and convey an enormous topic in a captivating and friendly way.

2020-03-21

#Python
#Data-Science
#Machine-Learning
#Lab
#Udemy: Complete Machine Learning and Data Science: Zero to Mastery I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.
This is the seventh part of the blog post series.
part 1 part 2 part 3 part 4 part 5 part 6 part 8 13. Data Engineering These lectures cover what kind of data we have (structured data, unstructured data, etc.).
How can we make the raw data consumable for machine learning libraries?

2020-03-16

#Python
#Machine-Learning
#Data-Science Create a pipeline to score different machine learning models with scikit-learn
After the initial data exploration I would like to get a quick gauge on what model would be best for the problem at hand.
A rough estimate helps in narrowing which machine-learning model to use and tune later. It helps to get a sense on how effective perspective algorithms will be.
The goal is to get a big picture overview.

2020-03-15

#Machine-Learning
#Data-Science
#Python
#DevTools Google Colab is a free online coding environment that offers GPU acceleration for your data science and machine learning needs.
It runs on top of Jupyter Notebooks. That means that the interface is familiar to most data scientists that use Python.
If your local machine is too slow for some of the more intensive computations you need for machine learning, Colab can help you out.
When you use the remote runtime with the free GPU, the runtime disconnects after a while.

2020-03-14

#Python
#Machine-Learning
#Data-Science You want to score a list of models with cross-validation with customized scoring methods.
Why Cross-validation? A common approach to machine learning is to split your data into three different sets: a training set, a test set, and a validation set.
Learning the parameters of a prediction function and testing it on the same data is a methodological mistake: a model that would just repeat the labels of the samples that it has just seen would have a perfect score but would fail to predict anything useful on yet-unseen data.

2020-03-12

#Python
#Data-Science
#Machine-Learning
#DevTools Google Colab is an online tool that allows you to run Python notebooks with free GPU acceleration.
Why is that useful?
Some machine learning models take a long time to compute and your local machine might not be able to run them.
The Colab notebooks are similar to Jupyter Notebooks, but they use the Google Drive environment.
You can always upload your dataset to Google Drive and connect your Drive to Colab.

2020-03-11

#Python
#Data-Science
#Machine-Learning
#Lab
#Udemy: Complete Machine Learning and Data Science: Zero to Mastery I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.
This is the sixth part of the blog post series.
part 1 part 2 part 3 part 4 part 5 part 7 part 8 10. Milestone Project 1 In this project we work through a dataset from start to finish. We use supervised machine learning to gain insight into a classification problem.

2020-03-09

#TIL
#Python
#Data-Science
#Machine-Learning As I’m working through different data sets in my machine learning journey, it gets more obvious that you have to know about feature engineering.
Feature engineering is an umbrella term for transforming your input data.
A machine learning model can only be as efficient as the data you feed it.
Features are the different input qualities you give the model.
Often, these features are in the wrong format, or they are missing.

2020-03-07

#TIL
#Python
#Machine-Learning Today I learned about logistic regression.
Logistic Regression is a statistical model that we can use for classification problems in machine learning.
You can easily confuse the term with linear regression.
With linear regression, you model the probability of a quantitative value, for example a price.
With logistic regression you can predict categories: yes/no, pass/fail, etc.
Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

2020-03-05

#Python
#Data-Science
#Machine-Learning
I’ve recently begun learning about data science and machine learning.
Here are some resources that I found:
r/datascience Resources r/learnmachinelearning Resources r/learnmachinelearning: List of Machine Learning Resources for a Beginner Machine Learning Mastery: Start Here Daniel Bourke’s Resources and his AI Masters Degree Udemy: Complete Machine Learning and Data Science: Zero to Mastery Machine Learning Glossary Resources for learning numpy, pandas, etc. (applying deep learning is goal)? End to End Machine Learning Tutorial — From Data Collection to Deployment 🚀 An Introduction to Statistical Learning The Elements of Statistical Learning Mathematics For Machine Learning Mathematics for the adventurous self-learner Khan Academy: Math Think Stats 2e Python Data Science Handbook The Hundred-Page Machine Learning Book ML from the Fundamentals Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems Data Science from Scratch: First Principles with Python MIT Deep Learning Book fast.

2020-03-04

#Python
#Data-Science
#Machine-Learning
#Lab
#Udemy: Complete Machine Learning and Data Science: Zero to Mastery I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.
This is the fifth part of the blog post series.
part 1 part 2 part 3 part 4 part 6 part 7 part 8 9. Scikit-Learn Up until now, we’ve learned how to consume data and make fancy diagrams.
The current section finally deals with Machine Learning and teaches you the basics of Scikit-learn.

2020-03-02

#TIL
#Python
#Data-Science
#Machine-Learning Today I learned how to reduce feature labels in a data set with Principal Component Analysis.
From Python Data Science Handbook:
Principal component analysis is a fast and flexible unsupervised method for dimensionality reduction in data, […]
You can use PCA to learn about the relationship between two values:
In principal component analysis, this relationship is quantified by finding a list of the principal axes in the data, and using those axes to describe the dataset.

2020-02-25

#Python
#Machine-Learning
#Data-Science
#TIL Let’s say we made some predictions with a machine-learning model using scikit-learn.
We want to evaluate how our model performs, and create a confusion matrix:
from sklearn.metrics import confusion_matrix ## make predictions with the scikit-learn model on the test data set y_preds = model.predict(X_test) ## Create confusion matrix on test data and predictions cm = confusion_matrix(y_test, y_preds) cm You’ll get an array like this:
array([[24, 5], [ 4, 28]]) We can visualize it with pandas:

2020-02-22

#Python
#Data-Science
#Machine-Learning
#Lab
#Udemy: Complete Machine Learning and Data Science: Zero to Mastery I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.
This is the fourth part of the blog post series.
part 1 part 2 part 3 part 5 part 6 part 7 part 8 7. NumPy The section covers an introduction into NumPy.
NumPy will covert any data into a series of numbers. NumPy is the backbone of all data-science in Python.

2020-02-11

#Python
#Data-Science
#Machine-Learning
#Lab
#Udemy: Complete Machine Learning and Data Science: Zero to Mastery I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.
This is the third part of the blog post series.
part 1 part 2 part 4 part 5 part 6 part 7 part 8 4. The 2 Paths The class aims to be beginner-friendly. Now you have the choice to learn how to program in Python or to continue with the default route.

2020-02-09

#Python
#Data-Science
#Machine-Learning
#Lab
#Udemy: Complete Machine Learning and Data Science: Zero to Mastery I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.
This is the second part of the blog post series.
part 1 part 3 part 4 part 5 part 6 part 7 part 8 3. Machine Learning and Data Science Framework The course focusses on learning by doing. Instead of learning higher mathematics and over-thinking the process, the instructors show you a framework that encourages a fast feedback loop.

2020-02-06

#Python
#Data-Science
#Machine-Learning
#Lab
#Udemy: Complete Machine Learning and Data Science: Zero to Mastery I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery.
The course runs under the flag of Andrei Neagoie. Andrei is a popular instructor on Udemy, with almost 200.000 students, and top reviews.
For this course, he has paired up with Daniel Bourke, a self-taught Machine Learning Engineer from Australia.
In this blog post series, I will jot down my thoughts on the course, and what I’ve learned.