2020-03-28

#TIL
#Python
#Data-Science
#Machine-Learning Streamlit allows you to write Markdown within a Python file (.py):
import streamlit as st st.title("Otto Group Product Classification Challenge 🛍") st.markdown("## 1. Problem Statement") st.markdown( "Given a

dataset with

**93 features**, create a

**predictive model** which is able to

**distinguish between the main product categories**." ) st.markdown("### 1.2 Evaluation") st.markdown( "The evaluation for the competition is

**multi-class logarithm loss**. See

Kaggle: Evaluation." ) I like that I can write Markdown, but the syntax is cumbersome.

2020-03-25

#Python
#Docker
#Machine-Learning
#Data-Science Create a Docker container that runs your machine learning models as a web application
This article will explain the advantages of Streamlit and how to build a Streamlit application with Docker.
Why Streamlit? You’ve explored your data and developed a machine learning model. It’s now time to release it to the world so that others can see what you’ve built.
Now what?
Deploying machine learning models is not trivial.

2020-03-23

#Python
#Data-Science
#Machine-Learning I finished the Complete Machine Learning and Data Science: Zero to Mastery this weekend (and wrote about it).
The course has given me the foundations of working with data in Python.
Practice makes perfect.
My goal is to sharpen my skills by exploring a Kaggle dataset, building a model and deploying it with Streamlit using Docker and Heroku.
The project will be on GitHub where I will post all the links, my thoughts and observations.

2020-03-22

#Python
#Data-Science
#Machine-Learning
#Lab
#Udemy: Complete Machine Learning and Data Science: Zero to Mastery I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.
This is the eigth part of the blog post series.
part 1 part 2 part 3 part 4 part 5 part 6 part 7 TL;DR (A Review of The Complete Course) The program is a praise-worthy introduction to data science and machine learning with Python. The instructors focus on practical skills and convey an enormous topic in a captivating and friendly way.

2020-03-21

#Python
#Data-Science
#Machine-Learning
#Lab
#Udemy: Complete Machine Learning and Data Science: Zero to Mastery I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.
This is the seventh part of the blog post series.
part 1 part 2 part 3 part 4 part 5 part 6 part 8 13. Data Engineering These lectures cover what kind of data we have (structured data, unstructured data, etc.).
How can we make the raw data consumable for machine learning libraries?

2020-03-16

#Python
#Machine-Learning
#Data-Science Create a pipeline to score different machine learning models with scikit-learn
After the initial data exploration I would like to get a quick gauge on what model would be best for the problem at hand.
A rough estimate helps in narrowing which machine-learning model to use and tune later. It helps to get a sense on how effective perspective algorithms will be.
The goal is to get a big picture overview.

2020-03-15

#Machine-Learning
#Data-Science
#Python
#DevTools Google Colab is a free online coding environment that offers GPU acceleration for your data science and machine learning needs.
It runs on top of Jupyter Notebooks. That means that the interface is familiar to most data scientists that use Python.
If your local machine is too slow for some of the more intensive computations you need for machine learning, Colab can help you out.
When you use the remote runtime with the free GPU, the runtime disconnects after a while.

2020-03-14

#Python
#Machine-Learning
#Data-Science You want to score a list of models with cross-validation with customized scoring methods.
Why Cross-validation? A common approach to machine learning is to split your data into three different sets: a training set, a test set, and a validation set.
Learning the parameters of a prediction function and testing it on the same data is a methodological mistake: a model that would just repeat the labels of the samples that it has just seen would have a perfect score but would fail to predict anything useful on yet-unseen data.

2020-03-12

#Python
#Data-Science
#Machine-Learning
#DevTools Google Colab is an online tool that allows you to run Python notebooks with free GPU acceleration.
Why is that useful?
Some machine learning models take a long time to compute and your local machine might not be able to run them.
The Colab notebooks are similar to Jupyter Notebooks, but they use the Google Drive environment.
You can always upload your dataset to Google Drive and connect your Drive to Colab.

2020-03-11

#Python
#Data-Science
#Machine-Learning
#Lab
#Udemy: Complete Machine Learning and Data Science: Zero to Mastery I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.
This is the sixth part of the blog post series.
part 1 part 2 part 3 part 4 part 5 part 7 part 8 10. Milestone Project 1 In this project we work through a dataset from start to finish. We use supervised machine learning to gain insight into a classification problem.

2020-03-09

#TIL
#Python
#Data-Science
#Machine-Learning As I’m working through different data sets in my machine learning journey, it gets more obvious that you have to know about feature engineering.
Feature engineering is an umbrella term for transforming your input data.
A machine learning model can only be as efficient as the data you feed it.
Features are the different input qualities you give the model.
Often, these features are in the wrong format, or they are missing.

2020-03-07

#TIL
#Python
#Data-Science
#Machine-Learning Today I learned about logistic regression.
Logistic Regression is a statistical model that we can use for classification problems in machine learning.
You can easily confuse the term with linear regression.
With linear regression, you model the probability of a quantitative value, for example a price.
With logistic regression you can predict categories: yes/no, pass/fail, etc.
Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

2020-03-05

#Python
#Data-Science
#Machine-Learning
I’ve recently begun learning about data science and machine learning.
Here are some resources that I found (sorted alphabetically):
A Lightning-Fast Introduction To Deep Learning And Tensorflow 2.0 An Introduction to Statistical Learning Awesome Data Science Awesome Machine Learning and AI Courses Awesome Machine Learning Chris Albon’s Notes Coursera Mathematics for Machine Learning Daniel Bourke’s Resources and his AI Masters Degree Data Science from Scratch: First Principles with Python Deploying and Hosting a Machine Learning Model with FastAPI and Heroku End to End Machine Learning Tutorial — From Data Collection to Deployment 🚀 Full Stack Deep Learning: how to deploy models Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems Khan Academy: Math Learning Math for Machine Learning MIT Deep Learning Book ML from the Fundamentals Machine Learning Algorithms from Scratch in Python Machine Learning Engineering book Machine Learning Glossary Machine Learning Mastery: Start Here Machine Learning from Scratch: covers the building blocks of the most common methods in machine learning Mathematics For Machine Learning Mathematics for the adventurous self-learner Neural Networks and Deep Learning Putting ML in Production: a guide and code-driven case study on MLOps for software engineers, data scientists and product managers Python Data Science Handbook Resources for learning numpy, pandas, etc.

2020-03-04

#Python
#Data-Science
#Machine-Learning
#Lab
#Udemy: Complete Machine Learning and Data Science: Zero to Mastery I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.
This is the fifth part of the blog post series.
part 1 part 2 part 3 part 4 part 6 part 7 part 8 9. Scikit-Learn Up until now, we’ve learned how to consume data and make fancy diagrams.
The current section finally deals with Machine Learning and teaches you the basics of Scikit-learn.

2020-03-02

#TIL
#Python
#Data-Science
#Machine-Learning Today I learned how to reduce feature labels in a data set with Principal Component Analysis.
From Python Data Science Handbook:
Principal component analysis is a fast and flexible unsupervised method for dimensionality reduction in data, […]
You can use PCA to learn about the relationship between two values:
In principal component analysis, this relationship is quantified by finding a list of the principal axes in the data, and using those axes to describe the dataset.

2020-03-01

#Python
#Data-Science
#TIL Today I learned that if you slice a list in Python, the program returns a copy of the list.
But NumPy returns a view, not a copy. That means, that slicing a NumPy array will modify it:
This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.

2020-02-25

#Python
#Machine-Learning
#Data-Science
#TIL Let’s say we made some predictions with a machine-learning model using scikit-learn.
We want to evaluate how our model performs, and create a confusion matrix:
from sklearn.metrics import confusion_matrix ## make predictions with the scikit-learn model on the test data set y_preds = model.predict(X_test) ## Create confusion matrix on test data and predictions cm = confusion_matrix(y_test, y_preds) cm You’ll get an array like this:
array([[24, 5], [ 4, 28]]) We can visualize it with pandas:

2020-02-23

#Python
#Data-Science If you want to convert a CSV file into Pandas, you can use [pandas.read_csv][readcsv].
The function takes several options. One of them is sep (default value is ,).
You can use a regular expression to customize the delimiter.
Let’s say your data looks like this:
vhigh,high,2,2,more,small med,vhigh,3,more,big … You want to load that data into a Pandas DataFrame. You can split each line on the comma, but you want to ignore the comma inside floating point numbers like 2.

2020-02-22

#Python
#Data-Science
#Machine-Learning
#Lab
#Udemy: Complete Machine Learning and Data Science: Zero to Mastery I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.
This is the fourth part of the blog post series.
part 1 part 2 part 3 part 5 part 6 part 7 part 8 7. NumPy The section covers an introduction into NumPy.
NumPy will covert any data into a series of numbers. NumPy is the backbone of all data-science in Python.

2020-02-11

#Python
#Data-Science
#Machine-Learning
#Lab
#Udemy: Complete Machine Learning and Data Science: Zero to Mastery I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.
This is the third part of the blog post series.
part 1 part 2 part 4 part 5 part 6 part 7 part 8 4. The 2 Paths The class aims to be beginner-friendly. Now you have the choice to learn how to program in Python or to continue with the default route.

2020-02-09

#Python
#Data-Science
#Machine-Learning
#Lab
#Udemy: Complete Machine Learning and Data Science: Zero to Mastery I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.
This is the second part of the blog post series.
part 1 part 3 part 4 part 5 part 6 part 7 part 8 3. Machine Learning and Data Science Framework The course focusses on learning by doing. Instead of learning higher mathematics and over-thinking the process, the instructors show you a framework that encourages a fast feedback loop.

2020-02-06

#Python
#Data-Science
#Machine-Learning
#Lab
#Udemy: Complete Machine Learning and Data Science: Zero to Mastery I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery.
The course runs under the flag of Andrei Neagoie. Andrei is a popular instructor on Udemy, with almost 200.000 students, and top reviews.
For this course, he has paired up with Daniel Bourke, a self-taught Machine Learning Engineer from Australia.
In this blog post series, I will jot down my thoughts on the course, and what I’ve learned.