Mastodon hachyterm.io

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.

This is the second part of the blog post series.

3. Machine Learning and Data Science Framework

The course focusses on learning by doing. Instead of learning higher mathematics and over-thinking the process, the instructors show you a framework that encourages a fast feedback loop.

The idea is to have a “field guide” for data modeling.

Framework

  1. Problem Definition (“What problem are we trying to solve?”)
  2. Data (“What kinds of data do we have?”)
  3. Evaluation (“What defines success for us?”)
  4. Feature (“What do we already know about the data?”)
  5. Modelling (“Based on our problem and data, what model should we use?”)
  6. Experimentation (“How could we improve/what can we try next?”)

1. Problem Definition

  • When shouldn’t we use machine learning? Can we hand-code the instructions?

Main types of machine learning:

  • supervised learning
  • unsupervised learning
  • transfer learning
  • reinforcement learning

Supervised learning has two main categories: classification and regression. In both cases you know the inputs and outputs (Example: inputs = patients records, outputs = which patients have a disease).

Unsupervised learning is about clustering. It has no labels. You try to find patterns in the data and derive labels from the existing data.

Transfer learning is about transferring one machine learning model to a different domain.

Reinforcement learning is about repeating a task in a problem space, and rewarding or punishing a certain outcome. Classic example: teaching a computer how to play chess.

2. Data

Structured data vs. unstructured data.

  • Structured: tables (rows, columns) - CSV
  • Unstructured: images, audio files
  • static data: values don’t change over time (e.g., patient data)
  • streaming data: data updates constantly (e.g., news headlines)

3. Evaluation

What metrics? accuracy, precision, recall

4. Features

Use Feature variables (input data variables) to predict a target variable.

Feature variables can be numerical or categorical.

Ideal: 100% feature coverage (“complete” sample data).

5. Modelling

3 Steps
  1. Choosing and training a model
  2. Tuning a model
  3. Model comparison

Split your input into three different data sets: training, validation and test.

Some models work better than others on different problems.

Try to minimize feedback time in your experimentation/training. Start with small datasets, build up.

Model comparison = “How will the model perform in the real world?“
This step tests your model on data that it hasn’t seen yet. The model should be able to generalize.

Keep the test set separate at all costs.

6. Experimentation

The framework is an iterative process.

Recap

The third section of the course is the last theoretical portion of the course.

I feel like I have a basic understanding of the required concepts. The course is engaging and easy to understand.

If you want to know more about the 6-Step-Framework, take a look at the instructor’s repository.


Go to the other parts of the series: