Mastodon hachyterm.io

I’m going through the Udemy course Complete Machine Learning and Data Science: Zero to Mastery and writing down my observations/lecture notes.

This is the third part of the blog post series.

4. The 2 Paths

The class aims to be beginner-friendly. Now you have the choice to learn how to program in Python or to continue with the default route.

The program contains more than 8 hours of video lectures on Python, which I’ll skip.

Still, I think it’s great that the material is available if you need it.

5. Data Science Environment Setup

We’re approaching the practical portion of this massive course. Again, the section start with a story. Your boss wants you to set up your new company laptop.

Daniel introduces us to the necessary tools, like Anaconda, matplotlib, TensorFlow, etc.
He manages to give a newbie-friendly overview of what those tools are (environment, distribution, package manager, libraries, etc.).

There are lectures on how to get started with macOs, Linux, and Windows.

I opted for a Docker and docker-compose setup for my data science project.

Daniel, the instructor, also introduces Jupyter Notebook/Jupyter Lab.
I haven’t worked with JupyterLab before, but it sure looks impressive. It even has native Vim key bindings.

6. pandas: Data Analysis

Now we learn about the library pandas, an open-source data analysis tool.

The instructor, Daniel, introduces you to the main functions of pandas using Jupyter Notebook. The lessons were easy to follow and give you a decent overview.

I would have liked a more detailed explanation of the pandas dataframe data structure.
(You can read about it on the pandas website).

Daniel could have also made it more transparent how Python assignments, mutability, and functions work. Granted, that’s not specific to pandas. But the course is heavily geared towards novice programmers.

The inbuilt visualizations of JupyterLab make sure that the lessons provide a fast feedback loop, thus ensuring a quick win. As a result, the course stays fun, although the teacher introduces a lot of new material.

I also appreciate that there are exercises that will help cement your knowledge.

pandas Lecture Notes

Data Structures

  1. series
  • 1-dimensional labeled array
  • can hold any data type
  • axis labels = called index
  • similar to Numpy’s ndarray: multi-dimensional container of items of the same type and size
  • dict-like
s = pd.Series(data, index=index)
series = pd.Series(["BMW", "Toyota", "Mercedes"])
  1. DataFrame
  • 2-dimensional labeled data structure
  • like: spreadsheet, CSV, SQL table
## series
series = pd.Series(["BMW", "Toyota", "Mercedes"])
colours = pd.Series(["Red", "Green", "Black"])
## create DataFrame from two series
car_data = pd.DataFrame({"Car make": series, "Colour": colours})

See Jupyter Notebook example.

Recap

Section 5 and 6 will pose the first challenge for beginner programmers.
That said, the instructor does a fabulous job of guiding students through the first hurdle. The tone is encouraging.

Learners that already have one or more programming languages under their belt will still find enough value in the lectures on pandas.

The course nails down the “from zero” part.


Go to the other parts of the series: