# TIL: How to Reduce Feature Labels With PCA

Today I learned how to reduce feature labels in a data set with **Principal Component Analysis**.

From Python Data Science Handbook:

Principal component analysis is a fast and flexible unsupervised method for dimensionality reduction in data, […]

You can use PCA to learn about the *relationship* between two values:

In principal component analysis, this relationship is quantified by finding a list of the principal axes in the data, and using those axes to describe the dataset.

Let’s assume we have a pandas DataFrame called `diabetes_df`

with 10 different columns (features).

We can use scikit-learn’s `PCA`

estimator to reduce the feature labels from 10 to 2. Then we can try to visualize the data points with matplotlib.

```
## Reduce dimensionality with PCA
from sklearn.decomposition import PCA
## instantiate model with 2 dimensions
pca = PCA(2)
## project from 10 to 2 dimensions
project_diab = pca.fit_transform(diabetes_df)
## plot
plt.scatter(project_diab[:,0], project_diab[:,1],
c=diabetes.target, edgecolor='none', alpha=0.5,
cmap=plt.get_cmap('Spectral', 10))
plt.xlabel('component 1')
plt.ylabel('component 2')
plt.colorbar();
```

For a visual explanation of Principal Component Analysis, I can recommend this site: Principal Component Analysis Explained Visually.