Today I learned how to reduce feature labels in a data set with Principal Component Analysis.
Principal component analysis is a fast and flexible unsupervised method for dimensionality reduction in data, […]
You can use PCA to learn about the relationship between two values:
In principal component analysis, this relationship is quantified by finding a list of the principal axes in the data, and using those axes to describe the dataset.
Let’s assume we have a pandas DataFrame called
diabetes_df with 10 different columns (features).
We can use scikit-learn’s
PCA estimator to reduce the feature labels from 10 to 2. Then we can try to visualize the data points with matplotlib.
## Reduce dimensionality with PCA from sklearn.decomposition import PCA ## instantiate model with 2 dimensions pca = PCA(2) ## project from 10 to 2 dimensions project_diab = pca.fit_transform(diabetes_df) ## plot plt.scatter(project_diab[:,0], project_diab[:,1], c=diabetes.target, edgecolor='none', alpha=0.5, cmap=plt.get_cmap('Spectral', 10)) plt.xlabel('component 1') plt.ylabel('component 2') plt.colorbar();
For a visual explanation of Principal Component Analysis, I can recommend this site: Principal Component Analysis Explained Visually.