# Dimensionality Reduction

*Topics Covered*

- Curse of Dimensionality
- Dimensionality Reduction
- Why Dimensionality Reduction is important
- Techniques to overcome the Curse of Dimensionality

# Curse Of Dimensionality

It refers to phenomena that arise when analyzing and organizing data in high-dimensional spaces (often with hundreds or thousands of dimensions) that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience.

# What is Dimensionality Reduction ?

It is a technique to transform the ** X’s (1 to p)predictors/independent variables into linear combination of predictor with reduced number of transformed Z’s(1 to m) variables, where m < p**. What happens when we don’t reduce the dimension?

**.**

*Curse Of Dimensionality*In machine learning and statistics, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables.

# Why Dimensionality Reduction is important ?

Nowadays, Data comes in all forms video, audio, images, texts etc., with huge number of features. Is it that all features are relevant ?, NO, not all feature are important or relevant. Based on business requirement or redundancy nature of the data captured we have to reduce the feature size through Feature selection and Feature Extraction. These techniques not only reduce computation cost but it also helps in avoiding the misclassification because of highly correlated variable.

# Techniques to overcome the Curse of Dimensionality ?

To overcome the above problem, we do dimensionality reduction. There are number of ways of Dimensionality reduction such as feature selection and Feature Extraction.

** PCA * Missing Value Ratio * Low Variance Filter * Backward Feature Elimination * Forward Feature Construction * High Correlation Filter*

Let’s look at the image shown above. It shows 2 dimensions x1 and x2, which are let us say measurements of an object in Km (x1) and Miles (x2). Now, if you were to use both these dimensions in machine learning, they will convey similar information and introduce a lot of noise in system, so you are better of just using one dimension. Here we have converted the dimension of data from 2D (from x1 and x2) to 1D (PC1), which has made the data relatively easier to explain.

*PCA*

Principal Components Analysis means components which are able to explain the maximum amount of variance of the features with respect to target variable, if we include all feature as components then we get the variance of 1.

PCA transforms all the interrelated variable into uncorrelated variable.Each uncorrelated variable is a Principal Component and each components is a linear combination of original variable.

Each uncorrelated variable or components holds feature information which is explained as variance. Each component with its variance add up to 1. Since each principal component is combination of original variable, some principal components explains more variance than others.

The variance explained by one principal component is uncorrelated with other principal components which means with each component we are learning or explaining a new feature. Now raises a question, how many components will be able to explain the maximum variance?. We don’t have any text book method for calculating the number of components for a given number of feature or variables.But We can maintain a variance threshold which needs to explained by the variance of the components.

C

onsider we have set a threshold variance of 0.8, and if have ten components with a variance as follows 0.3, 0.25, 0.15, 0.1, 0.08, 0.08, 0.07, 0.07. then we can notice 0.3 is a component with maximum variance and is called as First Principal Component. Now since the threshold is kept at 0.8, we can add up components untill it reaches a variance of 0.8.By

adding first 3 components, we have variance explained at 0.7 and by including 4th component we reach a variance of 0.8.So we can including 4 components instead of ten components thus reducing the dimension from 10 to 4.

# Missing Value Ratio:

In a Dataset, We have various columns and each column contains values but if data columns contains missing values then we have think about the feature selection based on Missing value ratio i.e. we can set a threshold for number of Missing value a column may contain and if we have ratio of Missing value greater than the threshold then we can drop the feature.

Higher the threshold, more aggressive the drop in features.

# Low Variance Filter:

It is similar to PCA Conceptually i.e. if a column carries very little information or has variance lower than a threshold value then we can drop feature i.e. variance value acts as Filter for Feature Selection.

Variance is range dependent, so normalization is required before applying this technique.

# Backward Feature Elimination:

In Simple terms, If a model is trained on n-input feature and error rate is calculated, then again if model is trained on n-1 feature and error rate is calculated, now if error rate is increased by small value then the feature is dropped from the dataset.

Backward feature Elimination can be performed iteratively to get better feature.

# Forward Feature Construction:

In this Feature Selection process, we train a model with one feature and calculate the performance measure. We keeping adding feature, one by one and calculate the performance if the performance decreases with increase in Feature, we should drop the feature and if the performance increases with increase in Feature, We iteratively add feature to the model.

# High Correlation Filter:

Here, If the columns present in the dataset are high correlated then the information becomes redundant and we drop these highly redundant variables from features.

We can calculate the *‘correlation coefficient’* between Numerical columns / variables.We can calculate the *‘correlation coefficient’* between Nominal columns / variables.

We can use the *‘pearson product moment coefficient’* between Numerical columns / variables.We can use the *‘pearson Chi squared’* value between Nominal columns / variables.

Before doing correlation operation, Perform normalization on the columns as correlation is scale sensitive.

# Note :

Both Forward Feature Construction and Backward Feature Elimination are computationally expensive tasks.

# Understanding Principal Component Analysis:

Here we’ll try to understand PCA by working on Digit Dataset. Since images have higher dimension, we’ll be loading a built in dataset from *sklearn.datasets*. We make all the import statements respective from loading the dataset to measuring the metrics.

👏 if you learned something !

Get in touch with me on LinkedIn