Feature Selection to Optimise Human Activity Recognition Model

Optimising Training Time and Reducing Computational Demands

Setting the Scene

Picture somebody performing 20 star jumps, followed by 20 burpees, followed by 20 sit ups.

Next, imagine these same exercises – however this time they are performed atop a multi-touch pressure sensing mat.

The pressure mat resembles a yoga mat, with 512 embedded pressure sensors aligned in a 32 x 16 matrix. Each sensor records a pressure reading (Kg/cm^2) against their timestamp, with a 10-20ms dynamic response time.

Every second has the potential to produce thousands of datapoints.

If we were to plot the sensor readings against time, we would expect:

  • The 20 star jumps —> Rapid changes in total pressure from zero to x as the subject jumps and lands on the mat. Only a small subset of the pressure sensors would ever be triggered, concentrated around the feet.
  • The 20 burpees —> Similar to the above, we would see total pressure vary from zero to x. However, a far wider range of pressure sensors across the mat will be triggered.
  • The 20 sit ups —> Total pressure should never equal zero, as the subject is continually sat on the mat. We would see intermittent increases and decreases as the subject’s back touches and lifts from the ground.

The Dataset

MEx is a research level, multi-model experimental dataset that was recommended by a prior data science manager of mine to analyse.

The experiment saw 30 individuals perform seven different dynamic physiotherapy exercises on a pressure sensing mat.

Objective

The aim of my analysis is to demonstrate a beneficial application of feature selection when designing a machine learning (ML) model. The ML model will ultimately be used to predict which physiotherapy exercise is being performed, according to sensor readings. The feature selection aspect is to test my hypothesis – there are pressure sensors within the matrix which offer little to no useful information and therefore can be excluded from the model input. All while still accurately predicting the exercise being performed.

Benefits of feature selection include:

  • Reduced training times
  • Reduced computational requirements
  • Remove irrelevant features, for example those pressure points which have never picked up a signal
  • Reduce overfitting, separating the signal from the noise

Data Frame Pre-Processing

I began by reading in a few examples of the 210 .csv files to observe the data’s structure and volume. I conducted typical data quality tests and sense checks – including checking for nulls and appropriate data types.

Next, I concatenated all the .csv files into one master data frame. I engineered the following additional features to aid analysis:

  • Subject –> The individual performing the exercise. This information was originally found in the folder structure.
  • Exercise –> The exercise number (1-7) performed by the subject. This information was originally in the .csv file name.
  • a_Time –> The actual time and date these readings were recorded.
  • r_Time –> The relative time these readings were recorded.
  • Total_p –> The total pressure on the mat at that point in time (sum of the 512 sensors).

I also renamed columns 1-512 into 1_1 to 32_16, to represent the X-Y coordinates of the sensor on the mat. The new data frame was checked for data quality and saved locally to aid future processing.

Original data structure of one .csv (a single subject performing one exercise), showing a timestamp plus one dimension per pressure sensor.
Original data structure of one .csv (a single subject performing one exercise), showing a timestamp plus one dimension per pressure sensor.

Code to build data frame that maintains original folder structure as new features

New data frame to be used for analysis, containing information from all 210 .csv files plus additional features.
New data frame to be used for analysis, containing information from all 210 .csv files plus additional features.

Exploratory Data Analysis (EDA)

I began by picking a few timestamps and visualising their respective row of data. A heat map proves the perfect visualisation for representing pressure on a mat. This is how the pressure mat’s native software visualises pressure readings.

Code to visualise heatmat

Visualising row number 1520 - subject appears on their hands and knees.
Visualising row number 1520 – subject appears on their hands and knees.
Row number 185115 - subject appears to be standing.
Row number 185115 – subject appears to be standing.
Row number 137022, appearing to be a jump.
Row number 137022, appearing to be a jump.
Row number 84260 - subject appears on their side (feet, hip and hand).
Row number 84260 – subject appears on their side (feet, hip and hand).

Next, some distribution graphs to understand the wider dataset:

Distribution of sensor readings by subject.
Distribution of sensor readings by subject.
Distribution of sensor readings by exercise.
Distribution of sensor readings by exercise.
Distribution of total pressure for a given timestamp.
Distribution of total pressure for a given timestamp.

The most common total pressure for a given timestamp is zero i.e. when there is no subject on the mat (including jumps).

Because the exercises are dynamic over time, a still snapshot is not enough information for a model to classify the exercise by. Therefore I wanted to deduce whether each exercise had its own distinct pattern when plotted against time.

Plotting subplots for each exercise, showing the total pressure recorded against time.

Code to visualise total exercise pressure against time

Total pressure of all sensors against time.
Total pressure of all sensors against time.

I was pleased to find a few of the exercises had distinctly repetitive patterns.

Feature Selection

I explored two methods of feature selection.

The first method was to manually calculate the approximate entropy (ApEn) of each feature. ApEn is a technique that measures the regularity and predictability of each dimension, which should in theory aid the selection of the most ‘useful’ sensors for a ML model.

Code to calculate ApEn

I then compared ApEn results against an out-the-box feature selection module, the mutual information (MI) regression from sci-kit learn. This method is an example of univariate feature selection, whereby the best features are selected based on univariate statistical tests. We compare each feature (pressure reading of a specific sensor) to the target variable (total pressure), to see whether there is any statistically significant relationship between them. It is also called analysis of variance (ANOVA), commonly used in Biological applications, where I first came across this methodology.

Code to calculate MI

Approximate Entropy (ApEn) distribution.
Approximate Entropy (ApEn) distribution.
Mutual Information (MI) regression distribution.
Mutual Information (MI) regression distribution.

Over half of all features have an ApEn value of zero or near zero, meaning they are so regular that they are predictable all the time and offer no meaningful information.

Likewise, over half of all features show an MI of zero, indicating the features are independent of, and therefore have no impact on, the total pressure. A higher MI value indicates a higher dependency.

Code to apply feature selection to the dataset

I applied the feature selection in a manner that creates 13 different datasets with a volume reduction of 30% each time. This way I can compare the ML performance of a model using all 512 features, against a model using only 358 features, then 251 features and so on. Each time removing the features deemed least useful.

Building ML Model and Results

How does feature selection within a large dataset affect the performance of a ML model to predict physical human activity?

In this dataset, the type of exercise being performed by the subject is our target to be predicted. Because this is a discrete data type, it forms an interesting classification problem.

The K-Nearest Neighbors classifier (KNN) was used in this analysis – it is a simple supervised machine learning algorithm which utilises observations within known classes (labeled data in our training set which tells us what pressure measurements are typical of which exercises). It then compares these against observations (pressure readings) with an unknown class – matching them according to proximity.

Code to build and run KNN model

The true intention of this analysis is to prove feature selection can improve the performance of a model, with additional computational benefits as a by-product. More information does not necessarily produce a more accurate model, due to noise and over-fitting.

The final step is to measure the levels of classification precision after reducing the sensor volume by 30% in a stepwise manner.

Code to plot KNN performance against dataset volume

In red, plotting the macro average precision of the K-Nearest Neighbors (KNN) classification model against the number of columns used to train the model. In blue, plotting the time in seconds taken for the model to perform a classification prediction.
In red, plotting the macro average precision of the K-Nearest Neighbors (KNN) classification model against the number of columns used to train the model. In blue, plotting the time in seconds taken for the model to perform a classification prediction.

Ta-da! There is an immediate increase in the macro average precision of the model upon removing 30% of features from the dataset. In addition to this, the time taken to make a prediction has decreased.

All code for this analysis can be found on my GitHub repo here.

0 thoughts on “Feature Selection to Optimise Human Activity Recognition Model

Leave a Reply

Your email address will not be published. Required fields are marked *