Visualization of FS with MIM, JMI and mRMR

Both MIM, JMI and mRMR are MI-based criteria used for feature selection. In these GIFs, the iterative process of selecting all the features in the MNIST dataset is shown. This dataset is composed of 60000 images of hand-written digits. The images have a resolution of 28x28 pixels, and each one is encoded as a single brightness value in the range [0, 255].

Feature selection

With feature selection, we try to find the subset of features that provide the best information about the class. In this case, we want the pixels that are more important to identify which digit corresponds to the image.

These methods work in an iterative manner, adding one feature to the selected subset in each iteration. To know what is the best feature per-iteration, the methods calculate a score for each feature using a certain criterion, and then the one with the highest score is selected (see Alg. 1).

  Input: dataset D, dataset class C, number of features to select K, number of features in dataset F 
  Output: subset S of selected features 

1 S ← {} 
2 score ← zeros(F) 
3 for k ∈ [0, K-1]: 
4    for f in dataset: 
5        score_i = criterion(f, D, S) //depends on used method 
6    S ← S ∪ {argmax(score)}

Algorithm 1. Generic algorithm for iterative feature selection

Criteria

There are several criteria to apply for score calculation. For this quick-research, I have tested the most basic one, MIM, and also two of the most known and best-performant, JMI and mRMR. These criteria are based in mutual information I to compute the score J of a feature f_i, given a class C and the set of already selected features S.

MIM - Mutual Information Maximisation: $J_{MIM} (f_{i}) = I (f_{i}; C)$
JMI - Joint Mutual Information: $J_{JMI} (f_{i}) = \sum_{f_{s} \in S} I (f_{i} f_{s}; C)$
mRMR - minimum Redundancy Maximum Relevance: $J_{mRMR} (f_{i}) = I (f_{i}; C) - \frac{1}{| S |} \sum_{f_{s} \in S} I (f_{i}; f_{s})$

Results

The animations in Figs. 1.a, 1.b and 1.c show how the selected subset is created when using the criteria MIM, JMI and mRMR, respectively. Each animation is composed by one frame per selected feature (i. e., the n-th frame shows the first n selected features).

Since each feature represents a pixel of the 28x28 image, it is easy to represent them back as an image again. In this case, the color shows the rank in which the features were selected (brighter-yellow colors for the former and darker-purple color for the latter).