EEG (electroencephalogram) signals are widely used to analyze consumer response in Neuromarketing. When compared with fMRI (Functional Magnetic Resonance Imaging), EEG cannot detect deep responses and exact positions of where the activity occurred but it can track changes in fractions of seconds. The setup is much cheaper and more easily usable than fMRI.
We are going to build a model that uses EEG signals recorded while the participants were shown various consumer products like shirts, gloves, etc. to learn consumer preference.
Dataset:
The EEG signals of volunteers of varying age and gender were recorded while they browsed through various consumer products. The users also reported their response as ‘Like’ or ‘Dislike’ for each product.
- Participants: 25
- Products: 42 images (14×3) 14 products with 3 varieties
- Feature shape: 1024x512x14
- EEG Channels: 14
- Data points: 1024
- Label Distribution: 580 Likes & 444 Dislike
The EEG signals are recorded using sensors connected to the human scalp. There are multiple sensors each of which captures responses from different parts of the brain. Thus, we have numerous signals available for each product viewed by the participant.
The figure below shows some of the sample EEG graphs of consumer responses to viewing products.
Pre-Processing:
We use the Standard Scaler Algorithm for preprocessing. The mean and standard deviation is obtained from the training data and then the training and test data are scaled down using the training mean & standard deviation.
Models and Results:
The dataset is divided into training & testing with 157 data points in the test size.
We are using confusion matrices to visualize model performance. A confusion matrix represents the True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).
The confusion matrix structure followed:
1. Support Vector Machine (SVM)
Support Vector Machine is a popular algorithm used for regression and classification tasks. We use SVM with a polynomial kernel. We use regularization to improve model accuracy instead of increasing its complexity. We get an accuracy of around 59%. We can find the Confusion matrix and Precision-Recall curve for the SVM model below:
2. KNN
K-Nearest Neighbors is one of the important supervised algorithms one learns when introduced to the concepts of machine learning. It is a non-parametric model and thus does not depend on the behaviour of the dataset (linear or non-linear). We get an accuracy of around 50%. The Confusion matrix for KNN can be found below:
3. Random Forest
We tried experimenting with ensemble learning algorithms too. We implemented the Random Forest algorithm. The results did not improve as expected. As we can see from the confusion matrix below, the correctly predicted classes (TP + TN) are just a total of 50%. This is not a good model performance.
4. Deep Learning
We developed a Deep Neural Network using the Keras library. The architecture consists of:
- 7168 neurons in the input layer
- 5 hidden layers (20148-512-128-32-4), with ReLU as the activation function and 1 output neuron
- We use the Sigmoid activation function in the final layer for binary output
- Loss function used – Binary cross-entropy
We get an accuracy of around 54%. We can find the Confusion matrix below
Fine-Tuning: We chose KNN to improve model performance by fine-tuning using cross-validation techniques. We use K-fold cross-validation with 10 folds. We can see this in the graph below. We achieve the maximum accuracy of around 61%.
Conclusion: The model performances can be seen below:
Sneha Bahl
Data Science Intern