FACIAL EMOTION DETECTION THROUGH DEEP COVOLUTIONAL NEURAL NETWORKS

. Our society has evolved to a threshold where use of machines to automate mundane tasks is constantly increasing in daily life. Providing machines with capability to develop perception from their environment can lead them to perform a great variety of tasks. Facial emotion detection is crucial sub-part of machine perception development. In this article we present a deep learning based approach for Facial emotion Detection. Our model uses Convolutional Neural Network (CNN) to learn deep features for classification of facial images into one of 22 emotion (Basic 7 + Compound 15) categories considered in this study. We trained our CNN model with the images dataset from Martinez et al. Our Facial Emotion Detection model was developed using keras with theano backend and implemented on a GPU-powered testbed. Our model achieved 67.6% accuracy for basic emotions and 33% accuracy for compound emotions.

1. Introduction: In information age, Computing is expected to help in every walk of life whether it is technical or social.For greater social acceptance of machines, the interaction between human and machine has to become smoother and more natural as machine's exposure with humans increase.To achieve this capacity, machines must be able to learn from their surroundings and especially from human interactions.When term machine is used it comprises of computers and robots.Humans use their senses to learn from environment therefore machine perception tries to mimic the human senses.One of the important senses which need to be mimicked is vision whereby machines can process visual stimulus to learn from their surroundings.A granular discipline of machine vision is facial emotion detection.Emotion detection is process of making machines capable of recognizing the mood and future perception of the individual from visual stimulus.The human face has several emotions in which 7 are basic emotions and 15 are compound emotions.The basic emotions are sadness, fear, surprise, anger, and disgust, happiness and the last one is the neutral and the compound emotions are those which are made up of the combination of the basic emotions.It is cleared by the example that a person see something unexpected then he gets surprised and when he faces some scared environment or some unpredictable than he get fearful so when these both things happen together than a new emotion comes into existence that is fearfully surprised this emotion is called the compound emotion because it is made up of the two basic emotions.Emotion detection and recognition is an emerging area of machine vision for last few years.Nowadays machines can capture visual stimulus in the form of pictures and videos using cameras.Hence this information can be used with machine learning algorithms to generate effective machine perception and computer vision.Effective computing requires that emotion detection is necessary for machines to better serve their purpose.For example use of the robots for house-hold, hospitals or elderly care requires emotional understanding of humans.Facial information delivers information about innerstate of a human.

Capturing this information in form of images and applying deep learning algorithms
Deep Learning is an area of Machine Learning which applies neuron like mathematical structures [1] for learning tasks.Neural Networks have been around for many decades [2] and have been gaining and losing the favor of research community.The latest rise of this technology is attributed to Alexnet [3] a Deep Neural Network, which won the ImageNet classification challenge.Alexnet achieved top-1 and top-5 error rates of 37.5% and 17.0% on ImageNet [4] Dataset which were considerably better than the previous state-of-the-art mechanisms.Since then, Deep Neural Networks (DNNs) have attracted the attention of research community once again and multiple DNN structures including Convolutional neural networks (CNNs) [5] Recurrent Neural networks (LSTM) [6], Deep belief nets (DBNs) and different types of Autoencoders have been proposed.These DNN structures have been successfully applied to devise state of art solutions in multiple disciplines.
In this paper we developed and implemented a deep CNN model to address the problem of emotion detection.Fine tuning and hyper-parameter optimization of proposed DCNN was performed using RandomizedSearch [7] over configuration space.Classification metrics including confusion matrix, accuracy, precision, recall and F1 scores were used to assess the performance of model.

I.
Table 1  For this study, we used dataset devised by Martinez et al [8] Dataset consists of 5060 images of 1000x750 pixels.These are colored images, which contains 07 basic emotions and 15 compound emotions as shown in table.

METHODOLOGY:
Preprocessing: Complete Dataset was processed in grayscale and resized in images of 64x64 pixels.Raw pixel data was read in numpy arrays to allow further processing by model.After reading the raw pixel data, we normalized them by subtracting mean of the images from each image.For both Basic emotions and compound emotions, Data was divided into training and testing partitions.In order to classify the expressions, mainly we used the features generated by convolution layers using the raw pixel data give them as input features into Fully Connected (FC) layers.Model Architecture: Proposed IDS approach uses a DCNN with an input layer, 3 pairs of conv-subsampledropout layers, 2 fully connected layers and an output layer with 7 unit and softmax non-linearity.The input plane receives preprocessed training dataset images in the form of 64x64 greyscale images.With local receptive fields, earlier layer neurons can extract elementary features which are combined by subsequent CNN layers to form higher-order features.These higher order features are passed to fully connected multilevel perceptron layers for classification.Each layer consists of trainable parameters and nodes as described in Table x Software toolchain used to implement the model consist of jupyter development environment using Keras 2.0 on Theano [9] backend and nVidia cuda 8.0 [10] Training and testing data is manipulated in form of numpy arrays.Model for Basic Emotions: Model for Basic emotions classification was built using CNN defined in table 2. We had 10 3×3 filters, with the stride of size 1, along with dropout and Maxpooling and Rectified Linear Unit (ReLU) as the activation function in the first convolutional layer while in second convolution layer we had 10 5×5 filters, with the stride of size 1, along max-pooling with a filter size of 2×2 and Rectified Linear Unit (ReLU) as the activation function.Third and fourth convolution layers used respectively 10 and 64 feature maps along with kernel size of 3, with the stride of size 1, max-pooling with a filter size 2×2 and Rectified Linear Unit (ReLU) as the activation function.In the FC layer, we placed a hidden layer with 512 neurons and Rectified Linear Unit (ReLU) as the activation function and Softmax as the loss function.Model was trained using training Dataset of Basic Emotions with 60 epochs and a batch size of 32 and cross-validated using hyper-parameters of the model with different values for regularization and the number of hidden neurons.To validate our model in each iteration, we used the validation set and to evaluate the performance of the model, we used the test set.We achieved 67% accuracy from this DCNN, its and great progress of 22%.Analysis of Compound Emotions: Model for compound Emotions was built using a CNN of 4 convolutional layers of depths(10x7x7,32x5x5,64x3x3 & 128x3x3) along with Max pooling of 2x2 and dropout and Relu as activation function, we used 128 neurons for fully connected layer, the results we achieved are quite good as these are compound emotions and they take quite long time to train and greater the training samples,greater will be the results, the results we achieved from this analysis is 33 % .

Evaluations and Results:
This section presents the results of implemented models.We used well-known classification quality metrics including Accuracy, Precison, Recall and F1 score to present the performncae of models.These evaluation metrics are calculated using confusion matrix which presents four measures as follows:  True Positive: if an anomaly is classified by model as an anomaly, result is accepted as TP  False Positive: if a normal instance is classified by model as an anomaly, result is accepted as FP  True Negative : if an anomaly is classified by model as normal instance, result is accepted as TN  False Negative: if a normal instance is classified by model as normal instance, result is accepted as FN Accuracy is defined as ratio of the number of correctly classified anomalous and normal instances to total number of all instances.Table 3 shows the Precision, Recall and F1 score for all seven emotion classes in basic emotions.It can be seen from the table that Basic Emotions Model predicted 4 emotions i.e happy, sad, fearful and surprised with maximal precision.For Recall the best score was achieved for basic emotion "happy".

Figure 1 :Figure 2
Figure 1: Accuracy Graph of Basic emotion Model

Figure 3 :
Figure 3: Confusion Matrix Heatmap Visualization of Basic Emotions Model

Figure 4 Figure 5 :
Figure 4 Accuracy Graph of Compound emotions Model Fig 3 shows the heatmap visualization of confusion matrix for basic emotions model.

Fig 4
Fig 4 shows the progression of training and validation accuracy of Compound Emotions Model with respect to training epochs.Fig 5 shows the training and validation loss with respect to training epochs.

Figure 6 :
Figure 6: Confusion Matrix Heatmap Visualization of Basic Emotions Model Table4shows the Precision, Recall and F1 score for all fifteen emotion classes in compound emotions.It can be seen from the table that Compound Emotions Model predicted disgustedly surprised with maximal precision followed by Fearfully Disgusted.The best recall score was for Happily surprised and happily disgusted.Best F1measure also turned out to be for Happily surprised Fig3shows the heatmap visualization of confusion matrix for compound emotions model.Conclusions: Emotion detection is helpful for many applications like driver monitoring, surveillance systems, pain detection as well as in other medical fields and suspicious person detection.In this article we used Deep convolution neural networks for the problem of Human Facial Emotion Detection.Stack of Convolution layers of the models extracted higher-order features which were used by fully connected portion of CNN to perform classification using Softmax classifier.The Deep features extracted using convolution layers can be used in different conventional machine learning algorithms for solving problems like surveillance system, Safety driving, suspicious person Detection and pain Detector.By taking large date set of images we can enhance the percentage accuracy of basic and compound emotions.
Categories of Basic and Compound Emotions available in selected Dataset . We use LeNet-5 nomenclature to name layers of DCNN for description purpose where convolution layers are labeled as Cx, subsampling layers as Sx, dropout layers as Dx and Fully connected layers FCx.

Table 2 :
Architecture of Convolutional Neural Network used to develop Basic and Compound Emotion Detection Models