Deep Siamese Neural Networks for Facial Expression Recognition in the Wild
General Material Designation
[Thesis]
First Statement of Responsibility
Hayale, Wassan
Subsequent Statement of Responsibility
Mahoor, Mohammad H.
.PUBLICATION, DISTRIBUTION, ETC
Name of Publisher, Distributor, etc.
University of Denver
Date of Publication, Distribution, etc.
2020
GENERAL NOTES
Text of Note
110 p.
DISSERTATION (THESIS) NOTE
Dissertation or thesis details and type of degree
Ph.D.
Body granting the degree
University of Denver
Text preceding or following the note
2020
SUMMARY OR ABSTRACT
Text of Note
The variation of facial images in the wild conditions due to head pose, face illumination, and occlusion can significantly affect the Facial Expression Recognition (FER) performance. Moreover, between subject variation introduced by age, gender, ethnic backgrounds, and identity can also influence the FER performance. This Ph.D. dissertation presents a novel algorithm for end-to-end facial expression recognition, valence and arousal estimation, and visual object matching based on deep Siamese Neural Networks to handle the extreme variation that exists in a facial dataset. In our main Siamese Neural Networks for facial expression recognition, the first network represents the classification framework, where we aim to achieve multi-class classification. The second network represents the verification framework, where we use pairwise similarity labels to map images to a feature space where similar inputs are close to each other, and dissimilar inputs are far from each other. Using Siamese architecture enabling us to obtain powerful discriminative features by taking full advantage of the training batches via our pairing strategy, and by dynamically transferring the learning from a local-adaptive verification space into a classification embedding space. These steps enable the algorithm to learn the state of the art features by optimizing the joint identification-verification embedding space. The verification model reduces the intra-class variation by minimizing the distance between the extracted features from the same identity using different strategies. In contrast, the identification model increases the inter-class variation by maximizing the distance between the features extracted from different classes. When a network is tuned carefully, we can rely on the powerful discriminative features to generalize the power of the network to unseen images. Further, we applied our proposed deep Siamese networks on two different challenging tasks in computer vision, valence and arousal estimation and visual object matching. The empirical results of the valence and arousal Siamese model demonstrate that transferring the learning from the classification space to the regression space enhances the regression task since each expression occupies a representation within a specified range of valence and arousal affect. On the other hand, Siamese model of visual object matching gives a better model performance since the classification framework helps to increase the inter-class variation in the verification framework. We evaluated the algorithm using state-of-the-art and challenging datasets such as AffectNet Mollahosseini et al. (2017), FERA2013 Goodfellow et al. (2013), categorical EmotioNet Du et al. (2014), and Cifar-100 Krizhevsky et al. (2009). To the best of our knowledge, this technique is the first to create a powerful recognition system by taking advantage of the features learned from different objective frameworks. We achieved comparable results with other deep learning models.