Expression recognition can be done with a single sense organ, or it can be done with a combination of multiple senses. It is the result of a combination of overall recognition and feature recognition. Specifically, the recognition of people at a distance is mainly based on overall recognition, while in close-range facial expression recognition, the recognition of characteristic components is more important. In addition, the various parts of the human face have different contributions to recognition. For example, the eyes and mouth are more important than the nose. According to research on the human brain, although there is a connection between facial expression recognition and face recognition, generally speaking, it is a separate and parallel processing process.
With the continuous improvement of face computer processing technology (including face detection and face recognition), it becomes possible to use computers to analyze facial expressions. Generally speaking, expression analysis is a very difficult research direction, which is mainly reflected in the accuracy and effectiveness of expression feature extraction. Especially the latter, because there is not much difference in the movement of each feature point of the body with various expressions. For example, opening the mouth does not mean laughing, but it may also be crying and surprised.
The current main application areas of facial expression recognition technology include human-computer interaction, security, robot manufacturing, medical, communications, and automotive fields.
In 1971, the research of psychologists Ekman and Friesen first proposed that human beings have six main emotions, and each emotion reflects a unique psychological activity of a person with a unique expression. These six emotions are called basic emotions and are composed of anger, happiness, sadness, surprise, disgust and fear.
Some of the methods mentioned below have evolved from face recognition and used in combination with the characteristics of facial expression recognition.
At present, the main recognition features used are: grayscale feature, motion feature and frequency feature. The gray-scale feature is processed from the gray-scale value of the expression image, and different expressions have different gray-scale values to obtain the basis for recognition. In this case, the image is required to be fully preprocessed for factors such as illumination and angle, so that the obtained gray value is normalized. The motion feature uses the motion information of the main expression points of the face under different expression situations for recognition. The frequency domain feature mainly utilizes the difference of expression images under different frequency decompositions, and fast speed is its distinguishing feature.
In terms of specific facial expression recognition methods, there are three main directions: overall recognition method and partial recognition method, deformation extraction method and motion extraction method, geometric feature method and facial feature method.
In the overall recognition method, whether it is from the deformation of the face or the movement of the face, the facial expressions are analyzed as a whole to find the image differences under various expressions. Typical methods include: Principal Component Analysis (PCA) based on eigenfaces, Independent Component Analysis (ICA), Fisher's Linear Discriminants (FLD), and local feature analysis ( Local Feature Analysis (LFA), Fisher Actions (Fisher Actions), Hidden Markov Model (HMM) and cluster analysis.
The local recognition method is to separate the various parts of the face during recognition, which means that the importance of each part is different. For example, in facial expression recognition, the most typical parts are eyes, mouth, eyebrows, etc. The different movements in these places represent rich facial expressions. In comparison, the movement of the nose is less, so that the nose can be analyzed as little as possible during recognition, which can speed up and improve accuracy. The most typical methods are the Facial Actions Code System (FACS) and the facial motion parameter method in MPEG-4. Others include local principal component analysis (Local PCA), Gabor wavelet and neural network methods. The facial motion coding system FACS (Facial Action Coding System) defines the basic deformation unit AU (Action Unit) according to the type of facial muscles and movement characteristics. Various facial expressions of the face can finally be decomposed and corresponded to each AU to analyze the expression Feature information is to analyze the changes in facial AU,
FACS has two main weaknesses: 1. The motion unit is a purely localized spatial template; 2. There is no time description information, just a heuristic information
The deformation extraction method is based on the deformation of various parts of the face when expressing various expressions. The main methods are: Principal Component Analysis (PCA), Gabor Wavelet, Active Shape Model (ASM) [ 7] and Point Distribution Model (PDM) method.
The movement method is based on the principle that certain characteristic parts of the face will make corresponding movements when expressing various specific expressions. In the 6 basic tables mentioned above, the movement direction or trend of some fixed feature points (or parts) on the face is fixed. For example, when a person is in fear, the opening of the eyes should be greater than normal. Big, mouth is generally open, etc. See Table 1 for details. Typical recognition methods are: Optical Flow  and Face Animation Parameter FAP in MPEG-4.
The geometric feature method extracts a feature vector based on the shape and position of each part of the human face (including the mouth, eyes, eyebrows, and nose). This feature vector represents the geometric features of the human face. According to the difference of this feature vector, different expressions can be recognized. The important method is: principal component analysis based on motion unit (AU). In the facial feature method, the overall face or partial face is filtered through an image to obtain a feature vector. The commonly used filter is Gabor wavelet.
Of course, these three development directions are not strictly independent, they only extract the required expression features from different sides, and they all only provide a way to analyze expressions, interconnect and influence each other. There are many ways to fall between the two or even the three. For example, the facial motion coding system method is a kind of local method, and it is also considered from the facial motion and so on.
Process and method of facial expression recognition
1. Establishment of emoticon library
At present, the most commonly used expression libraries in the research mainly include:
Cohn-Kanade AU-Coded Facial Expression Image Database (CKACFEID for short) jointly established by the CMU Robotics Institute and the Department of Psychology in the United States;
The Japanese Female Facial Expression Database (JAFFE) established by the Japanese ATR, which is an important test library for studying Asian facial expressions
The fer2013 face dataset, which can be downloaded from the kaggle website
More libraries > Reference link
2. Expression recognition:
(1) Image acquisition: Acquire still images or dynamic image sequences through image capture tools such as cameras.
(2) Image preprocessing: normalization of image size and gray level, correction of head posture, image segmentation, etc.
Purpose: Improve image quality, eliminate noise, unify image gray value and size, and lay a solid foundation for subsequent feature extraction and classification recognition
Main tasks: segmentation of facial expression recognition sub-regions and normalization of expression images (scale normalization and gray normalization)
(3) Feature extraction: transform the dot matrix into higher-level image representations such as shape, motion, color, texture, spatial structure, etc., to reduce the huge image data under the premise of ensuring stability and recognition rate as much as possible Dimension processing.
The main methods of feature extraction are: extraction of geometric features, statistical features, frequency domain features and motion features, etc.
1) The use of geometric features for feature extraction is mainly to locate and measure the salient features of facial expressions, such as the position changes of eyes, eyebrows, mouth, etc., to determine their size, distance, shape and mutual ratio and other features to perform facial expression recognition
Advantages: reduce the amount of input data
Disadvantages: some important identification and classification information is lost, and the accuracy of the result is not high
2) The method based on overall statistical features mainly emphasizes keeping as much information in the original facial expression image as possible, and allows the classifier to find relevant features in the expression image, and obtain the features for recognition by transforming the entire facial expression image .
Main methods: PCA (Principal Component Analysis) and ICA (Independent Principal Component Analysis)
PCA uses an orthogonal dimensional space to illustrate the main direction of data changes. Advantages: good reconstruction. Disadvantages: poor separability.
ICA can obtain independent components of data, with good separability
Disadvantages of the extraction method based on the overall statistical characteristics of the image: the interference of external factors (light, angle, complex background, etc.) will cause the recognition rate to decrease
3) Feature extraction based on frequency domain: is to convert the image from the spatial domain to the frequency domain to extract its features (lower-level features)
Main method: Gabor wavelet transform
Wavelet transform can perform multi-resolution analysis on images by defining different nuclear frequencies, bandwidths and directions, and can effectively extract image features with different details in different directions and relatively stable, but as a low-level feature, it is not easy to directly use for matching and recognition , Often used in conjunction with ANN or SVM classifiers to improve the accuracy of facial expression recognition.
4) Extraction based on motion features: Extract motion features of dynamic image sequences (the focus of future research)
Main method: optical flow method
Optical flow refers to the apparent motion caused by the brightness mode. It is the projection of the three-dimensional velocity vector of the visible point in the scene on the imaging plane. It represents the instantaneous change of the position of the point on the surface of the scene in the image. At the same time, the optical flow field carries relevant information. Rich information on movement and structure
The optical flow model is an effective method for processing moving images. Its basic idea is to take the moving image function f(x, y, t) as the basic function, establish the optical flow constraint equation according to the principle of image intensity conservation, and calculate the motion parameters by solving the constraint equation .
Advantages: reflects the essence of expression changes, less affected by uneven illumination
Disadvantage: large amount of calculation
4) Classification discrimination: including design and classification decision
In the classifier design and selection stage of facial expression recognition, there are mainly the following methods: use linear classifiers, neural network classifiers, support vector machines, hidden Markov models and other classification and recognition methods
5.1) Linear classifier: Assuming that the pattern spaces of different categories are linearly separable, the main reason for the separability is the difference between different expressions.
5.2) Neural network classifier: Artificial Neural Network (ANN) is a network structure that simulates human brain neurons. It is composed of a large number of simple basic components-neurons, which are connected to each other to form an adaptive nonlinear Dynamic system. Taking the coordinate position of the face feature and its corresponding gray value as the input of the neural network, ANN can provide a complicated interface between classes that is difficult to imagine.
Neural network classifiers mainly include: multilayer perceptron, BP network, RBF network
Disadvantages: a large number of training samples and training time are required, which cannot meet real-time processing requirements
5.3) Support vector machine (SVM) classification algorithm: strong generalization ability, solving small sample, nonlinear and high-dimensional pattern recognition problems, new research hotspots
Basic idea: For nonlinear separable samples, first transform the input space into a high-dimensional space through nonlinear transformation, and then find the optimal linear interface in this new space. This nonlinear transformation is realized by defining an appropriate inner product function. The three commonly used inner product functions are: polynomial inner product function, radial basis inner product function, and Sigmoid inner product function
5.4) Hidden Markov Models (Hidden Markov Models, HMM): Features: statistical models, robust mathematical structure, suitable for dynamic process time series modeling, powerful pattern classification capabilities, and theoretically can handle any length of time series, The scope of application is very wide.
Advantages: The HMM method can accurately describe the changing nature and dynamic performance of facial expressions
5.5) Other methods:
Based on the recognition method of the physical model of the face, the face image is modeled as a deformable 3D grid surface, and the space and grayscale are considered in a 3D space at the same time.
The method based on model image encoding is to use genetic algorithms to encode, recognize and synthesize various expressions
4. Research prospects
(1) Robustness needs to be improved:
External factors (mainly interference from head deflection and light changes)
Multi-camera technology and color compensation technology are used to solve the problem, which has certain effects, but it is not ideal
(2) The amount of calculation for facial expression recognition needs to be reduced requirements to ensure real-time performance
(3) Strengthen the integration of multiple information technologies
Facial expressions are not the only way of expressing emotions. Combining various information such as voice intonation, pulse, body temperature, etc. to more accurately infer a person s inner emotions will be an issue that needs to be considered in expression recognition technology.
Attached to the specific facial expression recognition method at this stage (in fact, as can be seen from here, it is basically handcrafted Features + shallow classifier)
% code optimized for the following assumptions: % 1. Only one face in scene and it is the primary object % 2. Faster noise reducion and face detection % Originaly by Tolga Birdal % Implementation of the paper: % "A simple and accurate face detection algorithm in complex background" % by Yu-Tang Pai, Shanq-Jang Ruan, Mon-Chau Shie, Yi-Chi Liu % Additions by Tolga Birdal: % Minimum face size constraint % Adaptive theta thresholding (Theta is thresholded by mean2(theata)/4 % Parameters are modified by to detect better. Please check the paper for % parameters they propose. % Check the paper for more details. % usage: % I = double (imread( 'c:\Data\girl1.jpg' )); % detect_face(I); % The function will display the bounding box if a face is found. function [aa,SN_fill,FaceDat]=detect_face(I) close all; I=imread( './Test/Image029.jpg' ); % No faces at the beginning Faces=; numFaceFound = 0 ; I = double (I); H=size(I, 1 ); W=size(I, 2 ); %%%%%%%%%%%%%%%%%% LIGHTING COMPENSATION %%%%%%%%%%%%%% C= 255 *imadjust(I/255 ,[ 0.3 ; 1 ],[ 0 ; 1 ]); figure,imshow(C/255 ); % title( 'Lighting compensation' ); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%% EXTRACT SKIN %%%%%%%%%%%%%%%%%%%%% YCbCr=rgb2ycbcr(C); Cr=YCbCr(:,:, 3 ); S=zeros(H,W); [SkinIndexRow,SkinIndexCol] =find( 10 <Cr & Cr< 255 ); for i= 1 :length(SkinIndexRow) S(SkinIndexRow(i),SkinIndexCol(i)) = 1 ; end m_S = size(S); S(m_S( 1 ) -7 : m_S( 1 ),:) = 0 ; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%% %%%%%%%%%%%%%%%% REMOVE NOISE %%%% % figure;imshow(S); SN=zeros(H,W); for i = 1 :H -5 for j = 1 :W -5 localSum=sum(sum(S(i:i+ 4 , j:j+ 4 ))); SN(i:i+ 5 , j:j+ 5 )=(localSum> 20 ); end end % figure;imshow(SN); Iedge=edge(uint8(SN)); % figure;imshow(Iedge); SE = strel( 'square' , 9 ); SN_edge = (imdilate(Iedge,SE)); % % SN_edge = SN_edge1.*SN; % figure;imshow(SN_edge); SN_fill = imfill(SN_edge, 'holes' ); figure;imshow(SN_fill); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%% %%%%%%%%%%%%%%% FIND SKIN COLOR BLOCKS %%%% [L,lenRegions] = bwlabel(SN_fill, 4 ); AllDat = regionprops(L, 'BoundingBox' , 'FilledArea' ); AreaDat = cat( 1 , AllDat.FilledArea); [maxArea, maxAreaInd] = max(AreaDat); FaceDat = AllDat(maxAreaInd); FaceBB = [FaceDat.BoundingBox( 1 ),FaceDat.BoundingBox( 2 ),... FaceDat.BoundingBox( 3 ) -1 ,FaceDat.BoundingBox( 4 ) -1 ]; aa=imcrop(rgb2gray(uint8(I)).*uint8(SN_fill),FaceBB); figure,imshow(aa); title( 'Identified Face' ); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%% end Copy code