Abstract:
A method and system for detecting temporal segments of talking faces in a video sequence using visual cues. The system detects talking segments by classifying talking and non-talking segments in a sequence of image frames using visual cues. The present disclosure detects temporal segments of talking faces in video sequences by first localizing face, eyes, and hence, a mouth region. Then, the localized mouth regions across the video frames are encoded in terms of integrated gradient histogram (IGH) of visual features and quantified using evaluated entropy of the IGH. The time series data of entropy values from each frame is further clustered using online temporal segmentation (K-Means clustering) algorithm to distinguish talking mouth patterns from other mouth movements. Such segmented time series data is then used to enhance the emotion recognition system.
Abstract:
A method and system for detecting temporal segments of talking faces in a video sequence using visual cues. The system detects talking segments by classifying talking and non-talking segments in a sequence of image frames using visual cues. The present disclosure detects temporal segments of talking faces in video sequences by first localizing face, eyes, and hence, a mouth region. Then, the localized mouth regions across the video frames are encoded in terms of integrated gradient histogram (IGH) of visual features and quantified using evaluated entropy of the IGH. The time series data of entropy values from each frame is further clustered using online temporal segmentation (K-Means clustering) algorithm to distinguish talking mouth patterns from other mouth movements. Such segmented time series data is then used to enhance the emotion recognition system.