Abstract:
A learning device includes: an acquisition unit configured to acquire a recognition result by recognizing intention indication information of a user who uses a communication robot; a presentation unit configured to select a plurality of action sets corresponding to the recognition result on the basis of the acquired recognition result and to present the selected plurality of action sets to a remote operator who remotely operates the communication robot from a remote location; an operation result detecting unit configured to detect a selection state of the remote operator for the presented plurality of action sets; and a learning unit configured to determine a reward in learning on the basis of the detected selection state of the remote operator and to learn a response to the user's action.
Abstract:
A social ability generation device includes a perception unit that acquires person information on a person, extracts feature information on the person from the acquired person information, perceives an action that occurs between a communication device performing communication and the person, and perceives an action that occurs between people, a learning unit that multimodally learns an emotional interaction of the person using the extracted feature information on the person, and an operation generation unit that generates a behavior on the basis of the learned emotional interaction information of the person.
Abstract:
A separation unit separates voice signals of a plurality of channels into an incoming component in each incoming direction, a selection unit selects a statistic corresponding to an incoming direction of the incoming component separated by the separation unit from a storage unit which stores a predetermined statistic and a voice recognition model for each incoming direction, an updating unit updates the voice recognition model on the basis of the statistic selected by the selection unit, and a voice recognition unit recognizes a voice of the incoming component separated using the voice recognition model.
Abstract:
A speech processing device includes a sound source localization unit configured to determine a sound source position from acquired speech, a reverberation suppression unit configured to suppress a reverberation component of the speech to generate dereverberated speech, a feature quantity calculation unit configured to calculate a feature quantity of the dereverberated speech, a feature quantity adjustment unit configured to multiply the feature quantity by an adjustment factor corresponding to the sound source position to calculate an adjusted feature quantity, and a speech recognition unit configured to perform speech recognition using the adjusted feature quantity.
Abstract:
A speech processing device includes a reverberation characteristic selection unit configured to correlate correction data indicating a contribution of a reverberation component based on a corresponding reverberation characteristic with an adaptive acoustic model which is trained using reverbed speech to which a reverberation based on the corresponding reverberation characteristic is added for each of reverberation characteristics, to calculate likelihoods based on the adaptive acoustic models for a recorded speech, and to select correction data corresponding to the adaptive acoustic model having the calculated highest likelihood, and a dereverberation unit configured to remove the reverberation component from the speech based on the correction data.
Abstract:
An emotion acquisition device includes an image capturing unit that acquires a human facial expression, a conversion unit that converts the human facial expression acquired by the image capturing unit into a continuous value indicating a human emotion, and an emotion estimation unit that maps the continuous value converted by the conversion unit to estimate an emotion of a target person.
Abstract:
A speech processing apparatus that collects sound signals. With each of the collected sound signals, the apparatus may estimate a direction of a sound source, and select an extension filter that is applied to each sound signal. The extension filter may correspond to the estimated sound source of each of the sound signals. In addition, each of the sound signals may be corrected using the extension filter, and a reverberation reduction of the corrected sound signals and the collected sound signals may be performed.
Abstract:
A speech processing device includes a sound source localization unit configured to determine a sound source position from acquired speech, a reverberation suppression unit configured to suppress a reverberation component of the speech to generate dereverberated speech, a feature quantity calculation unit configured to calculate a feature quantity of the dereverberated speech, a feature quantity adjustment unit configured to multiply the feature quantity by an adjustment factor corresponding to the sound source position to calculate an adjusted feature quantity, and a speech recognition unit configured to perform speech recognition using the adjusted feature quantity.
Abstract:
A communication robot includes an auditory information processing portion configured to recognize a volume of voice collected by a sound collection portion and generate an auditory attention map by projecting a sound position in a three-dimensional space onto a two-dimensional attention map in which the robot is located at a center, a visual information processing portion configured to generate a visual attention map using a face detection result obtained by detecting a face of a person from an image captured by an imaging portion and a motion detection result obtained by detecting a motion of the person, an attention map generation portion configured to generate an attention map by integrating the auditory attention map and the visual attention map, and a motion processing portion configured to control eyeball movements and motions of the communication robot using the attention map.
Abstract:
A dialog understanding device includes a sound collection module configured to collect a sound signal, a contextual processing module, and a dialog system configured to perform a dialog with a human. The contextual processing module includes a plurality of layers for processing information obtained from the sound collection module. A fallback processing module of a case where a predetermined process has not succeeded for the collected sound signal is provided on each layer of the plurality of layers. A process of the next layer is performed after a fallback corresponding to a fallback process performed by the fallback processing module is performed. A sound signal obtained when the contextual processing module completes a process is input to the dialog system.