摘要:
In a case where two microphones are used, sound source direction estimation of a plurality of sound sources can be performed with high accuracy. For this purpose, an inter-microphone phase difference is calculated for every frequency band in a microphone pair including two microphones that are installed apart from each other by a predetermined distance. Furthermore, for every frequency band in the microphone pair, a single sound source mask indicating whether or not a component of the frequency band is a single sound source is calculated. Then, the calculated inter-microphone phase difference and the calculated single sound source mask are input as feature quantities to a multi-label classifier, and a direction label associated with a sound source direction is output to the feature quantities.
摘要:
There is provided an apparatus and a method for rapidly extracting a target sound from a sound signal where a variety of sounds are mixed generated from a plurality of the sound sources. There is a voice recognition unit including a tracking unit for detecting a sound source direction and a voice segment to execute a sound source extraction process, and a voice recognition unit for inputting a sound source extraction result to execute a voice recognition process. In the tracking unit, a segment being created management unit that creates and manages a voice segment per unit of sound source sequentially detects a sound source direction, sequentially updates a voice segment estimated by connecting a detection result to a time direction, creates an extraction filter for a sound source extraction after a predetermined time is elapsed, and sequentially creates a sound source extraction result by sequentially applying the extraction filter to an input voice signal. The voice recognition unit sequentially executes the voice recognition process to a partial sound source extraction result to output a voice recognition result.
摘要:
A device and a method for determining a speech segment with a high degree of accuracy from a sound signal in which different sounds coexist are provided. Directional points indicating the direction of arrival of the sound signal are connected in the temporal direction, and a speech segment is detected. In this configuration, pattern classification is performed in accordance with directional characteristics with respect to the direction of arrival, and a directionality pattern and a null beam pattern are generated from the classification results. Also, an average null beam pattern is also generated by calculating the average of the null beam patterns at a time when a non-speech-like signal is input. Further, a threshold that is set at a slightly lower value than the average null beam pattern is calculated as the threshold to be used in detecting the local minimum point corresponding to the direction of arrival from each null beam pattern, and a local minimum point equal to or lower than the threshold is determined to be the point corresponding to the direction of arrival.
摘要:
A sound signal processing apparatus includes an observed signal analysis unit that receives as an observed signal a sound signal for channels obtained by a sound signal input unit formed of microphones and estimates a sound direction and a sound segment of a target sound which is sound to be extracted and a sound source extraction unit that receives the sound direction and sound segment of the target sound estimated by the observed signal analysis unit and extracts the sound signal for the target sound. The observed signal analysis unit includes a short time Fourier transform unit that generates an observed signal in time-frequency domain by applying short time Fourier transform to the sound signal for the channels received and a direction/segment estimation unit that receives the observed signal generated by the short time Fourier transform unit and detects the sound direction and sound segment of the target sound.