摘要:
In the system, a speech input uttered by a human is received by a microphone which outputs microphone output signals. The speech input received by the microphone is then recognized by a speech recognition unit, and a synthetic speech response appropriate for the speech input recognized by the speech recognition unit is generated and outputted from a loudspeaker to the human. In recognizing the speech input, the speech recognition unit receives input signals in which the synthetic speech response, outputted from the loudspeaker and then received by the microphone, is cancelled from the microphone output signals.
摘要:
In sound recognition apparatus of the present invention, user's utterance or a sound provided by an output section using previously stored sound waveforms is simultaneously inputted through a basic microphone of known frequency characteristics and an input microphone of unknown frequency characteristics. An analysis section respectively analyzes the frequency of the input speech through the basic microphone and the input microphone. A frequency characteristics calculation section calculates first difference data between the frequencies of the input speech of the basic microphone and the input microphone, and calculates frequency characteristics of the input microphone according to the first difference data and the frequency characteristics of the basic microphone. A frequency characteristics correction section calculates second difference data between the frequency characteristics of the input microphone and known frequency characteristics of a dictionary data microphone, and corrects input speech to be recognized through the input microphone as speech data of the frequency characteristics of the dictionary data microphone according to the second difference data. A recognition section recognizes corrected speech data by referring to a recognition dictionary storing data previously created through the dictionary data microphone.
摘要:
An apparatus for detecting a position of an object, including a signal output portion for generating a predetermined signal to radiate the signal into a space toward an arbitrary object, a signal input portion having a plurality of sensors for individually receiving signals reflected from the object, an impulse response calculating portion for obtaining an impulse response for each sensor in accordance with the signal radiated from the signal output portion and the signals received by the plural sensors, and an object position estimating portion for calculating the weight of a virtual position determined at an arbitrary point on the assumption that the signal radiated to the space by the signal output portion is reflected by the virtual position in such a manner that transmission time required for the signal to reach the signal input portion is measured and the components of each impulse response calculated in accordance with the transmission time are used to calculate the weight and calculating the weight while shifting the virtual position to estimate a virtual position, at which the weight exceeds a predetermined threshold value, to be the position of the object.
摘要:
A speech recognition interface system capable of handling a plurality of application programs simultaneously, and realizing convenient speech input and output modes which are suitable for the applications in the window systems and the speech mail systems. The system includes a speech recognition unit for carrying out a speech recognition processing for a speech input made by a user to obtain a recognition result; a program management table for managing program management data indicating a speech recognition interface function required by each application program; and a message processing unit for exchanging messages with the plurality of application programs in order to specify an appropriate recognition vocabulary to be used in the speech recognition processing of the speech input to the speech recognition unit, and to transmit the recognition result for the speech input obtained by the speech recognition unit by using the appropriate recognition vocabulary to appropriate ones of the plurality of application programs, according to the program management data managed by the program management table.
摘要:
A microphone array input type speech recognition scheme capable of realizing a high precision sound source position or direction estimation by a small amount of calculations, and thereby realizing a high precision speech recognition. A band-pass waveform, which is a waveform for each frequency bandwidth, is obtained from input signals of the microphone array, and a band-pass power of the sound source is directly obtained from the band-pass waveform. Then, the obtained band-pass power is used as the speech parameter. It is also possible to realize the sound source estimation and the band-pass power estimation at high precision while further reducing an amount of calculations, by utilizing a sound source position search processing in which a low resolution position estimation and a high resolution position estimation are combined.
摘要:
A speech dialogue system capable of realizing natural and smooth dialogue between the system and a human user, and easy maneuverability of the system. In the system, a semantic content of input speech from a user is understood and a semantic content determination of a response output is made according to the understood semantic content of the input speech. Then, a speech response and a visual response according to the determined response output are generated and outputted to the user. The dialogue between the system and the user is managed by controlling transitions between user states during which the input speech is to be entered and system states during which the system response is to be outputted. The understanding of a semantic content of input speech from a user is made by detecting keywords in the input speech, with the keywords to be detected in the input speech limited in advance, according to a state of a dialogue between the user and the system.
摘要:
A speech dialogue system capable of realizing natural and smooth dialogue between the system and a human user, and easy maneuverability of the system. In the system, a semantic content of input speech from a user is understood and a semantic content determination of a response output is made according to the understood semantic content of the input speech. Then, a speech response and a visual response according to the determined response output are generated and outputted to the user. The dialogue between the system and the user is managed by controlling transitions between user states during which the input speech is to be entered and system states during which the system response is to be outputted. The understanding of a semantic content of input speech from a user is made by detecting keywords in the input speech, with the keywords to be detected in the input speech limited in advance, according to a state of a dialogue between the user and the system.