Abstract:
Disclosed are a Zero User Interface (UI)-based automatic speech translation system and method. The system and method can solve problems such as the procedural inconvenience of inputting speech signals and the malfunction of speech recognition due to crosstalk when users who speak difference languages have a face-to-face conversation. The system includes an automatic speech translation server configured to select a speech signal of a speaker from among multiple speech signals received from user terminals connected to an automatic speech translation service and configured to transmit a result of translating the speech signal of the speaker into a target language, a speaker terminal configured to receive the speech signal of the speaker and transmit the speech signal of the speaker to the automatic speech translation server, and a counterpart terminal configured to output the result of the translation in a form of text or voice in the target language.
Abstract:
Provided is a zero user interface (UI)-based automatic interpretation method including receiving a plurality of speech signals uttered by a plurality of users from a plurality of terminal devices, acquiring a plurality of speech energies from the plurality of received speech signals, determining main speech signal uttered in a current utterance turn among the plurality of speech signals by comparing the plurality of acquired speech energies, and transmitting an automatic interpretation result acquired by performing automatic interpretation on the determined main speech signal to the plurality of terminal devices.
Abstract:
An automatic interpretation method performed by a correspondent terminal communicating with an utterer terminal includes receiving, by a communication unit, voice feature information about an utterer and an automatic translation result, obtained by automatically translating a voice uttered in a source language by the utterer in a target language, from the utterer terminal and performing, by a sound synthesizer, voice synthesis on the basis of the automatic translation result and the voice feature information to output a personalized synthesis voice as an automatic interpretation result. The voice feature information about the utterer includes a hidden variable including a first additional voice result and a voice feature parameter and a second additional voice feature, which are extracted from a voice of the utterer.
Abstract:
Provided is a method of providing an automatic speech translation service. The method includes, by an automatic speech translation device of a user, searching for and finding a nearby automatic speech translation device based on strength of a signal for wireless communication, exchanging information for automatic speech translation with the found automatic speech translation device, generating a list of candidate devices for the automatic speech translation using the automatic speech translation information and the signal strength, and connecting to a candidate device having a greatest variation of the signal strength among devices in the generated list.
Abstract:
The present invention suggests an interface device for processing a voice of a user which efficiently outputs various information so as to allow a user to contribute to the voice recognition or the automatic interpretation and a method thereof. For this purpose, the present invention suggests an interface device for processing a voice of a user which includes an utterance input unit configured to input utterance of a user, an utterance end recognizing unit configured to recognize the end of the input utterance; and an utterance result output unit configured to output at least one of a voice recognition result, a translation result, and an interpretation result of the ended utterance.
Abstract:
A voice recognition device having a barge-in function and a method thereof are proposed. In an exemplary embodiment, there are disclosed an intelligent robot and a method for operating the intelligent robot, including an input unit for receiving a user's voice data, one or more processors, and an output unit for outputting a response generated on a basis of the user's voice data, wherein the processors generate the response corresponding to the users' voice data while maintaining a listening mode for identifying a dialogue partner by using the user's face image data and the user's voice data, and perform a speaking mode for control so as to perform an operation corresponding to the response.
Abstract:
Provided is an end-to-end speech recognition technology capable of improving speech recognition performance in a desired specific domain, which includes collecting domain text data be specialized and comparing the data with a basic transcript text DB to determine domain text that is not included in the basic transcript text DB and requires additional training and constructing a specialization target domain text DB. The end-to-end speech recognition technology generates a speech signal from the domain text of the specialization target domain text DB, and trains a speech recognition neural network with the generated speech signal to generate an end-to-end speech recognition model specialized for the domain to be specialized. The specialized speech recognition model may be applied to the end-to-end speech recognizer to perform the domain-specific end-to-end speech recognition.
Abstract:
Provided are an automatic interpretation system and method for generating a synthetic sound having characteristics similar to those of an original speaker's voice. The automatic interpretation system for generating a synthetic sound having characteristics similar to those of an original speaker's voice includes a speech recognition module configured to generate text data by performing speech recognition for an original speech signal of an original speaker and extract at least one piece of characteristic information among pitch information, vocal intensity information, speech speed information, and vocal tract characteristic information of the original speech, an automatic translation module configured to generate a synthesis-target translation by translating the text data, and a speech synthesis module configured to generate a synthetic sound of the synthesis-target translation.
Abstract:
A user terminal, hands-free device and method for hands-free automatic interpretation service. The user terminal includes an interpretation environment initialization unit, an interpretation intermediation unit, and an interpretation processing unit. The interpretation environment initialization unit performs pairing with a hands-free device in response to a request from the hands-free device, and initializes an interpretation environment. The interpretation intermediation unit sends interpretation results obtained by interpreting a user's voice information received from the hands-free device to a counterpart terminal, and receives interpretation results obtained by interpreting a counterpart's voice information from the counterpart terminal. The interpretation processing unit synthesizes the interpretation results of the counterpart into a voice form based on the initialized interpretation environment when the interpretation results are received from the counterpart terminal, and sends the synthesized voice information to the hands-free device.
Abstract:
An apparatus and method for automatic translation are disclosed. In the apparatus for automatic translation, a User Interface (UI) generation unit generates UIs necessary for start of translation and a translation process. A translation target input unit receives a translation target to be translated from a user. A translation target translation unit translates the translation target received by the translation target input unit and generates results of translation. A display unit includes a touch panel for outputting the results of translation and the UIs in accordance with the location of the user.