Abstract:
A training method of an acoustic model includes constructing window-level input speech data based on a speech sequence; inputting the window-level input speech data to an acoustic model; calculating a sequence level-error based on an output of the acoustic model; acquiring window-level errors based on the sequence level-error; and updating the acoustic model based on the window-level errors.
Abstract:
An out-of-service (OOS) sentence generating method includes: training models based on a target utterance template of a target service and a target sentence generated from the target utterance template; generating a similar utterance template that is similar to the target utterance template based on a trained model, among the trained models, and a sentence generated from an utterance template of another service; and generating a similar sentence that is similar to the target sentence based on another trained model, among the trained models, and the similar utterance template.
Abstract:
An apparatus for determining a translation word includes a word vector generator configured to generate a word vector corresponding to an input word of a first language with reference to a first word vector space that is related to the first language, a word vector determiner configured to determine a word vector of a second language, wherein the determined word vector of the second language corresponds to the generated word vector, using a matching model, and a translation word selector configured to select a translation word of the second language, wherein the selected translation word corresponds to the input word of the first language, based on the determined word vector of the second language.
Abstract:
A training method of an acoustic model includes constructing window-level input speech data based on a speech sequence; inputting the window-level input speech data to an acoustic model; calculating a sequence level-error based on an output of the acoustic model; acquiring window-level errors based on the sequence level-error; and updating the acoustic model based on the window-level errors.
Abstract:
A video display method of a user terminal, includes determining whether ambient noise measured when a video is played is in an allowable range, and generating subtitles based on a voice signal included in the video in response to the ambient noise being determined to be out of the allowable range. The method further includes displaying the generated subtitles with the video.
Abstract:
A method of providing a search-integrated note function includes selecting a part of a text recorded in a note; performing a web search with regard to the selected text; clipping data selected by a user from a web search result; linking the clipped data to the selected text; and storing the clipped data.
Abstract:
An apparatus for providing a call log includes a call detail information generator configured to generate call detail information of a call; a call recorder configured to record a call content of the call; a call content summarizer configured to convert the recorded call content into a transcribed text, and generate call content summary information based on the transcribed text; a call log generator configured to generate a call log including the call detail information and the call content summary information; and a call log provider configured to output the call log.
Abstract:
Disclosed is a speech recognition method and apparatus, wherein the apparatus acquires first outputs from sub-models in a recognition model based on a speech signal, acquires a second output including values corresponding to the sub-models from a classification model based on the speech signal, and recognizes the speech signal based on the first outputs and the second output.
Abstract:
An electronic device and an method of the electronic device are provided, where the electronic device maintains a context that does not reflect a request for a secret conversation, in response to the request for the secret conversation being received from a first user, and generates a response signal to a voice signal of a second user based on the maintained context, in response to an end of the secret conversation with the first user.
Abstract:
An automated interpretation method includes: interpreting a source voice signal expressed in a first language by dividing the source voice signal into at least one word as a unit while the source voice signal is being input, and outputting, as an interpretation result in real time, a first target voice signal expressed in a second language by each unit; determining whether to re-output the interpretation result; and in response to a determination of the determining of whether to re-output the interpretation being a determination that the interpretation result is to be re-output, interpreting the source voice signal by a sentence as a unit and outputting, as the interpretation result, a second target voice signal expressed in the second language.