-
公开(公告)号:US20170011735A1
公开(公告)日:2017-01-12
申请号:US15187948
申请日:2016-06-21
Inventor: Dong Hyun KIM , Min Kyu LEE
IPC: G10L15/00 , G10L15/183 , G10L15/02
CPC classification number: G10L15/005 , G10L15/183 , G10L15/32
Abstract: A system and a method of speech recognition which enable a spoken language to be automatically identified while recognizing speech of a person who vocalize to effectively process multilingual speech recognition without a separate process for user registration or recognized language setting such as use of a button for allowing a user to manually select a language to be vocalized and support speech recognition of each language to be automatically performed even though persons who speak different languages vocalize by using one terminal to increase convenience of the user.
Abstract translation: 语音识别的系统和方法,其能够在识别发声的人的语音时有效地处理多语言语音识别的语音被自动识别,而无需用户注册或识别的语言设置的单独过程,例如使用按钮来允许 用户手动选择要发出的语言,并且支持即使通过使用一个终端来增加用户的便利而说出不同语言的人发音的每种语言的语音识别。
-
公开(公告)号:US20240221742A1
公开(公告)日:2024-07-04
申请号:US18488333
申请日:2023-10-17
Inventor: Seung Yun , Seung Hi Kim , Sanghun KIM , Jeonguk BANG , Min Kyu LEE
Abstract: A method of generating a sympathetic back-channel signal is provided. The method includes receiving a voice signal from a user, determining whether predetermined timing is timing at which a back-channel signal is output in response to the input of the voice signal at the predetermined timing, storing the voice signal that has been input so far if the predetermined timing is the timing at which the back-channel signal is output as a result of the determination, determining back-channel signal information based on the stored voice signal, and outputting the determined back-channel signal information.
-
公开(公告)号:US20230290360A1
公开(公告)日:2023-09-14
申请号:US18085889
申请日:2022-12-21
Inventor: Seung YUN , Jeonguk BANG , Min Kyu LEE , Sanghun KIM
Abstract: An apparatus for improving context-based automatic interpretation performance includes: an uttered voice input unit configured to receive a voice signal from a user; a previous sentence input unit configured to determine whether there is a user’s previous utterance when the voice signal is input by the uttered voice input unit; a voice encoding processing unit configured to decode only the voice signal through the uttered voice input unit when it is determined that there is no user’s previous utterance and extract a vector of the voice signal when it is determined that there is the user’s previous utterance; a context encoding processing unit configured to extract a context vector from a previous utterance when there is the previous utterance and transmit the extracted context vector of the previous utterance; and an interpretation decoding processing unit configured to output an interpretation result text.
-
公开(公告)号:US20160260426A1
公开(公告)日:2016-09-08
申请号:US15058550
申请日:2016-03-02
Inventor: Young Ik KIM , Sang Hun KIM , Min Kyu LEE , Mu Yeol CHOI
Abstract: A speech recognition apparatus and method are provided, the method including converting an input signal to acoustic model data, dividing the acoustic model data into a speech model group and a non-speech model group and calculating a first maximum likelihood corresponding to the speech model group and a second maximum likelihood corresponding to the non-speech model group, detecting a speech based on a likelihood ratio (LR) between the first maximum likelihood and the second maximum likelihood, obtaining utterance stop information based on output data of a decoder and dividing the input signal into a plurality of speech intervals based on the utterance stop information, calculating a confidence score of each of the plurality of speech intervals based on information on a prior probability distribution of the acoustic model data, and removing a speech interval having the confidence score lower than a threshold.
Abstract translation: 提供一种语音识别装置和方法,所述方法包括将输入信号转换为声学模型数据,将声学模型数据分解成语音模型组和非语音模型组,并计算对应于语音模型组的第一最大似然 和对应于非语音模型组的第二最大似然,基于第一最大似然和第二最大似然之间的似然比(LR)检测语音,基于解码器的输出数据获得话音停止信息,并将 基于所述发声停止信息将信号输入到多个语音间隔中,基于关于所述声学模型数据的先验概率分布的信息来计算所述多个语音间隔中的每一个的置信度分数,以及去除具有所述置信区间的语音间隔 低于阈值。
-
公开(公告)号:US20220215857A1
公开(公告)日:2022-07-07
申请号:US17531316
申请日:2021-11-19
Inventor: Jeong Uk BANG , Seung YUN , Sang Hun KIM , Min Kyu LEE , Joon Gyu MAENG
Abstract: Provided is a method of performing automatic interpretation based on speaker separation by a user terminal, the method including: receiving a first speech signal including at least one of a user speech of a user and a user surrounding speech around the user from an automatic interpretation service providing terminal, separating the first speech signal into speaker-specific speech signals, performing interpretation on the speaker-specific speech signals in a language selected by the user on the basis of an interpretation mode, and providing a second speech signal generated as a result of the interpretation to at least one of a counterpart terminal and the automatic interpretation service providing terminal according to the interpretation mode.
-
公开(公告)号:US20220147722A1
公开(公告)日:2022-05-12
申请号:US17522218
申请日:2021-11-09
Inventor: Sang Hun KIM , Seung YUN , Min Kyu LEE , Joon Gyu MAENG , Dong Hyun KIM
IPC: G06F40/58 , G10L21/0232 , G10L25/21 , G10L21/04 , G10L25/78 , G10L17/06 , G10L17/02 , G10L25/18 , G10L25/06 , G10L21/0316 , G10L17/18 , G06N3/08
Abstract: Disclosed are a Zero User Interface (UI)-based automatic speech translation system and method. The system and method can solve problems such as the procedural inconvenience of inputting speech signals and the malfunction of speech recognition due to crosstalk when users who speak difference languages have a face-to-face conversation.
The system includes an automatic speech translation server configured to select a speech signal of a speaker from among multiple speech signals received from user terminals connected to an automatic speech translation service and configured to transmit a result of translating the speech signal of the speaker into a target language, a speaker terminal configured to receive the speech signal of the speaker and transmit the speech signal of the speaker to the automatic speech translation server, and a counterpart terminal configured to output the result of the translation in a form of text or voice in the target language.-
公开(公告)号:US20210312938A1
公开(公告)日:2021-10-07
申请号:US17221364
申请日:2021-04-02
Inventor: Seung YUN , Sang Hun KIM , Min Kyu LEE
IPC: G10L21/0308 , G10L15/20 , G10L15/22 , G10L15/30 , G10L25/21
Abstract: Provided is a zero user interface (UI)-based automatic interpretation method including receiving a plurality of speech signals uttered by a plurality of users from a plurality of terminal devices, acquiring a plurality of speech energies from the plurality of received speech signals, determining main speech signal uttered in a current utterance turn among the plurality of speech signals by comparing the plurality of acquired speech energies, and transmitting an automatic interpretation result acquired by performing automatic interpretation on the determined main speech signal to the plurality of terminal devices.
-
公开(公告)号:US20210049997A1
公开(公告)日:2021-02-18
申请号:US16990482
申请日:2020-08-11
Inventor: Seung YUN , Sang Hun KIM , Min Kyu LEE
Abstract: An automatic interpretation method performed by a correspondent terminal communicating with an utterer terminal includes receiving, by a communication unit, voice feature information about an utterer and an automatic translation result, obtained by automatically translating a voice uttered in a source language by the utterer in a target language, from the utterer terminal and performing, by a sound synthesizer, voice synthesis on the basis of the automatic translation result and the voice feature information to output a personalized synthesis voice as an automatic interpretation result. The voice feature information about the utterer includes a hidden variable including a first additional voice result and a voice feature parameter and a second additional voice feature, which are extracted from a voice of the utterer.
-
公开(公告)号:US20240212681A1
公开(公告)日:2024-06-27
申请号:US18498241
申请日:2023-10-31
Inventor: Min Kyu LEE , Seung Hi KIM , Sanghun KIM , Jeonguk BANG , Seung YUN
IPC: G10L15/22 , G06V40/16 , G10L13/02 , G10L17/00 , H04N23/611
CPC classification number: G10L15/22 , G06V40/172 , G10L13/02 , G10L17/00 , H04N23/611
Abstract: A voice recognition device having a barge-in function and a method thereof are proposed.
In an exemplary embodiment, there are disclosed an intelligent robot and a method for operating the intelligent robot, including an input unit for receiving a user's voice data, one or more processors, and an output unit for outputting a response generated on a basis of the user's voice data, wherein the processors generate the response corresponding to the users' voice data while maintaining a listening mode for identifying a dialogue partner by using the user's face image data and the user's voice data, and perform a speaking mode for control so as to perform an operation corresponding to the response.-
10.
公开(公告)号:US20230215419A1
公开(公告)日:2023-07-06
申请号:US17979471
申请日:2022-11-02
Inventor: Seung YUN , Sanghun KIM , Min Kyu LEE
Abstract: Provided is an end-to-end speech recognition technology capable of improving speech recognition performance in a desired specific domain, which includes collecting domain text data be specialized and comparing the data with a basic transcript text DB to determine domain text that is not included in the basic transcript text DB and requires additional training and constructing a specialization target domain text DB. The end-to-end speech recognition technology generates a speech signal from the domain text of the specialization target domain text DB, and trains a speech recognition neural network with the generated speech signal to generate an end-to-end speech recognition model specialized for the domain to be specialized. The specialized speech recognition model may be applied to the end-to-end speech recognizer to perform the domain-specific end-to-end speech recognition.
-
-
-
-
-
-
-
-
-