-
公开(公告)号:US12266343B2
公开(公告)日:2025-04-01
申请号:US17534969
申请日:2021-11-24
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Sangjun Park , Kihyun Choo
IPC: G10L13/08 , G06N3/045 , G10L19/032
Abstract: The electronic device may include a communication interface; a memory configured to store a first neural network model; and a processor configured to: receive, from an external electronic device via the communication interface, compressed information related to an acoustic feature obtained based on a text; decompress the compressed information to obtain decompressed information; and obtain sound information corresponding to the text by inputting the decompressed information into the first neural network model. The first neural network model may be obtained by training a relationship between a plurality of sample acoustic features and a plurality of sample sounds corresponding to the plurality of sample acoustic features.
-
公开(公告)号:US20220148562A1
公开(公告)日:2022-05-12
申请号:US17554547
申请日:2021-12-17
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Sangjun PARK , Kyoungbo Min , Kihyun Choo , Seungdo Choi
IPC: G10L13/047 , G10L13/10
Abstract: An electronic apparatus and a controlling method thereof are provided. The electronic apparatus includes a microphone; a memory configured to store a text-to-speech (TTS) model and a plurality of evaluation texts; and a processor configured to: obtain a first reference vector of a user speech spoken by a user based the user speech being received through the microphone, generate a plurality of candidate reference vectors based on the first reference vector, obtain a plurality of synthesized sounds by inputting the plurality of candidate reference vectors and the plurality of evaluation texts to the TTS model, identify at least one synthesized sound of the plurality of synthesized sounds based on a similarity between characteristics of the plurality of synthesized sounds and the user speech, and store a second reference vector of the at least one synthesized sound in the memory as a reference vector corresponding to the user for the TTS model.
-
公开(公告)号:US12198675B2
公开(公告)日:2025-01-14
申请号:US18171079
申请日:2023-02-17
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Hosang Sung , Kyoungbo Min , Seonho Hwang , Doohwa Hong , Eunmi Oh , Jonghoon Jeong , Kihyun Choo
Abstract: An electronic apparatus which acquires input data to be input into a TTS module for outputting a voice through the TTS module, acquires a voice signal corresponding to the input data through the TTS module, detects an error in the acquired voice signal based on the input data, corrects the input data based on the detection result, and acquires a corrected voice signal corresponding to the corrected input data through the TTS module.
-
公开(公告)号:US11830473B2
公开(公告)日:2023-11-28
申请号:US17037023
申请日:2020-09-29
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Jesus Monge Alvarez , Holly Francois , Hosang Sung , Seungdo Choi , Kihyun Choo , Sangjun Park
IPC: G10L13/027 , G10L13/047 , G10L15/06 , G10L15/187 , G10L13/06
CPC classification number: G10L13/027 , G10L13/047 , G10L13/06 , G10L15/063 , G10L15/187 , G10L2015/0635
Abstract: A system for synthesising expressive speech includes: an interface configured to receive an input text for conversion to speech; a memory; and at least one processor coupled to the memory. The processor is configured to generate, using an expressivity characterisation module, a plurality of expression vectors, wherein each expression vector is a representation of prosodic information in a reference audio style file, and synthesise expressive speech from the input text, using an expressive acoustic model comprising a deep convolutional neural network that is conditioned by at least one of the plurality of expression vectors.
-
公开(公告)号:US09848180B2
公开(公告)日:2017-12-19
申请号:US14794517
申请日:2015-07-08
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Junghoe Kim , Eunmi Oh , Kihyun Choo , Miao Lei
CPC classification number: H04N13/161 , G10L19/008 , H04S1/007 , H04S3/008 , H04S7/308 , H04S2420/03
Abstract: Surround audio decoding for selectively generating an audio signal from a multi-channel signal. In the surround audio decoding, a down-mixed signal, e.g., as down-mixed by an encoding terminal, is selectively up-mixed to a stereo signal or a multi-channel signal, by generating spatial information for generating the stereo signal, using spatial information for up-mixing the down-mixed signal to the multi-channel signal.
-
公开(公告)号:US12154563B2
公开(公告)日:2024-11-26
申请号:US17679446
申请日:2022-02-24
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Jonghoon Jeong , Hosang Sung , Doohwa Hong , Kyoungbo Min , Eunmi Oh , Kihyun Choo
Abstract: An electronic apparatus, based on a text sentence being input, obtains prosody information of the text sentence, segments the text sentence into a plurality of sentence elements, obtains a speech in which prosody information is reflected to each of the plurality of sentence elements in parallel by inputting the plurality of sentence elements and the prosody information of the text sentence to a text to speech (TTS) module, and merges the speech for the plurality of sentence elements that are obtained in parallel to output speech for the text sentence.
-
公开(公告)号:US11335325B2
公开(公告)日:2022-05-17
申请号:US16749257
申请日:2020-01-22
Applicant: Samsung Electronics Co., Ltd.
Inventor: Hosang Sung , Seonho Hwang , Doohwa Hong , Eunmi Oh , Kyoungbo Min , Jonghoon Jeong , Kihyun Choo
IPC: G10L13/08 , G10L15/22 , G10L15/18 , G10L13/047 , G10L13/033 , G10L15/02 , G10L13/00
Abstract: An electronic device and a controlling method of the electronic device are provided. The electronic device acquires text to respond on a received user's speech, acquires a plurality of pieces of parameter information for determining a style of an output speech corresponding to the text based on information on a type of a plurality of text-to-speech (TTS) databases and the received user's speech, identifies a TTS database corresponding to the plurality of pieces of parameter information among the plurality of TTS databases, identifies a weight set corresponding to the plurality of pieces of parameter information among a plurality of weight sets acquired through a trained artificial intelligence model, adjusts information on the output speech stored in the TTS database based on the weight set, synthesizes the output speech based on the adjusted information on the output speech, and outputs the output speech corresponding to the text.
-
公开(公告)号:US20210225358A1
公开(公告)日:2021-07-22
申请号:US17037023
申请日:2020-09-29
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Jesus MONGE ALVAREZ , Holly Francois , Hosang Sung , Seungdo Choi , Kihyun Choo , Sangjun Park
IPC: G10L13/027 , G10L13/047 , G10L13/06 , G10L15/06 , G10L15/187
Abstract: A system for synthesising expressive speech includes: an interface configured to receive an input text for conversion to speech; a memory; and at least one processor coupled to the memory. The processor is configured to generate, using an expressivity characterisation module, a plurality of expression vectors, wherein each expression vector is a representation of prosodic information in a reference audio style file, and synthesise expressive speech from the input text, using an expressive acoustic model comprising a deep convolutional neural network that is conditioned by at least one of the plurality of expression vectors.
-
公开(公告)号:US09479871B2
公开(公告)日:2016-10-25
申请号:US14134508
申请日:2013-12-19
Applicant: Samsung Electronics Co., Ltd.
Inventor: Junghoe Kim , Eunmi Oh , Kihyun Choo , Miao Lei
CPC classification number: H04R5/02 , G10L19/008 , H04R5/033 , H04S1/002 , H04S3/00 , H04S3/002 , H04S3/02 , H04S2420/01 , H04S2420/07
Abstract: A method, medium, and system generating a 3-dimensional (3D) stereo signal in a decoder by using a surround data stream. According to such a method, medium, and system, a head related transfer function (HRTF) is applied in a quadrature mirror filter (QMF) domain, thereby generating a 3D stereo signal by using a surround data stream.
-
公开(公告)号:US11887574B2
公开(公告)日:2024-01-30
申请号:US17578164
申请日:2022-01-18
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Hosang Sung , Lei Yang , Jonguk Yoo , Jonghoon Jeong , Kihyun Choo
IPC: G10K11/178 , G10L25/78
CPC classification number: G10K11/17827 , G10K11/17823 , G10K11/17873 , G10K11/17885 , G10L25/78 , G10K2210/1081 , G10K2210/3036
Abstract: A controlling method of a wearable electronic apparatus includes: receiving, by an IMU sensor, a bone conduction signal corresponding to vibration in the user's face, while the wearable electronic apparatus is operated in an ANC mode; identifying a presence or an absence of the user's voice based on the bone conduction signal; based on the identifying the presence of the user's voice, controlling an operation mode of the wearable electronic apparatus to be a different operation mode from the ANC mode; while the wearable electronic apparatus is operated in the different operation mode, identifying presence or absence of the user's voice based on the bone conduction signal; and based on the absence of the user's voice being identified for a predetermined time while the wearable electronic apparatus is operated in the different operation mode, controlling the different operation mode to return to the ANC mode.
-
-
-
-
-
-
-
-
-