-
公开(公告)号:US12067371B2
公开(公告)日:2024-08-20
申请号:US17411137
申请日:2021-08-25
发明人: Srinivas Bangalore
IPC分类号: G06F17/00 , G06F40/263 , G06F40/40 , G06F40/58 , G06Q30/02 , G06Q30/0241 , G06Q30/0251 , G10L13/08 , G06F40/00
CPC分类号: G06F40/58 , G06F40/263 , G06F40/40 , G06Q30/02 , G06Q30/0241 , G06Q30/0271 , G10L13/08 , G06F40/00
摘要: In an embodiment of a messaging system, a method for presenting a commercial message to a user is provided. A target language in which the user is comfortable communicating may be determined based on at least one communication received by the user or at least one communication provided by the user. The commercial message may be presented to the user in the target language.
-
公开(公告)号:US20240274122A1
公开(公告)日:2024-08-15
申请号:US18193349
申请日:2023-03-30
发明人: Duo Wang , Vincent Laurent J. Pollet , Mikolaj Wojciech Babianski , Jakub Bartlomiej Swiatkowski
CPC分类号: G10L13/086 , G10L15/16 , G10L25/63 , H04N21/8106
摘要: An expressive speech translation system may process source speech in a source language and output synthesized speech in a target language while retaining vocal performance characteristics such as intonation, emphasis, rhythm, style, and/or emotion. The system may receive a transcript of the source speech, translate it, and generate transcript data. To generate the synthesized speech, the system may process the transcript data with a language embedding representing language-dependent speech characteristics of the target language, a speaker embedding representing speaker-dependent voice identity characteristics of a speaker, and a performance embedding representing the vocal performance characteristics of the source speech. The system may control the duration of segments of the synthesized speech to better align with corresponding segments of the source speech for the purpose of dubbing multimedia content with synthesized speech in a language different from that of the original audio.
-
公开(公告)号:US20240221723A1
公开(公告)日:2024-07-04
申请号:US18604756
申请日:2024-03-14
发明人: Chansik BOK , Jihun PARK
CPC分类号: G10L15/005 , G10L13/086 , G10L15/04 , G10L15/22 , G10L15/26 , G10L2015/223
摘要: An electronic apparatus including a memory configured to store first voice recognition information related to a first language and second voice recognition information related to a second language, and a processor to obtain a first text corresponding to a user voice that is received on the basis of first voice recognition information, based on an entity name being included in the user voice according to the obtained first text, identify a segment in the user voice in which the entity name is included. The processor is to obtain a second text corresponding to the identified segment of the user voice on the basis of the second voice recognition information, and obtain control information corresponding to the user voice on the basis of the first text and the second text.
-
公开(公告)号:US20240220564A1
公开(公告)日:2024-07-04
申请号:US18557114
申请日:2022-04-12
申请人: Rodd MARTIN
IPC分类号: G06F16/957 , G10L13/08 , G10L15/26
CPC分类号: G06F16/9577 , G10L13/08 , G10L15/26
摘要: There is provided herein a digital video virtual concierge user interface system which dynamically generates a series of user interface screens which the guide users through online application processes with dynamically generated audio and/or video content. The server generates the user interface along a process path defining user interface definitions. A user interface controller steps through the path to generate a user interface screen for each step according to the respective user interface definition thereof and a personalisation controller extracts customer data from a customer database according to a customer identifier and generates personalised content derived from the customer data so that the user interface controller generates at least one user interface screen in accordance with the personalised content.
-
公开(公告)号:US20240211688A1
公开(公告)日:2024-06-27
申请号:US18545147
申请日:2023-12-19
申请人: Google LLC
发明人: Abhirut Gupta , Aravindan Raghuveer , Abhay Sharma , Nitin Raut , Manish Kumar
IPC分类号: G06F40/284 , G06F40/232 , G06F40/242 , G10L13/08 , G10L15/02 , G10L15/06 , G10L15/187
CPC分类号: G06F40/284 , G06F40/232 , G06F40/242 , G10L13/08 , G10L15/187 , G10L2015/025 , G10L15/063
摘要: Systems and methods for generating phonetic spelling variations of a given word based on locale-specific pronunciations. A phoneme-letter density model may be configured to identify a phoneme sequence corresponding to an input word, and to identify all character sequences that may correspond to an input phoneme sequence and their respective probabilities. The phoneme-phoneme error model may be configured to identify locale-specific alternative phoneme sequences that may correspond to a given phoneme sequence, and their respective probabilities. Using these two models, a processing system may be configured to generate, for a given input word, a list of alternative character sequences that may correspond to the input word based on locale-specific pronunciations, and/or a probability distribution representing how likely each alternative character sequence is to correspond to the input word.
-
16.
公开(公告)号:US20240194181A1
公开(公告)日:2024-06-13
申请号:US17908826
申请日:2021-03-31
申请人: PIONEER CORPORATION
发明人: Takeshi NAKAMURA
IPC分类号: G10L13/08
CPC分类号: G10L13/08
摘要: A non-transitory computer-readable storage medium is provided having stored therein an information processing program that causes a computer to execute a process. The process includes an intent generating step of generating intent information that includes character string information that indicates a character string constituting a notification text that is output, as a voice output, to a driver of a moving object, and aiming information that indicates a type of a notification and that is set in each of the character strings; and a transmitting step of transmitting the intent information to an information processing apparatus that generates the notification text based on the intent information.
-
公开(公告)号:US12008992B2
公开(公告)日:2024-06-11
申请号:US17414331
申请日:2020-01-10
发明人: Junki Ohmura
IPC分类号: G10L15/22 , G06F40/151 , G10L13/08 , G10L15/01
CPC分类号: G10L15/22 , G10L13/08 , G10L15/01 , G10L2015/223
摘要: An information processing apparatus acquires a capability for each of agent devices that each output a dialog response, generates the dialog response corresponding to the capability, based on a general-purpose dialog response frame, and deploys the dialog response to each of the agent devices. The capability indicates a combination of interfaces available for each of the agent devices. The apparatus applies to the general-purpose dialog response frame a conversion template including plural templates that convert values of parameters of the general-purpose dialog response frame into converted forms in accordance with the combination of interfaces available for each of the agent devices, and generate the dialog response for each of the agent devices, using the converted forms converted from the values of the parameters of the general-purpose dialog response frame.
-
18.
公开(公告)号:US11991423B2
公开(公告)日:2024-05-21
申请号:US17961133
申请日:2022-10-06
发明人: Shaoming Zhu
IPC分类号: H04N21/2187 , G10L13/08 , H04N21/81 , H04N21/8547
CPC分类号: H04N21/816 , G10L13/08 , H04N21/2187 , H04N21/8547
摘要: The application provides a virtual video live streaming processing method and apparatus, an electronic device, and a computer-readable storage medium, and relates to the field of virtual video live streaming technologies. The virtual video live streaming processing method includes: obtaining text data and determining to-be-synthesized video data corresponding to the text data; synthesizing a live video stream in real time according to the to-be-synthesized video data and pushing the live video stream to a live streaming client; determining target video data from the to-be-synthesized video data that has not been synthesized into a live video stream in response to a live streaming interruption request during receiving a live streaming interruption request; and synthesizing an interruption transition video stream according to the target video data and pushing the interruption transition video stream to the live streaming client. When a live video is interrupted during a virtual video live streaming process, this application may implement a smooth transition process between a current video action and a next video action without affecting real-time performance of the live video.
-
公开(公告)号:US11990132B2
公开(公告)日:2024-05-21
申请号:US18176180
申请日:2023-02-28
发明人: Chenguang Zhu , Yu Shi , William Isaac Hinthorn , Nanshan Zeng , Ruochen Xu , Liyang Lu , Xuedong Huang
IPC分类号: G10L15/26 , G06F16/383 , G06F40/117 , G06F40/134 , G06F40/174 , G06F40/186 , G06N3/08 , G06Q10/0631 , G06Q10/10 , G06Q10/109 , G10L13/08 , G10L15/22
CPC分类号: G10L15/26 , G06F16/383 , G06F40/117 , G06F40/134 , G06F40/174 , G06F40/186 , G06N3/08 , G06Q10/063118 , G06Q10/103 , G06Q10/109 , G10L13/08 , G10L15/22
摘要: A transcription of audio speech included in electronic content associated with a meeting is created by an ASR model trained on speech-to-text data. The transcription is post-processed by modifying text included in the transcription, for example, by modifying punctuation, grammar, or formatting introduced by the ASR model and by changing or omitting one or more words that were included in both the audio speech and the transcription. After the transcription is post-processed, output based on the post-processed transcription is generated in the form of a meeting summary and/or template.
-
公开(公告)号:US20240153486A1
公开(公告)日:2024-05-09
申请号:US18271933
申请日:2021-12-02
发明人: Joon Hyuk CHANG , Sung Woong HWANG
摘要: The present disclosure provides an operating method of a speech synthesis system, which includes, inputting a first text and a first speech for the first text, and a second text and a second speech for the second text; generating a speech synthesis model trained by applying the first and second texts and the first and second speeches to curriculum learning; and outputting a target synthesis speech corresponding to a target text based on the speech synthesis model when inputting the target text for speech output, and the generating of the speech synthesis model includes generating a concatenation text in which the first and second texts are concatenated and a concatenation speech in which the first and second speeches are concatenated, and adding the concatenation text and the concatenation speech to the speech synthesis model when an error rate is smaller than a set reference rate when learning-concatenating the concatenation text and the concatenation speech.
-
-
-
-
-
-
-
-
-