SPEECH TRANSLATION WITH PERFORMANCE CHARACTERISTICS

    公开(公告)号:US20240274122A1

    公开(公告)日:2024-08-15

    申请号:US18193349

    申请日:2023-03-30

    摘要: An expressive speech translation system may process source speech in a source language and output synthesized speech in a target language while retaining vocal performance characteristics such as intonation, emphasis, rhythm, style, and/or emotion. The system may receive a transcript of the source speech, translate it, and generate transcript data. To generate the synthesized speech, the system may process the transcript data with a language embedding representing language-dependent speech characteristics of the target language, a speaker embedding representing speaker-dependent voice identity characteristics of a speaker, and a performance embedding representing the vocal performance characteristics of the source speech. The system may control the duration of segments of the synthesized speech to better align with corresponding segments of the source speech for the purpose of dubbing multimedia content with synthesized speech in a language different from that of the original audio.

    ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF

    公开(公告)号:US20240221723A1

    公开(公告)日:2024-07-04

    申请号:US18604756

    申请日:2024-03-14

    摘要: An electronic apparatus including a memory configured to store first voice recognition information related to a first language and second voice recognition information related to a second language, and a processor to obtain a first text corresponding to a user voice that is received on the basis of first voice recognition information, based on an entity name being included in the user voice according to the obtained first text, identify a segment in the user voice in which the entity name is included. The processor is to obtain a second text corresponding to the identified segment of the user voice on the basis of the second voice recognition information, and obtain control information corresponding to the user voice on the basis of the first text and the second text.

    A DIGITAL VIDEO VIRTUAL CONCIERGE USER INTERFACE SYSTEM

    公开(公告)号:US20240220564A1

    公开(公告)日:2024-07-04

    申请号:US18557114

    申请日:2022-04-12

    申请人: Rodd MARTIN

    摘要: There is provided herein a digital video virtual concierge user interface system which dynamically generates a series of user interface screens which the guide users through online application processes with dynamically generated audio and/or video content. The server generates the user interface along a process path defining user interface definitions. A user interface controller steps through the path to generate a user interface screen for each step according to the respective user interface definition thereof and a personalisation controller extracts customer data from a customer database according to a customer identifier and generates personalised content derived from the customer data so that the user interface controller generates at least one user interface screen in accordance with the personalised content.

    NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM, INFORMATION PROCESSING METHOD, AND APPLICATION DEVICE

    公开(公告)号:US20240194181A1

    公开(公告)日:2024-06-13

    申请号:US17908826

    申请日:2021-03-31

    发明人: Takeshi NAKAMURA

    IPC分类号: G10L13/08

    CPC分类号: G10L13/08

    摘要: A non-transitory computer-readable storage medium is provided having stored therein an information processing program that causes a computer to execute a process. The process includes an intent generating step of generating intent information that includes character string information that indicates a character string constituting a notification text that is output, as a voice output, to a driver of a moving object, and aiming information that indicates a type of a notification and that is set in each of the character strings; and a transmitting step of transmitting the intent information to an information processing apparatus that generates the notification text based on the intent information.

    Generating dialog responses from dialog response frame based on device capabilities

    公开(公告)号:US12008992B2

    公开(公告)日:2024-06-11

    申请号:US17414331

    申请日:2020-01-10

    发明人: Junki Ohmura

    摘要: An information processing apparatus acquires a capability for each of agent devices that each output a dialog response, generates the dialog response corresponding to the capability, based on a general-purpose dialog response frame, and deploys the dialog response to each of the agent devices. The capability indicates a combination of interfaces available for each of the agent devices. The apparatus applies to the general-purpose dialog response frame a conversion template including plural templates that convert values of parameters of the general-purpose dialog response frame into converted forms in accordance with the combination of interfaces available for each of the agent devices, and generate the dialog response for each of the agent devices, using the converted forms converted from the values of the parameters of the general-purpose dialog response frame.

    Virtual video live streaming processing method and apparatus, storage medium and electronic device

    公开(公告)号:US11991423B2

    公开(公告)日:2024-05-21

    申请号:US17961133

    申请日:2022-10-06

    发明人: Shaoming Zhu

    摘要: The application provides a virtual video live streaming processing method and apparatus, an electronic device, and a computer-readable storage medium, and relates to the field of virtual video live streaming technologies. The virtual video live streaming processing method includes: obtaining text data and determining to-be-synthesized video data corresponding to the text data; synthesizing a live video stream in real time according to the to-be-synthesized video data and pushing the live video stream to a live streaming client; determining target video data from the to-be-synthesized video data that has not been synthesized into a live video stream in response to a live streaming interruption request during receiving a live streaming interruption request; and synthesizing an interruption transition video stream according to the target video data and pushing the interruption transition video stream to the live streaming client. When a live video is interrupted during a virtual video live streaming process, this application may implement a smooth transition process between a current video action and a next video action without affecting real-time performance of the live video.

    OPERATION METHOD OF SPEECH SYNTHESIS SYSTEM
    20.
    发明公开

    公开(公告)号:US20240153486A1

    公开(公告)日:2024-05-09

    申请号:US18271933

    申请日:2021-12-02

    摘要: The present disclosure provides an operating method of a speech synthesis system, which includes, inputting a first text and a first speech for the first text, and a second text and a second speech for the second text; generating a speech synthesis model trained by applying the first and second texts and the first and second speeches to curriculum learning; and outputting a target synthesis speech corresponding to a target text based on the speech synthesis model when inputting the target text for speech output, and the generating of the speech synthesis model includes generating a concatenation text in which the first and second texts are concatenated and a concatenation speech in which the first and second speeches are concatenated, and adding the concatenation text and the concatenation speech to the speech synthesis model when an error rate is smaller than a set reference rate when learning-concatenating the concatenation text and the concatenation speech.