专利检索 ipc:G10L13/08 第 1 页

1.

发明公开
SYSTEMS AND METHODS FOR PROVIDING NOTIFICATIONS WITHIN A MEDIA ASSET WITHOUT BREAKING IMMERSION 审中-公开

公开(公告)号：US20240347040A1

公开(公告)日：2024-10-17

申请号：US18753840

申请日：2024-06-25

申请人： Rovi Guides, Inc.

发明人： Vikram Makam Gupta , Prateek Varshney , Madhusudhan Seetharam , Ashish Kumar Srivastava , Harshith Kumar Gejjegondanahally Sreekanth

IPC分类号： G10L13/08 , G06F40/205 , G06F40/279 , G10L13/00 , G10L13/033 , H04M1/72433 , H04M1/72442 , H04W68/00

CPC分类号： G10L13/08 , G06F40/205 , G06F40/279 , G10L13/00 , G10L13/033 , H04M1/72433 , H04W68/005 , H04M1/72442 , H04M2201/39

摘要： Systems and methods for providing notifications without breaking media immersion. A notification delivery application receives notification data while a media device provides a media asset. In response to receiving the notification data while the media device provides the media asset, the notification delivery application generates a voice model based on a voice detected in the media asset. The notification delivery application converts the notification data to synthesized speech using the voice model and generates, by the media device, the synthesized speech for output at an appropriate point in the media asset based on contextual features of the media asset.

2.

发明公开
SPEECH SYNTHESIS APPARATUS, SPEECH SYNTHESIS METHOD, AND SPEECH SYNTHESIS PROGRAM 审中-公开

公开(公告)号：US20240347039A1

公开(公告)日：2024-10-17

申请号：US18683786

申请日：2022-08-18

申请人： NIPPON TELEGRAPH AND TELEPHONE CORPORATION , The University of Tokyo

发明人： Yusuke IJIMA , Tomoki KORIYAMA , Shinnosuke TAKAMICHI

IPC分类号： G10L13/08

CPC分类号： G10L13/08

摘要： A speech synthesis apparatus according to the present disclosure includes a memory and a processor coupled to the memory. The processor is configured to: obtain utterance information on subjects to be uttered, wherein the subjects to be uttered are texts contained in data on a book, obtain image information on images that are contained in the data on the book, obtain speech data corresponding to the subjects to be uttered; and generate, based on the obtained utterance information, the obtained image information, and the obtained speech data, a speech synthesis model for reading out a text associated with an image.

3.

发明授权
System and method of facilitating human interactions with products and services over a network 有权

公开(公告)号：US12119002B1

公开(公告)日：2024-10-15

申请号：US18392947

申请日：2023-12-21

申请人： Rabbit Inc.

发明人： Cheng Lyu , Peiyuan Liao , Zhuoheng Yang

IPC分类号： G10L15/22 , G10L13/08 , G10L15/04 , G10L15/18 , G10L15/26

CPC分类号： G10L15/22 , G10L13/08 , G10L15/04 , G10L15/1822 , G10L15/26 , G10L2015/223

摘要： An artificial intelligence enabled system is disclosed. The system includes a core component for enabling AI-powered interactions between the system and its users and one or more agents that understand user intent and automatically interact with products and services on the web and/or in the physical world through imitation of a human user.

4.

发明授权
Optimization of lip syncing in natural language translated video 有权

公开(公告)号：US12118323B2

公开(公告)日：2024-10-15

申请号：US17482549

申请日：2021-09-23

申请人： International Business Machines Corporation

发明人： Sathya Santhar , Sridevi Kannan , Sarbajit K. Rakshit , Samuel Mathew Jawaharlal

IPC分类号： G06F40/40 , G10L13/08

CPC分类号： G06F40/40 , G10L13/08

摘要： An approach for generating an optimized video of a speaker, translated from a source language into a target language with the speaker's lips synchronized to the translated speech, while balancing optimization of the translation into a target language. A source video may be fed into a neural machine translation model. The model may synthesize a plurality of potential translations. The translations may be received by a generative adversarial network which generates video for each translation and classifies the translations as in-sync or out of sync. A lip-syncing score may be for each of the generated videos that are classified as in-sync.

5.

发明授权
Audio synthesis method and apparatus, computer readable medium, and electronic device 有权

公开(公告)号：US12106746B2

公开(公告)日：2024-10-01

申请号：US17703136

申请日：2022-03-24

申请人： Tencent Technology (Shenzhen) Company Limited

发明人： Shilun Lin

IPC分类号： G10L13/08 , G06F40/126 , G06F40/30 , G10L13/033

CPC分类号： G10L13/08 , G06F40/126 , G06F40/30 , G10L13/033

摘要： This application discloses a method, an apparatus, a computer readable medium, and an electronic device for audio synthesis. The method includes: acquiring mixed language text information comprising text characters corresponding to at least two language types; performing text coding processing on the mixed language text information based on the at least two language types, to obtain an intermediate semantic coding feature of the mixed language text information; acquiring a target tone feature corresponding to a target tone subject, and performing decoding processing on the intermediate semantic coding feature based on the target tone feature to obtain an acoustic feature; and performing acoustic coding processing on the acoustic feature to obtain an audio corresponding to the mixed language text information.

6.

发明公开
SYSTEMS AND METHODS FOR RECONSTRUCTING VIDEO DATA USING CONTEXTUALLY-AWARE MULTI-MODAL GENERATION DURING SIGNAL LOSS 审中-公开

公开(公告)号：US20240321260A1

公开(公告)日：2024-09-26

申请号：US18126212

申请日：2023-03-24

申请人： Verizon Patent and Licensing Inc.

发明人： Subham BISWAS , Saurabh TAHILIANI

IPC分类号： G10L13/027 , G10L13/08 , H04N7/15

CPC分类号： G10L13/027 , G10L13/08 , H04N7/157

摘要： A device may receive video data that includes a text transcript, audio sequences, and image frames, and may detect a network fluctuation. The device may process the text transcript to generate a new phrase, and may generate a response phoneme based on the new phrase. The device may generate a text embedding based on the response phoneme, and may process the audio sequences to generate a target voice sequence. The device may generate an audio embedding based on the target voice sequence, and may process the image frames to generate a target image sequence. The device may generate an image embedding based on the target image sequence, and may combine the embeddings to generate an embedding input vector. The device may generate a final voice response and a final video based on the embedding input vector, and may provide the video data, the final voice response, and the final video.

7.

发明公开
APPARATUS AND METHODS FOR AUGMENTING VISION WITH REGION-OF-INTEREST BASED PROCESSING 审中-公开

公开(公告)号：US20240311966A1

公开(公告)日：2024-09-19

申请号：US18613100

申请日：2024-03-21

申请人： SoftEye, Inc.

发明人： Edwin Chongwoo Park , Te-Won Lee

IPC分类号： G06T3/4092 , G02B27/01 , G06F3/01 , G06V10/25 , G06V20/70 , G06V30/10 , G10L13/08

CPC分类号： G06T3/4092 , G02B27/017 , G06F3/013 , G06V10/25 , G06V20/70 , G06V30/10 , G10L13/08 , G02B2027/0138 , G02B2027/0178

摘要： Systems, apparatus, and methods for augmenting vision with region-of-interest based processing. In one specific example, smart glasses may use an eye-tracking camera to monitor the user's gaze and determine the user's gaze point. When triggered, the camera assembly captures a high-resolution image. The high-resolution image may be cropped to a much smaller region-of-interest (ROI) image based on computer-vision analysis of the user's gaze point. For example, if the smart glasses detect a human face at the gaze point, then the ROI is cropped to the human face. In this manner, the smart glasses may leverage specific capabilities of the smart glasses to augment the user experience; for example, telephoto lenses provide long distance vision, or computer-assisted search may direct the user to interesting activity. Other aspects may include e.g., external database assisted operation and/or ongoing cataloging throughout the day.

8.

发明公开
CONTENT GENERATION DEVICE, CONTENT GENERATION METHOD, AND PROGRAM 审中-公开

公开(公告)号：US20240303892A1

公开(公告)日：2024-09-12

申请号：US18667096

申请日：2024-05-17

申请人： TOPPAN HOLDINGS INC.

发明人： Ping ZHANG , Shiori TADA

IPC分类号： G06T13/20 , G06F40/40 , G06T13/40 , G10L13/047 , G10L13/08 , G10L15/06

CPC分类号： G06T13/205 , G06F40/40 , G06T13/40 , G10L13/047 , G10L13/08 , G10L15/063

摘要： A content generation device includes: an acquisition unit that acquires text data representing a first text, being a reading target; a voice generation unit that, using a voice generation model that based on a voice in which a user has read out a second text, being a learning target, has learned a way of reading out the second text in a voice of the user, generates a synthesized voice in which the first text represented by the acquired text data is read out in the voice of the user; and a synthesis unit that generates synthesized content by synthesizing the generated synthesized voice and a personal image of the user.

9.

发明授权
Translation method and system using multilingual text-to-speech synthesis model 有权

公开(公告)号：US12080273B2

公开(公告)日：2024-09-03

申请号：US18371704

申请日：2023-09-22

申请人： NEOSAPIENCE, INC.

发明人： Taesu Kim , Younggun Lee

IPC分类号： G10L13/033 , G06F40/40 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/08 , G10L13/047 , G10L13/08 , G10L13/10 , G10L25/30

CPC分类号： G10L13/10 , G06F40/40 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/08 , G10L13/033 , G10L13/047 , G10L13/086 , G10L25/30

摘要： A speech translation method using a multilingual text-to-speech synthesis model includes receiving input speech data of the first language and an articulatory feature of a speaker regarding the first language, converting the input speech data of the first language into a text of the first language, converting the text of the first language into a text of the second language, and generating output speech data for the text of the second language that simulates the speaker's speech by inputting the text of the second language and the articulatory feature of the speaker to a single artificial neural network text-to-speech synthesis model.

10.

发明授权
Synthetic speech processing 有权

公开(公告)号：US12080269B2

公开(公告)日：2024-09-03

申请号：US17740680

申请日：2022-05-10

申请人： Amazon Technologies, Inc.

发明人： Abdigani Mohamed Diriye , Jaime Lorenzo Trueba , Patryk Golebiowski , Piotr Jozwiak

IPC分类号： G10L13/00 , G10L13/033 , G10L13/047 , G10L13/08 , G10L15/22 , G10L21/0232

CPC分类号： G10L13/047 , G10L13/033 , G10L21/0232

摘要： A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类