-
1.
公开(公告)号:US20240347040A1
公开(公告)日:2024-10-17
申请号:US18753840
申请日:2024-06-25
申请人: Rovi Guides, Inc.
发明人: Vikram Makam Gupta , Prateek Varshney , Madhusudhan Seetharam , Ashish Kumar Srivastava , Harshith Kumar Gejjegondanahally Sreekanth
IPC分类号: G10L13/08 , G06F40/205 , G06F40/279 , G10L13/00 , G10L13/033 , H04M1/72433 , H04M1/72442 , H04W68/00
CPC分类号: G10L13/08 , G06F40/205 , G06F40/279 , G10L13/00 , G10L13/033 , H04M1/72433 , H04W68/005 , H04M1/72442 , H04M2201/39
摘要: Systems and methods for providing notifications without breaking media immersion. A notification delivery application receives notification data while a media device provides a media asset. In response to receiving the notification data while the media device provides the media asset, the notification delivery application generates a voice model based on a voice detected in the media asset. The notification delivery application converts the notification data to synthesized speech using the voice model and generates, by the media device, the synthesized speech for output at an appropriate point in the media asset based on contextual features of the media asset.
-
公开(公告)号:US20240347039A1
公开(公告)日:2024-10-17
申请号:US18683786
申请日:2022-08-18
IPC分类号: G10L13/08
CPC分类号: G10L13/08
摘要: A speech synthesis apparatus according to the present disclosure includes a memory and a processor coupled to the memory. The processor is configured to: obtain utterance information on subjects to be uttered, wherein the subjects to be uttered are texts contained in data on a book, obtain image information on images that are contained in the data on the book, obtain speech data corresponding to the subjects to be uttered; and generate, based on the obtained utterance information, the obtained image information, and the obtained speech data, a speech synthesis model for reading out a text associated with an image.
-
3.
公开(公告)号:US12119002B1
公开(公告)日:2024-10-15
申请号:US18392947
申请日:2023-12-21
申请人: Rabbit Inc.
发明人: Cheng Lyu , Peiyuan Liao , Zhuoheng Yang
CPC分类号: G10L15/22 , G10L13/08 , G10L15/04 , G10L15/1822 , G10L15/26 , G10L2015/223
摘要: An artificial intelligence enabled system is disclosed. The system includes a core component for enabling AI-powered interactions between the system and its users and one or more agents that understand user intent and automatically interact with products and services on the web and/or in the physical world through imitation of a human user.
-
公开(公告)号:US12118323B2
公开(公告)日:2024-10-15
申请号:US17482549
申请日:2021-09-23
摘要: An approach for generating an optimized video of a speaker, translated from a source language into a target language with the speaker's lips synchronized to the translated speech, while balancing optimization of the translation into a target language. A source video may be fed into a neural machine translation model. The model may synthesize a plurality of potential translations. The translations may be received by a generative adversarial network which generates video for each translation and classifies the translations as in-sync or out of sync. A lip-syncing score may be for each of the generated videos that are classified as in-sync.
-
公开(公告)号:US12106746B2
公开(公告)日:2024-10-01
申请号:US17703136
申请日:2022-03-24
发明人: Shilun Lin
IPC分类号: G10L13/08 , G06F40/126 , G06F40/30 , G10L13/033
CPC分类号: G10L13/08 , G06F40/126 , G06F40/30 , G10L13/033
摘要: This application discloses a method, an apparatus, a computer readable medium, and an electronic device for audio synthesis. The method includes: acquiring mixed language text information comprising text characters corresponding to at least two language types; performing text coding processing on the mixed language text information based on the at least two language types, to obtain an intermediate semantic coding feature of the mixed language text information; acquiring a target tone feature corresponding to a target tone subject, and performing decoding processing on the intermediate semantic coding feature based on the target tone feature to obtain an acoustic feature; and performing acoustic coding processing on the acoustic feature to obtain an audio corresponding to the mixed language text information.
-
6.
公开(公告)号:US20240321260A1
公开(公告)日:2024-09-26
申请号:US18126212
申请日:2023-03-24
发明人: Subham BISWAS , Saurabh TAHILIANI
IPC分类号: G10L13/027 , G10L13/08 , H04N7/15
CPC分类号: G10L13/027 , G10L13/08 , H04N7/157
摘要: A device may receive video data that includes a text transcript, audio sequences, and image frames, and may detect a network fluctuation. The device may process the text transcript to generate a new phrase, and may generate a response phoneme based on the new phrase. The device may generate a text embedding based on the response phoneme, and may process the audio sequences to generate a target voice sequence. The device may generate an audio embedding based on the target voice sequence, and may process the image frames to generate a target image sequence. The device may generate an image embedding based on the target image sequence, and may combine the embeddings to generate an embedding input vector. The device may generate a final voice response and a final video based on the embedding input vector, and may provide the video data, the final voice response, and the final video.
-
公开(公告)号:US20240311966A1
公开(公告)日:2024-09-19
申请号:US18613100
申请日:2024-03-21
申请人: SoftEye, Inc.
发明人: Edwin Chongwoo Park , Te-Won Lee
CPC分类号: G06T3/4092 , G02B27/017 , G06F3/013 , G06V10/25 , G06V20/70 , G06V30/10 , G10L13/08 , G02B2027/0138 , G02B2027/0178
摘要: Systems, apparatus, and methods for augmenting vision with region-of-interest based processing. In one specific example, smart glasses may use an eye-tracking camera to monitor the user's gaze and determine the user's gaze point. When triggered, the camera assembly captures a high-resolution image. The high-resolution image may be cropped to a much smaller region-of-interest (ROI) image based on computer-vision analysis of the user's gaze point. For example, if the smart glasses detect a human face at the gaze point, then the ROI is cropped to the human face. In this manner, the smart glasses may leverage specific capabilities of the smart glasses to augment the user experience; for example, telephoto lenses provide long distance vision, or computer-assisted search may direct the user to interesting activity. Other aspects may include e.g., external database assisted operation and/or ongoing cataloging throughout the day.
-
公开(公告)号:US20240303892A1
公开(公告)日:2024-09-12
申请号:US18667096
申请日:2024-05-17
申请人: TOPPAN HOLDINGS INC.
发明人: Ping ZHANG , Shiori TADA
CPC分类号: G06T13/205 , G06F40/40 , G06T13/40 , G10L13/047 , G10L13/08 , G10L15/063
摘要: A content generation device includes: an acquisition unit that acquires text data representing a first text, being a reading target; a voice generation unit that, using a voice generation model that based on a voice in which a user has read out a second text, being a learning target, has learned a way of reading out the second text in a voice of the user, generates a synthesized voice in which the first text represented by the acquired text data is read out in the voice of the user; and a synthesis unit that generates synthesized content by synthesizing the generated synthesized voice and a personal image of the user.
-
公开(公告)号:US12080273B2
公开(公告)日:2024-09-03
申请号:US18371704
申请日:2023-09-22
申请人: NEOSAPIENCE, INC.
发明人: Taesu Kim , Younggun Lee
IPC分类号: G10L13/033 , G06F40/40 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/08 , G10L13/047 , G10L13/08 , G10L13/10 , G10L25/30
CPC分类号: G10L13/10 , G06F40/40 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/08 , G10L13/033 , G10L13/047 , G10L13/086 , G10L25/30
摘要: A speech translation method using a multilingual text-to-speech synthesis model includes receiving input speech data of the first language and an articulatory feature of a speaker regarding the first language, converting the input speech data of the first language into a text of the first language, converting the text of the first language into a text of the second language, and generating output speech data for the text of the second language that simulates the speaker's speech by inputting the text of the second language and the articulatory feature of the speaker to a single artificial neural network text-to-speech synthesis model.
-
公开(公告)号:US12080269B2
公开(公告)日:2024-09-03
申请号:US17740680
申请日:2022-05-10
IPC分类号: G10L13/00 , G10L13/033 , G10L13/047 , G10L13/08 , G10L15/22 , G10L21/0232
CPC分类号: G10L13/047 , G10L13/033 , G10L21/0232
摘要: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.
-
-
-
-
-
-
-
-
-