-
公开(公告)号:US12131739B2
公开(公告)日:2024-10-29
申请号:US18503501
申请日:2023-11-07
申请人: Q (CUE) LTD.
发明人: Aviad Maizels , Yonatan Wexler , Avi Barliya
IPC分类号: G06F21/31 , G06F21/32 , G06Q20/40 , G06V10/145 , G06V10/60 , G06V40/16 , G06V40/40 , G10L13/00 , G10L13/02 , G10L13/027 , G10L15/08 , G10L15/16 , G10L15/25 , G10L17/02 , G10L17/04 , G10L17/10 , G10L17/18 , G10L25/84 , H04L9/40
CPC分类号: G10L15/25 , G06F21/32 , G06Q20/40145 , G06V10/145 , G06V10/60 , G06V40/166 , G06V40/171 , G06V40/172 , G06V40/176 , G06V40/45 , G10L13/00 , G10L13/02 , G10L13/027 , G10L15/08 , G10L15/16 , G10L17/02 , G10L17/04 , G10L17/10 , G10L17/18 , G10L25/84 , H04L63/0861 , H04L63/108
摘要: Systems, methods, and non-transitory computer readable media including instructions for performing operations for continuous authentication based on facial skin micromovements is disclosed. The operations may include receiving during an ongoing electronic transaction, first signals representing coherent light reflections associated with first facial skin micromovements during a first time period and second signals representing coherent light reflections associated with second facial skin micromovements during a second time period. The operations may also include determining, using the first and second signals, that a specific individual is associated with the first and second facial skin micromovements. The operations may also include receiving during the ongoing electronic transaction third signals representing coherent light reflections associated with third facial skin micromovements. The operations may further include determining, using the third signals, that the third facial skin micromovements are not associated with the specific individual, and initiating an action based on the determination.
-
2.
公开(公告)号:US20240347040A1
公开(公告)日:2024-10-17
申请号:US18753840
申请日:2024-06-25
申请人: Rovi Guides, Inc.
发明人: Vikram Makam Gupta , Prateek Varshney , Madhusudhan Seetharam , Ashish Kumar Srivastava , Harshith Kumar Gejjegondanahally Sreekanth
IPC分类号: G10L13/08 , G06F40/205 , G06F40/279 , G10L13/00 , G10L13/033 , H04M1/72433 , H04M1/72442 , H04W68/00
CPC分类号: G10L13/08 , G06F40/205 , G06F40/279 , G10L13/00 , G10L13/033 , H04M1/72433 , H04W68/005 , H04M1/72442 , H04M2201/39
摘要: Systems and methods for providing notifications without breaking media immersion. A notification delivery application receives notification data while a media device provides a media asset. In response to receiving the notification data while the media device provides the media asset, the notification delivery application generates a voice model based on a voice detected in the media asset. The notification delivery application converts the notification data to synthesized speech using the voice model and generates, by the media device, the synthesized speech for output at an appropriate point in the media asset based on contextual features of the media asset.
-
公开(公告)号:US20240346285A1
公开(公告)日:2024-10-17
申请号:US18607777
申请日:2024-03-18
摘要: A feedforward generative neural network that generates an output example that includes multiple output samples of a particular type in a single neural network inference. Optionally, the generation may be conditioned on a context input. For example, the feedforward generative neural network may generate a speech waveform that is a verbalization of an input text segment conditioned on linguistic features of the text segment.
-
公开(公告)号:US12118371B2
公开(公告)日:2024-10-15
申请号:US17557790
申请日:2021-12-21
申请人: Meta Platforms, Inc.
发明人: Scott Martin
IPC分类号: G06F9/451 , G06F3/01 , G06F3/16 , G06F7/14 , G06F16/176 , G06F16/22 , G06F16/23 , G06F16/242 , G06F16/2455 , G06F16/2457 , G06F16/248 , G06F16/28 , G06F16/33 , G06F16/332 , G06F16/338 , G06F16/438 , G06F16/903 , G06F16/9032 , G06F16/9038 , G06F16/904 , G06F16/951 , G06F16/9535 , G06F18/2411 , G06F40/205 , G06F40/295 , G06F40/30 , G06F40/40 , G06N3/006 , G06N3/08 , G06N20/00 , G06Q50/00 , G06V20/10 , G06V40/16 , G06V40/20 , G10L13/00 , G10L13/04 , G10L15/02 , G10L15/06 , G10L15/07 , G10L15/16 , G10L15/18 , G10L15/183 , G10L15/187 , G10L15/22 , G10L15/26 , G10L17/06 , G10L17/22 , H04L12/28 , H04L41/00 , H04L41/22 , H04L43/0882 , H04L43/0894 , H04L51/02 , H04L51/046 , H04L51/216 , H04L51/52 , H04L67/10 , H04L67/306 , H04L67/50 , H04L67/53 , H04L67/5651 , H04L67/75 , H04W12/08
CPC分类号: G06F9/453 , G06F3/011 , G06F3/013 , G06F3/017 , G06F3/167 , G06F7/14 , G06F16/176 , G06F16/2255 , G06F16/2365 , G06F16/243 , G06F16/24552 , G06F16/24575 , G06F16/24578 , G06F16/248 , G06F16/285 , G06F16/3323 , G06F16/3329 , G06F16/3344 , G06F16/338 , G06F16/4393 , G06F16/90332 , G06F16/90335 , G06F16/9038 , G06F16/904 , G06F16/951 , G06F16/9535 , G06F18/2411 , G06F40/205 , G06F40/295 , G06F40/30 , G06F40/40 , G06N3/006 , G06N3/08 , G06N20/00 , G06Q50/01 , G06V20/10 , G06V40/172 , G06V40/28 , G10L15/02 , G10L15/063 , G10L15/07 , G10L15/16 , G10L15/1815 , G10L15/1822 , G10L15/183 , G10L15/187 , G10L15/22 , G10L15/26 , G10L17/06 , G10L17/22 , H04L12/2816 , H04L41/20 , H04L41/22 , H04L43/0882 , H04L43/0894 , H04L51/02 , H04L51/216 , H04L51/52 , H04L67/306 , H04L67/535 , H04L67/5651 , H04L67/75 , H04W12/08 , G06F2216/13 , G10L13/00 , G10L13/04 , G10L2015/223 , G10L2015/225 , H04L51/046 , H04L67/10 , H04L67/53
摘要: In one embodiment, a method includes receiving one or more voice inputs from a first user, determining a first language register associated with the first user based on the one or more voice inputs, selecting a second language register for a voice response based on the one or more voice inputs, generating the voice response based on the second language register, and providing the voice response in response to the one or more voice inputs.
-
公开(公告)号:US12100396B2
公开(公告)日:2024-09-24
申请号:US17583672
申请日:2022-01-25
发明人: Christo Frank Devaraj , Manish Kumar Dalmia , Tony Roy Hardie , Ran Mokady , Nick Ciubotariu , Sandra Lemon
IPC分类号: G10L15/22 , G06F3/16 , G06F40/35 , G10L15/30 , H04L51/10 , H04L51/224 , H04L67/306 , G06V40/10 , G10L13/00 , G10L15/08
CPC分类号: G10L15/22 , G06F3/167 , G06F40/35 , G10L15/30 , H04L51/10 , H04L51/224 , H04L67/306 , G06V40/10 , G10L13/00 , G10L15/08 , G10L2015/088 , G10L2015/223
摘要: Systems, methods, and devices for outputting indications regarding voice-based interactions are described. A first speech-controlled device detects spoken audio corresponding to recipient information. The first device captures the audio and sends audio data corresponding to the captured audio to a server. The server determines a second speech-controlled device of the recipient and sends a signal to the recipient's second speech-controlled device representing a message is forthcoming. The recipient's second speech-controlled device outputs and indication representing a message is forthcoming.
-
公开(公告)号:US20240296829A1
公开(公告)日:2024-09-05
申请号:US18663831
申请日:2024-05-14
发明人: Travis Grizzel
CPC分类号: G10L15/01 , G06F3/017 , G10L13/00 , G10L15/18 , G10L15/187 , G10L15/24 , G10L2015/088
摘要: A system and method for associating motion data with utterance audio data for use with a speech processing system. A device, such as a wearable device, may be capable of capturing utterance audio data and sending it to a remote server for speech processing, for example for execution of a command represented in the utterance. The device may also capture motion data using motion sensors of the device. The motion data may correspond to gestures, such as head gestures, that may be interpreted by the speech processing system to determine and execute commands. The device may associate the motion data with the audio data so the remote server knows what motion data corresponds to what portion of audio data for purposes of interpreting and executing commands. Metadata sent with the audio data and/or motion data may include association data such as timestamps, session identifiers, message identifiers, etc.
-
公开(公告)号:US12080269B2
公开(公告)日:2024-09-03
申请号:US17740680
申请日:2022-05-10
IPC分类号: G10L13/00 , G10L13/033 , G10L13/047 , G10L13/08 , G10L15/22 , G10L21/0232
CPC分类号: G10L13/047 , G10L13/033 , G10L21/0232
摘要: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.
-
公开(公告)号:US20240290316A1
公开(公告)日:2024-08-29
申请号:US18656983
申请日:2024-05-07
申请人: Electronic Arts Inc.
发明人: Siddharth Gururani , Kilol Gupta , Dhaval Shah , Zahra Shakeri , Jervis Pinto , Mohsen Sardari , Navid Aghdaie , Kazi Zaman
摘要: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features. The system includes one or more modules configured to process the predicted acoustic features, the one or more modules comprising a machine-learned vocoder configured to generate a waveform of the expressive speech audio.
-
9.
公开(公告)号:US20240251126A1
公开(公告)日:2024-07-25
申请号:US18432643
申请日:2024-02-05
申请人: SATURN LICENSING LLC
发明人: Takumi TSURU
IPC分类号: H04N21/442 , G10L13/00 , G10L15/22 , G10L15/30 , H04N21/235 , H04N21/422 , H04N21/458
CPC分类号: H04N21/44218 , G10L13/00 , G10L15/22 , G10L15/30 , H04N21/235 , H04N21/42203 , H04N21/458 , G10L2015/223
摘要: The present technology relates to an information processing apparatus, information processing method, transmission apparatus, and transmission method, capable of improving the convenience of a voice AI assistance service used in cooperation with content.
The convenience of the voice AI assistance service used in cooperation with the content can be improved by providing an information processing apparatus including a control unit configured to control a timing of a voice response upon using a voice AI assistance service in cooperation with content on the basis of voice response time information indicating time suitable for the voice response to an utterance of a viewer watching the content. The present technology can be applied to a system in cooperation with a voice AI assistance service, for example.-
公开(公告)号:US12046233B2
公开(公告)日:2024-07-23
申请号:US18361408
申请日:2023-07-28
申请人: GOOGLE LLC
IPC分类号: G10L15/22 , G10L13/00 , G10L15/00 , G10L15/08 , G10L15/14 , G10L15/18 , G10L15/197 , G10L15/30
CPC分类号: G10L15/197 , G10L13/00 , G10L15/005 , G10L15/08 , G10L15/14 , G10L15/1822 , G10L15/22 , G10L15/30 , G10L2015/088 , G10L2015/223 , G10L2015/228
摘要: Determining a language for speech recognition of a spoken utterance received via an automated assistant interface for interacting with an automated assistant. Implementations can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Implementations determine a user profile that corresponds to audio data that captures a spoken utterance, and utilize language(s), and optionally corresponding probabilities, assigned to the user profile in determining a language for speech recognition of the spoken utterance. Some implementations select only a subset of languages, assigned to the user profile, to utilize in speech recognition of a given spoken utterance of the user. Some implementations perform speech recognition in each of multiple languages assigned to the user profile, and utilize criteria to select only one of the speech recognitions as appropriate for generating and providing content that is responsive to the spoken utterance.
-
-
-
-
-
-
-
-
-