-
公开(公告)号:US20240331695A1
公开(公告)日:2024-10-03
申请号:US18736263
申请日:2024-06-06
申请人: Rovi Guides, Inc.
CPC分类号: G10L15/22 , G06F16/587 , G10L15/1815 , G10L15/24 , G10L15/30 , G10L2015/088 , G10L2015/223
摘要: A method of detecting establishment of a voice communication between a first voice communication equipment and a second voice communication equipment and automating requests for content. The method includes analyzing the voice communication to identify a request for content, analyzing the voice communication to identify an affirmative response to the request for content, and correlating the request for content with a first user account and correlating the affirmative response with a second user account. In response to identifying the affirmative response and based upon at least one of the first user account or the second user account, identifying from a data storage, the requested content and causing the transmission of the requested content.
-
公开(公告)号:US20240282290A1
公开(公告)日:2024-08-22
申请号:US18570168
申请日:2022-06-01
发明人: Teresa BOTSCHEN , Stefan ULTES
CPC分类号: G10L13/027 , G06V20/59 , G10L15/1815 , G10L15/22 , G10L15/24 , G10L15/30
摘要: A method for generating speech outputs in a vehicle in response to a speech input involves recording, in addition to the speech input, additional information by at least one sensor. Afterwards an analysis of the speech input and the sensor data is performed and which is used as a basis for the speech output. At least one imaging sensor is used as the at least one sensor and it records the passenger compartment. Identified objects or people are assigned to predetermined categories. The speech output is produced based on the analysis results and is enriched with keywords or formulations matching one of the categories.
-
公开(公告)号:US12039975B2
公开(公告)日:2024-07-16
申请号:US17112512
申请日:2020-12-04
发明人: Prakash Krishnan , Arindam Mandal , Siddhartha Reddy Jonnalagadda , Nikko Strom , Ariya Rastrow , Shiv Naga Prasad Vitaladevuni , Angeliki Metallinou , Vincent Auvray , Minmin Shen , Josey Diego Sandoval , Rohit Prasad , Thomas Taylor , Amotz Maimon
IPC分类号: G10L15/22 , G06F3/16 , G06F18/24 , G06V10/40 , G06V40/10 , G06V40/20 , G10L13/08 , G10L15/02 , G10L15/06 , G10L15/08 , G10L15/20 , G10L15/24
CPC分类号: G10L15/22 , G06F3/167 , G06F18/24 , G06V10/40 , G06V40/10 , G06V40/20 , G10L13/08 , G10L15/02 , G10L15/063 , G10L15/08 , G10L15/20 , G10L15/222 , G10L15/24 , G10L2015/0635 , G10L2015/088 , G10L2015/223 , G10L2015/227
摘要: A natural language system may be configured to act as a participant in a conversation between two users. The system may determine when a user expression such as speech, a gesture, or the like is directed from one user to the other. The system may processing input data related the expression (such as audio data, input data, language processing result data, conversation context data, etc.) to determine if the system should interject a response to the user-to-user expression. If so, the system may process the input data to determine a response and output it. The system may track that response as part of the data related to the ongoing conversation.
-
公开(公告)号:US11990144B2
公开(公告)日:2024-05-21
申请号:US17387412
申请日:2021-07-28
发明人: John C. Hardwick
IPC分类号: G10L19/00 , G10L15/24 , G10L19/005 , G10L19/02 , G10L25/18
CPC分类号: G10L19/0208 , G10L15/24 , G10L19/005 , G10L25/18
摘要: Non-voice data is embedded in a voice bit stream that includes frames of voice bits by selecting a frame of voice bits to carry the non-voice data, placing non-voice identifier bits in a first portion of the voice bits in the selected frame, and placing the non-voice data in a second portion of the voice bits in the selected frame. The non-voice identifier bits are employed to reduce a perceived effect of the non-voice data on audible speech produced from the voice bit stream.
-
5.
公开(公告)号:US20240135956A1
公开(公告)日:2024-04-25
申请号:US18395253
申请日:2023-12-22
发明人: Chun WANG , Dingheng ZENG , Haiying WU , Xunyi ZHOU , Ning JIANG
CPC分类号: G10L25/57 , G06V40/165 , G06V40/168 , G10L15/24
摘要: The application provides a method and an apparatus for measuring speech-image synchronicity, and a method and an apparatus for training a model, where the method for measuring speech-image synchronicity includes: acquiring a speech segment and an image segment of a video, where there is a correspondence between the speech segment and the image segment in the video; processing the speech segment and the image segment to obtain a speech feature of the speech segment and a visual feature of the image segment; and determining, according to the speech feature of the speech segment and the visual feature of the speech segment, whether there is synchronicity between the speech segment and the image segment, where the synchronicity is used for characterizing matching between a sound in the speech segment and a movement of a target character in the image segment.
-
公开(公告)号:US11961505B2
公开(公告)日:2024-04-16
申请号:US17573026
申请日:2022-01-11
发明人: Taegu Kim
CPC分类号: G10L15/005 , G10L15/02 , G10L15/22 , G10L15/24 , G10L2015/225
摘要: Methods and devices for identifying language level are provided. A first automatic speech recognition (ASR) module is identified, from among a plurality of ASR modules, based on information on a target received at the electronic device. First voice data and first image data for the target are received. The first voice data and the first image data are converted to first text data using the first ASR module. A first language level of the target is identified based on the first text data. Data including at least one of a voice output and an image output is output based on the first language level satisfying a condition.
-
公开(公告)号:US11954904B2
公开(公告)日:2024-04-09
申请号:US17367974
申请日:2021-07-06
申请人: AVODAH, INC.
发明人: Trevor Chandler , Dallas Nash , Michael Menefee
IPC分类号: G06V10/82 , G06F3/01 , G06F3/16 , G06F40/40 , G06F40/58 , G06N3/045 , G06N3/08 , G06N20/00 , G06T3/40 , G06T7/20 , G06T7/73 , G06T17/00 , G06V10/764 , G06V40/16 , G06V40/20 , G09B21/00 , G10L15/22 , G10L15/24 , G10L15/26 , H04N23/90 , G06T3/4046 , G10L13/00
CPC分类号: G06V10/82 , G06F3/013 , G06F3/017 , G06F3/167 , G06F40/40 , G06F40/58 , G06N3/045 , G06N3/08 , G06T7/20 , G06T7/73 , G06V10/764 , G06V40/165 , G06V40/176 , G06V40/20 , G06V40/28 , G09B21/00 , G09B21/009 , G10L15/22 , G10L15/24 , G10L15/26 , H04N23/90 , G06N20/00 , G06T3/4046 , G06T17/00 , G06T2207/20084 , G10L13/00
摘要: Disclosed are methods, apparatus and systems for real-time gesture recognition. One exemplary method for the real-time identification of a gesture communicated by a subject includes receiving, by a first thread of the one or more multi-threaded processors, a first set of image frames associated with the gesture, the first set of image frames captured during a first time interval, performing, by the first thread, pose estimation on each frame of the first set of image frames including eliminating background information from each frame to obtain one or more areas of interest, storing information representative of the one or more areas of interest in a shared memory accessible to the one or more multi-threaded processors, and performing, by a second thread of the one or more multi-threaded processors, a gesture recognition operation on a second set of image frames associated with the gesture.
-
8.
公开(公告)号:US20230377376A1
公开(公告)日:2023-11-23
申请号:US18155683
申请日:2023-01-17
申请人: Avodah, Inc.
发明人: Michael Menefee , Dallas Nash , Trevor Chandler
IPC分类号: G06V40/20 , G06N3/08 , G06F40/58 , G06V40/16 , H04N23/90 , G10L15/22 , G09B21/00 , G06F40/40 , G10L15/26 , G06F3/16 , G06F3/01 , G10L15/24 , G06T7/73 , G06N3/045 , G06T7/20 , G06N20/00 , G06T3/40 , G06T17/00 , G10L13/00
CPC分类号: G06V40/28 , G06N3/08 , G06F40/58 , G06V40/176 , G06V40/165 , H04N23/90 , G10L15/22 , G09B21/009 , G06F40/40 , G10L15/26 , G06F3/167 , G06F3/013 , G09B21/00 , G10L15/24 , G06T7/73 , G06N3/045 , G06F3/017 , G06T7/20 , G06N20/00 , G06T3/4046 , G06V40/20 , G06T17/00 , G06T2207/20084 , G10L13/00
摘要: Methods, apparatus and systems for recognizing sign language movements using multiple input and output modalities. One example method includes capturing a movement associated with the sign language using a set of visual sensing devices, the set of visual sensing devices comprising multiple apertures oriented with respect to the subject to receive optical signals corresponding to the movement from multiple angles, generating digital information corresponding to the movement based on the optical signals from the multiple angles, collecting depth information corresponding to the movement in one or more planes perpendicular to an image plane captured by the set of visual sensing devices, producing a reduced set of digital information by removing at least some of the digital information based on the depth information, generating a composite digital representation by aligning at least a portion of the reduced set of digital information, and recognizing the movement based on the composite digital representation.
-
公开(公告)号:US20230368792A1
公开(公告)日:2023-11-16
申请号:US18227716
申请日:2023-07-28
申请人: NEC Corporation
发明人: Masamichi TANABE
CPC分类号: G10L15/22 , G10L15/30 , G10L15/24 , G10L15/08 , G06V40/10 , G06Q30/0613 , G10L15/063 , G10L2015/088
摘要: Provided is an information processing system including: a voice information acquisition unit that acquires voice information including an utterance made by a person; a status acquisition unit that acquires status information related to status of the person; and a support information generation unit that generates support information used for supporting operation of the person based on the voice information and the status information.
-
公开(公告)号:US11749265B2
公开(公告)日:2023-09-05
申请号:US16593939
申请日:2019-10-04
发明人: Erika Varis Doggett , Ashutosh Modi , Nathan Nocon
IPC分类号: G10L15/22 , G10L15/18 , G10L15/197 , G10L15/24 , G10L15/04
CPC分类号: G10L15/22 , G10L15/04 , G10L15/1815 , G10L15/197 , G10L15/24 , G10L2015/223
摘要: Various embodiments disclosed herein provide techniques for performing incremental natural language understanding on a natural language understanding (NLU) system. The NLU system acquires a first audio speech segment associated with a user utterance. The NLU system converts the first audio speech segment into a first text segment. The NLU system determines a first intent based on a text string associated with the first text segment, wherein the text string represents a portion of the user utterance. The NLU system generates a first response based on the first intent prior to when the user utterance completes.
-
-
-
-
-
-
-
-
-