专利检索 ipc:G10L15/24 第 1 页

1.

发明公开
SYSTEMS AND METHODS FOR AUTOMATING VOICE COMMANDS 审中-公开

公开(公告)号：US20240331695A1

公开(公告)日：2024-10-03

申请号：US18736263

申请日：2024-06-06

申请人： Rovi Guides, Inc.

发明人： DurgaPrasad Pulicharla , Madhusudhan Srinivasan

IPC分类号： G10L15/22 , G06F16/587 , G10L15/08 , G10L15/18 , G10L15/24 , G10L15/30

CPC分类号： G10L15/22 , G06F16/587 , G10L15/1815 , G10L15/24 , G10L15/30 , G10L2015/088 , G10L2015/223

摘要： A method of detecting establishment of a voice communication between a first voice communication equipment and a second voice communication equipment and automating requests for content. The method includes analyzing the voice communication to identify a request for content, analyzing the voice communication to identify an affirmative response to the request for content, and correlating the request for content with a first user account and correlating the affirmative response with a second user account. In response to identifying the affirmative response and based upon at least one of the first user account or the second user account, identifying from a data storage, the requested content and causing the transmission of the requested content.

2.

发明公开
METHOD AND APPARATUS FOR GENERATING SPEECH OUTPUTS IN A VEHICLE 审中-公开

公开(公告)号：US20240282290A1

公开(公告)日：2024-08-22

申请号：US18570168

申请日：2022-06-01

申请人： MERCEDES-BENZ GROUP AG

发明人： Teresa BOTSCHEN , Stefan ULTES

IPC分类号： G10L13/027 , G06V20/59 , G10L15/18 , G10L15/22 , G10L15/24 , G10L15/30

CPC分类号： G10L13/027 , G06V20/59 , G10L15/1815 , G10L15/22 , G10L15/24 , G10L15/30

摘要： A method for generating speech outputs in a vehicle in response to a speech input involves recording, in addition to the speech input, additional information by at least one sensor. Afterwards an analysis of the speech input and the sensor data is performed and which is used as a basis for the speech output. At least one imaging sensor is used as the at least one sensor and it records the passenger compartment. Identified objects or people are assigned to predetermined categories. The speech output is produced based on the analysis results and is enriched with keywords or formulations matching one of the categories.

3.

发明授权
Dialog management for multiple users 有权

公开(公告)号：US12039975B2

公开(公告)日：2024-07-16

申请号：US17112512

申请日：2020-12-04

申请人： Amazon Technologies, Inc.

发明人： Prakash Krishnan , Arindam Mandal , Siddhartha Reddy Jonnalagadda , Nikko Strom , Ariya Rastrow , Shiv Naga Prasad Vitaladevuni , Angeliki Metallinou , Vincent Auvray , Minmin Shen , Josey Diego Sandoval , Rohit Prasad , Thomas Taylor , Amotz Maimon

IPC分类号： G10L15/22 , G06F3/16 , G06F18/24 , G06V10/40 , G06V40/10 , G06V40/20 , G10L13/08 , G10L15/02 , G10L15/06 , G10L15/08 , G10L15/20 , G10L15/24

CPC分类号： G10L15/22 , G06F3/167 , G06F18/24 , G06V10/40 , G06V40/10 , G06V40/20 , G10L13/08 , G10L15/02 , G10L15/063 , G10L15/08 , G10L15/20 , G10L15/222 , G10L15/24 , G10L2015/0635 , G10L2015/088 , G10L2015/223 , G10L2015/227

摘要： A natural language system may be configured to act as a participant in a conversation between two users. The system may determine when a user expression such as speech, a gesture, or the like is directed from one user to the other. The system may processing input data related the expression (such as audio data, input data, language processing result data, conversation context data, etc.) to determine if the system should interject a response to the user-to-user expression. If so, the system may process the input data to determine a response and output it. The system may track that response as part of the data related to the ongoing conversation.

4.

发明授权
Reducing perceived effects of non-voice data in digital speech 有权

公开(公告)号：US11990144B2

公开(公告)日：2024-05-21

申请号：US17387412

申请日：2021-07-28

申请人： Digital Voice Systems, Inc.

发明人： John C. Hardwick

IPC分类号： G10L19/00 , G10L15/24 , G10L19/005 , G10L19/02 , G10L25/18

CPC分类号： G10L19/0208 , G10L15/24 , G10L19/005 , G10L25/18

摘要： Non-voice data is embedded in a voice bit stream that includes frames of voice bits by selecting a frame of voice bits to carry the non-voice data, placing non-voice identifier bits in a first portion of the voice bits in the selected frame, and placing the non-voice data in a second portion of the voice bits in the selected frame. The non-voice identifier bits are employed to reduce a perceived effect of the non-voice data on audible speech produced from the voice bit stream.

5.

发明公开
METHOD AND APPARATUS FOR MEASURING SPEECH-IMAGE SYNCHRONICITY, AND METHOD AND APPARATUS FOR TRAINING MODEL 审中-公开

公开(公告)号：US20240135956A1

公开(公告)日：2024-04-25

申请号：US18395253

申请日：2023-12-22

申请人： MaShang Consumer Finance Co., Ltd.

发明人： Chun WANG , Dingheng ZENG , Haiying WU , Xunyi ZHOU , Ning JIANG

IPC分类号： G10L25/57 , G06V40/16 , G10L15/24

CPC分类号： G10L25/57 , G06V40/165 , G06V40/168 , G10L15/24

摘要： The application provides a method and an apparatus for measuring speech-image synchronicity, and a method and an apparatus for training a model, where the method for measuring speech-image synchronicity includes: acquiring a speech segment and an image segment of a video, where there is a correspondence between the speech segment and the image segment in the video; processing the speech segment and the image segment to obtain a speech feature of the speech segment and a visual feature of the image segment; and determining, according to the speech feature of the speech segment and the visual feature of the speech segment, whether there is synchronicity between the speech segment and the image segment, where the synchronicity is used for characterizing matching between a sound in the speech segment and a movement of a target character in the image segment.

6.

发明授权
Electronic device and method for identifying language level of target 有权

公开(公告)号：US11961505B2

公开(公告)日：2024-04-16

申请号：US17573026

申请日：2022-01-11

申请人： Samsung Electronics Co., Ltd.

发明人： Taegu Kim

IPC分类号： G10L15/00 , G10L15/02 , G10L15/22 , G10L15/24

CPC分类号： G10L15/005 , G10L15/02 , G10L15/22 , G10L15/24 , G10L2015/225

摘要： Methods and devices for identifying language level are provided. A first automatic speech recognition (ASR) module is identified, from among a plurality of ASR modules, based on information on a target received at the electronic device. First voice data and first image data for the target are received. The first voice data and the first image data are converted to first text data using the first ASR module. A first language level of the target is identified based on the first text data. Data including at least one of a voice output and an image output is output based on the first language level satisfying a condition.

7.

发明授权
Real-time gesture recognition method and apparatus 有权

公开(公告)号：US11954904B2

公开(公告)日：2024-04-09

申请号：US17367974

申请日：2021-07-06

申请人： AVODAH, INC.

发明人： Trevor Chandler , Dallas Nash , Michael Menefee

IPC分类号： G06V10/82 , G06F3/01 , G06F3/16 , G06F40/40 , G06F40/58 , G06N3/045 , G06N3/08 , G06N20/00 , G06T3/40 , G06T7/20 , G06T7/73 , G06T17/00 , G06V10/764 , G06V40/16 , G06V40/20 , G09B21/00 , G10L15/22 , G10L15/24 , G10L15/26 , H04N23/90 , G06T3/4046 , G10L13/00

CPC分类号： G06V10/82 , G06F3/013 , G06F3/017 , G06F3/167 , G06F40/40 , G06F40/58 , G06N3/045 , G06N3/08 , G06T7/20 , G06T7/73 , G06V10/764 , G06V40/165 , G06V40/176 , G06V40/20 , G06V40/28 , G09B21/00 , G09B21/009 , G10L15/22 , G10L15/24 , G10L15/26 , H04N23/90 , G06N20/00 , G06T3/4046 , G06T17/00 , G06T2207/20084 , G10L13/00

摘要： Disclosed are methods, apparatus and systems for real-time gesture recognition. One exemplary method for the real-time identification of a gesture communicated by a subject includes receiving, by a first thread of the one or more multi-threaded processors, a first set of image frames associated with the gesture, the first set of image frames captured during a first time interval, performing, by the first thread, pose estimation on each frame of the first set of image frames including eliminating background information from each frame to obtain one or more areas of interest, storing information representative of the one or more areas of interest in a shared memory accessible to the one or more multi-threaded processors, and performing, by a second thread of the one or more multi-threaded processors, a gesture recognition operation on a second set of image frames associated with the gesture.

8.

发明公开
AUTOMATED SIGN LANGUAGE TRANSLATION AND COMMUNICATION USING MULTIPLE INPUT AND OUTPUT MODALITIES 审中-公开

公开(公告)号：US20230377376A1

公开(公告)日：2023-11-23

申请号：US18155683

申请日：2023-01-17

申请人： Avodah, Inc.

发明人： Michael Menefee , Dallas Nash , Trevor Chandler

IPC分类号： G06V40/20 , G06N3/08 , G06F40/58 , G06V40/16 , H04N23/90 , G10L15/22 , G09B21/00 , G06F40/40 , G10L15/26 , G06F3/16 , G06F3/01 , G10L15/24 , G06T7/73 , G06N3/045 , G06T7/20 , G06N20/00 , G06T3/40 , G06T17/00 , G10L13/00

CPC分类号： G06V40/28 , G06N3/08 , G06F40/58 , G06V40/176 , G06V40/165 , H04N23/90 , G10L15/22 , G09B21/009 , G06F40/40 , G10L15/26 , G06F3/167 , G06F3/013 , G09B21/00 , G10L15/24 , G06T7/73 , G06N3/045 , G06F3/017 , G06T7/20 , G06N20/00 , G06T3/4046 , G06V40/20 , G06T17/00 , G06T2207/20084 , G10L13/00

摘要： Methods, apparatus and systems for recognizing sign language movements using multiple input and output modalities. One example method includes capturing a movement associated with the sign language using a set of visual sensing devices, the set of visual sensing devices comprising multiple apertures oriented with respect to the subject to receive optical signals corresponding to the movement from multiple angles, generating digital information corresponding to the movement based on the optical signals from the multiple angles, collecting depth information corresponding to the movement in one or more planes perpendicular to an image plane captured by the set of visual sensing devices, producing a reduced set of digital information by removing at least some of the digital information based on the depth information, generating a composite digital representation by aligning at least a portion of the reduced set of digital information, and recognizing the movement based on the composite digital representation.

9.

发明公开
INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM 审中-公开

公开(公告)号：US20230368792A1

公开(公告)日：2023-11-16

申请号：US18227716

申请日：2023-07-28

申请人： NEC Corporation

发明人： Masamichi TANABE

IPC分类号： G10L15/22 , G10L15/30 , G10L15/24 , G10L15/08 , G06V40/10 , G06Q30/0601 , G10L15/06

CPC分类号： G10L15/22 , G10L15/30 , G10L15/24 , G10L15/08 , G06V40/10 , G06Q30/0613 , G10L15/063 , G10L2015/088

摘要： Provided is an information processing system including: a voice information acquisition unit that acquires voice information including an utterance made by a person; a status acquisition unit that acquires status information related to status of the person; and a support information generation unit that generates support information used for supporting operation of the person based on the voice information and the status information.

10.

发明授权
Techniques for incremental computer-based natural language understanding 有权

公开(公告)号：US11749265B2

公开(公告)日：2023-09-05

申请号：US16593939

申请日：2019-10-04

申请人： DISNEY ENTERPRISES, INC.

发明人： Erika Varis Doggett , Ashutosh Modi , Nathan Nocon

IPC分类号： G10L15/22 , G10L15/18 , G10L15/197 , G10L15/24 , G10L15/04

CPC分类号： G10L15/22 , G10L15/04 , G10L15/1815 , G10L15/197 , G10L15/24 , G10L2015/223

摘要： Various embodiments disclosed herein provide techniques for performing incremental natural language understanding on a natural language understanding (NLU) system. The NLU system acquires a first audio speech segment associated with a user utterance. The NLU system converts the first audio speech segment into a first text segment. The NLU system determines a first intent based on a text string associated with the first text segment, wherein the text string represents a portion of the user utterance. The NLU system generates a first response based on the first intent prior to when the user utterance completes.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类