-
公开(公告)号:US20250069601A1
公开(公告)日:2025-02-27
申请号:US18943527
申请日:2024-11-11
Applicant: Ultratec, Inc.
Inventor: Robert M. Engelke , Kevin R. Colwell , Christopher Engelke
Abstract: A method to transcribe communications includes the steps of obtaining a plurality of hypothesis transcriptions of a voice signal generated by a speech recognition system, determining consistent words that are included in at least first and second of the plurality of hypothesis transcriptions, in response to determining the consistent words, providing the consistent words to a device for presentation of the consistent words to an assisted user, and presenting the consistent words via a display screen on the device, wherein a rate of the presentation of the words on the display screen is variable.
-
公开(公告)号:US20250069377A1
公开(公告)日:2025-02-27
申请号:US18727106
申请日:2023-01-25
Applicant: Sony Group Corporation
Inventor: Junji OTSUKA , Atsushi IRIE , Masakazu YOSHIMURA
IPC: G06V10/776 , G06V10/774 , G06V10/98 , G06V20/70 , G10L15/01 , G10L15/06
Abstract: An information processing apparatus according to an embodiment of the present technology includes a generation unit, an evaluation unit, and an update unit. The generation unit generates input data on the basis of a predetermined parameter. The evaluation unit generates evaluation data on the basis of first output data that includes evaluation target data and is output by inputting first input data generated by the generation unit to a first recognition model, and second output data that includes a pseudo label as a pseudo correct answer of the evaluation target data and is output by inputting second input data generated by the generation unit to a second recognition model. The update unit updates the predetermined parameter on the basis of the evaluation data.
-
3.
公开(公告)号:US12211487B1
公开(公告)日:2025-01-28
申请号:US18416084
申请日:2024-01-18
Applicant: Morgan Stanley Services Group Inc.
Inventor: Aratrika Sarkar , Ayyapparaj Radhakrishnan Ganesan , Mayank Jain , Mehak Mehta
IPC: G10L15/06 , G09B21/00 , G10L15/01 , G10L15/18 , G10L15/22 , G10L15/30 , G10L25/18 , G10L25/78 , G10L25/84
Abstract: A system and method for creating accessibility of any website or application for people with sight, hearing or speech disabilities. The system and method can include receiving input of the website or the application to be accessed and an indicator as to specific disabilities a user, scoring the website or the application for its accessibility based on the specific disabilities of the user, and if the score is below a threshold, determining an alternative form for the input of the website or the application to accommodate the specific disabilities of the user.
-
公开(公告)号:US20250006187A1
公开(公告)日:2025-01-02
申请号:US18885132
申请日:2024-09-13
Inventor: Hongtao ZOU , Si CHEN
IPC: G10L15/183 , G10L15/01 , G10L15/06
Abstract: The present disclosure provides a method and apparatus for transcribing audio, relates to the field of artificial intelligence technology. A specific embodiment of the method includes: receiving audio information uploaded through a scenario entry of a storage service application installed on a client; determining, based on the scenario entry, a scenario type of the audio information; performing speech recognition on the audio information to obtain text information corresponding to the audio information; and inputting the text information and a prompt corresponding to the scenario type into a language model to obtain summary information, where the language model is obtained by performing supervised fine-tuning on a pre-trained model using samples corresponding to various scenario types, and the prompts corresponding to the various scenario types are obtained by tuning initial prompts corresponding to the various scenario types using the language model.
-
公开(公告)号:US12148417B1
公开(公告)日:2024-11-19
申请号:US17354215
申请日:2021-06-22
Applicant: Amazon Technologies, Inc.
Inventor: Aidan Thomas Cardella , Anand Victor , Vipin Gupta , Zheng Du , John Rajiv Malik , Li Erran Li , Jarrett Alegre Bato , Peng Yang , Alejandro Ricardo Mottini D'Oliveira
Abstract: Devices and techniques are generally described for confidence score generation for label generation. In some examples, first data may be received from a first computing device. In various further examples, first label data classifying at least one aspect of the first data may be received. First metadata associated with how the first label data was generated may be received. In some cases, the first label data may be generated by a first user. In various examples, a first machine learning model may generate a first confidence score associated with the first label data based at least in part on the first data and second data related to label generation by the first person. In various examples, output data comprising the first confidence score may be sent to the first computing device.
-
公开(公告)号:US12112752B1
公开(公告)日:2024-10-08
申请号:US17688279
申请日:2022-03-07
Applicant: Amazon Technologies, Inc.
Inventor: Rahul Gupta , Jwala Dhamala , Apurv Verma , Qingwen Ye , Mayur Himmatbhai Dabhi , Srinivasan Rengarajan Veeravanallur , Spyridon Matsoukas , Melanie C B Gens , Seyed Omid Razavi , Avni Khatri , Premkumar Natarajan
CPC classification number: G10L15/22 , G10L15/01 , G10L15/063 , G10L15/08 , G10L2015/0631 , G10L2015/223
Abstract: Devices and techniques are generally described for cohort determination in natural language processing. In various examples, a first natural language input to a natural language processing system may be determined. The first natural language input may be associated with a first account identifier. A first machine learning model may determine first data representing one or more words of the first natural language input. A second machine learning model may determine second data representing one or more acoustic characteristics of the first natural language input. Third data may be determined, the third data including a predicted performance for processing the first natural language input by the natural language processing system. The third data may be determined based on the first data representation and the second data representation.
-
公开(公告)号:US12112745B2
公开(公告)日:2024-10-08
申请号:US17292116
申请日:2019-09-09
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Jisun Park , Minjin Rho
CPC classification number: G10L15/22 , G10L15/01 , G10L2015/223
Abstract: An electronic device is disclosed. The present electronic device comprises: a voice receiving unit; and a processor, wherein the processor: when a user's voice is received through the voice receiving unit, determines an accumulation level of utterance history information corresponding to the characteristics of the user's voice; when the accumulation level of utterance history information is below a predetermined threshold level, provides response information corresponding to the user's voice on the basis of user information related to the characteristics of the user's voice; and when the accumulation level of utterance history information is equal to or higher than the predetermined threshold level, provides response information corresponding to the user's voice on the basis of the user information and the utterance history information.
-
公开(公告)号:US12112740B2
公开(公告)日:2024-10-08
申请号:US17545815
申请日:2021-12-08
Applicant: SOCIETE BIC
Inventor: David Duffy , Bernadette Elliott-Bowman
IPC: G10L15/01 , G10L13/027 , G10L15/02 , G10L15/16 , G10L15/26
CPC classification number: G10L15/01 , G10L13/027 , G10L15/02 , G10L15/16 , G10L15/26 , G10L2015/025 , G10L2015/027
Abstract: A computer-implemented method for measuring cognitive load of a user creating a creative work in a creative work system, may include generating at least one verbal statement capable of provoking at least one verbal response from the user, prompting the user to vocally interact with the creative work system by vocalizing the at least one generated verbal statement to the user via an audio interface of the creative work system, and obtaining the at least one verbal response from the user via the audio interface, and determining the cognitive load of the user based on the at least one verbal response obtained from the user, wherein generating the at least one verbal statement is based on at least one predicted verbal response suitable for determining the cognitive load of the user.
-
公开(公告)号:US12087276B1
公开(公告)日:2024-09-10
申请号:US17155825
申请日:2021-01-22
Applicant: Cisco Technology, Inc.
Abstract: A plurality of audio datasets associated with captured audio are provided to a plurality of automatic speech recognition engines, wherein each of the automatic speech recognition engines is configured to recognize speech of a first language. Word error rate estimates that comprise at least one word error rate estimate for each of the plurality of audio datasets are determined from outputs of the plurality of automatic speech recognition engines. From the word error rate estimates, audio in the plurality of audio datasets is determined to include speech in a second language.
-
公开(公告)号:US20240296829A1
公开(公告)日:2024-09-05
申请号:US18663831
申请日:2024-05-14
Applicant: Amazon Technologies, Inc.
Inventor: Travis Grizzel
CPC classification number: G10L15/01 , G06F3/017 , G10L13/00 , G10L15/18 , G10L15/187 , G10L15/24 , G10L2015/088
Abstract: A system and method for associating motion data with utterance audio data for use with a speech processing system. A device, such as a wearable device, may be capable of capturing utterance audio data and sending it to a remote server for speech processing, for example for execution of a command represented in the utterance. The device may also capture motion data using motion sensors of the device. The motion data may correspond to gestures, such as head gestures, that may be interpreted by the speech processing system to determine and execute commands. The device may associate the motion data with the audio data so the remote server knows what motion data corresponds to what portion of audio data for purposes of interpreting and executing commands. Metadata sent with the audio data and/or motion data may include association data such as timestamps, session identifiers, message identifiers, etc.
-
-
-
-
-
-
-
-
-