-
公开(公告)号:US20240257815A1
公开(公告)日:2024-08-01
申请号:US18632277
申请日:2024-04-10
Applicant: Microsoft Technology Licensing, LLC
Inventor: Naoyuki KANDA , Takuya YOSHIOKA , Zhuo CHEN , Jinyu LI , Yashesh GAUR , Zhong MENG , Xiaofei WANG , Xiong XIAO
Abstract: The disclosure herein describes using a transcript generation model for generating a transcript from a multi-speaker audio stream. Audio data including overlapping speech of a plurality of speakers is obtained and a set of frame embeddings are generated from audio data frames of obtained audio data using an audio data encoder. A set of words and channel change (CC) symbols are generated from the set of frame embeddings using a transcript generation model. The CC symbols are included between pairs of adjacent words that are spoken by different people at the same time. The set of words and CC symbols are transformed into a plurality of transcript lines, wherein words of the set of words are sorted into transcript lines based on CC symbols, and a multi-speaker transcript is generated based on the plurality of transcript lines. The inclusion of CC symbols by the model enables efficient, accurate multi-speaker transcription.
-
公开(公告)号:US12026753B2
公开(公告)日:2024-07-02
申请号:US17308624
申请日:2021-05-05
Applicant: Google LLC
Inventor: Petar Aleksic , Pedro J. Moreno Mengibar
IPC: G06Q30/00 , G06Q30/0251 , G06Q30/0273 , G10L13/00 , G10L15/01 , G10L15/06 , G10L15/18 , G10L15/08 , G10L15/187 , G10L15/26
CPC classification number: G06Q30/0275 , G06Q30/0256 , G10L13/00 , G10L15/01 , G10L15/06 , G10L15/18 , G10L2015/088 , G10L15/187 , G10L15/26
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition are disclosed. In one aspect, a method includes receiving a candidate adword from an advertiser. The method further includes generating a score for the candidate adword based on a likelihood of a speech recognizer generating, based on an utterance of the candidate adword, a transcription that includes a word that is associated with an expected pronunciation of the candidate adword. The method further includes classifying, based at least on the score, the candidate adword as an appropriate adword for use in a bidding process for advertisements that are selected based on a transcription of a speech query or as not an appropriate adword for use in the bidding process for advertisements that are selected based on the transcription of the speech query.
-
公开(公告)号:US11942086B2
公开(公告)日:2024-03-26
申请号:US17125295
申请日:2020-12-17
Inventor: Natsuki Saeki , Shoichi Araki , Masakatsu Hoshimi , Takahiro Kamai
IPC: G10L15/22 , G06Q30/016 , G10L15/06
CPC classification number: G10L15/22 , G10L15/06 , G06Q30/016
Abstract: A description support device for displaying information on a topic to be checked in an utterance by a user, the description support device includes: an inputter to acquire input information indicating an utterance sentence corresponding to the utterance; a controller to generate information indicating a check result of the topic for the utterance sentence; and a display to display information generated by the controller, wherein the display is configured to display a checklist indicating whether or not the topic is described in the utterance sentence indicated by the input information sequentially acquired by the inputter, and wherein the display is configured to display, according to a likelihood of each utterance sentence, display information including the utterance sentence, the likelihood defining the check result of the topic in the checklist.
-
公开(公告)号:US11887603B2
公开(公告)日:2024-01-30
申请号:US17691698
申请日:2022-03-10
Applicant: GOOGLE LLC
CPC classification number: G10L15/30 , G10L15/22 , G10L25/78 , H04L67/10 , G10L15/06 , G10L2015/088 , G10L2015/223 , H05K999/99
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed, In one aspect, a method includes the actions of receiving audio data that corresponds to an utterance. The actions further include determining that the utterance likely includes a particular, predefined hotword. The actions further include transmitting (i) data indicating that the computing device likely received the particular, predefined hotword, (ii) data identifying the computing device, and (iii) data identifying a group of nearby computing devices that includes the computing device. The actions further include receiving an instruction to commence speech recognition processing on the audio data. The actions further include in response to receiving the instruction to commence speech recognition processing on the audio data, processing at least a portion of the audio data using an automated speech recognizer on the computing device.
-
公开(公告)号:US11862166B2
公开(公告)日:2024-01-02
申请号:US17961848
申请日:2022-10-07
Applicant: Samsung Electronics Co., Ltd.
Inventor: Nam-yeong Kwon , Kyung-mi Park
IPC: G10L15/22 , G10L15/06 , H04N21/422 , H04N21/439 , H04N21/482 , G06F3/16 , G10L15/02 , G10L15/10 , G10L15/187 , G10L15/08
CPC classification number: G10L15/22 , G06F3/167 , G10L15/02 , G10L15/06 , G10L15/10 , H04N21/42203 , H04N21/4394 , H04N21/482 , G10L15/187 , G10L2015/0638 , G10L2015/088 , G10L2015/221 , G10L2015/223 , G10L2015/225
Abstract: A display apparatus includes an input unit configured to receive a user command; an output unit configured to output a registration suitability determination result for the user command; and a processor configured to generate phonetic symbols for the user command, analyze the generated phonetic symbols to determine registration suitability for the user command, and control the output unit to output the registration suitability determination result for the user command. Therefore, the display apparatus may register a user command which is resistant to misrecognition and guarantees high recognition rate among user commands defined by a user.
-
公开(公告)号:US11776533B2
公开(公告)日:2023-10-03
申请号:US17225997
申请日:2021-04-08
Applicant: SoundHound, Inc.
Inventor: Bernard Mont-Reynaud , Seyed M. Emami , Chris Wilson , Keyvan Mohajer
CPC classification number: G10L15/18 , G06F8/31 , G06F40/205 , G10L15/06 , G10L15/22 , H04M3/4938
Abstract: A method of building a natural language understanding application is provided. The method includes receiving at least one electronic record containing programming code and creating executable code from the programming code. Further, the executable code, when executed by a processor, causes the processor to create a parse and an interpretation of a sequence of input tokens, the programming code includes an interpret-block and the interpret-block includes an interpret-statement. Additionally, the interpret-statement includes a pattern expression and the interpret-statement includes an action statement.
-
公开(公告)号:US20230290335A1
公开(公告)日:2023-09-14
申请号:US18318269
申请日:2023-05-16
Inventor: John Paul LESSO , Toru IDO
CPC classification number: G10L15/06 , G10L19/26 , G10L25/78 , G10L2025/937
Abstract: A method of detecting live speech comprises: receiving a signal containing speech; obtaining a first component of the received signal in a first frequency band, wherein the first frequency band includes audio frequencies; and obtaining a second component of the received signal in a second frequency band higher than the first frequency band. Then, modulation of the first component of the received signal is detected; modulation of the second component of the received signal is detected; and the modulation of the first component of the received signal and the modulation of the second component of the received signal are compared. It may then be determined that the speech may not be live speech, if the modulation of the first component of the received signal differs from the modulation of the second component of the received signal.
-
公开(公告)号:US11741970B2
公开(公告)日:2023-08-29
申请号:US17570246
申请日:2022-01-06
Applicant: Google LLC
Inventor: Andrew E. Rubin , Johan Schalkwyk , Maria Carolina Parada San Martin
CPC classification number: G10L17/24 , G06F21/32 , G06F21/46 , G10L15/06 , G10L15/08 , G10L15/22 , G10L25/51 , G10L2015/0638 , G10L2015/088 , G10L2015/225
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining hotword suitability. In one aspect, a method includes receiving speech data that encodes a candidate hotword spoken by a user, evaluating the speech data or a transcription of the candidate hotword, using one or more predetermined criteria, generating a hotword suitability score for the candidate hotword based on evaluating the speech data or a transcription of the candidate hotword, using one or more predetermined criteria, and providing a representation of the hotword suitability score for display to the user.
-
公开(公告)号:US20230178096A1
公开(公告)日:2023-06-08
申请号:US17911078
申请日:2021-02-26
Applicant: COCHL,INC.
Inventor: Ilyoung JEONG , Hyungui LIM , Yoonchang HAN , Subin LEE , Jeongsoo PARK , Donmoon LEE
Abstract: Proposed is an audio data identification apparatus for collecting random audio data and identifying an audio resource obtained by exacting any one section of the collected audio data. The audio data identification apparatus includes: a communication unit that collects and transmits the random audio data; and a control unit that identifies the collected audio data. The control unit includes: a parsing unit that parses the collected audio data into predetermined units; an extraction unit that selects, as the audio resource, any one of a plurality of parsed sections of the audio data; a matching unit that matches identification information of the audio resource via a pre-loaded artificial intelligence algorithm; and a verification unit that verifies the identification information matched to the audio resource.
-
公开(公告)号:US11664027B2
公开(公告)日:2023-05-30
申请号:US17459327
申请日:2021-08-27
Applicant: Samsung Electronics Co., Ltd.
Inventor: Chakladar Subhojit , Sang Hoon Lee , Ji Min Lee
IPC: G10L15/22 , G10L15/065 , G10L15/06 , G10L15/30 , G10L15/183
CPC classification number: G10L15/22 , G10L15/06 , G10L15/065 , G10L15/183 , G10L15/30 , G10L2015/223
Abstract: Disclosed is a portable communication device, including a display, at least one microphone, a memory, and a processor operably connected to the display, the at least one microphone and the memory, wherein the processor is configured to display guide information, via the display, in response to a user input, the guide information including a first display object related to guide a user voice input for generation of a new voice command and a second display object related to at least one application executed by the new voice command via the portable communication device, receive audio data corresponding to the first display object from a user through the at least one microphone, generate the new voice command corresponding to the audio data, and store, in the memory, the new voice command corresponding to the received audio data and mapping information indicating that the new voice command and the at least one application are mapped.
-
-
-
-
-
-
-
-
-