TRAINING AND USING A TRANSCRIPT GENERATION MODEL ON A MULTI-SPEAKER AUDIO STREAM

    公开(公告)号:US20240257815A1

    公开(公告)日:2024-08-01

    申请号:US18632277

    申请日:2024-04-10

    CPC classification number: G10L17/04 G10L15/06 G10L15/26

    Abstract: The disclosure herein describes using a transcript generation model for generating a transcript from a multi-speaker audio stream. Audio data including overlapping speech of a plurality of speakers is obtained and a set of frame embeddings are generated from audio data frames of obtained audio data using an audio data encoder. A set of words and channel change (CC) symbols are generated from the set of frame embeddings using a transcript generation model. The CC symbols are included between pairs of adjacent words that are spoken by different people at the same time. The set of words and CC symbols are transformed into a plurality of transcript lines, wherein words of the set of words are sorted into transcript lines based on CC symbols, and a multi-speaker transcript is generated based on the plurality of transcript lines. The inclusion of CC symbols by the model enables efficient, accurate multi-speaker transcription.

    Description support device and description support method

    公开(公告)号:US11942086B2

    公开(公告)日:2024-03-26

    申请号:US17125295

    申请日:2020-12-17

    CPC classification number: G10L15/22 G10L15/06 G06Q30/016

    Abstract: A description support device for displaying information on a topic to be checked in an utterance by a user, the description support device includes: an inputter to acquire input information indicating an utterance sentence corresponding to the utterance; a controller to generate information indicating a check result of the topic for the utterance sentence; and a display to display information generated by the controller, wherein the display is configured to display a checklist indicating whether or not the topic is described in the utterance sentence indicated by the input information sequentially acquired by the inputter, and wherein the display is configured to display, according to a likelihood of each utterance sentence, display information including the utterance sentence, the likelihood defining the check result of the topic in the checklist.

    Hotword detection on multiple devices

    公开(公告)号:US11887603B2

    公开(公告)日:2024-01-30

    申请号:US17691698

    申请日:2022-03-10

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed, In one aspect, a method includes the actions of receiving audio data that corresponds to an utterance. The actions further include determining that the utterance likely includes a particular, predefined hotword. The actions further include transmitting (i) data indicating that the computing device likely received the particular, predefined hotword, (ii) data identifying the computing device, and (iii) data identifying a group of nearby computing devices that includes the computing device. The actions further include receiving an instruction to commence speech recognition processing on the audio data. The actions further include in response to receiving the instruction to commence speech recognition processing on the audio data, processing at least a portion of the audio data using an automated speech recognizer on the computing device.

    DETECTION OF LIVE SPEECH
    7.
    发明公开

    公开(公告)号:US20230290335A1

    公开(公告)日:2023-09-14

    申请号:US18318269

    申请日:2023-05-16

    CPC classification number: G10L15/06 G10L19/26 G10L25/78 G10L2025/937

    Abstract: A method of detecting live speech comprises: receiving a signal containing speech; obtaining a first component of the received signal in a first frequency band, wherein the first frequency band includes audio frequencies; and obtaining a second component of the received signal in a second frequency band higher than the first frequency band. Then, modulation of the first component of the received signal is detected; modulation of the second component of the received signal is detected; and the modulation of the first component of the received signal and the modulation of the second component of the received signal are compared. It may then be determined that the speech may not be live speech, if the modulation of the first component of the received signal differs from the modulation of the second component of the received signal.

    AUDIO DATA IDENTIFICATION APPARATUS
    9.
    发明公开

    公开(公告)号:US20230178096A1

    公开(公告)日:2023-06-08

    申请号:US17911078

    申请日:2021-02-26

    Applicant: COCHL,INC.

    CPC classification number: G10L25/51 G10L15/06 G10L25/30

    Abstract: Proposed is an audio data identification apparatus for collecting random audio data and identifying an audio resource obtained by exacting any one section of the collected audio data. The audio data identification apparatus includes: a communication unit that collects and transmits the random audio data; and a control unit that identifies the collected audio data. The control unit includes: a parsing unit that parses the collected audio data into predetermined units; an extraction unit that selects, as the audio resource, any one of a plurality of parsed sections of the audio data; a matching unit that matches identification information of the audio resource via a pre-loaded artificial intelligence algorithm; and a verification unit that verifies the identification information matched to the audio resource.

    Method of providing voice command and electronic device supporting the same

    公开(公告)号:US11664027B2

    公开(公告)日:2023-05-30

    申请号:US17459327

    申请日:2021-08-27

    Abstract: Disclosed is a portable communication device, including a display, at least one microphone, a memory, and a processor operably connected to the display, the at least one microphone and the memory, wherein the processor is configured to display guide information, via the display, in response to a user input, the guide information including a first display object related to guide a user voice input for generation of a new voice command and a second display object related to at least one application executed by the new voice command via the portable communication device, receive audio data corresponding to the first display object from a user through the at least one microphone, generate the new voice command corresponding to the audio data, and store, in the memory, the new voice command corresponding to the received audio data and mapping information indicating that the new voice command and the at least one application are mapped.

Patent Agency Ranking