METHOD AND SYSTEM FOR ENRICHING DIGITAL CONTENT REPRESENTATIVE OF A CONVERSATION

    公开(公告)号:US20220254336A1

    公开(公告)日:2022-08-11

    申请号:US17597548

    申请日:2020-08-12

    Abstract: The method (3000) of enriching digital content representative of a conversation comprises: in an iterative manner: a step (3005) of capturing an audio signal representative of a voice message, a step (3010) of segmenting the voice message into a segment, said segmentation step comprising a silence detection step, the segment being obtained as a function of the detection of a silence a step (3015) of converting the audio segment into text, called “contribution”, and a step (3020) of storing, in a memory, a contribution, then: a step (3025) of detecting user sentiment towards at least one stored contribution a step (3030) of associating, in a memory and in relation to at least one stored contribution, at least one attribute corresponding to at least one detected sentiment and a step (3035) of displaying at least one stored contribution and at least one attribute with respect to said at least one contribution.

    Voice recognition system
    63.
    发明授权

    公开(公告)号:US11410660B2

    公开(公告)日:2022-08-09

    申请号:US16837250

    申请日:2020-04-01

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.

    Dual use of acoustic model in speech-to-text framework

    公开(公告)号:US11373655B2

    公开(公告)日:2022-06-28

    申请号:US17498811

    申请日:2021-10-12

    Abstract: An apparatus includes processor(s) to: perform preprocessing operations of a segmentation technique including divide speech data set into data chunks representing chunks of speech audio, use an acoustic model with each data chunk to identify pauses in the speech audio, and analyze a length of time of each identified pause to identify a candidate set of likely sentence pauses in the speech audio; and perform speech-to-text operations including divide the speech data set into data segments that each representing segments of the speech audio based on the candidate set of likely sentence pauses, use the acoustic model with each data segment to identify likely speech sounds in the speech audio, analyze the identified likely speech sounds to identify candidate sets of words likely spoken in the speech audio, and generate a transcript of the speech data set based at least on the candidate sets of words likely spoken.

    USER SPEECH PROFILE MANAGEMENT
    66.
    发明申请

    公开(公告)号:US20220180859A1

    公开(公告)日:2022-06-09

    申请号:US17115158

    申请日:2020-12-08

    Abstract: A device includes processors configured to determine, in a first power mode, whether an audio stream corresponds to speech of at least two talkers. The processors are configured to, based on determining that the audio stream corresponds to speech of at least two talkers, analyze, in a second power mode, audio feature data of the audio stream to generate a segmentation result. The processors are configured to perform a comparison of a plurality of user speech profiles to an audio feature data set of a plurality of audio feature data sets of a talker-homogenous audio segment to determine whether the audio feature data set matches any of the user speech profiles. The processors are configured to, based on determining that the audio feature data set does not match any of the plurality of user speech profiles, generate a user speech profile based on the plurality of audio feature data sets.

    EXAMPLE-BASED VOICE BOT DEVELOPMENT TECHNIQUES

    公开(公告)号:US20220180857A1

    公开(公告)日:2022-06-09

    申请号:US17112418

    申请日:2020-12-04

    Applicant: Google LLC

    Abstract: Implementations are directed to providing a voice bot development platform that enables a third-party developer to train a voice bot based on training instance(s). The training instance(s) can each include training input and training output. The training input can include a portion of a corresponding conversation and a prior context of the corresponding conversation. The training output can include a corresponding ground truth response to the portion of the corresponding conversation. Subsequent to training, the voice bot can be deployed for conducting conversations on behalf of a third-party. In some implementations, the voice bot is further trained based on a corresponding feature emphasis input that attentions the voice bot to a particular feature of the portion of the corresponding conversation. In some additional or alternative implementations, the voice bot is further trained to interact with third-party system(s) via remote procedure calls (RPCs).

    Audio device and computer readable program

    公开(公告)号:US11315561B2

    公开(公告)日:2022-04-26

    申请号:US16638540

    申请日:2018-03-14

    Inventor: Yukihiro Yoshida

    Abstract: [Problem] To provide an audio device having a voice operation receiving function with which the state of a voice recognition process can be notified in detail without affecting an audio playback environment, and which is inexpensive and has an excellent degree of freedom in design. [Solution] A wireless speaker 1 has a voice operation receiving function that receives an operation by a voice input into a microphone 11. The wireless speaker comprises: an LED 12; an LED control unit 18 that subjects the LED 12 to PWM control; and a lighting pattern storage unit 17 that stores a lighting pattern in which the brightness is changed on a time axis for each state of a voice recognition process. The LED control unit 18 subjects the LED 12 to PWM control in accordance with the lighting pattern stored in the lighting pattern storage unit 17 corresponding to the state of the voice recognition process performed on the voice input into the microphone 11.

    SYSTEM AND METHOD OF TEXT ZONING
    70.
    发明申请

    公开(公告)号:US20220122609A1

    公开(公告)日:2022-04-21

    申请号:US17567491

    申请日:2022-01-03

    Abstract: A method of zoning a transcription of audio data includes separating the transcription of audio data into a plurality of utterances. A that each word in an utterances is a meaning unit boundary is calculated. The utterance is split into two new utterances at a work with a maximum calculated probability. At least one of the two new utterances that is shorter than a maximum utterance threshold is identified as a meaning unit.

Patent Agency Ranking