SPEAKER RECOGNITION
    1.
    发明申请
    SPEAKER RECOGNITION 审中-公开

    公开(公告)号:US20170092278A1

    公开(公告)日:2017-03-30

    申请号:US15163392

    申请日:2016-05-24

    Applicant: Apple Inc.

    CPC classification number: G10L17/24 G10L15/22 G10L17/04 G10L17/08

    Abstract: A non-transitory computer-readable storage medium stores one or more programs including instructions, which when executed by an electronic device, cause the electronic device to receive natural-language speech input from one of a plurality of users, the natural-language speech input having a set of acoustic properties; and determine whether the natural-language speech input corresponds to both a user-customizable lexical trigger and a set of acoustic properties associated with the user; where in accordance with a determination that the natural language speech input corresponds to both a user-customizable lexical trigger and a set of acoustic properties associated with the user, invoke a virtual assistant; and in accordance with a determination that either the natural language speech input fails to correspond to a user-customizable lexical trigger or the natural-language speech input fails to have a set of acoustic properties associated with the user, forego invocation of a virtual assistant.

    INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION

    公开(公告)号:US20190278841A1

    公开(公告)日:2019-09-12

    申请号:US16024425

    申请日:2018-06-29

    Applicant: Apple Inc.

    Abstract: Techniques for inverse text normalization are provided. In some examples, speech input is received and a spoken-form text representation of the speech input is generated. The spoken-form text representation includes a token sequence. A feature representation is determined for the spoken-form text representation and a sequence of labels is determined based on the feature representation. The sequence of labels is assigned to the token sequence and specifies a plurality of edit operations to perform on the token sequence. Each edit operation of the plurality of edit operations corresponds to one of a plurality of predetermined types of edit operations. A written-form text representation of the speech input is generated by applying the plurality of edit operations to the token sequence in accordance with the sequence of labels. A task responsive to the speech input is performed using the generated written-form text representation.

Patent Agency Ranking