Method and apparatus for automatic speaker-based speech clustering
    2.
    发明授权
    Method and apparatus for automatic speaker-based speech clustering 有权
    用于自动基于扬声器的语音聚类的方法和装置

    公开(公告)号:US09368109B2

    公开(公告)日:2016-06-14

    申请号:US13907364

    申请日:2013-05-31

    CPC分类号: G10L15/063 G10L17/04

    摘要: Reliable speaker-based clustering of speech utterances allows improved speaker recognition and speaker-based speech segmentation. According to at least one example embodiment, an iterative bottom-up speaker-based clustering approach employs voiceprints of speech utterances, such as i-vectors. At each iteration, a clustering confidence score in terms of Silhouette Width Criterion (SWC) values is evaluated, and a pair of nearest clusters is merged into a single cluster. The pair of nearest clusters merged is determined based on a similarity score indicative of similarity between voiceprints associated with different clusters. A final clustering pattern is then determined as a set of clusters associated with an iteration corresponding to the highest clustering confidence score evaluated. The SWC used may further be a modified SWC enabling detection of an early stop of the iterative approach.

    摘要翻译: 可靠的基于扬声器的语音语音聚类允许改进扬声器识别和基于扬声器的语音分割。 根据至少一个示例实施例,迭代自下而上的基于扬声器的聚类方法采用诸如i向量之类的语音话语的声纹。 在每次迭代中,评估了轮廓宽度准则(SWC)值的聚类置信度分数,并将一对最近的聚类合并成单个聚类。 基于表示与不同簇相关联的声纹相似性的相似性分数来确定合并的最近聚类对。 然后将最终聚类模式确定为与对应于所评估的最高聚类置信度得分的迭代相关联的一组聚类。 所使用的SWC可能还是一个修改后的SWC,可以检测到迭代方法的早期停止。

    Speaker verification methods and apparatus

    公开(公告)号:US09728191B2

    公开(公告)日:2017-08-08

    申请号:US14838010

    申请日:2015-08-27

    IPC分类号: G10L15/00 G10L17/08

    CPC分类号: G10L17/08

    摘要: Techniques for automatically identifying a speaker in a conversation as a known person based on processing of audio of the speaker's voice to extract characteristics of that voice and on an automated comparison of those characteristics to known characteristics of the known person's voice. A speaker segmentation process may be performed on audio of the conversation to produce, for each speaker in the conversation, a segment that includes the audio of that speaker. Audio of each of the segments may then be processed to extract characteristics of that speaker's voice. The characteristics derived from each segment (and thus for multiple speakers) may then be compared to characteristics of the known person's voice to determine whether the speaker for that segment is the known person. For each segment, a degree of match between the voice characteristics of the speaker and the voice characteristics of the known person may be calculated.

    Method and Apparatus for Performing Speaker Recognition
    6.
    发明申请
    Method and Apparatus for Performing Speaker Recognition 有权
    执行扬声器识别的方法和装置

    公开(公告)号:US20160086607A1

    公开(公告)日:2016-03-24

    申请号:US14489996

    申请日:2014-09-18

    IPC分类号: G10L17/12

    摘要: Embodiments of the present invention perform speaker identification and verification by first prompting a user to speak a phrase that includes a common phrase component and a personal identifier. Then, the embodiments decompose the spoken phrase to locate the personal identifier. Finally, the embodiments identify and verify the user based on the results of the decomposing.

    摘要翻译: 本发明的实施例通过首先提示用户说出包括公共短语组件和个人标识符的短语来执行说话人识别和验证。 然后,实施例分解口头短语以定位个人标识符。 最后,实施例基于分解的结果识别和验证用户。

    Fast speaker recognition scoring using I-vector posteriors and probabilistic linear discriminant analysis
    8.
    发明授权
    Fast speaker recognition scoring using I-vector posteriors and probabilistic linear discriminant analysis 有权
    使用I矢量后验和概率线性判别分析的快速说话人识别得分

    公开(公告)号:US09373330B2

    公开(公告)日:2016-06-21

    申请号:US14454169

    申请日:2014-08-07

    CPC分类号: G10L17/06

    摘要: A method for performing speaker recognition comprises: estimating respective uncertainties of acoustic coverage of respective speech utterance(s) by first and second speakers, the acoustic coverage representing respective sounds used by the speakers when speaking; representing the respective uncertainties of acoustic coverage in a manner that allows for efficient memory usage by discarding dependencies between uncertainties of different sounds for the speakers; representing the respective uncertainties of acoustic coverage in a manner that allows for efficient computation by representing an inverse of the respective uncertainties of acoustic coverage and then discarding the dependencies between the uncertainties of different sounds for the speakers; and computing a score between the speech utterance(s) by the speakers in a manner that leverages the respective uncertainties of the acoustic coverage during the comparison, the score being indicative of a likelihood that the speakers are the same speaker.

    摘要翻译: 用于执行说话者识别的方法包括:通过第一和第二扬声器估计各个语音发声的声学覆盖的各种不确定性,所述声学覆盖表示说话者在说话时使用的相应声音; 以允许通过丢弃扬声器的不同声音的不确定性之间的依赖性来允许有效存储器使用的方式来表示声学覆盖的各个不确定性; 以允许通过表示声学覆盖的各个不确定性的倒数然后丢弃扬声器的不同声音的不确定性之间的依赖性的方式来允许有效计算的方式来表示声学覆盖的各个不确定性; 以及在所述比较期间利用所述声学覆盖的各自不确定性的方式,由所述扬声器计算所述语音发音之间的分数,所述分数指示所述扬声器是相同的扬声器的可能性。

    FAST SPEAKER RECOGNITION SCORING USING I-VECTOR POSTERIORS AND PROBABILISTIC LINEAR DISCRIMINANT ANALYSIS
    9.
    发明申请
    FAST SPEAKER RECOGNITION SCORING USING I-VECTOR POSTERIORS AND PROBABILISTIC LINEAR DISCRIMINANT ANALYSIS 有权
    使用I-VECTOR POSTERIORS和PROBABILISTIC LINEAR DISRIMINANT ANALYSIS的快速扬声器识别

    公开(公告)号:US20160042739A1

    公开(公告)日:2016-02-11

    申请号:US14454169

    申请日:2014-08-07

    IPC分类号: G10L17/06

    CPC分类号: G10L17/06

    摘要: A method for performing speaker recognition comprises: estimating respective uncertainties of acoustic coverage of respective speech utterance(s) by first and second speakers, the acoustic coverage representing respective sounds used by the speakers when speaking; representing the respective uncertainties of acoustic coverage in a manner that allows for efficient memory usage by discarding dependencies between uncertainties of different sounds for the speakers; representing the respective uncertainties of acoustic coverage in a manner that allows for efficient computation by representing an inverse of the respective uncertainties of acoustic coverage and then discarding the dependencies between the uncertainties of different sounds for the speakers; and computing a score between the speech utterance(s) by the speakers in a manner that leverages the respective uncertainties of the acoustic coverage during the comparison, the score being indicative of a likelihood that the speakers are the same speaker.

    摘要翻译: 用于执行说话者识别的方法包括:通过第一和第二扬声器估计各个语音发声的声学覆盖的各种不确定性,所述声学覆盖表示说话者在说话时使用的相应声音; 以允许通过丢弃扬声器的不同声音的不确定性之间的依赖性来允许有效存储器使用的方式来表示声学覆盖的各个不确定性; 以允许通过表示声学覆盖的各个不确定性的倒数然后丢弃扬声器的不同声音的不确定性之间的依赖性的方式来允许有效计算的方式来表示声学覆盖的各个不确定性; 以及在所述比较期间利用所述声学覆盖的各自不确定性的方式,由所述扬声器计算所述语音发音之间的分数,所述分数指示所述扬声器是相同的扬声器的可能性。

    Watermarking of Synthetic Speech
    10.
    发明申请

    公开(公告)号:US20210050024A1

    公开(公告)日:2021-02-18

    申请号:US16538423

    申请日:2019-08-12

    摘要: An audio watermark is embedded in synthetic speech, such as synthetic speech created using text-to-speech (TTS) synthesis. Such audio watermarks can, for example, be used to increase the accuracy of voice biometric (VB) and other systems in distinguishing synthetic speech from human speech. In addition to its use in voice biometrics, such audio watermarking can prevent misuse of human quality TTS, or other synthetic speech, in a variety of other contexts, such as incriminating recordings, spam messages, contact center denial of service, and protection of personal information in contact centers not utilizing VB.