-
公开(公告)号:US10133538B2
公开(公告)日:2018-11-20
申请号:US14671918
申请日:2015-03-27
Applicant: SRI International
Inventor: Mitchell Leigh McLaren , Aaron Dennis Lawson , Harry Bratt
Abstract: An audio file analyzer computing system includes technologies to, among other things, localize audio events of interest (such as speakers of interest) within an audio file that includes multiple different classes (e.g., different speakers) of audio. The illustrative audio file analyzer computing system uses a seed segment to perform a semi-supervised diarization of the audio file. The seed segment is pre-selected, such as by a human person using an interactive graphical user interface.
-
公开(公告)号:US20160283185A1
公开(公告)日:2016-09-29
申请号:US14671918
申请日:2015-03-27
Applicant: SRI International
Inventor: Mitchell Leigh McLaren , Aaron Dennis Lawson , Harry Bratt
CPC classification number: G06F3/165 , G06F3/0484 , G06F17/3074 , G06F17/30743 , G06F17/30769 , G06F17/30778 , G10L17/04 , G10L17/06 , G10L25/27 , G10L25/54
Abstract: An audio file analyzer computing system includes technologies to, among other things, localize audio events of interest (such as speakers of interest) within an audio file that includes multiple different classes (e.g., different speakers) of audio. The illustrative audio file analyzer computing system uses a seed segment to perform a semi-supervised diarization of the audio file. The seed segment is pre-selected, such as by a human person using an interactive graphical user interface.
Abstract translation: 音频文件分析器计算系统包括用于在包括音频的多个不同类别(例如,不同扬声器)的音频文件内本地化感兴趣的音频事件(例如感兴趣的扬声器)的技术。 说明性音频文件分析器计算系统使用种子片段来执行音频文件的半监督二值化。 种子段是预先选择的,例如由人使用交互式图形用户界面。
-
公开(公告)号:US20250068673A1
公开(公告)日:2025-02-27
申请号:US18813647
申请日:2024-08-23
Applicant: SRI International
Inventor: Mitchell Leigh McLaren , Aaron Dennis Lawson
IPC: G06F16/683 , G06F16/61 , G10L15/00
Abstract: A computing system is configured to obtain a plurality of media files that each includes speech of one or more speakers. The computing system is further configured to process the plurality of media files to generate indexed data, wherein the indexed data includes a corresponding embedding for each speaker of the one or more speakers identified in the media file and a corresponding one or more keywords identified in the speech in the media file. The computing system is further configured to receive an indication at least one of a selection of a particular speaker from the one or more speakers or a selection of a particular keyword from a plurality of keywords. The computing system is further configured to generate one or more correlations based on the indexed data. The computing system is further configured to output an alert regarding the one or more correlations.
-
公开(公告)号:US10476872B2
公开(公告)日:2019-11-12
申请号:US15013580
申请日:2016-02-02
Applicant: SRI International
Inventor: Mitchell Leigh McLaren , Aaron Dennis Lawson
Abstract: A spoken command analyzer computing system includes technologies configured to analyze information extracted from a speech sample and, using a joint speaker and phonetic content model, both determine whether the analyzed speech includes certain content (e.g., a command) and to identify the identity of the human speaker of the speech. In response to determining that the identity matches the authorized user's identity and determining that the analyzed speech includes the modeled content (e.g., command), an action corresponding to the verified content (e.g., command) is performed by an associated device.
-
公开(公告)号:US20250046333A1
公开(公告)日:2025-02-06
申请号:US18727279
申请日:2022-12-16
Applicant: SRI International
Inventor: Martin Graciarena , Aaron Dennis Lawson , MD Hafizur Rahman
Abstract: In general, the disclosure describes a computing system to automatically identify and classify audio input, including non-speech audio signals. The computing system may also add new classes based on only a limited number of examples of the new classes, to identify classes of sounds for which the system had not been trained.
-
公开(公告)号:US20250029601A1
公开(公告)日:2025-01-23
申请号:US18769197
申请日:2024-07-10
Applicant: SRI International
Inventor: MD Hafizur Rahman , Mitchell Leigh McLaren , Aaron Dennis Lawson
Abstract: In general, the disclosure describes techniques for detecting synthetic speech of a speaker. In an example, a machine learning system may be configured to generate, using a deep learning model trained to distinguish between synthetic speech and authentic speech, reference embeddings for the speaker that characterize a first set of acoustic features and a first set of phonetic features associated with the speaker. The machine learning system may further be configured to generate, using the deep learning model, a test embedding for an audio clip that characterizes a second set of acoustic features and a second set of phonetic features associated with the audio clip. The machine learning system may further be configured to compute a score based on the test embedding and the reference embeddings. The machine learning system may further be configured to output, based on the score, an indication of whether the audio clip includes synthetic speech.
-
7.
公开(公告)号:US20160248768A1
公开(公告)日:2016-08-25
申请号:US15013580
申请日:2016-02-02
Applicant: SRI International
Inventor: Mitchell Leigh McLaren , Aaron Dennis Lawson
CPC classification number: H04L63/0861 , G10L15/16 , G10L15/183 , G10L15/22 , G10L17/18 , G10L17/22 , G10L2015/223 , H04L63/10 , H04L63/102
Abstract: A spoken command analyzer computing system includes technologies configured to analyze information extracted from a speech sample and, using a joint speaker and phonetic content model, both determine whether the analyzed speech includes certain content (e.g., a command) and to identify the identity of the human speaker of the speech. In response to determining that the identity matches the authorized user's identity and determining that the analyzed speech includes the modeled content (e.g., command), an action corresponding to the verified content (e.g., command) is performed by an associated device.
Abstract translation: 语音命令分析器计算系统包括被配置为分析从语音样本提取的信息的技术,并且使用联合讲话者和语音内容模型,都确定所分析的语音是否包括某些内容(例如,命令),并且识别 演讲人讲话。 响应于确定身份与授权用户的身份匹配并且确定被分析的语音包括建模的内容(例如,命令),与相关联的设备执行与被验证的内容相对应的动作(例如,命令)。
-
-
-
-
-
-