Parallel computing for data analysis using generalized latent variable models

    公开(公告)号:US10706188B1

    公开(公告)日:2020-07-07

    申请号:US15348083

    申请日:2016-11-10

    IPC分类号: G06F30/20 G06N20/00

    摘要: Systems and methods are provided for implementing a parallel Expectation Minimization algorithm for generalized latent variable models. Item response data that is based on responses to items from multiple respondents is accessed. The item response data includes data for multiple response variables. The item response data is analyzed using a generalized latent variable model, and the analysis includes an application of a Parallel-E Parallel-M (PEPM) algorithm. In a parallel Expectation step of the PEPM algorithm, the respondents are subdivided into N groups of respondents, and computations for the N groups are performed in parallel using the N processor cores. In a parallel Maximization step of the PEPM algorithm, the response variables are subdivided into N groups of response variables, and computations for the N groups of response variables are performed in parallel using the N processor cores.

    Systems and methods for determining lexical associations among words in a corpus
    76.
    发明授权
    Systems and methods for determining lexical associations among words in a corpus 有权
    用于确定语料库中单词之间的词汇关联的系统和方法

    公开(公告)号:US09519634B2

    公开(公告)日:2016-12-13

    申请号:US14726928

    申请日:2015-06-01

    IPC分类号: G06F17/27 G06F17/30 G10L15/06

    CPC分类号: G06F17/277 G06F17/2715

    摘要: Systems and methods are provided for identifying one or more target words of a corpus that have a lexical relationship to a plurality of provided cue words. The cue words and statistical lexical information derived from a corpus of documents are analyzed to determine candidate words that have a lexical association with the cue words. The statistical information includes numerical values indicative of probabilities of word pairs appearing together as adjacent words in a well-formed text or appearing together within a paragraph of a well-formed text. For each candidate word, a statistical association score between the candidate word and each of the cue words is determined. An aggregate score for each of the candidate words is determined based on the statistical association scores. One or more of the candidate words are selected to be the one or more target words based on the aggregate scores.

    摘要翻译: 系统和方法被提供用于识别与多个提供的提示词具有词汇关系的语料库的一个或多个目标词。 分析从文件语料库导出的提示词和统计词汇信息,以确定与提示词具有词汇关联的候选词。 统计信息包括表示在形成良好的文本中一起作为相邻单词出现的单词对的概率的数值,或者一起出现在一个格式良好的文本的段落内。 对于每个候选词,确定候选词与每个提示词之间的统计关联得分。 基于统计关联分数确定每个候选词的总分。 基于总分,将一个或多个候选词选择为一个或多个目标词。

    Systems and methods for detecting fraud in spoken tests using voice biometrics
    77.
    发明授权
    Systems and methods for detecting fraud in spoken tests using voice biometrics 有权
    使用语音生物识别来检测口语测试中的欺诈的系统和方法

    公开(公告)号:US09472195B2

    公开(公告)日:2016-10-18

    申请号:US14670034

    申请日:2015-03-26

    IPC分类号: G10L25/00 G10L17/22 G09B7/00

    CPC分类号: G10L17/22 G09B7/00

    摘要: Systems and methods described herein automate imposture detection in, e.g., test settings based on voice samples. Based on user instructions, a processing system may determine at least one set of appointments, each having voice samples and a voice print, and a comparison plan for comparing the appointments. The comparison plan defines a plurality of appointment pairs. For each appointment pair, the system compares the associated first and second appointments by, e.g., comparing the first appointment's voice samples to the second appointment's voice print and generating corresponding raw scores, which may be used to compute a composite score. If the composite score satisfies a predetermined threshold condition for fraud, the system may determine whether flagging/holding criteria are satisfied by the raw scores. If the criteria are satisfied, a flag or hold notice may be associated with the appointment pair to trigger an appropriate system/human response (e.g., withholding the appointments' test results).

    摘要翻译: 基于语音样本的系统和方法在例如测试设置中自动执行伪造检测。 基于用户指令,处理系统可以确定至少一组约会,每个约会具有语音样本和语音打印,以及用于比较约会的比较计划。 比较计划定义了多个约会对。 对于每个预约对,系统通过例如将第一约会的语音样本与第二约会的语音印刷进行比较并产生相应的原始分数来比较相关的第一和第二约会,其可以用于计算综合得分。 如果复合分数满足预定的欺诈阈值条件,则系统可以通过原始分数来确定标记/保持标准是否满足。 如果标准被满足,则可以将该标记或保持通知与该约定对相关联以触发适当的系统/人的响应(例如,扣留约定的测试结果)。

    Systems and methods for evaluating multilingual text sequences
    78.
    发明授权
    Systems and methods for evaluating multilingual text sequences 有权
    用于评估多语言文本序列的系统和方法

    公开(公告)号:US09471667B2

    公开(公告)日:2016-10-18

    申请号:US13848837

    申请日:2013-03-22

    摘要: Systems and methods are provided for scoring a response to a character-by-character highlighting task. A similarity value for the response is calculated by comparing the response to one or more correct responses to the task to determine the similarity or dissimilarity of the response to the one or more correct responses to the task. A threshold similarity value is calculated for the task, where the threshold similarity value is indicative of an amount of similarity or dissimilarity to the one or more correct responses required for the response to be scored at a certain level. The similarity value for the response is compared to the threshold similarity value. A score is assigned at, above, or below the certain level based on the comparison.

    摘要翻译: 提供了系统和方法,用于评估对逐个字符突出显示任务的响应。 通过将响应与对任务的一个或多个正确响应进行比较来计算响应的相似性值,以确定对该任务的一个或多个正确响应的响应的相似性或不相似性。 为任务计算阈值相似度值,其中阈值相似度值指示与在一定水平上得分的响应所需的一个或多个正确响应的相似度或不相似的量。 将响应的相似性值与阈值相似度值进行比较。 根据比较,分数在某一等级以上或以下。

    Systems and methods for evaluating difficulty of spoken text
    79.
    发明授权
    Systems and methods for evaluating difficulty of spoken text 有权
    用于评估口语文本难度的系统和方法

    公开(公告)号:US09449522B2

    公开(公告)日:2016-09-20

    申请号:US14080867

    申请日:2013-11-15

    CPC分类号: G09B5/04

    摘要: Systems and methods are provided for assigning a difficulty score to a speech sample. Speech recognition is performed on a digitized version of the speech sample using an acoustic model to generate word hypotheses for the speech sample. Time alignment is performed between the speech sample and the word hypotheses to associate the word hypotheses with corresponding sounds of the speech sample. A first difficulty measure is determined based on the word hypotheses, and a second difficulty measure is determined based on acoustic features of the speech sample. A difficulty score for the speech sample is generated based on the first difficulty measure and the second difficulty measure.

    摘要翻译: 提供了系统和方法,用于将难度分数分配给语音样本。 使用声学模型对语音样本的数字化版本执行语音识别,以生成语音样本的词假说。 在语音样本和单词假设之间执行时间对齐,以将单词假设与语音样本的对应声音相关联。 第一个难度测量是基于假设来确定的,第二个难度测量是基于语音样本的声学特征来确定的。 基于第一难度测量和第二难度测量,产生语音样本的难度得分。

    Computer-implemented systems and methods for scoring concatenated speech responses
    80.
    发明授权
    Computer-implemented systems and methods for scoring concatenated speech responses 有权
    计算机实现的系统和方法,用于评分级联的语音响应

    公开(公告)号:US09361908B2

    公开(公告)日:2016-06-07

    申请号:US13556439

    申请日:2012-07-24

    摘要: Systems and methods are provided for scoring non-native speech. Two or more speech samples are received, where each of the samples are of speech spoken by a non-native speaker, and where each of the samples are spoken in response to distinct prompts. The two or more samples are concatenated to generate a concatenated response for the non-native speaker, where the concatenated response is based on the two or more speech samples that were elicited using the distinct prompts. A concatenated speech proficiency metric is computed based on the concatenated response, and the concatenated speech proficiency metric is provided to a scoring model, where the scoring model generates a speaking score based on the concatenated speech metric.

    摘要翻译: 提供系统和方法来评分非本地语音。 接收两个或更多个语音样本,其中每个样本是由非母语说话者说出的语音,并且其中每个样本响应于不同的提示而被说出。 两个或多个样本被连接以产生用于非母语者的连接响应,其中级联响应基于使用不同提示引出的两个或多个语音样本。 基于连接的响应来计算级联语音能力度量,并且将连接的语音能力度量提供给评分模型,其中评分模型基于级联的语音度量生成口语分数。