-
公开(公告)号:US10937444B1
公开(公告)日:2021-03-02
申请号:US16196716
申请日:2018-11-20
摘要: A system for end-to-end automated scoring is disclosed. The system includes a word embedding layer for converting a plurality of ASR outputs into input tensors; a neural network lexical model encoder receiving the input tensors; a neural network acoustic model encoder implementing AM posterior probability, word duration, mean value of pitch and mean value of intensity based on a plurality of cues; and a linear regression module, for receiving concatenated encoded features from the neural network lexical model encoder and the neural network acoustic model encoder.
-
公开(公告)号:US11222627B1
公开(公告)日:2022-01-11
申请号:US16197704
申请日:2018-11-21
发明人: Yao Qian , Rutuja Ubale , Vikram Ramanarayanan , Patrick Lange , David Suendermann-Oeft , Keelan Evanini , Eugene Tsuprun
摘要: Systems and methods are provided for conducting a simulated conversation with a language learner include determining a first dialog state of the simulated conversation. First audio data corresponding to simulated speech based on the dialog state is transmitted. Second audio data corresponding to a variable length utterance spoken in response to the simulated speech is received. A fixed dimension vector is generated based on the variable length utterance. A semantic label is predicted for the variable-length utterance based on the fixed dimension vector. A second dialog state of the simulated conversation is determined based on the semantic label, and third audio data corresponding to simulated speech is transmitted based on the second dialog state.
-
3.
公开(公告)号:US10592733B1
公开(公告)日:2020-03-17
申请号:US15600206
申请日:2017-05-19
发明人: Vikram Ramanarayanan , David Suendermann-Oeft , Patrick Lange , Alexei V. Ivanov , Keelan Evanini , Yao Qian , Eugene Tsuprun , Hillary R. Molloy
摘要: Systems and methods are provided providing a spoken dialog system. Output is provided from a spoken dialog system that determines audio responses to a person based on recognized speech content from the person during a conversation between the person and the spoken dialog system. Video data associated with the person interacting with the spoken dialog system is received. A video engagement metric is derived from the video data, where the video engagement metric indicates a level of the person's engagement with the spoken dialog system.
-
公开(公告)号:US10283142B1
公开(公告)日:2019-05-07
申请号:US15215649
申请日:2016-07-21
发明人: Zhou Yu , Vikram Ramanarayanan , David Suendermann-Oeft , Xinhao Wang , Klaus Zechner , Lei Chen , Jidong Tao , Yao Qian
摘要: Systems and methods are provided for a processor-implemented method of analyzing quality of sound acquired via a microphone. An input metric is extracted from a sound recording at each of a plurality of time intervals. The input metric is provided at each of the time intervals to a neural network that includes a memory component, where the neural network provides an output metric at each of the time intervals, where the output metric at a particular time interval is based on the input metric at a plurality of time intervals other than the particular time interval using the memory component of the neural network. The output metric is aggregated from each of the time intervals to generate a score indicative of the quality of the sound acquired via the microphone.
-
公开(公告)号:US11238844B1
公开(公告)日:2022-02-01
申请号:US16255220
申请日:2019-01-23
摘要: Systems and methods for identifying a person's native language and/or non-native language based on code-switched text and/or speech, are presented. The systems may be trained using various methods. For example, a language identification system may be trained using one or more code-switched corpora. Text and/or speech features may be extracted from the corpora and used, in combination with a per-word language identify of the text and/or speech, to train at least one machine learner. Code-switched text and/or speech may be received and processed by extracting text and/or speech features. These features may be fed into the at least one machine learner to identify the person's native language.
-
6.
公开(公告)号:US10607504B1
公开(公告)日:2020-03-31
申请号:US15272903
申请日:2016-09-22
发明人: Vikram Ramanarayanan , David Suendermann-Oeft , Patrick Lange , Alexei V. Ivanov , Keelan Evanini , Yao Qian , Zhou Yu
摘要: Systems and methods are provided for implementing an educational dialog system. An initial task model is accessed that identifies a plurality of dialog states associated with a task, a language model configured to identify a response meaning associated with a received response, and a language understanding model configured to select a next dialog state based on the identified response meaning. The task is provided to a plurality of persons for training. The task model is updated by revising the language model and the language understanding model based on responses received to prompts of the provided task, and the updated task is provided to a student for development of speaking capabilities.
-
公开(公告)号:US11455999B1
公开(公告)日:2022-09-27
申请号:US16844439
申请日:2020-04-09
发明人: Xinhao Wang , Su-Youn Yoon , Keelan Evanini , Klaus Zechner , Yao Qian
摘要: Data is received that encapsulates a spoken response to a prompt text comprising a string of words. Thereafter, the received data is transcribed into a string of words. The string of words is then compared with a prompt so that a similarity grid representation of the comparison can be generated that characterizes a level of similarity between the string of words in the spoken response and the string of words in the prompt text. The grid representation is then scored using at least one machine learning model. The score indicates a likelihood of the spoken response having been off-topic. Data providing the encapsulated score can then be provided. Related apparatus, systems, techniques and articles are also described.
-
公开(公告)号:US11417339B1
公开(公告)日:2022-08-16
申请号:US16695348
申请日:2019-11-26
发明人: Xinhao Wang , Keelan Evanini , Yao Qian , Klaus Zechner
IPC分类号: G10L15/26 , G10L15/197 , G10L25/51 , G10L15/16
摘要: Data is received that encapsulates a spoken response to a test question. Thereafter, the received data is transcribed into a string of words. The string of words is then compared with at least one source string so that a similarity grid representation of the comparison can be generated that characterizes a level of similarity between the string of words and the at least one source string. The grid representation is then scored using at least one machine learning model. The score indicates a likelihood of the spoken response having been plagiarized. Data providing the encapsulated score can then be provided. Related apparatus, systems, techniques and articles are also described.
-
公开(公告)号:US10783873B1
公开(公告)日:2020-09-22
申请号:US16221980
申请日:2018-12-17
发明人: Yao Qian , Keelan Evanini , Patrick Lange , Robert A. Pugh , Rutuja Ubale
摘要: Systems and methods for identifying a person's native language, are presented. A native language identification system, comprising a plurality of artificial neural networks, such as time delay deep neural networks, is provided. Respective artificial neural networks of the plurality of artificial neural networks are trained as universal background models, using separate native language and non-native language corpora. The artificial neural networks may be used to perform voice activity detection and to extract sufficient statistics from the respective language corpora. The artificial neural networks may use the sufficient statistics to estimate respective T-matrices, which may in turn be used to extract respective i-vectors. The artificial neural networks may use i-vectors to generate a multilayer perceptron model, which may be used to identify a person's native language, based on an utterance by the person in his or her non-native language.
-
公开(公告)号:US10008209B1
公开(公告)日:2018-06-26
申请号:US15273830
申请日:2016-09-23
发明人: Yao Qian , Jidong Tao , David Suendermann-Oeft , Keelan Evanini , Alexei V. Ivanov , Vikram Ramanarayanan
摘要: Systems and methods are provided for providing voice authentication of a candidate speaker. Training data sets are accessed, where each training data set comprises data associated with a training speech sample of a speaker and a plurality of speaker metrics, where the plurality of speaker metrics include a native language of the speaker. The training data sets are used to train a neural network, where the data associated with each training speech sample is a training input to the neural network, and each of the plurality of speaker metrics is a training output to the neural network. Data associated with a speech sample is provided to the neural network to generate a vector that contains values for the plurality of speaker metrics, and the values contained in the vector are compared to values contained in a reference vector associated with a known person to determine whether the candidate speaker is the known person.
-
-
-
-
-
-
-
-
-