Abstract:
Provided are an apparatus and method for a statistical memory network. The apparatus includes a stochastic memory, an uncertainty estimator configured to estimate uncertainty information of external input signals from the input signals and provide the uncertainty information of the input signals, a writing controller configured to generate parameters for writing in the stochastic memory using the external input signals and the uncertainty information and generate additional statistics by converting statistics of the external input signals, a writing probability calculator configured to calculate a probability of a writing position of the stochastic memory using the parameters for writing, and a statistic updater configured to update stochastic values composed of an average and a variance of signals in the stochastic memory using the probability of a writing position, the parameters for writing, and the additional statistics.
Abstract:
Provided are a neural network memory computing system and method. The neural network memory computing system includes a first processor configured to learn a sense-making process on the basis of sense-making multimodal training data stored in a database, receive multiple modalities, and output a sense-making result on the basis of results of the learning, and a second processor configured to generate a sense-making training set for the first processor to increase knowledge for learning a sense-making process and provide the generated sense-making training set to the first processor.
Abstract:
The present invention relates to a method and apparatus for improving spontaneous speech recognition performance. The present invention is directed to providing a method and apparatus for improving spontaneous speech recognition performance by extracting a phase feature as well as a magnitude feature of a voice signal transformed to the frequency domain, detecting a syllabic nucleus on the basis of a deep neural network using a multi-frame output, determining a speaking rate by dividing the number of syllabic nuclei by a voice section interval detected by a voice detector, calculating a length variation or an overlap factor according to the speaking rate, and performing cepstrum length normalization or time scale modification with a voice length appropriate for an acoustic model.
Abstract:
Provided is a mobile communication terminal including: a camera module which captures an image of a set area; a microphone module which, when a sound including a voice of a user is input, extracts a sound level corresponding to the sound and a sound generating position; and a control module which estimates a position of a lip of the user from the image, extracts a voice level from the sound level corresponding to the position of the lip of the user and a voice generating position from the sound generating position, and recognizes the voice of the user based on at least one of the voice level and the voice generating position.
Abstract:
A feature compensation apparatus includes a feature extractor configured to extract corrupt speech features from a corrupt speech signal with additive noise that consists of two or more frames; a noise estimator configured to estimate noise features based on the extracted corrupt speech features and compensated speech features; a probability calculator configured to calculate a correlation between adjacent frames of the corrupt speech signal; and a speech feature compensator configured to generate compensated speech features by eliminating noise features of the extracted corrupt speech features while taking into consideration the correlation between adjacent frames of the corrupt speech signal and the estimated noise features, and to transmit the generated compensated speech features to the noise estimator.
Abstract:
A pre-training apparatus and method for recognition speech, which initialize, by layers, a deep neural network to correct a node connection weight. The pre-training apparatus for speech recognition includes an input unit configured to receive speech data, a model generation unit configured to initialize a connection weight of a deep neural network, based on the speech data, and an output unit configured to output information about the connection weight. In order for a state of a phoneme unit corresponding to the speech data to be output, the model generation unit trains the connection weight by piling a plurality of hidden layers according to a determined structure of the deep neural network, applies an output layer to a certain layer between the plurality of hidden layers to correct the trained connection weight in each of the plurality of hidden layers, thereby initializing the connection weight.
Abstract:
The present invention relates to automatic summarization so as to recognize entire contents of multimedia data. A method of generating summarized information according to the present invention includes: generating index information on a specific audio signal or a specific video signal among input signals; synchronizing text information extracted from the input signal or received for the input signal with the index information; and generating first summarized information by using the synchronized text information and index information.
Abstract:
Provided is a mobile communication terminal including: a camera module which captures an image of a set area; a microphone module which, when a sound including a voice of a user is input, extracts a sound level corresponding to the sound and a sound generating position; and a control module which estimates a position of a lip of the user from the image, extracts a voice level from the sound level corresponding to the position of the lip of the user and a voice generating position from the sound generating position, and recognizes the voice of the user based on at least one of the voice level and the voice generating position.
Abstract:
Provided are sentence embedding method and apparatus based on subword embedding and skip-thoughts. To integrate skip-thought sentence embedding learning methodology with a subword embedding technique, a skip-thought sentence embedding learning method based on subword embedding and methodology for simultaneously learning subword embedding learning and skip-thought sentence embedding learning, that is, multitask learning methodology, are provided as methodology for applying intra-sentence contextual information to subword embedding in the case of subword embedding learning. This makes it possible to apply a sentence embedding approach to agglutinative languages such as Korean in a bag-of-words form. Also, skip-thought sentence embedding learning methodology is integrated with a subword embedding technique such that intra-sentence contextual information can be used in the case of subword embedding learning. A proposed model minimizes additional training parameters based on sentence embedding such that most training results may be accumulated in a subword embedding parameter.
Abstract:
An apparatus for controlling a mobile device according to the present invention includes: a conversation recognition unit configured to recognize a conversation between users through mobile devices; a user intent verification unit configured to verify an intent of at least one user among the users based on the recognition result; and an additional function control unit configured to execute an additional function corresponding to the verified user's intent in a mobile device of the user. According to the present invention, great contribution may be made to improve communication between users by recognizing the conversation between the users, thereby directly providing information associated with the conversation or providing a service.