Abstract:
A feature compensation apparatus includes a feature extractor configured to extract corrupt speech features from a corrupt speech signal with additive noise that consists of two or more frames; a noise estimator configured to estimate noise features based on the extracted corrupt speech features and compensated speech features; a probability calculator configured to calculate a correlation between adjacent frames of the corrupt speech signal; and a speech feature compensator configured to generate compensated speech features by eliminating noise features of the extracted corrupt speech features while taking into consideration the correlation between adjacent frames of the corrupt speech signal and the estimated noise features, and to transmit the generated compensated speech features to the noise estimator.
Abstract:
Provided are an apparatus and method for a statistical memory network. The apparatus includes a stochastic memory, an uncertainty estimator configured to estimate uncertainty information of external input signals from the input signals and provide the uncertainty information of the input signals, a writing controller configured to generate parameters for writing in the stochastic memory using the external input signals and the uncertainty information and generate additional statistics by converting statistics of the external input signals, a writing probability calculator configured to calculate a probability of a writing position of the stochastic memory using the parameters for writing, and a statistic updater configured to update stochastic values composed of an average and a variance of signals in the stochastic memory using the probability of a writing position, the parameters for writing, and the additional statistics.
Abstract:
Provided are a neural network memory computing system and method. The neural network memory computing system includes a first processor configured to learn a sense-making process on the basis of sense-making multimodal training data stored in a database, receive multiple modalities, and output a sense-making result on the basis of results of the learning, and a second processor configured to generate a sense-making training set for the first processor to increase knowledge for learning a sense-making process and provide the generated sense-making training set to the first processor.
Abstract:
The present invention relates to a method and apparatus for improving spontaneous speech recognition performance. The present invention is directed to providing a method and apparatus for improving spontaneous speech recognition performance by extracting a phase feature as well as a magnitude feature of a voice signal transformed to the frequency domain, detecting a syllabic nucleus on the basis of a deep neural network using a multi-frame output, determining a speaking rate by dividing the number of syllabic nuclei by a voice section interval detected by a voice detector, calculating a length variation or an overlap factor according to the speaking rate, and performing cepstrum length normalization or time scale modification with a voice length appropriate for an acoustic model.
Abstract:
A knowledge increasing method includes calculating uncertainty of knowledge obtained from a neural network using an explicit memory, determining the insufficiency of the knowledge on the basis of the calculated uncertainty, obtaining additional data (learning data) for increasing insufficient knowledge, and training the neural network by using the additional data to autonomously increase knowledge.
Abstract:
Provided are sentence embedding method and apparatus based on subword embedding and skip-thoughts. To integrate skip-thought sentence embedding learning methodology with a subword embedding technique, a skip-thought sentence embedding learning method based on subword embedding and methodology for simultaneously learning subword embedding learning and skip-thought sentence embedding learning, that is, multitask learning methodology, are provided as methodology for applying intra-sentence contextual information to subword embedding in the case of subword embedding learning. This makes it possible to apply a sentence embedding approach to agglutinative languages such as Korean in a bag-of-words form. Also, skip-thought sentence embedding learning methodology is integrated with a subword embedding technique such that intra-sentence contextual information can be used in the case of subword embedding learning. A proposed model minimizes additional training parameters based on sentence embedding such that most training results may be accumulated in a subword embedding parameter.
Abstract:
An apparatus for canceling an acoustic echo signal caused by a far-end talker signal is provided. The apparatus for canceling an acoustic echo signal includes: a variance estimating unit configured to estimate a variance of a first audio signal of a near-end talker signal and a first noise signal of the near-end talker signal; a step size determining unit configured to determine a step size by using the variance of the first audio signal and the variance of the first noise signal; an adaptive filter coefficient updating unit configured to update an adaptive filter coefficient of an adaptive filter by using the step size; and an acoustic echo canceling unit configured to estimate an acoustic echo signal by using the adaptive filter coefficient, and cancel the acoustic echo signal from a microphone input signal by using the estimated acoustic echo signal.