摘要:
A system identifies a document that includes an address and locates business information in the document. The system assigns a confidence score to the business information, where the confidence score relates to a probability that the business information is associated with the address. The system determines whether to associate the business information with the address based on the assigned confidence score.
摘要:
A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing those concatenation costs likely to occur. By constructing a concatenation cost database in this fashion, the processing power required at run-time is greatly reduced with negligible effect on speech quality.
摘要:
A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing those concatenation costs likely to occur. By constructing a concatenation cost database in this fashion, the processing power required at run-time is greatly reduced with negligible effect on speech quality.
摘要:
A large vocabulary speech recognizer including a combined weighted network of transducers reflecting fully expanded context-dependent modeling of pronunciations and language that can be used with a single-pass Viterbi or other coder based on sequences of labels provided by feature analysis of input speech.
摘要:
Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.
摘要:
A system identifies a document that includes an address and locates business information in the document. The system assigns a confidence score to the business information, where the confidence score relates to a probability that the business information is associated with the address. The system determines whether to associate the business information with the address based on the assigned confidence score.
摘要:
Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.
摘要:
A speech recognition system may be trained with data that is independent from previous acoustics. This method of training is quicker and more cost effective than previous training methods. In training the system, after a vocabulary word is input into the system, a first set of phonemes representative of the vocabulary word is determined. Next, the first set of phonemes is compared with a second set of phonemes representative of a second vocabulary word. The first vocabulary word and the second vocabulary word are different. The comparison generates a confusability index. The confusability index for the second word is a measure of the likelihood that the second word will be mistaken as another vocabulary word, e.g., the first word, already in the system. This process may be repeated for each newly desired vocabulary word.
摘要:
Systems and methods for identifying music in a noisy environment are described. One of the methods includes receiving audio segment data. The audio segment data is generated from the portion that is captured in the noisy environment. The method further includes generating feature vectors from the audio segment data, identifying phonemes from the feature vectors, and comparing the identified phonemes with pre-assigned phoneme sequences. Each pre-assigned phoneme sequence identifies a known music piece. The method further includes determining an identity of the music based on the comparison.
摘要:
A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs or acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and aching the concatenation costs. The number of possible sequential pairs of acoustic units makes such caching prohibitive. Statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs or acoustic units occur in practice. The system synthesizes a large body of speech, identifies the acoustic unit sequential pairs generated and their respective concatenation costs, and stores those concatenation costs likely to occur.