摘要:
A language system facilitates entry of an input string into a mobile device using discrete keys on a keypad, such as a 10-key keypad. The numeric keys have associated letters of an alphabet. The key input is representative of one or more Chinese phonetic characters. Based on this input string, the language system derives the most likely Chinese corresponding language characters intended by the user. The language system uses multiple different search engines and language models to aid in deriving the most probable Chinese language characters. When the language system recognizes possible Chinese language characters, the mobile device displays the possible Chinese language characters for user selection of the possible Chinese language characters and/or further input of one or more Chinese phonetic characters. In this manner, the language system adopts a modeless entry methodology that eliminates conventional mode switching between input and selection operations.
摘要:
A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.
摘要:
Provided is an adaptive semantic reasoning engine that receives a natural language query, which may contain one or more contexts. The query can be broken down into tokens or a set of tokens. A task search can be performed on the token or token set(s) to classify a particular query and/or context and retrieve one or more tasks. The token or token set(s) can be mapped into slots to retrieve one or more task result. A slot filling goodness may be determined that can include scoring each task search result and/or ranking the results in a different order than the order in which the tasks were retrieved. The token or token set(s), retrieved tasks, slot filling goodness, natural language query, context, search result score and/or result ranking can be feedback to the reasoning engine for further processing and/or machine learning.
摘要:
A method and apparatus for generating a score for a system that generates text is provided. The method and apparatus identify errors in the text generated by the system and identify errors in a second text generated by a second system. The number of errors that are generated by the system but not generated by the second system is divided by the number of errors that are generated by the second system but not by the system to generate the score.
摘要:
A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.
摘要:
A method for resolving overlapping ambiguity strings in unsegmented languages such as Chinese. The methodology includes segmenting sentences into two possible segmentations and recognizing overlapping ambiguity strings in the sentences. One of the two possible segmentations is selected as a function of probability information. The probability information is derived from unsupervised training data. A method of constructing a knowledge base containing probability information needed to select one of the segmentation is also provided.
摘要:
Cluster- and pruning-based language model compression is disclosed. In one embodiment, a language model is first clustered, such as by using predictive clustering. The language model after clustering has a larger size than it did before clustering. The language model is then pruned, such as by using entropy-based techniques, such as Rosenfeld pruning, or by using Stolcke pruning or count-cutoff techniques. In one particular embodiment, a word language model is first predictively clustered by a technique described as P(Z|xy)×P(z|xyZ), where a lower-case letter refers to a word, and an upper-cluster letter refers to a cluster in which the word resides.