摘要:
Collocation errors can be automatically proofed using local and network-based corpora, including the Web. For example, according to one illustrative method, one or more collocations from a text sample are compared with a corpus such as the content of the Web. The collocations are identified for whether they are disfavored in the corpus. Indications are provided via an output device of whether the collocations are disfavored in the corpus. Additional steps may then be taken such as searching for and providing potentially proper word collocations via a user output.
摘要:
Collocation errors can be automatically proofed using local and network-based corpora, including the Web. For example, according to one illustrative method, one or more collocations from a text sample are compared with a corpus such as the content of the Web. The collocations are identified for whether they are disfavored in the corpus. Indications are provided via an output device of whether the collocations are disfavored in the corpus. Additional steps may then be taken such as searching for and providing potentially proper word collocations via a user output.
摘要:
A sentence is accessed and at least one query is generated based on the sentence. At least one query can be compared to text within a collection of documents, for example using a web search engine. Collocation errors in the sentence can be detected and/or corrected based on the comparison of the at least one query and the text within the collection of documents.
摘要:
A sentence is accessed and at least one query is generated based on the sentence. At least one query can be compared to text within a collection of documents, for example using a web search engine. Collocation errors in the sentence can be detected and/or corrected based on the comparison of the at least one query and the text within the collection of documents.
摘要:
The subject disclosure is directed towards developing a translation model for mapping search query terms to document-related data. By processing user logs comprising search histories into word-aligned query-document pairs, the translation model may be trained using data, such as probabilities, corresponding to the word-aligned query-document pairs. After incorporating the translation model into model data for a search engine, the translation model is used may used as features for producing relevance scores for current search queries and ranking documents/advertisements according to relevance.
摘要:
The subject disclosure is directed towards developing a translation model for mapping search query terms to document-related data. By processing user logs comprising search histories into word-aligned query-document pairs, the translation model may be trained using data, such as probabilities, corresponding to the word-aligned query-document pairs. After incorporating the translation model into model data for a search engine, the translation model is used may used as features for producing relevance scores for current search queries and ranking documents/advertisements according to relevance.
摘要:
A method, computer readable medium and system are provided which retrieve confirming sentences from a sentence database in response to a query. A search engine retrieves confirming sentences from the sentence database in response to the query. IN retrieving the confirming sentences, the search engine defines indexing units based upon the query, with the indexing units including both lemma from the query and extended indexing units associated with the query. The search engine then retrieves a plurality of sentences from the sentence database using the defined indexing units as search parameters. A similarity between each of the plurality of retrieved sentences and the query is determined by the search engine, wherein each similarity is determined as a function of a linguistic weight of a term in the query. The search engine then ranks the plurality of retrieved sentences based upon the determined similarities.
摘要:
An ensemble of random feature clusters is built from training data using a clustering algorithm where some randomness has been introduced. For each clustered feature space, a classifier, such as a Naïve Bayesian Classifier, is trained, realizing a classifier ensemble. The final classification decision is made by the resulting classifier ensemble.
摘要:
Candidate suggestions for correcting misspelled query terms input into a search application are automatically generated. A score for each candidate suggestion can be generated using a first decoding pass and paths through the suggestions can be ranked in a second decoding pass. Candidate suggestions can be generated based on typographical errors, phonetic mistakes and/or compounding mistakes. Furthermore, a ranking model can be developed to rank candidate suggestions to be presented to a user.
摘要:
A method, computer readable medium and system are provided which retrieve confirming sentences from a sentence database in response to a query. A search engine retrieves confirming sentences from the sentence database in response to the query. IN retrieving the confirming sentences, the search engine defines indexing units based upon the query, with the indexing units including both lemma from the query and extended indexing units associated with the query. The search engine then retrieves a plurality of sentences from the sentence database using the defined indexing units as search parameters. A similarity between each of the plurality of retrieved sentences and the query is determined by the search engine, wherein each similarity is determined as a function of a linguistic weight of a term in the query. The search engine then ranks the plurality of retrieved sentences based upon the determined similarities.