Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying synonyms. One method includes receiving a query containing a first phrase, identifying one or more first synonym phrases that are synonyms for the first phrase, identifying a new synonym phrase that is a synonym for one of the first synonym phrases, determining that the new phrase is a synonym for the first phrase, and augmenting the query with the new phrase. Another method includes receiving a query including a first compound term having a first subterm, identifying a first synonym for a first subterm, generating a second compound term, wherein the second compound term is the first compound term modified by replacing the first subterm with the first synonym, and augmenting the query with the second compound term.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for index-side synonym expansion are disclosed. Some implementations include actions of obtaining a token sequence for a resource, wherein each token in the token sequence comprises one or more characters. The actions also include selecting a token from the token sequence, wherein the selected token comprises at least one numeric portion having one or more contiguous numeric characters, and at least one non-numeric portion having one or more non-numeric characters. Further actions include generating a new token corresponding to each of the at least one numeric portions of the selected token and storing data associating the selected token and each of the new tokens corresponding to the at least one numeric portion of the selected token as index terms for the resource, wherein the search engine index is accessed to augment search queries.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for synonym verification. In one aspect, a method includes receiving a term and a candidate synonym for the term. The method further includes generating a term group of one or more text strings and a synonym group of one or more text strings. Each text string in the term group corresponding to a translation of the term into a language, and each text string in the synonym group corresponding to a translation of the synonym into the language. The method further includes determining whether the candidate synonym is a valid synonym for the term from an amount of overlap between the term group of text strings and the synonym group of text strings.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for query synonym expansion. One method includes receiving a query including a first compound term, and in response to receiving the query, performing the following operations before search results responsive to the query are identified: generating one or more splits of the first compound term, wherein each split divides the compound term into two or more subterms, assigning a score to each subterm of each split, determining an overall score for each split from the scores for the subterms of the split, selecting one or more of the one or more splits according to the overall score for each split, and augmenting the query with the subterms of each selected split.
Abstract:
One embodiment of the present invention provides a system for identifying synonym candidates. During operation, the system receives a first term and a second term. The system then determines a length of the longer one of the first and second terms, and determines a longest common subsequence of the two terms. The system further produces a result to indicate whether the two terms are synonym candidates based on the length of the longer term and a length of the longest common subsequence of the two terms.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for index-side synonym expansion. One method includes obtaining a token sequence for a resource and indexing a particular token in the token sequence. The indexing includes obtaining a diacritically canonicalized form of the particular token; determining that the diacritically canonicalized form of the particular token is different from the particular token; and storing data associating the resource with both the particular token and the different diacritically canonicalized form of the particular token as index terms for the resource in a search engine.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for query synonym expansion. One method includes receiving a query including a first compound term, and in response to receiving the query, performing the following operations before search results responsive to the query are identified: generating one or more splits of the first compound term, wherein each split divides the compound term into two or more subterms, assigning a score to each subterm of each split, determining an overall score for each split from the scores for the subterms of the split, selecting one or more of the one or more splits according to the overall score for each split, and augmenting the query with the subterms of each selected split.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for index-side synonym expansion. One method includes indexing a token from a resource, including determining that the token comprises a numeric portion and storing data associating the resource with both the particular token and the numeric portion in a search engine index. Another method includes indexing a token from a resource, including normalizing the token by removing a prefix matching a stopword prefix and storing data associating the resource with both the token and the normalized form of the token in a search engine index. Another method includes creating a token blacklist.
Abstract:
One embodiment of the present invention provides a system that considers lexical synonyms for terms while processing a query. During operation, the system receives a query containing one or more terms. Next, the system identifies one or more lexical synonyms for the one or more terms. The system then generates an altered query using the one or more lexical synonyms and processes the altered query to produce search results.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for storing, in an index associated with a document, a particular term that occurs in the document, wherein the particular term comprises n words, and wherein n is greater than 1; identifying a substitute term of the particular term; and in response to identifying the substitute term of the particular term, storing, in the index associated with the document, (i) the substitute term of the particular term, and (ii) data indicating that the substitute term spans the n words of the particular term.