Abstract:
Browsing sequence phrase identification technique embodiments are presented that generally extract topically-related phrases from the pages visited by a user in a browsing session. The topically-related phrases can be used for a variety of purposes, including aiding a user in re-finding previously visited sites. This phrase identification task is performed by considering not just the pages of a user's browsing sequence individually, but also pages visited immediately before and immediately after each page. In this way, phrases found in a page can be analyzed in the context in which the page was viewed, rather than in isolation. The identified phrases are further filtered by picking those that appear on a pre-populated topic list, and then clustering to find the most informative ones.
Abstract:
Semantic object characterization and its use in indexing and searching a database directory is presented. In general, a first binary hash code is generated to represent a first representation or view of a semantic object which when compared to a characterized version of a second representation or view of the same semantic object in the form of a second binary hash code, the first and second binary hash codes exhibit a degree of similarity indicative of the objects being the same object. In one implementation the semantic objects correspond to peoples' names and the first and second representations or views correspond to two different languages. Thus, a user can search a database of information in one language with a search query in another language.
Abstract:
An approach is described for using a query expressed in a source language to retrieve information expressed in a target language. The approach uses a translation dictionary to convert terms in the query from the source language to appropriate terms in the target language. The approach determines viable transliterations for out-of-vocabulary (OOV) query terms by retrieving a body of information based on an in-vocabulary component of the query, and then mining the body of information to identify the viable transliterations for the OOV query terms. The approach then adds the viable transliterations to the translation dictionary. The retrieval, mining, and adding operations can be repeated one or more or times.
Abstract:
Semantic object characterization and its use in indexing and searching a database directory is presented. In general, a first binary hash code is generated to represent a first representation or view of a semantic object which when compared to a characterized version of a second representation or view of the same semantic object in the form of a second binary hash code, the first and second binary hash codes exhibit a degree of similarity indicative of the objects being the same object. In one implementation the semantic objects correspond to peoples' names and the first and second representations or views correspond to two different languages. Thus, a user can search a database of information in one language with a search query in another language.
Abstract:
A search environment of an embodiment includes name mining and matching features used in part to identify people-centric queries and provide an enriched search experience, but is not so limited. A method of an embodiment operates to provide an expanded query based in part on a geometric similarity measure, an edit distance measure, a string similarity measure, and a cumulative similarity measure. A search system of an embodiment includes a mined candidate generator component and a name matcher component used in part to identify name queries and provide an expanded query that includes original query terms and one or more valid mined names. Other embodiments are also disclosed.
Abstract:
Various technologies described herein pertain to suggesting context dependent keywords for advertising. A set of seed queries can be identified from a context, where the context is a source keyword, a search query, a category, or a landing page. Moreover, the set of seed queries can be inputted to a search engine. A predetermined number of web pages returned by the search engine upon executing the set of seed queries can be retrieved. Candidate keywords can be extracted from the web pages returned by the search engine. Further, keywords from the candidate keywords can be selected from the candidate keywords based on relevance scores of the candidate keywords.
Abstract:
Browsing sequence phrase identification technique embodiments are presented that generally extract topically-related phrases from the pages visited by a user in a browsing session. The topically-related phrases can be used for a variety of purposes, including aiding a user in re-finding previously visited sites. This phrase identification task is performed by considering not just the pages of a user's browsing sequence individually, but also pages visited immediately before and immediately after each page. In this way, phrases found in a page can be analyzed in the context in which the page was viewed, rather than in isolation. The identified phrases are further filtered by picking those that appear on a pre-populated topic list, and then clustering to find the most informative ones.
Abstract:
Keyword extraction technique embodiments are presented which extract topically related keywords from a set of topically related documents. In one general embodiment, this keyword extraction involves first accessing a set of topically related documents. A number of candidate keywords are then identified from the set of related documents. A weighted keyword candidate-document matrix is formed using these candidate keywords, and it is partitioned into multiple groups of keyword candidates. Dense clusters of keyword candidates whose density exceeds a prescribed density threshold are then identified in each of the groups of keyword candidates. Finally, the keyword candidates associated with each dense cluster are designated as topically related keywords.
Abstract:
A search environment of an embodiment includes name mining and matching features used in part to identify people-centric queries and provide an enriched search experience, but is not so limited. A method of an embodiment operates to provide an expanded query based in part on a geometric similarity measure, an edit distance measure, a string similarity measure, and a cumulative similarity measure. A search system of an embodiment includes a mined candidate generator component and a name matcher component used in part to identify name queries and provide an expanded query that includes original query terms and one or more valid mined names. Other embodiments are also disclosed.
Abstract:
Described herein are various technologies pertaining to provision of query suggestions to a user independent of a query log. Key phrases are automatically identified in documents of a document corpus, and a forward index and inverted index are generated. The forward index indexes key phrases by documents, and the inverted index indexes documents by key phrases. A query is received from a user, and documents relevant to the query are retrieved. Key phrases in the retrieved documents are identified via the forward index, and a subset of the key phrases are selected as query suggestions by determining coverage of the key phrases as identified in the inverted index.