Abstract:
A method and apparatus for indexing and searching content in a hardcopy document utilizes a searching assistant computing device (402) with an index table (420) stored in memory (412). The index table (420) is created in memory by scanning a 2-D barcode from a hardcopy document or alternatively by downloading indexing information from a web page via the Internet (430). A search engine (410) in the searching assistant (402) searches the index table (420) to locate a data element found in the content of the hardcopy document. The indexing information corresponding to the data element is displayed to a user as part of the search results to indicate the location of the data element in the hardcopy document.
Abstract:
A method, apparatus, and article of manufacture employing lexicon reduction using key characters and a neural network, for recognizing a line of cursive text. Unambiguous parts of a cursive image, referred to as “key characters,” are identified. If the level of confidence that a segment of a line of cursive text is a particular character is higher than a threshold, and is also sufficiently higher than the level of confidence of neighboring segments, then the character is designated as a key character candidate. Key character candidates are then screened using geometric information. The key character candidates that pass the screening are designated key characters. Two-stages of lexicon reduction are employed. The first stage of lexicon reduction uses a neural network to estimate a lower bound and an upper bound of the number of characters in a line of cursive text. Lexicon entries having a total number of characters outside of the bounds are eliminated. For the second stage of lexicon reduction, the lexicon is further reduced by comparing character strings using the key characters, with lexicon entries. For each of the key characters in the character strings, it is determined whether there is a mismatch between the key character and characters in a corresponding search range in the lexicon entry. If the number of mismatches for all of the key characters in a search string is greater than (1+(the number of key characters in the search string/4)), then the lexicon entry is eliminated. Accordingly, the invention advantageously accomplishes lexicon reduction, thereby decreasing the time required to recognize a line of cursive text, without reducing accuracy.
Abstract:
An advertisement impression distribution system is programmed to generate an allocation plan for serving a number of advertisement impressions changeable as a result of one or more events, the allocation plan to allocate a first portion of advertisement impressions to satisfy guaranteed demand and a second portion of advertisement impressions to satisfy non-guaranteed demand. The system includes an optimizer programmed to establish a relationship between the first portion of advertisement impressions and the second portion of advertisement impressions, the relationship defining a range of possible proportions of allocation of the first portion of advertisement impressions and the second portion of advertisement impressions; and to impose at least one objective on the relationship including moderating an increase in the number of advertisement impressions available for allocation to the first and second portions, to minimize a cost associated with reducing a quality of the advertisement impressions as their volume increases. The system outputs the allocation plan to an ad serving module to control serving of the advertisement impressions according to the range of possible proportions of allocation between the first and the second portions.
Abstract:
Computer systems and methods incorporate user annotations (metadata) regarding various pages or sites, including annotations by a querying user and by members of a trust network defined for the querying user into search and browsing of a corpus such as the World Wide Web. A trust network is defined for each user, and annotations by any member of a first user's trust network are made visible to the first user during search and/or browsing of the corpus. Users can also limit searches to content annotated by members of their trust networks or by members of a community selected by the user.
Abstract:
The present invention relates to systems, methods, and user interfaces for browsing a collection of content items saved by a user or by one or more buddies associated with a given user. The method of the present invention comprises saving one or more content items and one or more associated keywords as specified by a user. An interface is generated that displays the one or more saved content items and the one or more associated keywords, as well as the one or more buddies associated with a given user. A user indication of the selection of a given keyword or the selection of a given buddy by the user is received. The one or more displayed content items are filtered according to the selected keyword, buddy, or combination of selected keyword and buddy.
Abstract:
Disclosed are apparatus and methods for facilitating the ranking of web objects. The method includes automatically adjusting a plurality of weight values for a plurality of parameters for inputting into a ranking engine that is adapted to rank a plurality of web objects based on such weight values and their corresponding parameters. The adjusted weight values are provided to the ranking engine so as to generate a ranked set of web objects based on such adjusted weight values and their corresponding parameters, as well as a particular query. A relevance metric (e.g., that quantifies or qualifies how relevant the generated ranked set of web objects are for the particular query) is determined. The method includes automatically repeating the operations of adjusting the weight values, providing the adjusted weight values to the ranking engine, and determining a relevance metric until the relevance metric reaches an optimized level, which corresponds to an optimized set of weight values. The repeated operations utilize one or more sets of weight values including at least one set that results in a worst relevance metric value, as compared to a previous set of weight values, according to a certain probability in order to escape local optimal solution to reach the global optimal solution.
Abstract:
An annotation method for annotating content includes displaying a set of suggested keywords on an editing interface page configured to receive one or more annotations for the content. A request is received via the editing interface page to annotate the content with at least one keyword from the set of suggested keywords. Association information is generated that associates the at least one keyword with the content.
Abstract:
Various embodiments are directed to a system and method providing associative matching of terms. Candidate terms are selected for building one or more associative matching models from one or more selected candidate sources. Associativity is defined to give editors the ability to label sample associative term pairs from the one or more candidate sources. The editors label sample candidate term pairs as being related. Features are determined that can differentiate associative from non-associative pairs. The selected features are used to build a model. The model is applied to determine whether a received query-candidate pair are associative.
Abstract:
Methods, systems, and machine-readable media are disclosed for searching a corpus of information by utilizing a Bloom filter for caching query results. According to one aspect of the present invention, a method of caching information from a corpus of information can include populating one or more Bloom filters with a plurality of bits representative of information in the corpus of information. A search request can be received identifying requested information from the corpus of information. One or more bits in the filter(s) associated with the requested information can be checked and the requested information can be retrieved from the corpus of information based on results of said checking. Furthermore, the filter(s) can be used to determine which information to make available to a particular user in a system where certain information is associated with or access is limited to certain users or groups of users.
Abstract:
The present invention is directed to systems and methods for searching content items indexed in real-time. The method according to one embodiment comprises generating an index of word location pairs that identifies the location of one or more words in one or more content items available on a network. One or more additional content items are received over the network. The received content items are stored in a stream search queue, the stream search queue operative to allow for a stream search of the one or more additional content items.