Abstract:
Methods, systems, and computer-readable media for anonymizing electronic documents. In accordance with one or more embodiments, structurally-similar electronic documents can be identified among a group of electronic documents (e.g., e-mail messages, documents containing HTML formatting, etc.). A hash function can be specifically tailored to identify the similarly structured documents. The structurally-similar electronic documents can be grouped into a same equivalence class. Masked anonymized document samples can be generated from the structurally-similar electronic documents utilizing the same equivalence class, thereby ensuring that the anonymized document samples when viewed as a part of an audit remain anonymous. An online process is provided to guarantee k-anonymity of the users over the entire lifetime of the auditing process. An auditor's productivity can be measured based on the amount of content revealed to the auditor within the samples he is assigned. The auditor's productivity is maximized while ensuring anonymization over the lifetime of the audit.
Abstract:
Methods, systems and programming for predicting search results quality. In one example, a search query is received from a user. A plurality of search results are obtained from a content source based on the search query. The plurality of search results are ranked based on their relevance scores with respect to the search query. A distribution of the relevance scores of the plurality of search results is normalized in each position of the ranking. A metric of the content source is computed based on the normalized distribution of the relevance scores. The metric indicates a relevance between the plurality of search results and the search query.
Abstract:
Methods, systems and programming for predicting search results quality. In one example, a search query is received from a user. A plurality of search results are obtained from a content source based on the search query. The plurality of search results are ranked based on their relevance scores with respect to the search query. A distribution of the relevance scores of the plurality of search results is normalized in each position of the ranking. A metric of the content source is computed based on the normalized distribution of the relevance scores. The metric indicates a relevance between the plurality of search results and the search query.