摘要:
Methods, systems, and articles of manufacture consistent with certain principles related to the present invention enable a computing system to perform hierarchical topical clustering of text data based on statistical modeling of co-occurrences of (document, word) pairs. The computing system may be configured to receive a collection of documents, each document including a plurality of words, and perform a modified deterministic annealing Expectation-Maximization (EM) process on the collection to produce a softly assigned hierarchy of nodes. The process may involve assigning documents and document fragments to multiple nodes in the hierarchy based on words included in the documents, such that a document may be assigned to any ancestor node included in the hierarchy, thus eliminating the hard assignment of documents in the hierarchy.
摘要:
Described is a system that characterizes segments of a document with one or more keyphrases and then uses the keyphrases to help users find interesting parts of a document. The keyphrases are displayed with information about the location of the phrase in the document and are used as pointers to quickly move to from an overview to a section of potential interest.
摘要:
Demographic information of an Internet user is predicted based on an analysis of accessed web pages. Web pages accessed by the Internet user are detected and mapped to a user path vector which is converted to a normalized weighted user path vector. A centroid vector identifies web page access patterns of users with a shared user profile attribute. The user profile attribute is assigned to the Internet user based on a comparison of the vectors. Bias values are also assigned to a set of web pages and a user profile attribute can be predicted for an Internet user based on the bias values of web pages accessed by the user. User attributes can also be predicted based on the results of an expectation maximization process. Demographic information can be predicted based on the combined results of a vector comparison, bias determination, or expectation maximization process.
摘要:
The present invention is related to antibodies directed to the antigen PDGFD and uses of such antibodies. In particular, in accordance with the present invention, there are provided fully human monoclonal antibodies directed to the antigen PDGFD. Nucelotide sequences encoding, and amino acid sequences comprising, heavy and light chain immunoglobulin molecules, particularly sequences corresponding to contiguous heavy and light chain sequences spanning the framework regions and/or complementarity determining regions (CDR's), specifically from FR1 through FR4 or CDR1 through CDR3, are provided. Hybridomas or other cell lines expressing such immunoglobulin molecules and monoclonal antibodies are also provided.
摘要:
One aspect of the invention is that of efficiently and incrementally adding new terms to an already trained probabilistic latent semantic analysis (PLSA) model.
摘要:
A method of trading a future right to a keyword advertisement placement associated with a search results list, wherein the search results list is generated in response to a search query. The method includes creating ownership of the future right to the keyword advertisement placement in an original keyword search engine. Next, the future right to the keyword advertisement placement originally owned by the original keyword search engine is made available for purchase in a keyword advertising market. Then, the future right to the keyword advertisement placement originally owned by the original keyword search engine is traded to another participant in the keyword advertising market.
摘要:
The subject invention relates to a system and method for video summarization, and more specifically to a system for segmenting and classifying data from a video in order to create a summary video that preserves and summarizes relevant content. In one embodiment, the system first extracts appearance, motion, and audio features from a video in order to create video segments corresponding to the extracted features. The video segments are then classified as dynamic or static depending on the appearance-based and motion-based features extracted from each video segment. The classified video segments are then grouped into clusters to eliminate redundant content. Select video segments from each cluster are selected as summary segments, and the summary segments are compiled to form a summary video. The parameters for any of the steps in the summarization of the video can be altered so that a user can adapt the system to any type of video, although the system is designed to summarize unstructured videos where the content is unknown. In another aspect, audio features can also be used to further summarize video with certain audio properties.
摘要:
Described is system that characterizes segments of document with one or more keyphrases and then uses keyphrases to help users find interesting parts of document. Keyphrases are displayed with information about the location of the phrase in the document and are used as pointers to quickly move to from overview to section of potential interest. In another implementation, when there are many documents in a collection, inventive multi-document view can be used to reduce number of documents presented, helping user to more efficiently find documents of interest. In this view, a user (possibly repeatedly) filters documents displayed based on metadata values. In one implementation, icons corresponding to documents are displayed on a display device together with metadata corresponding to the documents. When the value of the metadata is selected by the user, display state of the icons corresponding to document is varied based on selected value of metadata.
摘要:
Demographic information of an Internet user is predicted based on an analysis of accessed web pages. Web pages accessed by the Internet user are detected and mapped to a user path vector which is converted to a normalized weighted user path vector. A centroid vector identifies web page access patterns of users with a shared user profile attribute. The user profile attribute is assigned to the Internet user based on a comparison of the vectors. Bias values are also assigned to a set of web pages and a user profile attribute can be predicted for an Internet user based on the bias values of web pages accessed by the user. User attributes can also be predicted based on the results of an expectation maximization process. Demographic information can be predicted based on the combined results of a vector comparison, bias determination, or expectation maximization process.
摘要:
Techniques for training and using linked event detection systems and transforming source-identified stopwords are provided. A training corpus of source identified stories and a reference language is determined. Optionally, stopwords for source-identified stories are transformed based on statistical analysis of parallel verified and un-verified transformations. Reference language and non-reference language terms are selectively included in source-pair term frequency-inverse story frequency models. Optionally, incremental source-identified term frequency-inverse story frequency models are determined. Selected terms are weighted and similarity metrics determined. Associated source-pair statistics, computed in part from a training corpus, are combined with the values of each similarity metric in the set of similarity metrics to form a similarity vector. Similarity vectors and verified link label information are used to determine a predictive model. Similarity vectors for story pairs are used with the predictive model to determine if the story-pairs are linked. Sources are arranged based on source inter-relationships into a source-hierarchy. Progressively more refined source-pair similarity statistics are also provided. New sources and associated source-pair similarity statistics are added by substituting related source-pair similarity statistics based on the source hierarchy and source characteristics. The source-pair similarity statistics are used to optionally normalize the similarity metrics.