摘要:
System and methods for enhancing search engine functionality by enabling and providing a new search function based on fuzzy expressions in a query string. When a query is received by a search engine, it is first analyzed to identify whether the query contains an expression that represents a fuzzy reference to certain objects or properties of objects, or object with certain properties, to overcome the limitations of the keyword-matching methods used by conventional search engines. For example, the present invention can accurately retrieve results for a query such as “find large-screen smart-phones” or “find light-weighted computers”, by understanding the meaning of the query and automatically identifying objects with applicable properties and mapping the meaning of the expression to such objects.
摘要:
A system, methods, and user interface for extracting information from unstructured data sources and presenting such information in a structured or semi-structured format for better information search and utilization, and can be applied to replace the conventional methods of displaying search results. The methods identify terms representing topics and related comments in various types of text contents including documents and Web pages, and extract such terms and present them in a form of a topic-comment or object-properties hierarchy, including a heading+list format and heading+cloud or group format. Methods and interface object are provided to make a file object a non-terminal node in a computer file system, with information extracted from the file content displayed as deeper levels of the file system hierarchy. Methods for displaying information extracted from unstructured document contents in terms of class-members and topic-attributes are also disclosed.
摘要:
System and methods are disclosed for providing answers to search queries, and for searching using association data without requiring keyword matching. Datasets representing objects and their properties are created from unstructured data sources based on natural language analysis methods, and can be used to answer queries about objects or properties of objects. Implementations include general information search engines and embodiments for searching products, services, people, or other objects without knowing the names of such objects, or searching for information about known objects by using either keyword-based queries or natural language queries such as asking questions. System and methods are also provided for creating a structured or semi-structured representation of various unstructured data, in contrast to the conventional term-vector or term-document matrix representation.
摘要:
System and methods are disclosed for discovering topics in sub-segments of documents, and extracting terms from a sub-segment representing topics or summaries of the sub-segment, and displaying such terms in connection with the sub-segment or with the document, which can also function as automatically generated tags or labels for the segments or for the documents. Methods are also disclosed for building search indexes based on specific sub-segments of documents, such that, users can search for contents in a specific segment of the document. One embodiment of such a search index is with emails, blogs, and forum articles that typically contain segmented contents added at different times or by different authors in a format known as a thread, and searching in a specific segment such as the most recently added segment can help quickly find the most relevant information without repeating the same information in other segments in the thread.
摘要:
A computer-assisted method for discovering topics in a document collection is disclosed. The method includes obtaining a group of text units in the document collection, tokenizing the words in the group of text units to produce a plurality of tokens that include a jth token, and adding a weighting coefficient to a parameter token_j_count for each text unit in the first group that includes the jth token. The weighting coefficient is dependent on the grammatical role of the jth token. The method includes calculating an internal term prominence value (ITP) using token_j_count, selecting one or more tokens from the tokens based on the ITP values of the respective tokens, and outputting the one or more selected tokens as topic terms associated with the document collection.
摘要:
System and methods are disclosed for determining the connotation or sentiment type of a text unit comprising multiple terms and with a grammatical structure, such as subject+verb, verb+object, adjective+noun, noun+noun, noun+preposition+noun. The connotation or sentiment type of the text unit is determined by applying context rules where the context of the grammatical structure may change the inherent or default connotations of individual terms in the text unit. The methods provide a solution to the challenge of correctly or accurately determining the sentiment type of various linguistic structures under different context, and to the simplistic approach of using the inherent or default connotation of individual terms for the linguistic structure containing such terms.
摘要:
A method is disclosed for quantitatively assessing information in natural language contents related to an object name. The method includes identifying a sentence in a document, determining a subject and a predicate in the sentence, and retrieving an object-specific data set related to the object name. The object-specific data set includes property names and association-strength values. Each property name is associated with an association-strength value. The method also includes identifying a first property name in the property names that matches the subject, assigning a first association-strength value associated with the first property name to the subject, identifying a second property name in the property names that matches the predicate, assigning a second association-strength value associated with the second property name to the predicate, and multiplying the first association-strength value and the second association-strength value to produce a sentence information index.
摘要:
System and methods are disclosed for discovering and presenting prominent information in a collection of text contents by identifying prominent terms in the text contents, and displaying the terms as either category nodes for organizing the contents in the collection, or as topics in the text contents, or as labels or tags for highlighting the contents in the collection, or for searching the contents in the collection. Methods include distinguishing the grammatical attributes associated with the terms, including the grammatical attributes of a subject and non-subject of a sentence, or a multi-word phrase and a sub-phrase, or a head and a modifier in a phrase, and other distributional attributes of the terms.
摘要:
System, methods, and user interface for managing emails and other message communications are disclosed as solutions for reducing confusion and avoiding mistakes in communications when replying to a message with multiple recipients. The mode of replying either to the sender only or to multiple recipients is first detected, and notifications and user interface objects are displayed to help the user to make sure the message is sent to the intended recipient, and to easily correct a mistake and switch the mode of reply for enhanced productivity. The notification can be displayed based on the user actions or based on the results of a text analysis of the content of the message.
摘要:
A system, methods, and user interface for organizing an unstructured collection of electronic objects in a list or group format are disclosed for more effectively locating and retrieving needed items from a large number of candidates. The electronic objects include various types of data objects, including files or folders or contacts. The methods include assigning importance measures to items in the collection based on various attributes associated with the objects. The attributes include metadata and attributes obtained from content analyzes of the objects, including a specific term, a term with a specific semantic attribute, a class of the object, and other attributes.