System and methods for processing fuzzy expressions in search engines and for information extraction

    公开(公告)号:US10698977B1

    公开(公告)日:2020-06-30

    申请号:US16207402

    申请日:2018-12-03

    申请人: Guangsheng Zhang

    发明人: Guangsheng Zhang

    摘要: System and methods for enhancing search engine functionality by enabling and providing a new search function based on fuzzy expressions in a query string. When a query is received by a search engine, it is first analyzed to identify whether the query contains an expression that represents a fuzzy reference to certain objects or properties of objects, or object with certain properties, to overcome the limitations of the keyword-matching methods used by conventional search engines. For example, the present invention can accurately retrieve results for a query such as “find large-screen smart-phones” or “find light-weighted computers”, by understanding the meaning of the query and automatically identifying objects with applicable properties and mapping the meaning of the expression to such objects.

    System, methods, and user interface for presenting information from unstructured data

    公开(公告)号:US09659084B1

    公开(公告)日:2017-05-23

    申请号:US14222591

    申请日:2014-03-22

    申请人: Guangsheng Zhang

    发明人: Guangsheng Zhang

    IPC分类号: G06F17/30

    摘要: A system, methods, and user interface for extracting information from unstructured data sources and presenting such information in a structured or semi-structured format for better information search and utilization, and can be applied to replace the conventional methods of displaying search results. The methods identify terms representing topics and related comments in various types of text contents including documents and Web pages, and extract such terms and present them in a form of a topic-comment or object-properties hierarchy, including a heading+list format and heading+cloud or group format. Methods and interface object are provided to make a file object a non-terminal node in a computer file system, with information extracted from the file content displayed as deeper levels of the file system hierarchy. Methods for displaying information extracted from unstructured document contents in terms of class-members and topic-attributes are also disclosed.

    System and methods for searching objects and providing answers to queries using association data
    3.
    发明授权
    System and methods for searching objects and providing answers to queries using association data 有权
    用于搜索对象的系统和方法,并使用关联数据提供对查询的答案

    公开(公告)号:US09367608B1

    公开(公告)日:2016-06-14

    申请号:US14214955

    申请日:2014-03-16

    申请人: Guangsheng Zhang

    发明人: Guangsheng Zhang

    IPC分类号: G06F7/00 G06F17/30

    摘要: System and methods are disclosed for providing answers to search queries, and for searching using association data without requiring keyword matching. Datasets representing objects and their properties are created from unstructured data sources based on natural language analysis methods, and can be used to answer queries about objects or properties of objects. Implementations include general information search engines and embodiments for searching products, services, people, or other objects without knowing the names of such objects, or searching for information about known objects by using either keyword-based queries or natural language queries such as asking questions. System and methods are also provided for creating a structured or semi-structured representation of various unstructured data, in contrast to the conventional term-vector or term-document matrix representation.

    摘要翻译: 公开了用于提供搜索查询的答案以及使用关联数据搜索而不需要关键字匹配的系统和方法。 表示对象及其属性的数据集是基于自然语言分析方法从非结构化数据源创建的,可用于回答关于对象或对象属性的查询。 实现包括一般信息搜索引擎和用于搜索产品,服务,人或其他对象的实施例,而不知道这些对象的名称,或者通过使用基于关键字的查询或诸如提问的自然语言查询来搜索关于已知对象的信息。 与传统的术语向量或术语文档矩阵表示相比,还提供了系统和方法用于创建各种非结构化数据的结构化或半结构化表示。

    Topic discovery, summary generation, automatic tagging, and search indexing for segments of a document
    4.
    发明授权
    Topic discovery, summary generation, automatic tagging, and search indexing for segments of a document 有权
    主题发现,摘要生成,自动标记和文档段的搜索索引

    公开(公告)号:US09015153B1

    公开(公告)日:2015-04-21

    申请号:US13845087

    申请日:2013-03-18

    申请人: Guangsheng Zhang

    发明人: Guangsheng Zhang

    IPC分类号: G06F17/30 G06F17/28

    摘要: System and methods are disclosed for discovering topics in sub-segments of documents, and extracting terms from a sub-segment representing topics or summaries of the sub-segment, and displaying such terms in connection with the sub-segment or with the document, which can also function as automatically generated tags or labels for the segments or for the documents. Methods are also disclosed for building search indexes based on specific sub-segments of documents, such that, users can search for contents in a specific segment of the document. One embodiment of such a search index is with emails, blogs, and forum articles that typically contain segmented contents added at different times or by different authors in a format known as a thread, and searching in a specific segment such as the most recently added segment can help quickly find the most relevant information without repeating the same information in other segments in the thread.

    摘要翻译: 公开了用于发现文档子段中的主题的系统和方法,以及从表示子段的主题或摘要的子段中提取术语,并且与子段或文档一起显示这些术语,其中 也可以作为片段或文档的自动生成的标签或标签。 还公开了基于文档的特定子段构建搜索索引的方法,使得用户可以搜索文档的特定段中的内容。 这样的搜索索引的一个实施例是电子邮件,博客和论坛文章,其通常包含以不同时间添加的分段内容或不同作者以被称为线程的格式添加,并且在诸如最近添加的片段之类的特定片段中进行搜索 可以帮助快速找到最相关的信息,而不会在线程中的其他段中重复相同的信息。

    System and methods for automated document topic discovery, browsable search and document categorization
    5.
    发明授权
    System and methods for automated document topic discovery, browsable search and document categorization 有权
    用于自动文档主题发现,可浏览搜索和文档分类的系统和方法

    公开(公告)号:US08843476B1

    公开(公告)日:2014-09-23

    申请号:US12782545

    申请日:2010-05-18

    申请人: Guangsheng Zhang

    发明人: Guangsheng Zhang

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30719

    摘要: A computer-assisted method for discovering topics in a document collection is disclosed. The method includes obtaining a group of text units in the document collection, tokenizing the words in the group of text units to produce a plurality of tokens that include a jth token, and adding a weighting coefficient to a parameter token_j_count for each text unit in the first group that includes the jth token. The weighting coefficient is dependent on the grammatical role of the jth token. The method includes calculating an internal term prominence value (ITP) using token_j_count, selecting one or more tokens from the tokens based on the ITP values of the respective tokens, and outputting the one or more selected tokens as topic terms associated with the document collection.

    摘要翻译: 公开了一种用于发现文档集合中的主题的计算机辅助方法。 所述方法包括:获取所述文档集合中的一组文本单元,对所述文本单元组中的单词进行标记,以产生包含第j个令牌的多个令牌,以及将权重系数添加到所述文本单元中的每个文本单元的参数token_j_count 第一组包含第j个令牌。 加权系数取决于第j个令牌的语法作用。 该方法包括使用token_j_count计算内部术语突出值(ITP),基于相应令牌的ITP值从令牌中选择一个或多个令牌,并将一个或多个所选令牌输出为与该文档集合相关联的主题项。

    SYSTEM AND METHODS FOR DETERMINING SENTIMENT BASED ON CONTEXT
    6.
    发明申请
    SYSTEM AND METHODS FOR DETERMINING SENTIMENT BASED ON CONTEXT 审中-公开
    基于语境确定感知的系统和方法

    公开(公告)号:US20140278365A1

    公开(公告)日:2014-09-18

    申请号:US13844980

    申请日:2013-03-17

    IPC分类号: G06F17/27

    CPC分类号: G06F17/2785 G06F17/274

    摘要: System and methods are disclosed for determining the connotation or sentiment type of a text unit comprising multiple terms and with a grammatical structure, such as subject+verb, verb+object, adjective+noun, noun+noun, noun+preposition+noun. The connotation or sentiment type of the text unit is determined by applying context rules where the context of the grammatical structure may change the inherent or default connotations of individual terms in the text unit. The methods provide a solution to the challenge of correctly or accurately determining the sentiment type of various linguistic structures under different context, and to the simplistic approach of using the inherent or default connotation of individual terms for the linguistic structure containing such terms.

    摘要翻译: 公开了用于确定包括多个术语和语法结构的文本单元的内涵或情感类型的系统和方法,例如主语+动词,动词+对象,形容词+名词,名词+名词,名词+介词+名词。 文本单元的内涵或情感类型通过应用上下文规则来确定,其中语法结构的上下文可以改变文本单元中各个术语的固有或默认内涵。 这些方法提供了解决正确或准确地确定不同背景下各种语言结构的情绪类型的挑战的解决方案,以及对包含这些术语的语言结构使用个体术语的固有或默认内涵的简单方法。

    SYSTEM AND METHODS FOR QUANTITATIVE ASSESSMENT OF INFORMATION IN NATURAL LANGUAGE CONTENTS
    7.
    发明申请
    SYSTEM AND METHODS FOR QUANTITATIVE ASSESSMENT OF INFORMATION IN NATURAL LANGUAGE CONTENTS 有权
    用于定量评估自然语言信息的系统和方法

    公开(公告)号:US20100174526A1

    公开(公告)日:2010-07-08

    申请号:US12573134

    申请日:2009-10-04

    申请人: Guangsheng Zhang

    发明人: Guangsheng Zhang

    IPC分类号: G06F17/27

    CPC分类号: G06F17/271 G06F17/30654

    摘要: A method is disclosed for quantitatively assessing information in natural language contents related to an object name. The method includes identifying a sentence in a document, determining a subject and a predicate in the sentence, and retrieving an object-specific data set related to the object name. The object-specific data set includes property names and association-strength values. Each property name is associated with an association-strength value. The method also includes identifying a first property name in the property names that matches the subject, assigning a first association-strength value associated with the first property name to the subject, identifying a second property name in the property names that matches the predicate, assigning a second association-strength value associated with the second property name to the predicate, and multiplying the first association-strength value and the second association-strength value to produce a sentence information index.

    摘要翻译: 公开了一种用于定量评估与对象名称相关的自然语言内容中的信息的方法。 该方法包括识别文档中的句子,确定句子中的对象和谓词,以及检索与对象名称相关的对象特定数据集。 对象特定的数据集包括属性名称和关联强度值。 每个属性名称都与关联强度值相关联。 该方法还包括识别属性名称中与匹配主题的第一属性名称,将与第一属性名称相关联的第一关联强度值分配给对象,识别与谓词匹配的属性名称中的第二属性名称,分配 与所述谓词相关联的第二属性名称的第二关联强度值,并且将所述第一关联强度值和所述第二关联强度值相乘以产生句子信息索引。

    System and methods for discovering, presenting, and accessing information in a collection of text contents

    公开(公告)号:US10387469B1

    公开(公告)日:2019-08-20

    申请号:US14289636

    申请日:2014-05-28

    申请人: Guangsheng Zhang

    发明人: Guangsheng Zhang

    IPC分类号: G06F16/35

    摘要: System and methods are disclosed for discovering and presenting prominent information in a collection of text contents by identifying prominent terms in the text contents, and displaying the terms as either category nodes for organizing the contents in the collection, or as topics in the text contents, or as labels or tags for highlighting the contents in the collection, or for searching the contents in the collection. Methods include distinguishing the grammatical attributes associated with the terms, including the grammatical attributes of a subject and non-subject of a sentence, or a multi-word phrase and a sub-phrase, or a head and a modifier in a phrase, and other distributional attributes of the terms.

    System, methods, and user interface for organizing unstructured data objects
    10.
    发明授权
    System, methods, and user interface for organizing unstructured data objects 有权
    用于组织非结构化数据对象的系统,方法和用户界面

    公开(公告)号:US09430131B1

    公开(公告)日:2016-08-30

    申请号:US14225422

    申请日:2014-03-25

    申请人: Guangsheng Zhang

    发明人: Guangsheng Zhang

    摘要: A system, methods, and user interface for organizing an unstructured collection of electronic objects in a list or group format are disclosed for more effectively locating and retrieving needed items from a large number of candidates. The electronic objects include various types of data objects, including files or folders or contacts. The methods include assigning importance measures to items in the collection based on various attributes associated with the objects. The attributes include metadata and attributes obtained from content analyzes of the objects, including a specific term, a term with a specific semantic attribute, a class of the object, and other attributes.

    摘要翻译: 公开了用于以列表或组格式组织电子对象的非结构化集合的系统,方法和用户界面,用于更有效地从大量候选者中定位和检索所需项目。 电子对象包括各种类型的数据对象,包括文件或文件夹或联系人。 这些方法包括基于与对象相关联的各种属性将重要性度量分配给集合中的项目。 属性包括从对象的内容分析获得的元数据和属性,包括特定术语,具有特定语义属性的术语,对象的类和其他属性。