METHOD FOR FILTERING FILE CLUSTERS
    1.
    发明申请
    METHOD FOR FILTERING FILE CLUSTERS 有权
    过滤文件集的方法

    公开(公告)号:US20100293155A1

    公开(公告)日:2010-11-18

    申请号:US12773619

    申请日:2010-05-04

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30867

    摘要: A method for filtering file clusters is presented. In the method, a plurality of advanced filter actions with different filter conditions and independent from each other is performed on an obtained main result file. Thereby, a history record of each advanced filter is kept, and the history record of each advanced filter and respective search results are presented on a target interface in a presentation mode of opening a new page or updating an index list.

    摘要翻译: 介绍了一种过滤文件簇的方法。 在该方法中,对获得的主结果文件执行具有不同的滤波条件并且彼此独立的多个高级滤波器动作。 由此,保持每个高级滤波器的历史记录,并且以打开新页面或更新索引列表的呈现模式,在目标界面上呈现每个高级滤波器和各个搜索结果的历史记录。

    METHOD FOR MERGING DOCUMENT CLUSTERS
    2.
    发明申请
    METHOD FOR MERGING DOCUMENT CLUSTERS 有权
    合并文件集的方法

    公开(公告)号:US20100082625A1

    公开(公告)日:2010-04-01

    申请号:US12559964

    申请日:2009-09-15

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3071

    摘要: A method for merging document clusters includes the following steps. An association graph among document clusters is established. The association graph is an oriented graph. Each document cluster is represented by one node in the association graph, and each node is searched in a pair-wise manner. An oriented edge is established between any two nodes having associated weights there-between reaching a preset value. An arrow of the oriented edge points to a node capable of serving as a descriptor for the other node. An associated weight is assigned to the oriented edge to represent an association degree between the two nodes. Any two document clusters that can serve as a descriptor for each other and have an association degree there-between reaching a preset threshold value are merged into a single document cluster.

    摘要翻译: 合并文档集的方法包括以下步骤。 建立文档集群之间的关联图。 关联图是一个定向图。 每个文档集合由关联图中的一个节点表示,并且以成对的方式搜索每个节点。 在具有相关联权重的任何两个节点之间建立定向边缘,达到预设值。 定向边缘的箭头指向能够用作另一个节点的描述符的节点。 相关联的权重被分配给定向边缘以表示两个节点之间的关联度。 可以作为彼此的描述符并且具有相关度的任何两个文档集合在达到预设阈值之间被合并到单个文档集群中。

    Method for filtering out identical or similar documents
    3.
    发明授权
    Method for filtering out identical or similar documents 有权
    过滤相同或相似文件的方法

    公开(公告)号:US08185532B2

    公开(公告)日:2012-05-22

    申请号:US12561843

    申请日:2009-09-17

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30699

    摘要: A method for filtering out identical or similar documents includes storing multiple documents to be filtered as a pat tree (PT) data structure profile based on a pat tree data structure, searching for all string nodes with a consecutive character length reaching a lower threshold in the PT profile and all documents to which the string nodes belong, and finding documents having identical consecutive characters with a length reaching a higher threshold from the documents. Another technical solution includes searching for all string nodes with a consecutive character length reaching a lower threshold in the PT profile and all documents to which the string nodes belong, and finding documents having identical consecutive characters with such a length that a ratio of the length of the identical consecutive characters to a total character length of the original document reaches a ratio threshold from the documents, these documents are similarity.

    摘要翻译: 用于过滤相同或相似文档的方法包括基于拍摄树数据结构存储要被过滤的多个文档作为拍摄树(PT)数据结构简档,在所述文本中搜索连续字符长度达到较低阈值的所有字符串节点 PT配置文件和字符串节点所属的所有文档,以及从文档中找到长度达到更高阈值的相同连续字符的文档。 另一种技术方案包括搜索PT配置文件中连续字符长度达到较低阈值的所有字符串节点,以及字符串节点所属的所有文档,以及找到具有相同连续字符的文档,其长度与 与原始文档的总字符长度相同的连续字符从文档达到比例阈值,这些文档是相似的。

    METHOD FOR FILTERING OUT IDENTICAL OR SIMILAR DOCUMENTS
    4.
    发明申请
    METHOD FOR FILTERING OUT IDENTICAL OR SIMILAR DOCUMENTS 有权
    用于过滤身份或类似文件的方法

    公开(公告)号:US20100082626A1

    公开(公告)日:2010-04-01

    申请号:US12561843

    申请日:2009-09-17

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30699

    摘要: A method for filtering out identical or similar documents includes storing a plurality of documents to be filtered as a pat tree (PT) data structure profile based on a pat tree data structure, searching for all string nodes with a consecutive character length reaching a lower threshold in the PT profile and all documents to which the string nodes belong, and finding documents having identical consecutive characters with a length reaching a higher threshold from the documents. Another technical solution includes searching for all string nodes with a consecutive character length reaching a lower threshold in the PT profile and all documents to which the string nodes belong, and finding documents having identical consecutive characters with such a length that a ratio of the length of the identical consecutive characters to a total character length of the original document reaches a ratio threshold from the documents, these documents are similarity.

    摘要翻译: 用于过滤掉相同或相似文档的方法包括基于拍摄树数据结构存储要过滤的多个文档作为拍摄树(PT)数据结构简档,搜索具有连续字符长度达到较低阈值的所有字符串节点 在PT配置文件和字符串节点所属的所有文档中,以及从文档中找到长度达到更高阈值的相同连续字符的文档。 另一种技术方案包括搜索PT配置文件中连续字符长度达到较低阈值的所有字符串节点,以及字符串节点所属的所有文档,以及找到具有相同连续字符的文档,其长度与 与原始文档的总字符长度相同的连续字符从文档达到比例阈值,这些文档是相似的。