AUTO-MAINTAINED DOCUMENT CLASSIFICATION

发明申请

US20150012470A1 AUTO-MAINTAINED DOCUMENT CLASSIFICATION 有权

标题翻译：自动维护的文档分类

请登陆查看更多内容

专利标题： AUTO-MAINTAINED DOCUMENT CLASSIFICATION
专利标题（中）： 自动维护的文档分类
申请号： US14492914

申请日： 2014-09-22
公开(公告)号： US20150012470A1

公开(公告)日： 2015-01-08
发明人: Yigal S. Dayan , Gil Fuchs , Josemina M. Magdalen , Irit Maharian , Yariv Tzaban
申请人： International Business Machines Corporation
主分类号： G06N5/04
IPC分类号： G06N5/04 ; G06N99/00

摘要：

Machines, systems and methods for maintaining a representative data set in a document classification system, the method comprising: including an initial set of seed representative data in a representative data set (RDS) implemented for a knowledge base (KB), wherein the KB is trained to classify documents provided to a document classification system based on analysis of the representative documents included in the RDS and a set of rules, wherein the seed representative data includes a balanced number of representative data across a plurality of classes; updating the RDS by adding or removing representative data from the RDS based on feedback received about accuracy of classification of one or more documents by the classification system; and retraining the KB, wherein the retraining is performed based on occurrence of one or more events.

摘要（中）：

用于在文档分类系统中维护代表性数据集的机器，系统和方法，所述方法包括：在针对知识库（KB）实现的代表性数据集（RDS）中包括初始集合种子代表数据，其中所述知识库是根据对包括在RDS中的代表性文件的分析以及一组规则来对提供给文档分类系统的文档进行分类，其中种子代表数据包括跨多个类别的平均数量的代表性数据; 根据收到的关于分类系统对一个或多个文件的分类准确性的反馈，从RDS中添加或删除代表性数据来更新RDS; 并重新训练KB，其中基于一个或多个事件的发生执行再训练。

公开/授权文献

US09195947B2 Auto-maintained document classification 公开/授权日：2015-11-24

信息查询

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N5/00	利用基于知识的模式的计算机系统
G06N5/04	.推理方法或设备