摘要:
A computer-implemented method for gathering knowledge within an organization for supporting the preparation, animation, and execution of a collaborative workshop for high speed and efficient document management and labeling. Printed documents are tracked within the system over a specified amount of time to acquire print job information from the jobs printed within an organization. Based upon the documents retrieved, a list of users is determined and invited to review and annotate the list of documents. The list of documents is then narrowed down to an optimized set for ease of labeling and clustering. Provision is made for user-annotation of the classification label associated with the submitted print jobs including a reason for printing the print job. User-annotations are received for at least some of the submitted print jobs. The print jobs may be clustered into clusters based on the print job representations and annotations. A representation of the set of print jobs is generated which represents the agreed upon labels for a set of documents with similar traits in at least one of the clusters, based on the user provided labels.
摘要:
A method and apparatus for creating a file directory of documents in a database that are clustered based on one or more high level features are disclosed. For example, the method includes identifying the one or more high level features for each one of a plurality of documents stored in the database, comparing the one or more high level features of the each one of the plurality of documents to other documents of the plurality of documents, grouping documents of the plurality of documents into a plurality of clusters based on common high level features that are identified in the comparing and creating the file directory of documents in the database based on the plurality of clusters.
摘要:
Disclosed is a method and system of differential processing a print job including one or more original documents to render an obfuscated version of the print job. According to an exemplary method, the differential process replaces letters of an original document with randomly selected characters of substantially the same size and location as the original document and objects such as images/graphics are replaced with blurred versions of substantially the same size and locations as the objects in the original document. The differential process creates an obfuscated version of the print job which is illegible and useful for further processing where privacy of documents included in the print job is required.
摘要:
A system and method that supports the efficient interactive identification of the most paper intensive document categories such that a maximum number of the documents belonging to those categories can be correctly categorized with a minimum effort and within a minimum amount of time is disclosed. Further, an iterative method combining automatic grouping mechanisms with human labelling. The system and method are configured to allow the automatic machine labelling to run iteratively to generate improved document clustering and categorization.
摘要:
Disclosed is a method and system of differential processing a print job including one or more original documents to render an obfuscated version of the print job. According to an exemplary method, the differential process replaces letters of an original document with randomly selected characters of substantially the same size and location as the original document and objects such as images/graphics are replaced with blurred versions of substantially the same size and locations as the objects in the original document. The differential process creates an obfuscated version of the print job which is illegible and useful for further processing where privacy of documents included in the print job is required.