Detection of sensitive personal information in a storage device

    公开(公告)号:US11308235B2

    公开(公告)日:2022-04-19

    申请号:US16812029

    申请日:2020-03-06

    IPC分类号: G06F16/00 G06F21/62

    摘要: A method, system and computer program product for detecting sensitive personal information in a storage device. A block delta list containing a list of changed blocks in the storage device is processed. After identifying the changed blocks from the block delta list, a search is performed on those identified changed blocks for sensitive personal information using a character scanning technique. After identifying a changed block deemed to contain sensitive personal information, the changed block is translated from the block level to the file level using a hierarchical reverse mapping technique. By only analyzing the changed blocks to determine if they contain sensitive personal information, a lesser quantity of blocks needs to be processed in order to detect sensitive personal information in the storage device in near real-time. In this manner, sensitive personal information is detected in the storage device using fewer computing resources in a shorter amount of time.

    Serving data assets based on security policies by applying space-time optimized inline data transformations

    公开(公告)号:US11210410B2

    公开(公告)日:2021-12-28

    申请号:US16573326

    申请日:2019-09-17

    IPC分类号: G06F21/60 G06F21/62

    摘要: Serving data assets based on security policies is provided. A request to access an asset received from a user having a particular context is evaluated based on a set of asset access enforcement policies. An asset access policy enforcement decision is generated based on evaluating the request. It is determined whether the asset access policy enforcement decision is to transform particular data of the asset prior to allowing access. In response to determining that the asset access policy enforcement decision is to transform the particular data of the asset prior to allowing access, a transformation specification that includes an ordered subset of unit transformations for transforming the particular data of the asset is generated based on the particular context of the user and the set of asset access enforcement policies. A transformed asset is generated by applying the transformation specification to the asset transforming the particular data of the asset.

    Handling queries in document systems using segment differential based document text-index modelling

    公开(公告)号:US11157477B2

    公开(公告)日:2021-10-26

    申请号:US16202215

    申请日:2018-11-28

    IPC分类号: G06F16/21 G06F16/22

    摘要: A method, computer system, and computer program product for segment differential-based document text-index modeling are provided. The embodiment may include receiving, by a processor, a document with a valid document ID and version ID tuple. The embodiment may also include determining the received document is a new version of a previously stored document and consequently multiplexing versions of the document into a single indexed document. The embodiment may further include segmenting the received document and building a token vector. The embodiment may also include calculating a difference between the received new version of the document and the previously stored document using information obtained from the segmentation. The embodiment may further include in response to the calculated difference being below a pre-configured threshold value, discarding the received new version.

    Extending a content repository using an auxiliary data store

    公开(公告)号:US09606998B2

    公开(公告)日:2017-03-28

    申请号:US14298111

    申请日:2014-06-06

    IPC分类号: G06F7/00 G06F17/00 G06F17/30

    摘要: According to one embodiment of the present invention, a system extends a content repository by creating an auxiliary data store outside of the content repository and storing auxiliary data in the auxiliary data store, wherein the auxiliary data is associated with a collection of documents in the content repository. The system stores version information for the auxiliary data store and records of operations against the auxiliary data store in a log in the repository. In response to receiving a request for an operation against the auxiliary data store, the system determines that the auxiliary data store and repository are consistent based on the version information and applies the operation against the auxiliary data store. Embodiments of the present invention further include a method and computer program product for extending a content repository data model in substantially the same manners described above.

    Distributing posting lists to processing elements

    公开(公告)号:US11151132B2

    公开(公告)日:2021-10-19

    申请号:US16440971

    申请日:2019-06-13

    摘要: Provided are a computer program product, system, and method for distributed processing of a query with distributed posting lists. A dispatch map has entries, wherein each entry identifies one of a plurality of terms in a dictionary, wherein for each of the terms there is a posting list identifying zero or more objects including the term, wherein at least one of the dispatch map entries indicate at least one distributed processing element including the posting list for the term. The dispatch map is used to dispatch sub-expressions comprising portions of a query to distributed processing elements having the posting lists for terms in the sub-expressions, wherein the distributed processing elements distributed the sub-expressions execute the sub-expressions on the posting lists for the terms in the sub-expression.

    DISCOVERING LATENT CUSTODIANS AND DOCUMENTS IN AN E-DISCOVERY SYSTEM

    公开(公告)号:US20210263977A1

    公开(公告)日:2021-08-26

    申请号:US16795678

    申请日:2020-02-20

    摘要: Discovering second-order documents and latent custodians in an e-discovery system is provided. A list of first-order documents and document custodians within a base state of the e-discovery system are identified based on a plurality of terms corresponding to a meet and confer practice for a legal matter instance. The plurality of terms is masked within the first-order documents. The first-order documents having the plurality of terms masked are divided into groups. A list of second-order documents is generated from a group of documents. A list of second-order document custodians is generated based on corresponding custodian relationships to second-order documents. Finally, each second-order document custodian in the list of second-order document custodians that has a corresponding rank exceeding a defined rank threshold level is identified as an official document custodian in the e-discovery system.