Focused probabilistic entity resolution from multiple data sources

    公开(公告)号:US12229154B2

    公开(公告)日:2025-02-18

    申请号:US18330746

    申请日:2023-06-07

    Abstract: Various systems and methods are provided for performing soft entity resolution. A plurality of data objects are retrieved from a plurality of data stores to create aggregated data objects for one or more entities. One or more retrieved data objects may be associated with the same entity, based at least in part upon one or more attribute types and attribute values of the data objects. In response to a determination that the one or more of the retrieved data objects should be associated with the same entity, metadata is generated that associates the data objects with the entity, the metadata being stored separately from the data objects, such that the underlying data objects remain unchanged. In addition, one or more additional attributes may be determined for the entity, based upon the data objects associated with the entity.

    Workflow driven database partitioning

    公开(公告)号:US12056128B2

    公开(公告)日:2024-08-06

    申请号:US17820067

    申请日:2022-08-16

    Inventor: James Ding

    Abstract: A database is configured to analyze user queries to dynamically partition the database according to a partition scheme. User queries can be rewritten based on the partition scheme so that, in response to queries, partitions including relevant data are read while partitions including irrelevant data can be skipped, reducing latency. Files can be named according to the partition scheme and stored on respective partitions so that low partition management can be implemented by underlying systems. Blocks within files can be sorted and statistics can be determined. The statistics can be used to find and read relevant blocks and skip irrelevant blocks.

    FOCUSED PROBABILISTIC ENTITY RESOLUTION FROM MULTIPLE DATA SOURCES

    公开(公告)号:US20200065310A1

    公开(公告)日:2020-02-27

    申请号:US16562201

    申请日:2019-09-05

    Abstract: Various systems and methods are provided for performing soft entity resolution. A plurality of data objects are retrieved from a plurality of data stores to create aggregated data objects for one or more entities. One or more retrieved data objects may be associated with the same entity, based at least in part upon one or more attribute types and attribute values of the data objects. In response to a determination that the one or more of the retrieved data objects should be associated with the same entity, metadata is generated that associates the data objects with the entity, the metadata being stored separately from the data objects, such that the underlying data objects remain unchanged. In addition, one or more additional attributes may be determined for the entity, based upon the data objects associated with the entity.

    SYSTEMS AND METHODS FOR MANAGING CUSTOM CODE IN A DATA COMPUTING PLATFORM

    公开(公告)号:US20230283610A1

    公开(公告)日:2023-09-07

    申请号:US18197501

    申请日:2023-05-15

    Inventor: James Ding

    CPC classification number: H04L63/101 H04L63/0281 H04L63/20

    Abstract: A system for managing custom code within a data computing platform determines that a request for one or more uniform resource identifiers external to the platform is being made by custom code executing in the platform. In response to the determination, the system checks a whitelist of allowable external URIs against the requested one or more URIs and allows access to the requested one or more URIs if a match is detected with the whitelist, otherwise access by the custom code to the requested one or more URIs is denied. In addition, or alternatively, the system checks a blacklist of disallowed external URIs against the requested one or more URIs and denies access to the requested one or more URIs if a match is detected with the blacklist, otherwise access by the custom code to the requested one or more URIs is allowed. The blacklist can override the whitelist.

    Workflow driven database partitioning

    公开(公告)号:US11449509B2

    公开(公告)日:2022-09-20

    申请号:US16797583

    申请日:2020-02-21

    Inventor: James Ding

    Abstract: A database is configured to analyze user queries to dynamically partition the database according to a partition scheme. User queries can be rewritten based on the partition scheme so that, in response to queries, partitions including relevant data are read while partitions including irrelevant data can be skipped, reducing latency. Files can be named according to the partition scheme and stored on respective partitions so that low partition management can be implemented by underlying systems. Blocks within files can be sorted and statistics can be determined. The statistics can be used to find and read relevant blocks and skip irrelevant blocks.

    Focused probabilistic entity resolution from multiple data sources

    公开(公告)号:US11294915B2

    公开(公告)日:2022-04-05

    申请号:US16562201

    申请日:2019-09-05

    Abstract: Various systems and methods are provided for performing soft entity resolution. A plurality of data objects are retrieved from a plurality of data stores to create aggregated data objects for one or more entities. One or more retrieved data objects may be associated with the same entity, based at least in part upon one or more attribute types and attribute values of the data objects. In response to a determination that the one or more of the retrieved data objects should be associated with the same entity, metadata is generated that associates the data objects with the entity, the metadata being stored separately from the data objects, such that the underlying data objects remain unchanged. In addition, one or more additional attributes may be determined for the entity, based upon the data objects associated with the entity.

Patent Agency Ranking