CLEANING AND ORGANIZING SCHEMALESS SEMI-STRUCTURED DATA FOR EXTRACT, TRANSFORM, AND LOAD PROCESSING

    公开(公告)号:US20240193176A1

    公开(公告)日:2024-06-13

    申请号:US18581856

    申请日:2024-02-20

    IPC分类号: G06F16/25 G06F16/21 G06F16/28

    摘要: In some implementations, a system may obtain, from a first data repository, a first dataset that includes event data associated with a generic schema. The system may infer an event-specific schema that defines an organizational structure for the event data based on common attributes identified among a plurality of events included in the event data using one or more data analytics functions. The system may store, in a second data repository, a second dataset in which the event data is partitioned based on the organizational structure defined by the event-specific schema. The system may generate a third dataset that includes a subset of the event data included in the second dataset that satisfies one or more registration parameters related to an extract, transform, load (ETL) use case. The system may provide the third dataset to an ETL system configured to process the third dataset based on the ETL use case.

    Mechanisms for Deploying Database Clusters
    72.
    发明公开

    公开(公告)号:US20240193154A1

    公开(公告)日:2024-06-13

    申请号:US18417790

    申请日:2024-01-19

    申请人: Salesforce, Inc.

    摘要: Techniques are disclosed that pertain to deploying immutable instances of a system. A computer system may maintain an active generation value that indicates an immutable instance of a database system that is permitted to write data to a database. The computer system may deploy a first immutable instance of the database system and update the active generation value to permit the first immutable instance to write data to the database. The computer system may receive a request to deploy a second immutable instance of the database system that includes an update not found in the first immutable instance. The computer system may deploy the second immutable instance and update the active generation value to cause the first immutable instance to cease writing data to the database and to permit the second immutable instance to write data to the database.

    System For Live-Migration and Automated Recovery of Applications in a Distributed System

    公开(公告)号:US20240184805A1

    公开(公告)日:2024-06-06

    申请号:US18439248

    申请日:2024-02-12

    申请人: Google LLC

    发明人: Luke Marsden

    IPC分类号: G06F16/27 G06F11/20 G06F16/21

    摘要: A method and apparatus for distribution of applications amongst a number of servers, ensuring that changes to application data on a master for that application are asynchronously replicated to a number of slaves for that application. Servers may be located in geographically diverse locations; the invention permits data replication over high-latency and lossy network connections and failure-tolerance under hardware and network failure conditions. Access to applications is mediated by a distributed protocol handler which allows any request for any application to be addressed to any server, and which, when working in tandem with the replication system, pauses connections momentarily to allow seamless, consistent live-migration of applications and their state between servers. Additionally, a system which controls the aforementioned live- migration based on dynamic measurement of load generated by each application and the topological preferences of each application, in order to automatically keep servers at an optimum utilisation level.

    INFERRING A DATASET SCHEMA FROM INPUT FILES
    77.
    发明公开

    公开(公告)号:US20240184754A1

    公开(公告)日:2024-06-06

    申请号:US18438301

    申请日:2024-02-09

    发明人: Nir Ackner Eric Lin

    IPC分类号: G06F16/21 G06F3/06 G06F40/205

    摘要: A method comprises selecting a sample excerpt from a data input file; in response to the determining that a first row in the sample excerpt does not contain a delimited value and a second row does contain a delimited value, determining that the first row consists of header data; identifying one or more jagged rows based on row delimiters that were erroneously placed; causing displaying text that led to creation of a jagged row; receiving an addition or removal of a specific row delimiter to the text; updating the sample excerpt based on the addition or the removal; analyzing the sample excerpt to determine a row delimiter for the data input file; identifying a plurality of rows that is not included in the header data; identifying a plurality of candidate column delimiters and generating a candidate schema for the data input file.

    Geolocation of wireless network users

    公开(公告)号:US12004114B2

    公开(公告)日:2024-06-04

    申请号:US17346179

    申请日:2021-06-11

    摘要: A method includes selecting a first machine learning model from a plurality of machine learning models that are trained for use in performing geolocation, wherein the first machine learning model is selected to perform geolocation within a first cell of a plurality of cells of a wireless network, acquiring event data from a plurality of wireless devices within the first cell, grouping the event data into a plurality of records, wherein each record of the plurality of records contains event data that indicates a common wireless device of the plurality of wireless devices, a common cell of the plurality of cells, and a common timestamp, and generating a predicted location of a first wireless device of the plurality of wireless devices, using the first machine learning model, wherein the first machine learning model outputs the predicted location in response to an input of a record of the plurality of records.