SYSTEMS AND METHODS FOR LARGE-SCALE DATA PROCESSING

    公开(公告)号:US20240020282A1

    公开(公告)日:2024-01-18

    申请号:US17865945

    申请日:2022-07-15

    CPC classification number: G06F16/212 G06F16/258

    Abstract: Systems and methods for authoring workflows for processing data from a large-scale dataset include defining a metadata schema for the large-scale dataset, and receiving user input defining a workflow as a plurality of operations to be performed on the data. Each of the operations includes input metadata formatted according to the metadata schema. The input metadata describes input data to be processed by the operation and identifying a location for the input data in the data storage system, programmed instructions for performing an atomic operation on the input data to generate output data; and output metadata formatted according to the metadata schema. The output metadata describes the output data and identifying a location for the output data in the data storage system.

    Geospatial Image Processing for Targeted Data Acquisition

    公开(公告)号:US20220375031A1

    公开(公告)日:2022-11-24

    申请号:US17323358

    申请日:2021-05-18

    Abstract: A computer implemented method includes obtaining data for raw image frames captured by a moving camera. The raw image frames are indexed geographically, and a graph is created from the multiple raw image frames. The graph includes image frames as vertices and edges that represent image frames having overlapping image information. The method further includes skipping frames based on the amount of overlap, determining a frame having an interesting feature, using the graph to find additional raw image frames that have the interesting feature, combining multiple raw image frames to form a unique image frame, and transmitting the unique image frame.

    IMAGE DATA SEGMENTATION AND TRANSMISSION

    公开(公告)号:US20210136171A1

    公开(公告)日:2021-05-06

    申请号:US16746105

    申请日:2020-01-17

    Abstract: A computing device is provided, including a logic subsystem with one or more processors, and memory storing instructions executable by the logic subsystem. These instructions are executed to obtain one or more source images, segment the one or more source images to generate a plurality of segments, determine a priority order for the plurality of segments, and transmit the plurality of segments to a remote computing device in the priority order. The plurality of segments are spatial components generated by spatial decomposition of the one or more source images and/or frequency components that are generated by frequency decomposition of the one or more source images. A remote computing device may receive these components in priority order, and perform certain algorithms on individual components without waiting for the entire image to upload.

    DECENTRALIZED DATA PLATFORM
    4.
    发明申请

    公开(公告)号:US20220405126A1

    公开(公告)日:2022-12-22

    申请号:US17354200

    申请日:2021-06-22

    Abstract: Data from data sources may be processed at an edge device. The edge device may generate a local processing result, filter the data, and/or prioritize the data. Accordingly, data is transmitted from the edge device to the data platform, where it may be processed further. For example, a local processing result may be processed at the data platform, such that processing is performed without all of the data source data. In examples, at least a part of such data may remain at an edge device. The edge device may maintain a manifest of data stored by the edge device. The data platform may generate an aggregated manifest using manifests from associated edge devices, such that it may be determined where data is stored. As a result, the data platform may redirect requests to an associated edge device when it is determined that requested data is remote from the data platform.

    IMAGE DATA SEGMENTATION AND TRANSMISSION

    公开(公告)号:US20220263921A1

    公开(公告)日:2022-08-18

    申请号:US17661970

    申请日:2022-05-04

    Abstract: A computing device is provided, including a logic subsystem with one or more processors, and memory storing instructions executable by the logic subsystem. These instructions are executed to obtain one or more source images, segment the one or more source images to generate a plurality of segments, determine a priority order for the plurality of segments, and transmit the plurality of segments to a remote computing device in the priority order. The plurality of segments are spatial components generated by spatial decomposition of the one or more source images and/or frequency components that are generated by frequency decomposition of the one or more source images. A remote computing device may receive these components in priority order, and perform certain algorithms on individual components without waiting for the entire image to upload.

    MACHINE LEARNING SOLUTION TO PREDICT PROTEIN CHARACTERISTICS

    公开(公告)号:US20240055100A1

    公开(公告)日:2024-02-15

    申请号:US18146123

    申请日:2022-12-23

    CPC classification number: G16H20/60

    Abstract: This disclosure provides a machine learning technique to predict a protein characteristic. A first training set is created that includes, for multiple proteins, a target feature, protein sequences, and other information about the proteins. A first machine learning model is trained and then used to identify which of the features are relevant as determined by feature importance or causal relationships to the target feature. A second training set is created with only the relevant features. Embeddings generated from the protein sequences are also added to the second training set. The second training set is used to train a second machine learning model. The first and second machine learning models may be any type of regressors. Once trained, the second machine learning model is used to predict a value for the target feature for an uncharacterized protein. The model of this disclosure provides 91% accuracy in predicting an ideal digestibility score.

    POLLUTANT SENSOR PLACEMENT
    8.
    发明公开

    公开(公告)号:US20230169222A1

    公开(公告)日:2023-06-01

    申请号:US17726206

    申请日:2022-04-21

    CPC classification number: G06F30/13

    Abstract: A method for pollutant sensor placement for pollutants from point sources is described. Data about environmental characteristics for a geographic region are received from a plurality of environmental sensors. The geographic region includes pollutant sources that emit a pollutant. The received data from one or more of the plurality of environmental sensors are transformed into common data having a common spatial and temporal discretization across the geographic region. Predicted emission plumes are generated for the pollutant sources within the geographic region that identify pollutant detection regions for the pollutant when the pollutant is emitted by the pollutant sources using the common data. Sensor locations for a plurality of pollutant sensors are greedily selected across the common spatial and temporal discretization according to a number of predicted emission plumes that are detectable by the plurality of pollutant sensors.

Patent Agency Ranking