-
公开(公告)号:US20240202214A1
公开(公告)日:2024-06-20
申请号:US18067770
申请日:2022-12-19
发明人: Rajesh Bordawekar
IPC分类号: G06F16/28 , G06F16/242
CPC分类号: G06F16/285 , G06F16/2433
摘要: Clustering data points of a relational database having special data types is performed by establishing logarithmic bins in which the data is collected. Special data types include (i) zero; (ii) positive and negative values; (iii) infinity (positive and negative); (iv) not-a-number values (NaNs); (v) out-of-range values; and (vi) IEEE DECFloat (decimal floating-point) values. The numerical data is mapped to bins according to their values and redistributed among the bins based on median bin value. An occupancy-based partitioning process assures each bin has no more than a pre-defined threshold percentage of the data. Assigning data bins to clusters facilitates prediction of placement of input values into a particular cluster for response to database queries.
-
公开(公告)号:US20240045866A1
公开(公告)日:2024-02-08
申请号:US17817428
申请日:2022-08-04
发明人: Rajesh Bordawekar , Prabhakar Kudva
IPC分类号: G06F16/2453 , G06F16/22 , G06F16/248
CPC分类号: G06F16/24549 , G06F16/2255 , G06F16/248
摘要: Systems, computer-implemented methods or computer program products to facilitate receiving results of a semantic structured query language (SQL) query and employing sparse hash-table based sketches to interpret a semantic structured query language (SQL) query result. A computing component stores a first space-efficient structure sketch in a compressed serialize form. The computing component can load a second space-efficient data structure sketch along with the first space-efficient data structure sketch and can compute one or more interpretability scores by extracting co-occurrence information from the first space-efficient data structure sketch. The second space-efficient data structure sketch can include a sketch for containment check.
-
公开(公告)号:US11244224B2
公开(公告)日:2022-02-08
申请号:US15926109
申请日:2018-03-20
发明人: Rajesh Bordawekar , Tin Kam Ho
摘要: A first observation window in a first time series is identified. The first observation window is preceded by a first portion of the first time series. A neural network is trained using the first portion of the first time series and the first observation window, and weights are extracted from the middle layers of the neural network. A first feature vector is generated based on the weights. A second observation window in a second time series is identified, where the second observation window is preceded by a first portion of the second time series. A second feature vector associated with the second observation window is determined. The second feature vector is based at least in part on the first set of weights. A similarity between the first and second observation windows is determined based on comparing the first feature vector and the second feature vector.
-
公开(公告)号:US20210124724A1
公开(公告)日:2021-04-29
申请号:US16665364
申请日:2019-10-28
发明人: Rajesh Bordawekar
IPC分类号: G06F16/22 , G06F16/242 , G06F16/28
摘要: A computer-implemented method according to one embodiment includes identifying a relational database; determining columns of interest within the relational database; creating an unordered group of string tokens for each row of the relational database, utilizing the determined columns of interest; assigning weights for one or more columns within the relational database to one or more string tokens within each unordered group of string tokens to create a plurality of weighted unordered groups of string tokens; and determining a meaning vector for an identifier of each row of the relational database, utilizing the plurality of weighted unordered groups of string tokens.
-
公开(公告)号:US20200159853A1
公开(公告)日:2020-05-21
申请号:US16197137
申请日:2018-11-20
发明人: Rajesh Bordawekar , Tin Kam Ho
摘要: From a first attribute-value pair in a record, new data comprising a first token is created. From each token using a processor and a memory, new data including a corresponding vector is computed. From the record, a target row is selected, wherein a target attribute-value pair in the target row includes a value requiring correction. Using a similarity measure, a set of most similar rows to the target row is determined, wherein each row in the set of most similar rows to the target row has a corresponding similarity measure above a threshold similarity measure and wherein each row in the set of most similar rows includes the target attribute. From values corresponding to the target attribute in the set of most similar rows, a replacement value is determined. The value requiring correction in the target row is replaced with the replacement value.
-
公开(公告)号:US10217053B2
公开(公告)日:2019-02-26
申请号:US14747062
申请日:2015-06-23
发明人: Rajesh Bordawekar , Ashish Kundu , Oded Shmueli
摘要: Disclosed is a system, computer program product, and method for provisioning a new service request. The computer-implemented method begins with receiving a new service request for computational resources in a computing system. The required computational resources are memory usage, storage usage, processor usage, or a combination thereof to fulfill the new service request. Next a sandbox computing environment is used to operate the new service request. The sandbox computing environment is used to isolate the computing system. The sandbox computing environment produces a current computational resources usage data to fulfill the new service request in the sandbox computing environment. The current sandbox computational resources usage data and historical computational resources usage data are both used by a machine learning module to create a prediction of the computational resources that will be required in the computing system to fulfill the new service request.
-
公开(公告)号:US09892149B2
公开(公告)日:2018-02-13
申请号:US14750363
申请日:2015-06-25
发明人: Rajesh Bordawekar , Daniel Brand , Minsik Cho , Ulrich Finkler , Ruchir Puri
IPC分类号: G06F17/30
CPC分类号: G06F17/30345 , G06F17/30324 , G06F17/30445 , G06F17/30598
摘要: Methods for sorting a data set. A data storage is divided into a plurality of buckets that is each associated with a respective key value. A plurality of stripes is identified in each bucket. At least one data stripe set is defined that has one stripe within each respective bucket. An in-place partial bucket radix sort is performed on data items contained within one data stripe set with a first processor using an initial radix. Incorrectly sorted data items are then grouped in each bucket into a respective incorrect data item group within each bucket. A radix sort is then performed using the initial radix on the items within the respective incorrect data item group. A first level sorted output is produced.
-
公开(公告)号:US20220269686A1
公开(公告)日:2022-08-25
申请号:US17184303
申请日:2021-02-24
发明人: Rajesh Bordawekar , Apoorva Nitsure
IPC分类号: G06F16/2457 , G06F16/2455 , G06N5/04 , G06F16/2453 , G06F11/34
摘要: Systems, computer-implemented methods and/or computer program products to facilitate interpretation of a result of execution of a query over a structured database are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a determination component that determines a result of execution of a query over a structured database. The computer executable components also can comprise an interpretation component that interprets data underlying the result of execution of the query to determine one or more reasons that the result is provided in response to the query.
-
公开(公告)号:US20220180253A1
公开(公告)日:2022-06-09
申请号:US17114644
申请日:2020-12-08
发明人: Rajesh Bordawekar , Tin Kam Ho
摘要: Data-parallel ensemble training using gradient boosted trees includes training an ensemble of trees. The training includes splitting a training dataset into several data portions. Each data portion is assigned to each thread group from a set of thread groups. The training further includes executing a stage, in which each thread group, in parallel, trains a respective ensemble of decision trees. Executing the stage includes performing, by each thread group, in parallel, machine learning operations for the respective ensemble of decision trees using the data portion assigned to each thread group. Further, each thread group validates, in parallel, the respective ensemble of decision trees using a data portion assigned to another thread group. Execution of the stage is repeated until a predetermined threshold is satisfied. Further, a prediction is inferenced using the ensemble of decision trees that is formed using the respective ensemble of trees from each of the thread groups.
-
公开(公告)号:US20210294794A1
公开(公告)日:2021-09-23
申请号:US16825509
申请日:2020-03-20
发明人: Rajesh Bordawekar , Tin Kam Ho
摘要: Structured and semi-structured databases and files are processed using natural language processing techniques to impute data for null value tokens in database records from other records that have non-null values for the same attributes. Vector embedding techniques are used, including, in some cases, appropriately tagging null value tokens to reduce or eliminate their undue impact on semantic vectors generating using a neural network.
-
-
-
-
-
-
-
-
-