-
公开(公告)号:US20240339112A1
公开(公告)日:2024-10-10
申请号:US18295973
申请日:2023-04-05
发明人: Ella Rabinovich , Matan Vetzler , Samuel Solomon Ackerman , Ateret Anaby - Tavor , Eitan Daniel Farchi , Orna Raz
CPC分类号: G10L15/1815 , G06N5/022 , G10L15/063 , G10L2015/0631 , G10L2015/0638
摘要: Various systems and methods are presented regarding detecting data drift. The data of interest can be batches of utterances received at an interface (e.g., a chatbot). The batches of utterances can be compared with topics present in training data utilized to train a data classifier (e.g., an autoencoder), wherein topics identified in the batches of utterances that are not present in the training data can be considered to be novel topics. The greater the presence of novel topics in a batch of utterances, the greater the divergence of the batch of utterances from the content of the training data. The novel topics can be identified and subsequently applied to the training data such that the data classifier can be re-trained with the novel topics, thereby causing the data classifier to be contemporaneous with the novel topics. In an embodiment, the utterances can be short streams of text, symbols, and suchlike.
-
公开(公告)号:US20240202575A1
公开(公告)日:2024-06-20
申请号:US18069150
申请日:2022-12-20
IPC分类号: G06N20/00
CPC分类号: G06N20/00
摘要: A computer hardware system includes a slice generator and a policy generator and performs the following. The slice generator slices a first dataset including true values and predicted values of a class variable into a plurality of slices each defining a plurality of observations within the first dataset. A first one and another one of the plurality of slices are selected, and a union of observations is generated by adding observations within the selected another one to observations within the selected first one of the plurality of slices. The selecting another one of the plurality of slices and the generating the union is repeated until a number of observations within the union reaches a predetermined value. Using the policy generator and after the number of observations within the union reaches the predetermined value, an error policy is generated. The predicted values were generated by a machine learning engine.
-
公开(公告)号:US20160004627A1
公开(公告)日:2016-01-07
申请号:US14324191
申请日:2014-07-06
CPC分类号: G06F11/3692 , G06F11/3616 , G06F11/3684 , G06F11/3688 , G06N7/005 , G06N20/00
摘要: A method, apparatus and product for utilizing semantic clusters to predict software defects. The method comprising: obtaining a plurality of software elements that are associated with a version of a System Under Test (SUT), wherein the plurality of software elements comprise defective software elements which are associated with a defect in the version of the SUT; defining, by a processor, a plurality of clusters, wherein each cluster of the plurality of clusters comprises software elements having an attribute, wherein the attribute is associated with a functionality of the SUT; and determining a score of each cluster of the plurality of clusters, wherein the score of a cluster is based on a relation between a number of defect software elements in the cluster and a number of software elements in the cluster.
摘要翻译: 一种利用语义聚类来预测软件缺陷的方法,装置和产品。 该方法包括:获得与被测系统(SUT)的版本相关联的多个软件单元,其中所述多个软件单元包括与所述SUT版本中的缺陷相关联的缺陷软件单元; 由处理器定义多个群集,其中所述多个群集中的每个群集包括具有属性的软件元素,其中所述属性与所述SUT的功能相关联; 以及确定所述多个群集中的每个群集的得分,其中,所述群集的得分基于所述群集中的多个缺陷软件元素与所述群集中的软件元素的数量之间的关系。
-
公开(公告)号:US11995068B1
公开(公告)日:2024-05-28
申请号:US18142584
申请日:2023-05-03
发明人: Yair Allouche , Aviad Cohen , Eitan Daniel Farchi
IPC分类号: G06F16/00 , G06F16/22 , G06F16/23 , G06F18/2413 , G06N20/00
CPC分类号: G06F16/2365 , G06F16/2237 , G06F18/24143 , G06N20/00
摘要: A method including: receiving a set of data representing usage by entities of objects in a computing resource; extracting, from the initial set of data, one or more feature vectors representing the usage by one of the entities with respect to the objects; generating, from the feature vectors, a feature matrix; with respect to each entry in the feature matrix: (i) assigning a binary value to the entry, based on a predefined usage threshold, (ii) identifying, among the one or more entities, k nearest neighbor entities with respect to the one of the entities, based on a predefined distance threshold, and (iii) modifying the usage value of the entry, based on usage values associated with each of the k nearest neighbor entities with respect to the one of the objects; and updating the feature matrix with the modified usage values, to obtain a manipulated feature matrix.
-
公开(公告)号:US20230342356A1
公开(公告)日:2023-10-26
申请号:US17726795
申请日:2022-04-22
发明人: Sweta Singh , Vaibhav Murlidhar Kulkarni , MARIO Dominic Savio BRIGGS , Deepak Anil Mahajan , Eitan Daniel Farchi
IPC分类号: G06F16/2453
CPC分类号: G06F16/24542 , G06F16/24539
摘要: Embodiments are for generating a digital signature of a query execution plan using similarity hashing. A technique includes generating a node digital signature for nodes in a query and generating an edge digital signature for edges in the query, the edges connecting the nodes. The technique includes selecting at least one previously executed query based on the node digital signature and the edge digital signature for the query and causing the query to be processed according to an assignment associated with the at least one previously executed query.
-
公开(公告)号:US11768758B2
公开(公告)日:2023-09-26
申请号:US17495200
申请日:2021-10-06
CPC分类号: G06F11/3676 , G06F11/3688 , G06N5/01 , G06N20/00
摘要: Methods, systems, and computer program products for path-coverage directed black box application programming interface (API) testing are provided herein. A computer-implemented method includes determining constraints based on inputs and corresponding outputs of an API in a production environment; generating initial test inputs based at least in part on the constraints; creating a program dependency graph based on trace sequences and request-response data obtained in response to providing the initial test inputs to an endpoint of the API; enhancing the program dependency graph by generating additional test inputs directed to one or more paths of the dependency graph; identifying, based on the enhanced program dependency graph, at least a portion of the API that is not covered by an existing test suite; and using the enhanced program dependency graph to generate new test cases for the test suite based on the identifying.
-
公开(公告)号:US20230205847A1
公开(公告)日:2023-06-29
申请号:US17561951
申请日:2021-12-26
IPC分类号: G06K9/62
CPC分类号: G06K9/6219 , G06K9/6261
摘要: Systems and methods for automatically identifying in a dataset insufficient data for learning, or records with anomalous combinations of feature values, by partition of numeric and/or categorical data space into human-interpretable regions are disclosed. The method comprises: receiving a dataset of numeric and/or categorical features with a plurality of observations.
Calculating observation density for each observation according to a distance or anomaly based metric, and receiving a density measurement. Partitioning the dataset along the numeric and/or categorical features according to the density measurement of each observation by a perpendicular cut along the feature spaces, receiving a map of a plurality of hyper-rectangular shapes representing various levels of density including empty spaces. Displaying the received map, being human-interpretable regions on a Graphic user interface, GUI, wherein the plurality of hyper-rectangular shapes are selectable and present information about the selected hyper-rectangular shape level of density when selected by a user.-
公开(公告)号:US20230102152A1
公开(公告)日:2023-03-30
申请号:US17484104
申请日:2021-09-24
IPC分类号: G06Q10/06 , G06F16/215 , G06K9/62 , G06F11/34 , G06N20/00
摘要: A system, program product, and method for automatic detection of data drift in a data set are presented. The method includes determining changes to relations in the data set through generating baseline and production data sets. The method further includes generating a production data set with some inserted data distortion, and defining, for a plurality of features in the baseline data set, potential relations for participant features. The method also includes determining a first likelihood and a second likelihood of each potential relation in the baseline and production data sets, respectively, for the participant features. The method further includes comparing each first likelihood with each second likelihood, generating a comparison value that is compared with a threshold value, and determining, subject to the comparison value exceeding the threshold value, the potential relation in the baseline data set does not describe a relation in the production data set.
-
公开(公告)号:US20210012221A1
公开(公告)日:2021-01-14
申请号:US16508698
申请日:2019-07-11
摘要: A method, computer system, and a computer program product for assessing a likelihood of success associated with developing at least one machine learning (ML) solution is provided. The present invention may include generating a set of questions based on a set of raw training data. The present invention may also include computing a feasibility score based on an answer corresponding with each question from the generated set of questions. The present invention may then include, in response to determining that the computed feasibility score satisfies a threshold, computing a level of effort associated with developing the at least one ML solution to address a problem. The present invention may further include presenting, to a user, a plurality of results associated with assessing the likelihood of success of the at least one ML solution.
-
公开(公告)号:US20160292069A1
公开(公告)日:2016-10-06
申请号:US15186560
申请日:2016-06-20
CPC分类号: G06F11/3692 , G06F11/3616 , G06F11/3684 , G06F11/3688 , G06N7/005 , G06N20/00
摘要: A method, apparatus and product for utilizing semantic clusters to predict software defects. The method comprising: obtaining a plurality of software elements that are associated with a version of a System Under Test (SUT), wherein the plurality of software elements comprise defective software elements which are associated with a defect in the version of the SUT; defining, by a processor, a plurality of clusters, wherein each cluster of the plurality of clusters comprises software elements having an attribute, wherein the attribute is associated with a functionality of the SUT; and determining a score of each cluster of the plurality of clusters, wherein the score of a cluster is based on a relation between a number of defect software elements in the cluster and a number of software elements in the cluster.
-
-
-
-
-
-
-
-
-