System and method for performing set operations with defined sketch accuracy distribution
    1.
    发明授权
    System and method for performing set operations with defined sketch accuracy distribution 有权
    用定义的草图精度分布进行设定操作的系统和方法

    公开(公告)号:US08819038B1

    公开(公告)日:2014-08-26

    申请号:US14078301

    申请日:2013-11-12

    Applicant: Yahoo! Inc.

    Abstract: Techniques are provided for improving the speed and accuracy of analytics on big data using theta sketches, by converting fixed-size sketches to theta sketches, and by performing set operations on sketches. In a technique for performing a set operation, two sketches are analyzed to identify the maximum value of each sketch. The maximum values of the two sketches are compared. Based the comparison, one or more values are removed from the sketch whose maximum value is greater. After the removal, a set operation (e.g., union, intersection, or difference) is performed based on the modified sketch and the unmodified sketch. A result of the set operation is a third sketch, which may be used to estimate a cardinality of the larger data sets that are represented by the two input sketches.

    Abstract translation: 提供了技术,通过将固定尺寸草图转换为theta素描,以及通过在草图上执行设置操作来提高使用θ素描的大数据分析的速度和准确性。 在用于执行设置操作的技术中,分析两个草图以识别每个草图的最大值。 比较两幅草图的最大值。 基于比较,从最大值更大的草图中删除一个或多个值。 在移除之后,基于修改的草图和未修改的草图来执行设置操作(例如,联合,相交或差异)。 设置操作的结果是第三个草图,其可以用于估计由两个输入草图表示的较大数据集的基数。

    SYSTEM AND METHOD FOR ENHANCED ACCURACY CARDINALITY ESTIMATION
    2.
    发明申请
    SYSTEM AND METHOD FOR ENHANCED ACCURACY CARDINALITY ESTIMATION 审中-公开
    用于增强精确度估计的系统和方法

    公开(公告)号:US20150269178A1

    公开(公告)日:2015-09-24

    申请号:US14218818

    申请日:2014-03-18

    Applicant: Yahoo Inc.

    Inventor: Lee Rhodes

    CPC classification number: G06F17/3089

    Abstract: Techniques are provided for improving the accuracy of analytics on big data using sketches and fixed-size buckets. In a technique for enhancing a unique count (cardinality) estimate of a large data set, a request for a cardinality estimate for the large data set is received. An initial cardinality estimate is determined using a sketch or a fixed-size bucket. If the initial cardinality estimate is within a range where the initial estimate could be further enhanced, the initial estimate is used for a lookup into a lookup table. Based on retrieved values from the lookup table and the initial estimate, an enhanced cardinality estimate is calculated.

    Abstract translation: 提供了使用草图和固定大小的桶来提高大数据分析的准确性的技术。 在用于增强大数据集的唯一计数(基数))估计的技术中,接收对大数据集的基数估计的请求。 使用草图或固定大小的桶来确定初始基数估计。 如果初始基数估计在可以进一步增强初始估计的范围内,则初始估计用于查找查找表。 基于从查找表和初始估计的检索值,计算增强的基数估计。

    System and method for performing set operations with defined sketch accuracy distribution

    公开(公告)号:US09043348B2

    公开(公告)日:2015-05-26

    申请号:US14448487

    申请日:2014-07-31

    Applicant: Yahoo! Inc.

    Abstract: Techniques are provided for improving the speed and accuracy of analytics on big data using theta sketches, by converting fixed-size sketches to theta sketches, and by performing set operations on sketches. In a technique for performing a set operation, two sketches are analyzed to identify the maximum value of each sketch. The maximum values of the two sketches are compared. Based the comparison, one or more values are removed from the sketch whose maximum value is greater. After the removal, a set operation (e.g., union, intersection, or difference) is performed based on the modified sketch and the unmodified sketch. A result of the set operation is a third sketch, which may be used to estimate a cardinality of the larger data sets that are represented by the two input sketches.

    SYSTEM AND METHOD FOR PERFORMING SET OPERATIONS WITH DEFINED SKETCH ACCURACY DISTRIBUTION

    公开(公告)号:US20150227608A1

    公开(公告)日:2015-08-13

    申请号:US14692477

    申请日:2015-04-21

    Applicant: Yahoo! Inc.

    Abstract: Techniques are provided for improving the speed and accuracy of analytics on big data using theta sketches, by converting fixed-size sketches to theta sketches, and by performing set operations on sketches. In a technique for performing a set operation, two sketches are analyzed to identify the maximum value of each sketch. The maximum values of the two sketches are compared. Based the comparison, one or more values are removed from the sketch whose maximum value is greater. After the removal, a set operation (e.g., union, intersection, or difference) is performed based on the modified sketch and the unmodified sketch. A result of the set operation is a third sketch, which may be used to estimate a cardinality of the larger data sets that are represented by the two input sketches.

    System and method for enhanced accuracy cardinality estimation

    公开(公告)号:US10055506B2

    公开(公告)日:2018-08-21

    申请号:US14218818

    申请日:2014-03-18

    Applicant: Yahoo! Inc.

    Inventor: Lee Rhodes

    CPC classification number: G06F16/958

    Abstract: Techniques are provided for improving the accuracy of analytics on big data using sketches and fixed-size buckets. In a technique for enhancing a unique count (cardinality) estimate of a large data set, a request for a cardinality estimate for the large data set is received. An initial cardinality estimate is determined using a sketch or a fixed-size bucket. If the initial cardinality estimate is within a range where the initial estimate could be further enhanced, the initial estimate is used for a lookup into a lookup table. Based on retrieved values from the lookup table and the initial estimate, an enhanced cardinality estimate is calculated.

    System and method for performing set operations with defined sketch accuracy distribution
    6.
    发明授权
    System and method for performing set operations with defined sketch accuracy distribution 有权
    用定义的草图精度分布进行设定操作的系统和方法

    公开(公告)号:US09152691B2

    公开(公告)日:2015-10-06

    申请号:US14692477

    申请日:2015-04-21

    Applicant: Yahoo! Inc.

    Abstract: Techniques are provided for improving the speed and accuracy of analytics on big data using theta sketches, by converting fixed-size sketches to theta sketches, and by performing set operations on sketches. In a technique for performing a set operation, two sketches are analyzed to identify the maximum value of each sketch. The maximum values of the two sketches are compared. Based the comparison, one or more values are removed from the sketch whose maximum value is greater. After the removal, a set operation (e.g., union, intersection, or difference) is performed based on the modified sketch and the unmodified sketch. A result of the set operation is a third sketch, which may be used to estimate a cardinality of the larger data sets that are represented by the two input sketches.

    Abstract translation: 提供了技术,通过将固定尺寸草图转换为theta素描,以及通过在草图上执行设置操作来提高使用θ素描的大数据分析的速度和准确性。 在用于执行设置操作的技术中,分析两个草图以识别每个草图的最大值。 比较两幅草图的最大值。 基于比较,从最大值更大的草图中删除一个或多个值。 在移除之后,基于修改的草图和未修改的草图来执行设置操作(例如,联合,相交或差异)。 设置操作的结果是第三个草图,其可以用于估计由两个输入草图表示的较大数据集的基数。

    SYSTEM AND METHOD FOR PERFORMING SET OPERATIONS WITH DEFINED SKETCH ACCURACY DISTRIBUTION
    7.
    发明申请
    SYSTEM AND METHOD FOR PERFORMING SET OPERATIONS WITH DEFINED SKETCH ACCURACY DISTRIBUTION 有权
    用定义的绘图精度分布进行设置操作的系统和方法

    公开(公告)号:US20150100596A1

    公开(公告)日:2015-04-09

    申请号:US14448487

    申请日:2014-07-31

    Applicant: Yahoo! Inc.

    Abstract: Techniques are provided for improving the speed and accuracy of analytics on big data using theta sketches, by converting fixed-size sketches to theta sketches, and by performing set operations on sketches. In a technique for performing a set operation, two sketches are analyzed to identify the maximum value of each sketch. The maximum values of the two sketches are compared. Based the comparison, one or more values are removed from the sketch whose maximum value is greater. After the removal, a set operation (e.g., union, intersection, or difference) is performed based on the modified sketch and the unmodified sketch. A result of the set operation is a third sketch, which may be used to estimate a cardinality of the larger data sets that are represented by the two input sketches.

    Abstract translation: 提供了技术,通过将固定尺寸草图转换为theta素描,以及通过在草图上执行设置操作来提高使用θ素描的大数据分析的速度和准确性。 在用于执行设置操作的技术中,分析两个草图以识别每个草图的最大值。 比较两幅草图的最大值。 基于比较,从最大值更大的草图中删除一个或多个值。 在移除之后,基于修改的草图和未修改的草图来执行设置操作(例如,联合,相交或差异)。 设置操作的结果是第三个草图,其可以用于估计由两个输入草图表示的较大数据集的基数。

Patent Agency Ranking