Abstract:
Embodiments are directed towards clustering cookies for identifying unique mobile devices for associating activities over a network with a given mobile device. The cookies are clustered based on a Bayes Factor similarity model that is trained from cookie features of known mobile devices. The clusters may be used to determine the number of unique mobile devices that access a website. The clusters may also be used to provide targeted content to each unique mobile device.
Abstract:
Techniques are provided for improving the speed and accuracy of analytics on big data using theta sketches, by converting fixed-size sketches to theta sketches, and by performing set operations on sketches. In a technique for performing a set operation, two sketches are analyzed to identify the maximum value of each sketch. The maximum values of the two sketches are compared. Based the comparison, one or more values are removed from the sketch whose maximum value is greater. After the removal, a set operation (e.g., union, intersection, or difference) is performed based on the modified sketch and the unmodified sketch. A result of the set operation is a third sketch, which may be used to estimate a cardinality of the larger data sets that are represented by the two input sketches.
Abstract:
A system for resolving disputes in an online answers community is disclosed. The system improves the system's ability to resolve reports of abuse.
Abstract:
Techniques are provided for improving the speed and accuracy of analytics on big data using theta sketches, by converting fixed-size sketches to theta sketches, and by performing set operations on sketches. In a technique for performing a set operation, two sketches are analyzed to identify the maximum value of each sketch. The maximum values of the two sketches are compared. Based the comparison, one or more values are removed from the sketch whose maximum value is greater. After the removal, a set operation (e.g., union, intersection, or difference) is performed based on the modified sketch and the unmodified sketch. A result of the set operation is a third sketch, which may be used to estimate a cardinality of the larger data sets that are represented by the two input sketches.
Abstract:
Techniques are provided for improving the speed and accuracy of analytics on big data using theta sketches, by converting fixed-size sketches to theta sketches, and by performing set operations on sketches. In a technique for performing a set operation, two sketches are analyzed to identify the maximum value of each sketch. The maximum values of the two sketches are compared. Based the comparison, one or more values are removed from the sketch whose maximum value is greater. After the removal, a set operation (e.g., union, intersection, or difference) is performed based on the modified sketch and the unmodified sketch. A result of the set operation is a third sketch, which may be used to estimate a cardinality of the larger data sets that are represented by the two input sketches.
Abstract:
Techniques are provided for improving the speed and accuracy of analytics on big data using theta sketches, by converting fixed-size sketches to theta sketches, and by performing set operations on sketches. In a technique for performing a set operation, two sketches are analyzed to identify the maximum value of each sketch. The maximum values of the two sketches are compared. Based the comparison, one or more values are removed from the sketch whose maximum value is greater. After the removal, a set operation (e.g., union, intersection, or difference) is performed based on the modified sketch and the unmodified sketch. A result of the set operation is a third sketch, which may be used to estimate a cardinality of the larger data sets that are represented by the two input sketches.
Abstract:
A system for resolving disputes in an online answers community is disclosed. The system improves the system's ability to resolve reports of abuse.
Abstract:
Embodiments are directed towards clustering cookies for identifying unique mobile devices for associating activities over a network with a given mobile device. The cookies are clustered based on a Bayes Factor similarity model that is trained from cookie features of known mobile devices. The clusters may be used to determine the number of unique mobile devices that access a website. The clusters may also be used to provide targeted content to each unique mobile device.
Abstract:
Techniques are provided for improving the speed and accuracy of analytics on big data using theta sketches, by converting fixed-size sketches to theta sketches, and by performing set operations on sketches. In a technique for performing a set operation, two sketches are analyzed to identify the maximum value of each sketch. The maximum values of the two sketches are compared. Based the comparison, one or more values are removed from the sketch whose maximum value is greater. After the removal, a set operation (e.g., union, intersection, or difference) is performed based on the modified sketch and the unmodified sketch. A result of the set operation is a third sketch, which may be used to estimate a cardinality of the larger data sets that are represented by the two input sketches.