摘要:
A method for clustering multi-dimensional data streams includes: when data elements are input, determining 1-D subclusters and assigning identifiers to the determined 1-D subclusters; (b) generating a matching set that is a set of identifiers of the 1-D subclusters where each dimensional value of the data elements belongs to the range of the 1-D subclusters of the corresponding dimensions; and (c) determining subclusters by finding a set of frequently co-occurring 1-D subclusters among a set of 1-D subclusters that belong to the generated matching set. With the present invention, the processing time required to find the subclusters can be improved and the performance of the memory is further improved.
摘要:
A method and apparatus to find maximal frequent itemsets over data streams. A prefix tree manages itemsets and appearance frequencies of the itemsets, and each of nodes of the prefix tree has information about an appearance frequency, a maximum lifetime, and a mark indicating whether the corresponding itemset is a maximal frequent itemset. The method includes: receiving transaction Tk generated at a current point in time; updating the information owned by each node corresponding to the itemset of the transaction Tk among the nodes of the prefix tree; adding each node that is not managed in the prefix tree among nodes corresponding to the itemset of the transaction Tk, to the prefix tree and setting the information on the added nodes; and finding maximal frequent itemsets by visiting each node of the prefix tree that has the mark indicating the maximal frequent itemset and checking whether the corresponding itemset is frequent.