Invention Application
US20140236990A1 MAPPING SURPRISAL DATA THROUGTH HADOOP TYPE DISTRIBUTED FILE SYSTEMS 审中-公开
映射数据类型HADOOP类型分布式文件系统

MAPPING SURPRISAL DATA THROUGTH HADOOP TYPE DISTRIBUTED FILE SYSTEMS
Abstract:
A method, system and computer program product for reducing an amount of data representing a genetic sequence of an organism using a Hadoop type distributed file system. The method including the steps of breaking a surprisal data filter and an uncompressed genetic sequence into blocks of data of a fixed size; distributing the blocks of data to the plurality of worker nodes within the clusters and replicating the blocks of data within each of the worker nodes; tasking the plurality of worker nodes to perform a map job comprising mapping the surprisal data filter relative to the uncompressed genetic sequence; and when a worker node has reported a completion of the map job, tasking the worker node with a reduce job based on a specific key to an output of surprisal data and associated metadata.
Information query
Patent Agency Ranking
0/0