Invention Application
US20140236990A1 MAPPING SURPRISAL DATA THROUGTH HADOOP TYPE DISTRIBUTED FILE SYSTEMS
审中-公开
映射数据类型HADOOP类型分布式文件系统
- Patent Title: MAPPING SURPRISAL DATA THROUGTH HADOOP TYPE DISTRIBUTED FILE SYSTEMS
- Patent Title (中): 映射数据类型HADOOP类型分布式文件系统
-
Application No.: US13770025Application Date: 2013-02-19
-
Publication No.: US20140236990A1Publication Date: 2014-08-21
- Inventor: Tom Deutsch , Robert R. Friedlander , James R. Kraemer , Josko Silobrcic
- Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Applicant Address: US NY Armonk
- Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee Address: US NY Armonk
- Main IPC: G06F19/26
- IPC: G06F19/26

Abstract:
A method, system and computer program product for reducing an amount of data representing a genetic sequence of an organism using a Hadoop type distributed file system. The method including the steps of breaking a surprisal data filter and an uncompressed genetic sequence into blocks of data of a fixed size; distributing the blocks of data to the plurality of worker nodes within the clusters and replicating the blocks of data within each of the worker nodes; tasking the plurality of worker nodes to perform a map job comprising mapping the surprisal data filter relative to the uncompressed genetic sequence; and when a worker node has reported a completion of the map job, tasking the worker node with a reduce job based on a specific key to an output of surprisal data and associated metadata.
Information query