MAPPING SURPRISAL DATA THROUGTH HADOOP TYPE DISTRIBUTED FILE SYSTEMS

Invention Application

US20140236990A1 MAPPING SURPRISAL DATA THROUGTH HADOOP TYPE DISTRIBUTED FILE SYSTEMS 审中-公开

Title translation: 映射数据类型HADOOP类型分布式文件系统

Please log in to see more content

Patent Title: MAPPING SURPRISAL DATA THROUGTH HADOOP TYPE DISTRIBUTED FILE SYSTEMS
Patent Title (中): 映射数据类型HADOOP类型分布式文件系统
Application No.: US13770025

Application Date: 2013-02-19
Publication No.: US20140236990A1

Publication Date: 2014-08-21
Inventor: Tom Deutsch , Robert R. Friedlander , James R. Kraemer , Josko Silobrcic
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Applicant Address: US NY Armonk
Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
Current Assignee Address: US NY Armonk
Main IPC: G06F19/26
IPC: G06F19/26

MAPPING SURPRISAL DATA THROUGTH HADOOP TYPE DISTRIBUTED FILE SYSTEMS

Abstract:

A method, system and computer program product for reducing an amount of data representing a genetic sequence of an organism using a Hadoop type distributed file system. The method including the steps of breaking a surprisal data filter and an uncompressed genetic sequence into blocks of data of a fixed size; distributing the blocks of data to the plurality of worker nodes within the clusters and replicating the blocks of data within each of the worker nodes; tasking the plurality of worker nodes to perform a map job comprising mapping the surprisal data filter relative to the uncompressed genetic sequence; and when a worker node has reported a completion of the map job, tasking the worker node with a reduce job based on a specific key to an output of surprisal data and associated metadata.

Abstract(Chinese):

一种用于使用Hadoop类型分布式文件系统减少表示生物体的遗传序列的数据量的方法，系统和计算机程序产品。该方法包括以下步骤：将惊奇的数据滤波器和未压缩的遗传序列分解成固定大小的数据块; 将数据块分配到群集内的多个工作节点并复制每个工作节点内的数据块; 对所述多个工作节点进行任务以执行地图作业，所述地图作业包括相对于所述未压缩遗传序列映射所述惊奇数据过滤器; 并且当工作者节点报告了地图作业的完成时，基于特定密钥对工作节点的任务进行减少作业以输出惊奇的数据和关联的元数据。

Information query

Global Dossier Espacenet