Abstract:
Provided are a high-performance distributed storage apparatus and method. The high-performance distributed storage method includes receiving and storing file data by a chunk unit, outputting file data chunks stored in an input buffer and transmitting the file data chunks to data servers in parallel, additionally generating a new file storage requester to connect the new file storage requester to a new data server based on a data input speed of the input buffer and a data output speed at which data is output to the data server, re-setting a file data chunk output sequence for a plurality of file storage requesters including the new file storage requester, and applying a result of the re-setting to output and transmit the file data chunks stored in the input buffer to the data servers in parallel.
Abstract:
Provided is an apparatus for allocating resources of a distributed data processing system by considering a virtualization platform, the apparatus including: a resource usage monitor configured to scan one or more available virtual machines that execute one or more selected tasks in one or more physical machines, and to calculate a distance between the one or more scanned available virtual machines based on physical machine information received from the one or more physical machines; and a task allocator configured to allocate the one or more selected tasks to one or more virtual machines selected from among the one or more scanned available virtual machines based on the calculated distance between the one or more scanned available virtual machines.
Abstract:
Provided herein is a system for providing a virtual block device, which includes a logical block device configured to transmit and receive data to and from an external computer and to have logical block IDs, and a physical block device configured to include a cache area and a storage area for storing at least a part of the data and to have physical block IDs, wherein the cache area comprises a first memory cluster having first memory IDs, the storage area comprises a second memory cluster having second memory IDs and a third memory cluster having third memory IDs, and the physical block IDs include the first to third memory IDs. The system can simultaneously cope with both an application requiring a large capacity and an application requiring a short access time, using a single virtual block device provision system in which various storage devices are included.
Abstract:
A cluster-based workflow system is provided which has the advantage of executing a workflow created by a non-IT researcher in a way that is suitable for computing resources in a cluster environment. The user can quickly analyze workflows using third-party applications, such as a large-scale bio data analysis workflow, a weather forecast data analysis workflow, or a customer relationship management (CRM) data analysis workflow, by using a large-scale computing cluster. In addition, third-party applications not optimized for a cluster environment can be automatically distributed and executed in parallel by preliminary analysis so that they run properly in the cluster environment.