Extending Shared-Memory Computations to Multiple Distributed Nodes

Waseem Ahmed

doi:10.14569/IJACSA.2020.0110882

DOI: 10.14569/IJACSA.2020.0110882

PDF

Extending Shared-Memory Computations to Multiple Distributed Nodes

Author 1: Waseem Ahmed

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 11 Issue 8, 2020.

Abstract and Keywords
How to Cite this Article
{} BibTeX Source

Abstract: With the emergence of accelerators like GPUs, MICs and FPGAs, the availability of domain specific libraries (like MKL) and the ease of parallelization associated with CUDA and OpenMP based shared-memory programming, node-based parallelization has recently become a popular choice among developers in the field of scientific computing. This is evident from the large volume of recently published work in various domains of scientific computing, where shared-memory programming and accelerators have been used to accelerate applications. Although these approaches are suitable for small problem-sizes, there are issues that need to be addressed for them to be applicable to larger input domains. Firstly, the primary focus of these works has been to accelerate the core kernel; acceleration of input/output operations is seldom considered. Many operations in scientific computing operate on large matrices - both sparse and dense - that are read from and written to external files. These input-output operations present themselves as bottlenecks and significantly effect the overall application time. Secondly, node-based parallelization limits a developer from distributing the computation beyond a single node without him having to learn an additional programming paradigm like MPI. Thirdly, the problem size that can be effectively handled by a node is limited by the memory of the node and accelerator. In this paper, an Asynchronous Multi-node Execution (AMNE) approach is presented that uses a unique combination of the shared-file system and pseudo-replication to extend node-based algorithms to a distributed multiple node implementation with minimal changes to the original node-based code. We demonstrate this approach by applying it to GEMM, a popular kernel in dense linear algebra and show that the presented methodology significantly advances the state of art in the field of parallelization and scientific computing.

Keywords: GPU; OpenMP; shared memory programming; distributed programming; CUDA

Waseem Ahmed, “Extending Shared-Memory Computations to Multiple Distributed Nodes” International Journal of Advanced Computer Science and Applications(IJACSA), 11(8), 2020. http://dx.doi.org/10.14569/IJACSA.2020.0110882

@article{Ahmed2020,
title = {Extending Shared-Memory Computations to Multiple Distributed Nodes},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2020.0110882},
url = {http://dx.doi.org/10.14569/IJACSA.2020.0110882},
year = {2020},
publisher = {The Science and Information Organization},
volume = {11},
number = {8},
author = {Waseem Ahmed}
}

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

Extending Shared-Memory Computations to Multiple Distributed Nodes

Upcoming Conferences