HAMSA : Highly Accelerated Multiple Sequence Aligner

For biologists, the existence of an efficient tool for multiple sequence alignment is essential. This work presents a new parallel aligner called HAMSA. HAMSA is a bioinformatics application designed for highly accelerated alignment of multiple sequences of proteins and DNA/RNA on a multi-core cluster system. The design of HAMSA is based on a combination of our new optimized algorithms proposed recently of vectorization, partitioning, and scheduling. It mainly operates on a distance vector instead of a distance matrix. It accomplishes similarity computations and generates the guide tree in a highly accelerated and accurate manner. HAMSA outperforms MSAProbs with 21.9fold speedup, and ClustalW-MPI of 11-fold speedup. It can be considered as an essential tool for structure prediction, protein classification, motive finding and drug design studies. Keywords—Bioinformatics; Multiple sequence alignment; parallel programming; Clusters; Multi-cores


I. INTRODUCTION
Although diverse methods for aligning multiple sequences have been designed, the accomplishment of alignment's vast computations in a highly accelerated and accurate manner is still a challenge [1].Most multiple sequence alignment (MSA) tools utilize the progressive method because it is computationally efficient.
First, it calculates a distance matrix illustrating the divergence of each pair of sequences by a similarity score.Second, it uses a clustering method to create a guide tree constructed from pairwise sequence distances.Third, it builds up the final multiple alignments according to the order given by the guide tree.Some other MSA tools follow the iterative method.It makes an initial alignment of groups of sequences and then revises the alignment to achieve a more reasonable result.
The main contributions of this work are: 1) Designing a highly accelerated parallel tool for aligning multiple sequences on multicore clusters, called HAMSA, to align massive sequences rapid, 2) Implementing the proposed HAMSAtool using C++ with MPI and OpenMP on Bibliotheca Alexandrina cluster system, to testits quality, 3) Carrying out comprehensive tests on a variety of actual dataset sizes, to prove that our developed tool outperforms competitive existing tools.
The rest of this paper is organized as follows.Section 2 summarizes briefly the fundamental tools used for aligning multiple sequences.Section 3 explains the HAMSA proposed tool.Section 4 presents results and comparisons with diagnosis.Finally, Section 5 concludes the paper and suggests future work.

II. RELATED WORK
In the last decade, various parallel MSA programs have been developed for reducing time consumption and handling big data.They differ in the parallel platform they use and the way they optimize computations and storage.
The ClustalW is the commonly used tool.Different versions of ClustalW have been developed for shared memory SGI Origin machine, multiprocessors, clusters, and GPUs [2].Although ClustalW is the most highly cited aligner especially for huge number of sequences, its accuracy is not satisfied enough compered to T-Coffee and MSAProbs.And, it has a problem with long sequences.
The T-Coffee produced two parallel versions for clusters and clouds [3].The MUSCLE's first parallel attempt was on SMP system then multiscale simulations for HPC cluster and Amazon AWS cloud have been presented [4].Booth T-Coffee and MUSCLE achieves high accuracy and fast speed, but cannot handle large sized dataset.
The MAFFT have been parallelized using the POSIX Threads library for multi-core PCs [5].MAFFT is very fast, nevertheless it has poor accuracy.
ParaAT is used to construct multiple protein-coding DNA alignments for a large number of homologs on multi-core machines [6].It is very good tool for large-scale data analysis, however its speed is unsatisfied.
MSAProbs was optimized for modern multi-core CPUs by employing a multi-threaded design [7].While MSAProbs is the best tool for demonstrating dramatically accurate alignment, it is very slow.
The parallel version of DIALIGN-TX was implemented using both OpenMP and MPI on a 28-cores heterogeneous cluster [8].Its speed and accuracy are neutral.
The GPU-REMuSiC [9] was proposed to reduce the computation time of RE-MuSiC; the newest tool with the regular expression constraints.It has good speed but it cannot align long sequences.www.ijacsa.thesai.org

III. HAMSA PROPOSED TOOL
The main goal of HAMSA is to provide biologists with an accelerated multiple sequence aligner with minimal space consumption.It consists of three different stages.The architecture showing the interaction between these three stages is presented in Fig. (1).To illustrate how HAMSA works by going through its three stages, the processes of aligning a class of 35 HIV viruses is introduced as a case study.
The distance vector DV including the similarity scores between every pair for the input sequences, computed by Stage (1), is introduced in Fig. (2).While, the phylogeny guide tree, constructed by Stage (2), is presented in Fig. (3).And, the final alignment, resulted from Stage (3), is given in Fig. (4).In the following, a detailed discussion of each stage is given.

A. Distance vector computation
Stage (1), the distance vector computation, takes as input a set of N sequences with average length L and produces a distance vector DV including all pairwise distances.Each DV cell contains the similarity score for pair of sequences, by using the DistVect1 algorithm, proposed in [10].
DistVect1 is a highly efficient parallel vectorized algorithm for computing similarity distances using multicore clusters.It deduces an efficient approach of partitioning and scheduling computations that consumes less space and accelerates computations.DistVect1 was mainly based on the accelerated vectorization algorithm DistVect proposed in [11].
DistVect has solved the problem that real biological applications face when the length of sequences is large and the memory requirement cannot be met.
Instead of seeing the process of distance calculations in a two-dimensional array (matrix), DistVect algorithm substitutes the matrix by three vectors only.One vector includes the antidiagonal of the current computed score, and two other vectors save the two previously calculated anti-diagonals including Northern, Western and North-Western needed values.It reduces the space complexity from O(L 2 ) to O(L).
DistVect1 also presented a superior performance with very long sequences.For example, for aligning 200 sequences of length 30,488, ClustalW-MPI did not work, SSE2 [12] exhausted 22,949 sec., where DistVect1 achieved it in 8,017 sec only.This accomplishment is due to its perfect vectorization and hybrid partitioning approaches.

B. Guide tree construction
Stage (2), the guide tree construction, takes the evaluated distance vector DV and computes a guide tree.It uses the optimized NJ phylogeny reconstruction algorithm (NJVect) presented [13].NJVect is a massively parallel optimized algorithm that compensates matrices used in NJ [14] by vectors.
It achieves remarkable reductions in both time and space by eliminating redundant computations and breaking dependences, while preserving the accuracy.It outperforms ClustalW-MPI with 2.5-fold speedup.

C. Progressive alignment
Stage (3), the progressive alignment, uses the method provided by ClustalW-MPI for achieving progressive alignments.Its main objective is to distribute all external nodes (n) in the guide tree to be aligned in parallel.The efficiency obviously depends on the topology of the tree.For well-balanced guide tree, the ideal speedup is estimated as n/log n, where n is the number of nodes in the tree.

IV. EXPERIMENTAL RESULTS
HAMSA was implemented in C++, with MPI and OpenMP libraries.It accomplishes the alignment's vast computations in a highly accelerated and accurate manner.The experimental tests were conducted on Sun Microsystems cluster of 32 nodes, provided by LinkSCEEM-2 systems at Bibliotheca Alexandrina, Egypt.Each node contains two Intel Quad core Xeon 2.83 GHz processors (64 bit technology), with 8 GB RAM and 80 GB hard disk, a dual port in fin band (10 Gbps).www.ijacsa.thesai.orgThe experiments have been conducted using four protein real sequence datasets.These sequences have lengths ranging from 400 to 163,000 DNA residues, which made it possible to study the overall performance of solution against multiple different sizes.
The datasets consist of sequences selected from NCBI [15] and it was comprised of a subset of the Human Immunodeciency Virus (HIV), the Coronaviridae family viruses (COR), the Hemagglutinin (Inuenza B virus (HA)), Herpesviridae (large family of DNA viruses (HRV)), and Plasmid (large family of DNA bacteria Enterobacteriaceae (ENA).The performance of HAMSA has been evaluated by using different metrics, such as: storage, execution time, speedup, efficiency, GCUPS, and occupancy.Its performance has been compared to ClustalW-MPI 0.13 [16], SSE2 [17], and MSAProb 3.0 [18].HAMSA is able to handle the memory perfectly while computing distances between very long sequences; up to 163 k.Fig. (5) shows that HAMSA has exhausted less execution time when aligning variant number of sequences (N) with different lengths (L) with respect to others.Fig. (5) shows that HAMSA has ability to achieve the maximum speedup with respect to the ClustalW-MPI and MSAProps for aligning the set 50 (161,000) and the set 100 (65,188), respectively.
Furthermore HAMSA can also achieve high GCUPs up to 13.835, while the highest values for ClustalW-MPI and MSAProps were 1.17 and 0.47, respectively, as shown in

V. CONCLUSION AND FUTURE WORK
In this paper, HAMSA has been proposed for aligning multiple sequences efficiently by using a multi-core cluster system.HAMSA applies several optimization methods considering the memory usage and load balancing.
It provides a powerful improved storage handling capabilities with efficient improvement of the overall processing time.The beneficial of HAMSA is in relating the molecular structure to the underlying sequences as well as it can operate on local or online databases.
Experimental results show that HAMSA is an accelerated competitive MSA tool.HAMSA achieves speedup of 21.9 by comparing to MSAProbs and speedup of 11 by comparing to ClustalW-MPI.Its efficiency reaches 0.29, 0.086 and 0.092 over the ClustalW-MPI, SSE2 and MSAProbs, respectively.Its performance varies from a low of 6.27 GCUPS to a high of 13.835 GCUPS as the lengths of the query sequences increase from 1,750 to 30,500, it also accomplishes 100

TABLE I :
HAMSA performance comparison in GCUPS

TABLE II :
Efficiency of HAMSA comparisons using 32 nodes