Abstract: The continuous growth of data, mainly the medical data at laboratories becomes very complex to use and to manage by using traditional ways. So, the researchers start studying genetic information field which increased in the past thirty years in bioinformatics domain (the computer science field, genetic biology field, and DNA). This growth of data becomes known as big bioinformatics data. Thus, efficient algorithms such as Genetic Algorithms are needed to deal with this big and vast amount of bioinformatics data in genetic laboratories. So the researchers proposed two models to manage the big bioinformatics data in addition to the traditional model. The first model by applying Genetic Algorithms before MapReduce, the second model by applying Genetic Algorithms after the MapReduce, and the original or the traditional model by applying only MapReduce without using Genetic Algorithms. The three models were implemented and evaluated using big bioinformatics data collected from the Duchenne Muscular Dystrophy (DMD) disorder. The researchers conclude that the second model is the best one among the three models in reducing the size of the data, in execution time, and in addition to the ability to manage and summarize big bioinformatics data. Finally by comparing the percentage errors of the second model with the first model and the traditional model, the researchers obtained the following results 1.136%, 10.227%, and 11.363% respectively. So the second model is the most accurate model with the less percentage error.
Keywords: Bioinformatics; Big Data; Genetic Algorithms; Hadoop MapReduce