A Review of Towered Big-Data Service Model for Biomedical Text-Mining Databases

The rapid growth of biomedical informatics has drawn increasing popularity and attention. The reason behind this are the advances in genomic, new molecular, biomedical approaches and various applications like protein identification, patient medical records, genome sequencing, medical imaging and a huge set of biomedical research data are being generated day to day. The increase of biomedical data consists of both structured and unstructured data. Subsequently, in a traditional database system (structured data), managing and extracting useful information from unstructured-biomedical data is a tedious job. Hence, mechanisms, tools, processes, and methods are necessary to apply on unstructured biomedical data (text) to get the useful business data. The fast development of these accumulations makes it progressively troublesome for people to get to the required information in an advantageous and viable way. Text mining can help us mine information and knowledge from a mountain of text, and is now widely applied in biomedical research. Text mining is not a new technology, but it has recently received spotlight attention due to the emergence of Big Data. The applications of text mining are diverse and span to multiple disciplines, ranging from biomedicine to legal, business intelligence and security. In this survey paper, the researcher identifies and discusses biomedical data (text) mining issues, and recommends a possible technique to cope with possible future


INTRODUCTION
Currently, the field of biomedical research is booming, a lot of biomedical knowledge is in unstructured form in the form of text file, and now the field has witnessed exponential trend increase; there is a need to solve the contradictions between massive growth of information and knowledge of text slowly and in a credible manner to identify useful patterns in the text which is still a challenge. In recent years, biomedical text mining technology which is one branch of an efficient automatic access to new exploration-related knowledge has witnessed significant progress [1].
Biomedical information is increasing rapidly in size, and helpful outcomes come into sight daily in research publications. However, automatically taking out useful information from such a stupendous quantity of documents is a difficult task because these documents are unstructured and are revealed in natural language. To enable data mining and knowledge discovery techniques, documents should be in the structured format [2].
The problem faced by the biological researchers is on how to effectively find out the useful and needed documents in such an information-overload environment. Traditional manual retrieval method is impractical. Furthermore, online biological information exists in a combination of different forms, including structured, semi-structured and unstructured forms [3]. It is impossible to keep abreast of all developments. Computational methodologies increasingly become important for research [4]. Text mining techniques which involve the process of information retrieval, information extraction and data mining provide a means of solving this by Ananiadou et al. [5].
The volume of published knowledge in the biomedical region is produced at an unprecedented pace. Biomedical researchers need to explore the big amount of scientific publications to examine findings related to certain biomedical entities such as proteins, diseases, etc. In the biomedical domain, simple keyword based matching may not be adequate because biomedical entities have synonyms and ambivalent names. Biomedical text mining relates to automatically identifying biomedical entities from a given text and to associate them to their correlating entries in knowledge bases. Biomedical text mining enables researchers to recognize useful information more efficiently. Two elementary functions of information extraction are Named entity recognition and Relation extraction. Named entity recognition deals with detecting the name of entities. Relation extraction refers to uncovering the semantic relations between entities [2].
The number of articles that are added to the literature databases is growing at a fast pace [6]. Retrieval of relevant information from literature databases and combining this information with experimental output is time-consuming and requires careful selection of keywords and drafting of queries. This is often a biased and time-consuming process, resulting in incomplete search results, preventing the realization of the full potential that these databases can offer [7]. Automated processing and analysis of text (referred to as Text Mining (TM)) can assist researchers in evaluating scientific literature. Nowadays, TM is applied to answer many different research questions, ranging from the discovery of drug targets and biomarkers from high-throughput experiments [8]- [13] to drug repositioning, the creation of a state-of-the-art overview of a certain disease or therapeutic area and for the creation of The rest of the paper is organized as follows: Section II has discussed the purpose and overview of text mining, the significance of biomedical text mining, task, models and methods used. It presents the definitions of the concepts explored in this study. Section III discussed the previous study which is related to text mining, biomedical text mining, biomedical data with feature extraction approach and biomedical data mapping technique. It critically evaluates methodologies that were available at the time of this research. Section IV discussed research methods used by the reviewed articles. Analysis and discussion are covered in Section V wherein the contextual settings of the reviewed articles are examined.
The study findings, conclusion, and recommendation for further research are discussed in Section VI.

II. PURPOSE OF THE STUDY
Due to the rapid growth of data and text, information extraction is a difficult task, especially in biomedical databases [16]. Additionally, the diversity, complexity and volume of the information that need to be mined present challenges in the biomedical domain impacts the biomedical discovery process, stifling researchers working towards novel hypotheses to address critical questions [17]. Subsequently, such information extraction depends on the flexible formulation and common methods for heterogeneous data integration and indefinable discovery of knowledge sources that highly depend on a particular scientific question. It truly influences the effective techniques of storage, extraction and permitting sympathetic of the molecular substructures of biological processes. For this purpose, this paper briefly overviews the major challenges in these areas and discusses the recommendations and implications of this research.

A. Overview of Text Mining
Text mining or text analytics is an umbrella term describing a range of techniques that seek to extract useful information from document collections through the identification and exploration of interesting patterns in the unstructured textual data of various types of documents -such as books, web pages, emails, reports or product descriptions. A more formal definition restricts text mining to the creation of new, nonobvious information (such as patterns, trends or relationships) from a collection of textual documents. Typical text mining tasks include activities of search engines, such as assigning texts to one or more categories (text categorization), grouping similar texts together (text clustering), finding the subject of discussions (concept/entity extraction), finding the tone of a text (sentiment analysis), summarizing documents, and learning relations between entities described in a text (entity relation modeling) [18].
The utilization of the web has expanded the obtainability and access to publications that are the foremost in various cases on data over-burdening [19]. Specifically, biomedical data sets have increased rapidly in large computerized stores [20]. Therefore, searching and organizing these data is always considered as time-consuming and cost ineffective. For example, in the digital library, the MEDLINE is a fastgrowing biomedical database, and the information within this data set is stored in text form. Recently, it has stored more than 18 million indexed articles. So the usability and obtainability of this data have become precarious to the researchers and students who are working in the biomedical area [21]. The quick advancement of these accumulations makes it progressively troublesome for analysts to get the required information in a helpful and proficient way. Text mining eco-system for biomedical data.
Subsequently, the relationship amongst various medical conceptions from medical collected works is a foremost issue for many biological researchers. But, the data gathering level confines the incorporation into choice data frameworks for two reasons. Firstly, it requires more time from therapeutic specialists to create and maintain the learning base. Secondly, sharing and reusing the approved information base is troublesome due to the absence of clearness [22]. The goal is to obtain consistent data, and its extraction is one of the essential objectives of biomedical text mining groups [23]. The term text mining is used when exploring the objects stored in an unstructured data set and offers the capability towards managing and analyzing [24] the large sets of data in an effective manner [25]. While also realizing the significant relationships or correlations amongst variables in the huge dataset [26]. The smart data retrieval system is essential in operating non-standardized entries in order to access the data [27]. Subsequently, there is a robust need to create strategies for programmed extraction of pertinent data from the collected works, which is composed in natural language [28]. Therefore, in this study, the text mining method is towards discovering additional useful information in a more effective way. " Fig. 1" shows the overview of text mining process from the biomedical database.

B. Text Mining
Text mining refers to the automated extraction of knowledge and information from the text by revealing relationships and patterns that are present, but not obvious, in a document collection. Subsequently, it uses a wide range of www.ijacsa.thesai.org utilities including information extraction, text clustering, sentiment analysis, text categorization, document summarization, named entity recognition and question answering and the seven interdisciplinary fields based on computational linguistics: artificial intelligence, data mining, natural language processing and information retrieval [29].
The goal of text mining is to derive implicit knowledge that hides in unstructured text and present in an explicit form. This generally has four phases: information retrieval, information extraction, knowledge discovery, and hypothesis generation. Information retrieval systems aim to get desired text on a certain topic; information extraction systems are used to extract predefined types of information such as relation extraction; knowledge discovery systems help us to extract novel knowledge from the text; hypothesis generation systems infer unknown biomedical facts based on text, as shown in " Fig. 2". Thus, the general tasks of biomedical text mining include information retrieval, named entity recognition and relation extraction, knowledge discovery and hypothesis generation [30].
The text mining-associated text document and database models [31] are identified as:  Information recovered from web archives with a population of data set patterns.
 Disclosure of data presented in the text as well as the capacity for XML or social groups.
 Incorporation and questioning of content information after it has been stored in databases.
 Deduplication of a data set through utilizing standard information mining strategies like clustering.

C. Models and Methods Used in Text Mining
To solve text mining issues, previously many researchers have suggested new methods for relevant information retrieval according to a user's requirement [32]. Based on the information retrieval process, there are four methods: term, phrase, pattern taxonomy and the concept-based method.

D. Biomedical Literature Mining
The era of applying text mining approaches to biology and biomedical fields came into existence in 1999. It was first applied to the biomedical domain for gene expression profiling [33], as well as the extraction and visualization of protein-protein interaction [34]. It emerged as a hybrid discipline from the edges of three major fields, namely, bioinformatics, information science, and computational linguistics. Biomedical literature mining is concerned with the identification and extraction of biomedical concepts (e.g., genes, proteins, DNA/RNA, cells, and cell types) and their functional relationships [35]. The major tasks include 1) document retrieval and prioritization (gathering and prioritizing the relevant documents); 2) information extraction (extracting information of interest from the retrieved document); 3) knowledge discovery (discovering new biological event or relationship among the biomedical concepts); and 4) knowledge summarization (summarizing the knowledge available across the documents). A brief description of the biomedical literature mining tasks is listed as follows.

E. Biomedical Text Mining Tasks
Document Retrieval: The process of extracting relevant documents from a large collection is called document retrieval or information retrieval [36]. The two basic strategies applied are query-based and document-based retrieval. In query-based retrieval, documents matching with the user specified query are retrieved. In document-based retrieval, a ranked list of documents similar to a document of interest is retrieved. Document Prioritization: The retrieved documents are usually prioritized to get the most relevant document. Many biomedical document retrieval systems achieve prioritization based on certain parameters including journal-related metrics (e.g., impact factor, citation count) [37] and MeSH index [38], [39] for biomedical articles. The similarity between the documents is estimated with various similarity measurements (e.g., Jaccard similarity, cosine similarity) [40]. Information Extraction: This task aims to extract and present the information in a structured format. Concept extraction and relation/event extraction are the two major components of information extraction [41], [42]. While concept extraction automatically identifies the biomedical concepts present in the articles, relation/event extraction is used to predict the relationship or biological event (e.g., phosphorylation) between the concepts [43], [44].
Knowledge Discovery: It is a nontrivial process to discover novel and potentially useful biological information from the structured text obtained from information extraction. Knowledge discovery uses techniques from a wide range of disciplines such as artificial intelligence, machine learning, pattern recognition, data mining, and statistics [45]. Both information extraction and knowledge discovery find their application in database curation [46], [47] and pathway construction [48], [49].
Knowledge Summarization: The purpose of knowledge summarization is to generate information for a given topic from one or multiple documents. The approach aims to reduce the source text to express the most important key points through content reduction selection and/or generalization [50]. Although knowledge summarization helps to manage the www.ijacsa.thesai.org information overload, state of the art is still open to research to develop more sophisticated approaches that increase the likelihood of identifying the information.
Hypothesis Generation: An important task of text mining is hypothesis generation to predict unknown biomedical facts from biomedical articles. These hypotheses are useful in designing experiments or explaining existing experimental results [51].
Text mining for biomedical literature often involves two major steps. a. First, it must identify biomedical entities and concepts of interests from free text using natural language processing techniques. Many text mining algorithms have been applied to this problem. For example, some morphological clues to recognize the heartache like obesity, blood pressure. b. And then, the converted information is extracted from the text or unstructured documents into the standardized data set, and data mining is applied to the data source. " Fig. 3", shows the typical text Mining Process. Typical text mining process.

III. RELATED WORKS
This section provides a brief summary of text mining followed by most recent studies that have been conducted with regard to text mining in the field of biomedical research texts.

A. Biomedical Text Mining Review
Information extraction or (IE) covers the recognition of biomedical identities in biomedicine for extracting information pertaining to a disease, its treatment and its proteins and extracting the association (s) between these identities. The association between two different entities is extracted through different methods. Previous studies related to the extraction of useful information from databases are discussed as follows.
Tan and Lambri [52] suggested a framework for the purpose of selecting a suitable ontology for a specific application for biomedical text mining. Subsequently, an experiment was put forth for biomedical ontology in the context of a gene normalization system by utilizing the framework. Inside the references of the framework, the results of the assessment directed us to a comparatively firm option of ontology for our module. Furthermore, the researchers have planned to evaluate this framework with more applications and ontologies.
Qi et al. [53] conducted a survey about text mining in the realm of bioinformatics with a focus on the application of text mining. During the course of this study, the primary research focus of text mining in bioinformatics was supported through exhaustive examples. This study, in particular, matched the requirement for a state-of-the-art area of text mining in bioinformatics, primarily due to the swift advancement in both the fields of text mining and bioinformatics. The full ability of this area has remained underutilized.
A framework of a probabilistic combination nature for the purpose of precisely linking citation information with the content-based information retrieval weighting model is suggested by Yin et al. [54]. Through a case study, they were able to observe the model of linking information that was available in the citation graph. Extensive parameter tuning can possibly be done away with through this framework. However, this basically tested the suggested combination framework in the context of a biomedical literature corpus; they researchers of the opinion that the basic premise of their paper could be absorbed for literature retrieval in other areas.
Also, Tari et al. [55] explained the Gene Properties Mining Portal as one that permits retrieving gene-centric data from literature through text mining. This portal acts as a node for scientists to discern vital relationships in an effective and efficient manner from literature. But, the precision of the relations that were extracted were influenced by many issues, for instance, by limiting the methods of extraction in addition to the quality of the sources.
Bchir and Ben Abdessalem Karaa [56] proposed a method for the purpose of extracting relations between disease and drug. To begin with, they deployed Natural Language Processing methods for prepossessing abstracts. Later, features were extracted in the form of a set of preprocessed abstracts. To conclude, a disease drug association was extracted through the utilization of a disease-drug Association through a machine learning classifier. But, they ended up extracting associations among drugs and diseases, with a need to additionally extract other relations among other concepts.
Mala and Lobiyal [57] relied on ontologies for extracting concepts and offered an algorithm to locate and identify concept-based clusters. They then went on to label semantic weightage for all terms for every document. They resorted to using a tagging mechanism commonly known as POS (Part of Speech Tagging) to locate nouns in addition to utilizing Rapid Miner for text mining method such as text processing. The use of medical ontologies can also enhance the outcome of this method.
Roth et al. [58] had an objective to extract from biomedical literature information that was supportive of Protein-Protein Interactions (PPIs) that were of a predictable nature. The demonstrated results of the relation extraction show that an f-score of 0.88 was witnessed on the HPRD50 corpus, and the similarities in semantics that were calculated with an angular distance were also proved to be statistically considerable.
Jimeno Yepes and Berlanga [59] suggested an innovative technique to create word-concept probabilities from knowledge bases (KBs), which could then act as a foundation for numerous text mining jobs. The findings indicated that this www.ijacsa.thesai.org technique secured enhanced accuracy when compared with other state-of-the-art methods, particularly in the context of the MSH WSD data set. However, the present refinement implementation does not attempt to recognize or locate new synonyms for prevailing concepts; rather, it only attempts to tag the data by quantifying the frequency of usage within a specific concept. It does not attempt to locate or unearth new concepts that are not found in the knowledge base. It would be worthwhile to evaluate information extraction methods in order to locate and recognize new synonyms [60] for concepts that are both prevailing and new.
Meaney et al. [61] debated the changes and patterns in the use of techniques in statistics and epidemiology found in medical literature from the last 20 years. Furthermore, the research proposed a method to improve the text-mining approach and incorporated advanced retrieval techniques to gauge the ratio of articles. This method referred to a specific technique that was statistical or epidemiological : this is where further study needs to be undertaken by the team [62], [63]. A statistical machine translation approach [64] and a Bayesian information extraction network for the Medline abstract [65] are used in the proposed text mining system to deal with this problem.

B. Text Mining Methods
Berardi et al. suggested a framework that assists biologists in automatically extracting information from machinereadable documents or texts. These extraction models were later used on unobserved texts in automatic mode. They reported an application that was a real world dataset compiled by publications, which were in turn chosen to aid biologists in annotating an HmtDB database.
An extension of the Okapi retrieval system that was effectual for mining biomedical text has been suggested by [66]. This lead to two advantages in the system when compared with other models. First, this method is uncomplicated to implement and is not tagged to any domain. Secondly, it has proven its competence and effectiveness in TREC Genomics experiments. Despite the fact that the suggested extension is effective in discerning the subtle variations in the verbiage of a biological entity, it does not offer any comprehensive solution to encompass all variations in that lexicon. But, this algorithm cannot serve to be its identifying factor. Henceforth, such variations would be discerned through a query expansion algorithm.
A text summarization algorithm that used scientific literature in biomedicine which discerns the focal topic of biomarker cancer discoveries and all information in the literature that is deemed vital was suggested by Islam et al. [67]. The purpose of this study, however, needs to be directed towards extracting more specialized information on protein structure and image data mining. Also, the system needs to be optimized to handle large loads with quick response and must support multiple databases.
Liu et al. [68] introduced a study regarding names in the Bio Thesaurus, which was, in turn, collated from multiple databases present in a free-text by utilizing a data set that was automatically created from cross-referencing in the UniProtKB. The findings proved that using different resources to put together synonyms for biological identities can result in optimized coverage for nomenclature present in the text while utilizing matching that is able to be adjusted. But, flexible matching creates more ambiguous situations for English words that are common. This results in the need to narrow down the confusion between common English vocabulary and biological identity nomenclature through corpus-based word sense disambiguation.
Leroy et al. [69] created text mining tools that indicate cooccurrence relations among concepts. Engaging subsets of relations are mined through statistical measures. In addition, the researchers proved the manner in which these relations were directed had an effect on the amount of interest. To summarise, the numerous relations and their assistants were quantified. The differences in direction had a remarkable effect on the number of relations, and it also included the firm support of different types of graphs. The consequences of directionality on bigger graphs were not considered, however.
Salahuddin and Rahman [70] attempted to analyze and collate biomedical data from hypertext documents by utilizing text mining methods with the assistance of biomedical ontology. The matching and layout of the biomedical entity from the Metathesaurus were performed through a query on a keyword. However, this study focused on data in documents alone. Documents contain both textual information and visual imagery, and hence, there is a need to take into consideration the relevance of images in medical documents and attempt to give ranking to the documents based on the combined textual and image content.
Ronquillo et al. [23] proposed a program for automatic categorization of biomedical text. The results achieved pertaining to performance and execution timing are more positive when compared to the results obtained earlier and used in Weka, and what is known as the baseline system. This system has certain limitations, however, especially when it needs to show the difference between texts regarding hearing loss classified as syndromic and nonsyndromic. For the purpose of improving categorization, this method will be used to locate and indicate symptoms and genes that are related to both types of hearing loss.
Hou et al. [71] proposed two options to help in directing the relation between genes and diseases (a) utilizing proximity relationship among genes and diseases, and (b) using GO terms that are prevalent among genes and diseases for the purpose of comparing similarity. Experiments demonstrate that relations using GO terms function better than utilizing word proximity. This proves that GO terms serve as a better option for good gene-disease association. But, this only concentrated on the aspect of the relationship. Additionally, there is a need to focus on applying prediction of gene-disease relationships apart from the OMIM database.
A text mining technique that extracts numerous entities from biomedical text had proposed by Javed and Afzal [72] where candidate terms are discerned through the application of an algorithm known as the C-Value. These candidate terms and prevalent terms used in Seed/Ontology are labeled in the corpus. By resorting to the assessment of profiles that were www.ijacsa.thesai.org lexical and contextual in the comparison between candidate terms and the prevailing Seed/Ontological Terms, it was possible for them to discover novel ideas and assess them. This study required an enhancement to the categorization of included measures that resembled each other, such as Word Net to discern the link between two terms. Table 1 where each method is briefly identified and then analyzed. Other methods which include knowledge extraction and data mapping techniques have been classified in the next sections along with they're summary in Table 2. The evaluation includes some major limitations in each method which need to be recovered for potential researches and experiments. The summary of previous studies related to biomedical data mapping techniques are discussed in Table 3. Finally, the recommendation and implication of this research are discussed in Table 4.

C. Knowledge Extraction Methods
Jahiruddin et al. [74] introduced an innovative Biomedical Knowledge Extraction and Visualization framework (BioKEV) which is used to discern and isolate vital information components from biomedical text documents. The method of information extraction was based on NLP or Natural Language Processing methods and analysis that were also based on semantics. Additionally, it was suggested that a ranking system for documents needed to be in place to refer to retrieved documents in the same relevant order as queried by the user. Furthermore, they improved the format of the query processing module to render it compatible with a high degree of efficiency when searching biomedical queries of a complex nature.
Sharma et al. [75] concentrated on discovering the task and extracting relations that were witnessed between certain bioentities, like green tea and cancer of the breast. Additionally, a verb-centric algorithm was suggested to be put in place. This system locates and extracts the primary verb(s) observed in a sentence; hence, there is no requirement for a separate set of rules or patterns. The algorithm was assessed in numerous datasets and observed an average of F as 0.905, which is considerably more than what had been previously achieved.
However, a framework called Feature Coupling Generalization (FCG) for the purpose of developing novel features from untagged data has been suggested by Li et al. [76]. This framework chooses Example-Distinguishing Features (EDFs) and Class-Distinguishing Features (CDFs) to recognize the gene entity name (NER), extract the proteinprotein interaction (PPIE) and classify the gene ontology (GO). Additionally, the performance of baselines that are under supervision was improved by 7.8 %, 5.0 %, and 5.8 %, respectively, in all three tasks. But this study does not justify the reason for the workings of FCG and the reasons that determine EDFs' and CDFs' qualities.
Holzinger et al. [77] proposed a Sequence Memorizer Based Model (SMBM) that had its roots in what was known as the generative model to oversee its functioning. This method resorted to the utilization of the generative strategy in order to avoid the option of selecting work that was timeconsuming. While ensuring the advantages of models that were generative in nature, the functionality of this technique can be compared to that of the Maxent model.
Holzinger et al. [77] offered a way to assess knowledge discernment of disease-disease relationships for rheumatic diseases. Also, they resorted to utilizing a Point wise Mutual Information (PMI) calculation to identify a relationship's strength. The output indicates concealed knowledge in articles www.ijacsa.thesai.org pertaining to rheumatic diseases that were indexed by MEDLINE, and which could be used by medical experts and researchers for the purpose of making medical decisions. This study also needs to concentrate on collecting the names of diseases, nomenclature/codes of diagnosis and treatments to observe the extent to which identification of diseases in the searched content can be improved through screening for diagnosis and treatment of such diseases.
Pereira et al. [78] developed an integrated approach for the reconstruction of Transcriptional Regulatory Networks (TRNs), which retrieve the relevant data from important biological databases and insert the result into a unique repository named KREN. Further, they integrated this into the Note software system, which allows some methods from the Biomedical Text Mining field, including algorithms for Named Entity Recognition (NER), extraction relationships between biological entities and extraction of all relevant terms from publication abstracts. Finally, this tool was extended to allow the reconstruction of TRN using scientific literature.
Landge and Rajeswari [79] conducted an overview of the comparative analysis of numerous techniques employed in determining the relation between chemical entities, and also reviewed the comparative analysis of numerous text mining methods. Further, they suggested to using a parallel approach to text mining towards minimizing the time needed by their method. Conventional algorithms can be parallelized and applied to mine and extract information and knowledge from a large data set.

D. Biomedical's Data Mapping Techniques
Cano et al. [81] suggested an approach that was hybrid in nature for the purpose of mining or unearthing the vast knowledge that was accumulated in the scientific literature. This method has its foundations on the utilization of effectively mining text through tools that work in tandem with precise and collaborative human duration. To demonstrate the effectiveness of this method, this study requires quantification of the time that is reduced in performing tasks. This leads to an observable upgrade in the state of information regarding the remaining portion of the knowledge content and ensures that active learning techniques are put into use for assigning priority to the annotation process.
Yang and Dong [82] proposed a mapping-based approach by first mapping bio-entities to terms in an established ontology Medical Subject Headings (MeSH). Specifically, they present two approaches to mapping biomedical entities identified using the Unified Medical Language System Met thesaurus to MeSH terms. The first approach utilizes a special feature of the MetaMap algorithm, and the second employs an approximate phrase-based match to map entities directly to MeSH terms. These two approaches deliver comparable results with an accuracy of 72% and 75%, respectively, based on two evaluation datasets.
Mohammed and Nazeer [83] suggested an enhanced system of text mining that was focused on the method of matching patterns and heuristics that reduced space and increased the recall and accuracy. The system recalls, f-factor and precision were assessed through three metrics. The output of the experiments resulted in a recall of 98.68% and precision of 98.68%.The system has a drawback, though, in that it placed restrictions on the format of candidate acronymdefinition pairs, which means that they needed to appear as either an acronym.
Ji et al. [84] created a Map Reduce algorithm to calculate the strength of association among two biomedical terms www.ijacsa.thesai.org witnessed in biomedical documents. Additionally, they evaluated if the algorithm was scalable by utilizing 3,610 documents retrieved from biomedical journals. Further, they demonstrated that this algorithm was linearly scalable when measured in the context of the number of nodes in a cluster. This method was only tested on a limited number of clusters with a reduced dataset, therefore leaving an additional need to assess the scalability of the algorithm in the context of the dataset size. Moreover, the algorithm needed to be enhanced in efficiency and accuracy.

IV. MATERIALS AND METHODS
A comprehensive literature search of mining for text information in a medical database was conducted using a database such as Google search, Elsevier, IEEE, and Springer digital library and other literature sources. The searches were restricted to the years 2005 to 2016. The retrieved articles had text summarizations, like clinical, biomedical and medical summarization. This kind of search approach was applied to the web supplement. Additionally, searching through the collected database investigated the references of the included articles with an uncommon spotlight on past pertinent surveys. As seen in this review, the search retrieved a total of 112 potentially suitable articles to fulfill the inclusion criteria required for this review. Here, this study included unique examinations concentrated on the created and assessed text summarization techniques in the therapeutic areas, together with a summarization of electronic health records and biomedical collected works. " Fig. 4" shows the study flow diagram based on the PRISMA guidelines for reporting systematic reviews. However, the studies that met any of the following norms were excluded: images and multimedia summarization without a text summarization component, summarization of substance outside the biomedical area and non-English may have missed frameworks that compress content in different languages.  [52], [70], [82] have suggested a structure for choosing a suitable ontology for a specific biomedical text mining application. However, this study needs to focus more on handling the complex biomedical words. Additionally, this study only concentrated on document data. The archive is improved with both printed data and pictures. Therefore, this study needs to consider the significance of pictures in medicinal reports and attempt to rank archives both on the premise of printed data and picture data [67], [70]. Also, a focus on gaining higher mapping accuracy should be included.
(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 8, 2017 32 | P a g e www.ijacsa.thesai.org Some studies focused on event extraction applications in biomedical text, such as [85], [86]. On the other hand, some concentration was held on security-based event extraction applications, such a sin [87]. Hogenboom et al. [88] reviewed event extraction methods from the text for decision support systems. They extracted the biologist's data automatically from the text, though some researchers had proposed [68], [72], [73] a mining based framework. However, this study needs to focus on reducing redundancy in data as well as improving the classification by adding similarity measures in order to extract the biomedical term proposed [71], [84] association rule mining approach. However, this method was tested on a fewer number of the clusters with low-scale datasets. also, there is a need for further refinement of this algorithm to improve the overall efficiency and accuracy. Landge and Rajeswari [79] reviewed the comparative analysis of various text mining methods to find an association amongst various chemical entities. They also discussed that text mining algorithms take a large amount of time [81] for the huge data sets. For this purpose, they suggested using the parallel approach of text mining towards minimizing the time over huge datasets.
Few of the previous studies have proposed a framework for identifying key information components from biomedical text documents, such as [55], [82] and [74], [84]. But, the precision of the extricated relations was influenced by various issues, for example, the impediment of the extraction designs and the nature of the sources.  Aimed to study the Bio Thesaurus. Nonetheless, the examination of sets with names was neglected, which demonstrates that there are a few equivalent words in the content that were neglected to be caught in the Bio Thesaurus [59], [60]. [54], [56], [77], [80] all extracted keywords from biomedical records. However, this study only focused on the biomedical literature corpus and could be adapted to literature retrieval in other domains [89], [90]. Hou et al. Sharma et al. [71], [75] focused on mining associations amongst bioentities, like breast cancer and green tea. However, this study only concentrated on a specific data set. Furthermore, this would take a shot at the undertakings of categorization and relationship integration [23], [72], [76] which proposed an algorithm for categorizing biomedical text in an automatic manner. However, this system needs to improve the classification to achieve a higher performance [58]. A deep validation process in order to compare this method with the existing regulatory model is still necessary. Meaney et al. [61] recommended, enhancing the text mining technique towards a retrieval approach or highly sophisticated preprocessing [35],which could be utilized to evaluate the extent of articles referring to a given epidemiological or statistical technique [62], [63].
It is hence clear that biomedical text mining has great potential. However, that potential is yet unrealized. In the following years, text mining should be able to evaluation validate the results of analytical expression methods in identifying significant groupings of data [91]. Text mining researcher should co-operate with biology researchers in this interdisciplinary area. The following are some of the potential "New Frontiers" in biomedical text mining: Questionanswering, Summarization, Mining data from full text (including figures and tables), User-driven systems, Evaluation [92] Now, this is an exciting time in biomedical text mining, full of promise.

VI. CONCLUSION
In this research, the researcher discussed and analyzed text mining techniques for biomedical data retrieving from the pool of documents on the web. From the literature, the biomedical record recovery strategy demonstrates about ideal results. In any case, the significance of a web report significantly relies on upon client's need that implies how much applicable the web record is as indicated by the client question. More effective text processing approach will provide an ideal result for the retrieval of the document from the web. Proficiency in processing mainly depends on time, but the calculation of time for ranking is a critical issue in implementation. As the web contains a large number of reports, offline estimation approach is not estimated effectively by any of existing approaches. Due to the complexity of Natural language processing, there is a broad examination in this field. So in future, it is necessary to concentrate more towards an effective method for capturing the meaning as well as relationships of words present in the document.
Based on the above review, future studies need to focus on:  Cognitive aspects of text summarization which include visualization techniques, and evaluations of the impact of text summarization systems in work settings.
 Need to enable summarization corpora and reference standards to support the development of summarization tools in various applications.
 The increasing interest of users in efficiently retrieving and extracting relevant information, the need to keep up with new discoveries described in the literature or in biological databases, and the demands posed by the analysis of high-throughput experiments, are the underlying forces motivating the development of textmining applications in molecular biology. Those technologies should provide the foundation for future knowledge-discovery tools able to identify previously undiscovered associations, something that will assist in the formulation of models of biological systems.
 Need to enable publicly available summarization corpora and reference standards to support the development of summarization tools.
 Need to improve the classification and mine the data towards getting higher performance

Recommendation Definition
Text Summarization Further research is required in the subjective parts of text summarization, together with visualization method and the assessments towards the effect of text summarization systems in work settings.

Summarization Tool
Need to permit the reference standards and summarization corpora towards supporting the advancement of summarization tools in different applications.

Databases
The expanding enthusiasm of clients in productively recovering and separating important data, the need to stay aware of new disclosures depicted in the collected work or inorganic databases. Also, the requests postured by the investigation of high-throughput investigates, investigation are the basic powers spurring the improvement of text mining applications in sub-atomic science. Those innovations ought to provide an establishment of future information disclosure devices ready to distinguish already unfamiliar affiliations that will help with planning models of organic frameworks.

Higher Performance
Need to concentrate towards increasing the classification and mining the data for the attainment of higher performance.