Recovery of User Interface Web Design Patterns using Regular Expressions

User Interface Web Design Patterns are standard solutions for the development of web applications. The recovery of these patterns from web applications supports program comprehension, reusability, reverse engineering, re-engineering, and maintenance of legacy web applications. The recovery of patterns from web applications becomes arduous due to the heterogeneous nature of web applications. Authors presented different catalogs and recovery approaches for extracting User Interface Web Design Patterns from source code in last one and half decade. There is still a lack of formal specifications for web design patterns, which are important for their recovery from source code. The objective of this paper is to specify User Interface Web Design Patterns (UIWDP) using semiformal specification technique and use these specifications for the recovery of patterns from the source code of web applications using regular expressions. 55 feature types are identified for the specification of 15 UIWDPs. We evaluated our approach on 75 randomly selected web applications and recovered 15 UIWDPs. The standard deviation, precision, recall and F-score measures are used to evaluate the accuracy of our approach. Keywords—Design patterns; user interface patterns; web applications; web reverse engineering; regular expressions


INTRODUCTION
The World Wide Web has surprisingly affected many aspects of our life and it will continuously influence society.Due to increasing popularity and use of web applications (WAs) in all fields of life, the WAs are subject to continuous evolution, maintenance and re-engineering [1], [10].WAs demand continuous evolution due to different reasons such as improvement in usability, quality, efficiency, correction of bugs, introducing the new functionality and modifications in legacy WAs [22], [27].User Interface Web Design Patterns are standard solutions that are frequently used for developing web applications.The appropriate use of UIWEDPs not only creates consistency among different web pages of web applications, but it also establishes efficient layout of web pages [3]- [6].The automatic recovery of UIWDPs supports comprehension, maintenance, evolution and reengineering activities.
Web application development too often had no design and modeling principles in early days of web application development processes.Mostly, developers apply ad hoc approaches which make the process of extracting information from web applications difficult.The recovery of information from legacy WAs is a daunting task due to the heterogeneous nature of web applications.Web applications are developed using different technologies such as Java Scripts, HTML, DHTML, XHTML, XML, CSS, ASP, PHP, DOM, database, images etc. Single language based static analysis tools are not capable to extract information from multilingual aspects of WAs.
A major motivation for reverse engineering of web applications is to reuse the tested, reliable and valid artifacts of legacy applications in the development of modern applications.The recovery of UIWDPs helps to comprehend architecture of web applications.The UIWDPs are not just about buttons and menus; they are about the interaction between the users and applications or devices.The UI WDPs are used to create consistency throughout the web development process to give the best, attractive, user-friendly, most usable and effective layouts to the WAs.The different authors have presented various catalogs of UIWDPs in the last one and half decade such as UI pattern library [4], Welie patterns library for interaction design [5], 10 UI design patterns by Jovanovic [9] and 15 UI design patterns by Kayla Night [6].All the authors of these lists focused only on the organization, usage, naming, problem summary and examples of UIWDPs.
Most previous contributions [11]- [15] in the field of UIWDPs focused on the collection, organization and applications of UIWDPs.The recovery of these patterns from WAs requires standard and formal definitions which are not yet available to the best of our knowledge.Some approaches recover different components and models from web applications, but they did not focus on recovery of specific patterns with complete information.The concept of recovering UIWDPs is applied in [31].The applied approach only extract whether a pattern is present in a web application or not.The approach is also limited to recognize patterns with fixed tags and fails to recognize the same pattern when it is implemented by using different constructs.Our approach recognizes patterns from WAs along with complete information about the location www.ijacsa.thesai.org of patterns which is important for the analysis of source code.We propose a lightweight approach based on lexical analysis to search patterns by file name, file path, line number and its occurrences in the WAs.The regular expression patterns are used in this paper and are easily customizable while handling variations in the detection of UIWDPs.

Our work focuses on following contributions:
 Summarization of up-to-date state of the art work on web design patterns recovery;  Semiformal specifications of UIWDPs;  Automatic recovery of UIWDPs from web applications;  Evaluation of approach.
The rest of this paper is organized as: In Section II, related work is discussed.Section III presents semiformal specifications of the web user interface design patterns used by our approach.This paper presents concept and architecture of our approach in Section IV.Section V presents the evaluation of approach and Section VI concludes the whole approach.

II. RELATED WORK
A number of approaches and tools are presented for recovery of information from web applications.The comparative overview on features of different approaches presented by different authors is given in Table I.The key factors for comparison are tool support, model/framework, source code languages, Analysis type, technique applied and experimental case studies.The important approaches are listed so far in Table I and discuss selected approaches in the following paragraphs.
According to the experiment of the Carlos et al. [28], the proper use of UIWDPs [7] in the development of web applications had positive impact on the quantitative/qualitative performance and usability of WAs [8].Different libraries of UIWDPs such as Danish web developer [4], Welie (a pattern library for interaction design) [5], 10 UI design patterns by Jovanovic [9] and 15 UIWDPs by Kayla Night [6] are presented.The information about patterns in these libraries is available in informal languages.Yingtao et al. [37] presented a reverse engineering approach to extract presentation layer from web applications.They specify recovered features in the form of WSDL and these features can be deployed through proxies accessing the original web server and parsing its responses.Authors recover functionalities of websites from the presentation layer instead of focusing on all the source code.The applied recovery process consists of five Components (page collector, pattern miner, pattern visualize, translator and service interface editor).Amalfitano et al. [34] presented a tool for automatic reverse engineering of dynamic web applications using source transformation technology.Authors extract UML sequence diagrams from the execution traces generated by the resulting instrumentation.The result can be directly imported and visualized in a UML toolset such as Rational Software Architect.The extracted results can be imported and visualized in any UML 2.1 toolset.Authors apply filter execution traces directly on information stored in a database that automatically eliminates redundant information which complicates the understanding process.The approach was limited only to reverse engineering of PHP-based web applications.Bochiha et al. [15] proposed a semi-automatic approach for re-engineering of multi-language based WAs and recovered the conceptual model by using syntactic and semantic information hidden in the LWAs.The presented framework is based on two steps: web application reverse engineering and SWS forwarding engineering.Reverse engineering step extract class and activity diagrams and forward engineering process generates WSDL and WSMO semantic descriptions.As compared to our approach, authors did not focus on the features of UI patterns that are necessary for pattern"s recovery, maintenance and re-engineering from LWAs [23].
Staiger [55] presented an approach for reverse engineering of GUI components from different applications using static analysis.He extracted control flow graph for the examined applications.Author maps source code constructs with GUI components and detects relationship between GUI components through event handlers and their callers.Bauhaus tool is applied by the author to extract GUI components from the source code of C/C++.Norizan et al. [16] performed the survey to extract the list of user interface design patterns and their impact on the usability of WAs.In their approach, they tried to recover groups of UIWDPs that were used collectively and their impact on the quantitative/qualitative performance and usability of WAs with help of check list based survey.But in our approach, we recover a single or group of UI patterns from LWAs on the basis of pattern"s features.
A number of approaches [10]- [13], [35], [36] recovered UML models for the comprehension of behavioral, structural and relational aspects of WAs but these approaches did not focus on the recovery of UIWDPs and their occurrences in WAs.The Marchetteo et al. [14] proposed ReAjax tool to reverse engineer only Ajax based applications.Echeverrıa et al. [17] used MoDisco to reverse engineer only the Strut based WAs.Martin et al. [2] proposed reusing of software engineering tool Rigi as a means of analyzing and visualizing the structure of web applications.Chung and Lee [8] proposed an approach for reverse engineering of Websites and adopt Conallen"s UML extensions [53] to describe their architecture.Rasool et al. [32] presented an approach to recover general artifacts from legacy software applications based on abstract regular expressions.Draheim et al. [56] presented a tool that constructs analysis from website based on the concept of fromoriented analysis.Bouchiha et al. [57] presented an ontology based approach for reverse engineering of web applications.Boldyreff et al. [18] proposed a system that exploits traditional reverse engineering techniques to extract duplicated contents and styles from websites in order to restructure them and improve their maintainability.www.ijacsa.thesai.orgAuthors in approaches [17]- [20], [24], [25] used Eclipse and WARE [21] tools to create the intermediate representations of LWAs.They integrate reverse engineering and model driven engineering to extract the conceptual models of LWAs and user interaction artifacts from web applications.All applied approaches did not address the locations and occurrences of UI patterns in the code that are necessary for maintenance, correction of bugs and abstraction.Key features of different approaches are summarized in Table I.
We see from Table I that most reverse engineering approaches presented for recovery of information from web applications start their process from the selection of a model or a modeling language that will generate an intermediate format of the legacy web application.These intermediate formats may be used by one or more tools for recovering different artifacts from the source code.Some approaches use transform techniques, queries and algorithms to recover different types of artifacts and their relationships from the web applications.Many recovery tools accept models as input and recover different types of information from examined applications.The web models act as main source for the comprehensive reverse engineering of legacy web applications.

III. SEMIFORMAL SPECIFICATION OF UIWDPS
The list of UIWDPs proposed by Kayla Night [6] has been used by our approach.We realize that patterns presented in this list are more generic which include maximum UIWDPs that are included in other lists as well.We take user interface graphical items, lexical items and feature types for the specification of patterns.Graphical items are specified with the help of different tags such as HTML, Form, DIV etc. Lexical items are captions/ labels that describe graphical items.For example, user login user name and password are lexical items that are used to describe graphical items.We use the concept of feature types presented in our previous work to specify all 15 UIWDPs [50].The feature types presented in this paper are different from feature types used to specify GoF (Gang of Four) design patterns in previous work.Feature types in this approach are based on graphical and lexical properties of patterns.They refer to different characteristics of patterns implemented by developers using different tags.The worth and quality of our specifications are based on the appropriate selection of feature types that are used by web designers in the specific area with specific sequence to implement the UIWDPs in the web applications.We identified 55 feature types presented in Table II to specify all 15UIDPs and their variants.We don"t claim that these 55 features can be used to specify User Interface design patterns of different other lists.These features are presented in Table II.We select four UIWDPs (Login, Navigation Bars, Bread Scrum, Lazy Registration) to demonstrate our approach with their intent, visual diagrams and specifications as given in the following subsections.The specifications of rest of patterns are available on our web source [29].

A. Login Pattern
Login Pattern is very common in web applications.The intent and specification of login pattern is given below: A login pattern [26] is required when users need to identify themselves to either gain access to a restricted area or experience a more personalized user interface based on information provided previously.Fig. 1 below presents the screenshot of a Login pattern. Specification Specification of login pattern described as feature types and diagrammatic notation is presented in Table III and Fig. 2. We consider variations used by developers while implementing these user interface web design patterns.Login pattern may have features (F11, F12, F13, F14, F15) or (F11, F12, F13, F14, F15, F16).

B. Navigation Bar Patterns
Navigation Bar pattern has become a key feature of all web applications.The intent and specification of Navigation Bar pattern is given below: Navigation Bar pattern is used when the user needs to locate contents and features necessary to accomplish a task.A vertical/horizontal navigation is a quite common layout that gives much emphasis to the vertical/horizontal orientation.

C. Breadcrumbs Pattern
Breadcrumbs Pattern is mostly used when websites follow a hierarchical structure.The intent and specification of Breadcrumbs pattern is given below.
The web users require the tracking of complete browsing path form home to its current location in order to possibly switch back to a higher level in the hierarchy [3].Fig. 5 presents a screenshot of Breadcrumbs pattern.

D. Lazy Registration Pattern
Lazy Registration Pattern lets the user browse the website without formal registration.The intent and specification of Lazy Registration pattern is given below.
Signup forms have long worked the casual visitor.During the process of discovery, nobody wants to stop and fill out details before they can "unlock" the rest of the site"s potential.As web users become more and more fickle, signup forms are becoming an increasingly large barrier that repels many prospective visitors from great sites.Fortunately, there is a new signup system in web designing that is making it much easier for the visitor to interact with the site and it increases signups.Fig. 7 presents a screenshot of Lazy Registration pattern.

Features of Lazy Registration Pattern are given in Table VI
and diagrammatic specification is presented in Fig. 8.

IV. DETECTION APPROACH
A lightweight and customizable approach for recognition of UIWDPs from WAS has been presented.Our approach can handle variations in patterns by customizing specifications and regular expression patterns.A number of regular expression parsing tools are available, but PowerGrep [33] has been selected due to its excellent features and free availability.It can match patterns through large numbers of files/folders in multiple formats.PowerGrep [33] is also capable to map one or more than one patterns at the same time.State of the art approaches [16], [21], [23], [31] discussed in Section II only indicate the presence or absence of patterns by using SQL, checklist survey and descriptive analysis of WAs.The proposed approach takes semiformal specifications of patterns, source code and regular expression patterns as input and recognizes patterns with detailed information.We also plan to automate the process of writing regular expression patterns directly from specifications in the future.The architecture of the approach is given in Fig. 9.

A. Automated Experiment
In order to validate our semiformal specifications and regular expression patterns, we performed experiments on 75 different websites using PowerGrep [33] tool and recovered instances of 15 selected patterns.PowerGrep tool has excellent capabilities for parsing source code by using regular expressions.The recovered results are presented in Table VIII.

B. Manual Experiment
Manual analysis of source code is very time consuming and daunting task.It was important for us to validate our semiformal specifications of UIWDPs in the source code manually.In order to recover a pattern, we start searching from folders, subfolders and files, line by line, that was very time consuming and very difficult to mark starting and end point of the patterns.The multiple occurrences of patterns and finding the precise location of patterns in the code are very hard.
There exists slight difference between the manual and automated results due to the several factors.One of the major reasons is that different web designers implement UIWDPs according to their skill, nature of applications and frameworks.The manual detection involves personal experiences and understanding of multi-language source code, but our approach can recover patterns automatically by following the semiformal specifications and regular expression patterns.
The difference between automated and manual experiments with their Standard Deviation (S.T.D.) is shown in the following Fig. 10.

C. Accuracy of Approach
The accuracy in the universal statistical sense indicates the proximity of calculations or estimates to the accurate or exact results.To measure the accuracy of our approach, we compute the Precision, Recall and F-Score for our recovered results.
The data retrieval techniques can be evaluated by using the Precision and Recall metrics.This matrix method is popular for the evaluation of patterns extraction approaches.The Precision and Recall have been used to evaluate the quality of different systems from the last few decades.This method can evaluate how many patterns retrieved are relevant and how many relevant patterns are retrieved [51].The accuracy of any approach can be measured by finding the relationship between the Precision and Recall metrics.In the ideal situations, the Precision results of an approach should remain high when the Recall increases [52].The following parameters are used to calculate the Precision and Recall for pattern recovery techniques as given in Table IX.

Precision:
The fraction of retrieved documents those are relevant: P=TP/(TP+FP) Recall: The fraction of relevant documents that are retrieved: R=TP/(TP+FN) F-Score: The accuracy of pattern recovery approaches can be effectively measured by using the Precision and Recall, although the combination of both factors yields a combined effect.This common factor for evaluating the Precision and Recall metrics for any recovery technique is addressed by Peterson et al. [30].They proposed a standard solution by using the Precision and Recall of any approach.They defined F-score (Fw) as: Fw = (1+W2)(PR)/(W2P+R) The value of W is constant (W =2.28).If the Precision and Recall of any approach are high, then F-Score obtained is also high.The Precision and Recall for manual and automated experimental results are given in Table VIII and presented in Fig. 11.The average Precision, Recall and F-Score is 98%, 93% and 94% respectively for all 15 UI WDPs detected from the 75 web applications.

D. Validity Threats
Validity is an important concern for empirical acceptance of results extracted by different approaches.A major threat to results of our approach is a lack of standard definitions for UIWDPs and their variants.These definitions are not available in literate to the best of our knowledge.Internal validity refers to the consistency of measurements across all methods and tools.It is affected by experimental biases.We tried to mitigate this threat by manually performing experiments on source code of 75 selected web applications.All the experimental results of our approach are presented in paper and researchers can validate our results.External validation is affected by generalization of results.We specified all 15 UIWDPs using semiformal specifications and these specifications are available on web for community.The source code of 75 web applications is also freely available.One possible threat to external validity of our results may be application of different constructs for the implementation of UIWDPs.Reliability validity affects the replicability of our results.All selected web applications and their source code are available on web for validation.Reliability validity threat is eliminated because our regular expression patterns can be validated with different regular expression parsing tools which are available on the web.www.ijacsa.thesai.org

VI. CONCLUSION
UIWDPs recovery from WAs is a challenging task due to ever increasing applications of new technologies for the development of web applications.The recovery of information from web applications provides valuable information to maintenance, comprehension, refactoring, reuse and reengineering disciplines.A number of techniques and tools are presented for recovering information from web applications, but they are not capable enough to deal with the heterogeneous nature of web applications completely.The recovery of information from web applications is difficult due to number of technologies and external dependencies in web applications.State of the art approaches focused on extraction of UML models from WAs.In this paper, we present an approach that can recover UIWDPs from LWAs with their necessary attributes such as filename, line number, numbers of matches per file, etc.The deviation in the automated results extracted by our approach and manual results shows that there is no consensus on the definitions of UIWDPs from community.The implementation variations are another cause in the disperse results extracted by our approach.Our approach can handle multi-language source code partially for recovery of patterns directly from the source code.Moreover, the recovered UI pattern"s information can be effectively used in the maintenance, abstraction, comprehension, upgradation, migration of applications from one framework to another and re-engineering of LWAs.In future, we plan to extend the scope of our automatic recognition of UIWDPs approach on other UI pattern libraries such as Yahoo pattern library, Weli pattern library, etc.

Fig. 3 .
Fig. 3. Navigation pattern. Specification Features of Navigation Bar pattern are given in Table IV and diagrammatic specification is presented in Fig. 4. "*" means that a feature type or group of feature types can repeat.The first variant of Navigation Bar pattern has repeating features (F3, F5, F4) as shown in Fig. 4. Features for other variants of Navigation Bar are given in TableIV. .

Fig. 5 .Fig. 6 .
Fig. 5. Breadcrumbs patterns.SpecificationFeatures of Breadcrumbs Pattern are given in Table V and diagrammatic specification is presented in Fig.6.The sequence of features is important for specification and recovery of Breadcrumb.The standard code contains the following tags:

TABLE I .
COMPARATIVE OVERVIEW OF FEATURES OF WEB RECOVERY APPROACHES application www.ijacsa.thesai.org

TABLE II
the path to this page www.ijacsa.thesai.org

TABLE III .
FEATURES OF LOGIN PATTERN

TABLE IV .
FEATURES OF NAVIGATION BAR PATTERN *: Means that feature type can repeat

TABLE V .
FEATURES OF BREADCRUMBS PATTERNS

TABLE VI .
FEATURES OF LAZY REGISTRATION

TABLE VII .
REGULAR EXPRESSION PATTERNS

TABLE VIII .
EVALUATION RESULTS SD: Standard Deviation, TP: True Positives, FP: False Positives, FN: False Negatives, Pr: Precision, Rc: Recall, Fs: F-Score, Tot: Total Files, Imp: Files in which pattern is implemented, M: Manual Result, A: Automated Result, TM: Total Manual Instances, TA: Total Automated Instances

TABLE IX .
PRECISION AND RECALL