Code Readability Management of High-level Programming Languages: A Comparative Study

Quality can never be an accident and therefore, software engineers are paying immense attention to produce quality software product. Source code readability is one of those important factors that play a vital role in producing quality software. The code readability is an internal quality attribute that directly affects the future maintenance of the software and reusability of same code in similar other projects. Literature shows that readability does not just rely on programmer’s ability to write tidy code but it also depends on programming language’s syntax. Syntax is the most visible part of any programming language that directly influence the readability of its code. If readability is a major factor for a given project, the programmers should know about the language that they shall choose to achieve the required level of quality. For this we compare the readability of three most popular high-level programming languages; Java, C#, and C++. We propose a comprehensive framework for readability comparison among these languages. The comparison has been performed on the basis of certain readability parameters that are referenced in the literature. We have also implemented an analysis tool and performed extensive experiments that produced interesting results. Furthermore, to judge the effectiveness of these results, we have performed statistical analysis using SPSS (Statistical Package for Social Sciences) tool. We have chosen the Spearman’s correlation ad Mann Whitney’s T-test for the same. The results show that among all three languages, Java has the most readable code. Programmers should use Java in the projects that have code readability as a significant quality requirement. Keywords—Source code; high-level programming languages; Java; C++; C#; code readability; code readability index


I. INTRODUCTION
Software engineering is different in nature as compared to other engineering domains. Products may remain in use even if there are some imperfections in them. But a software product may go through several revisions even after development is completed until software becomes faults free. Otherwise customer may not accept and use it. Customers these days are very smart and want to know what is going inside the software and what does affect the future maintenance and cost.
Software go through several updates after the first version due to some reasons; a feature was not implemented that was required, a feature was incorrectly implemented, or a new feature is now required. This is known as maintenance and research shows that around 70% of the product cost is spent on the maintenance [2] as shown in Fig. 1. Software engineers need to ensure that the software they produce is easy to maintain. There are many factors that affect software maintainability and source code readability is one of them. Readability is how quickly a reader can read and understand  [12] the written text. Elements that make the text difficult to read and understand include; long lines, insufficient contrast, and long paragraph with no segmentation.
In a software product, readability means the ability to read documentation and source code [10]. The documentation serves as the means of communication among the stakeholders. But the research shows that agile teams focus on working software as compared to the documentation while communicating with the clients [10]. Collection of computer instructions that are written in high-level programming language is called source code. Source code is the significant part of software readability in terms of re-usability, cost, maintenance, and robustness. Software industry is facing problems to minimize the software development cost, which is affected by many factors. Researchers are trying to identify those factors and ways to eliminate or at least reduce their impact to reduce the overall cost. According to Collar et al. [11] improved readability saves developer's time while reading the code that eventually helps in bringing down the overall development cost. Readability is important not only during development time to improve software quality [1] but also during maintenance because reading the code is the first stage of maintenance [3]. Research also shows that the maintainability of a software is measured by the readability and understand-ability of code. [12].
If for a given project, project manager foresees that a large number of programmers will be required, programmers are geographically distributed, programmers will be changing over the period of time, new programmers will be hired, or customers will change the requirements then code readability becomes a major concern. Generally code readability is calculated using proportion between number of lines and the comments that are written for the programmer. The project manager should select a programming language, which is not only suitable for project's functional requirements but should also offer required level of readability. This selection is vital because the correct selection will positively affect the quality of the software.
In this research we have conducted a comparative study on readability of high-level programming languages. We have chosen Java, C++, and C# for this purpose. According to the TIOBE programming community index [15], Java, C++ and C# are among the top five high-level programming languages. These languages are maximally used, so we have computed the readability value of these languages. For this first we have devised a comprehensive framework and used it for the analysis. The analysis is three-fold, we have not only used general text readability indexes, code readability indexes, but also have included the expert opinions. The end results clearly shows that Java has been the best as far as readability is concerned among all.
Rest of the paper is organized as follows. Section II presents brief description on literature review of existing text readability assessment techniques. Section III covers all the proposed techniques for code readability analysis. In Section IV, we present our novel framework to perform comparative analysis among programming languages. Section V presents experiment details and results. We analyze results using statistical techniques in Section VI. Finally we conclude the discussion in Section VII and future directions in Section VIII.

II. LITERATURE REVIEW
In this section, we present literature review of readability metrics to assess the natural languages. Readability tests not only determine readability but also predict the reading ease. Most of the tests are language neutral but some of them are used for certain languages. We have used four natural language metrics for code readability assessment on the basis of their popularity and they are described in this section along with some others.

A. Coleman-Liau Index
Colman-Liau is a readability index similar to automated readability index (ARI) [16] but different from other indexes used to estimate the readability of text. This index is developed by Pahal et al. [3]. This index considers letters per word rather than text as a whole. It was used to calculate readability mechanically from samples of hard copy text. It does not require characters from words and it only calculates the length in characters. The formula of Coleman-Liau index is given below: In the above mentioned equation "L" is average number of letters, whereas, "S" is average number of sentences.

B. SMOG
SMOG stands for "Simple Measure of Gobbledygook". McLaughlin [14] created this index in 1969 in article, SMOG Grading. It estimates the time (years) to read the text required by any person. As compared to other readability metrics, SMOG is better and provide more accurate results. SMOG metric is calculated with the following formula: The Flesch-Kincaid [17] index is improved version of Flesch Reading Ease Readability Formula [3]. It checks the reading ease of the give text. If the value is high, it means the text readability is high. But if the value is low then it means text is difficult to read. The grade level is calculated with the following formula: Shorter sentences and words give best results. The score between 60 and 69 is considered average readability while score between 0 and 29 is considered confusing for the reader. The complete list of values and their interpretations is provided in Table I.

D. The Gunning's Fog Index
Gunning [18] propose this index and it is also known as FOG index in short. It can be calculated by using the following formula: The average sentence length is added to the percentage of hard word (PHW). And average sentence length (ASL) is calculated by ratio of words count to the total number of sentences. Ideal score for FOG readability is 7 or 8 and if score goes higher than 12, it is considered as hard to read text.

E. The Automated Readability Index (ARI)
Senter [19] design automated readability index (ARI) test to access the understandability of text. Word difficulty and sentences are used in ARI. ARI calculate the readability value and output will be compared with grade level. Here is the formula of ARI: Characters are the number of letters and numbers. Words are the number of words and spaces and sentences are the number of sentences.

III. CODE READABILITY INDEXES
The most important parameter of maintainable software is readability, because changes in the system are made through source code [3]. Less readable source code is harder to maintain than a code that is readable. Most of the time managers reject the code due to lack of code readability. In this section we present some code readability index that we find in the literature.

A. Deepa and Dua (2015)
Deepa and Dua [4] explain that readability depends upon simple sequences and unnecessary loops complicate the program. In this paper code readability is calculated on the basis of software developer judgment. Authors use two copies of the same program for their study. First copy of the program is less readable as proper indentation was not applied whereas the second copy was well formatted using a beautifier tool. Authors also propose a new metric for readability assessment. They perform experiments using novel readability metric and find out that the program written and formatted properly with the help of beautifier has more readability as compared to the other one. The metric that authors use have some parameters including; lines of code, line length, and number of comment lines, number of blank lines, number of lines after semicolon, number of spaces after directive statement and number of method.

B. Tashtoush (2013)
Tashtoush [5] develops an approach called "impact of programming features on code readability" (IPFCR). In this approach author studies the impact of various features and their effect on code readability. For evaluation he uses feature code readability tool (CRT). Author conducts the survey on a random number of expert programmers to access the level of impact. 25 readability features are proposed for survey; meaningful name, comments, spacing, indents, short scope, line length distribution, identifier name length, arithmetic formula, identifier frequency, if-else, nested if, switch, for loop, do while loop and nested loop [5]. Programmers evaluated features into positive and negative factors based on their understandability. The results are evaluated using SPSS statistical tool. ANOVA test is used to remove the biased from data. The top three features that come from survey were meaningful names, consistency and comments. And the lowest impact features were nested loops, arithmetic formula and recursive function. Some of them have neutral impact on readability.

C. Sivaprakasam and Sangeetha (2012)
Sivaprakasam and Sangeetha [7] have conducted a study that shows that readability has a global effect on software budget. In this paper authors define the relationship between software quality and source code readability. Mostly software metrics are used to measure the complexity of software. Authors have developed an automated readability tool, which is 80% more effective than human judgment. Authors have performed extensive experiments to evaluate the readability of code and for this they selected code snippets from the developed projects. The size of snippets is important because too small snippets may reflect incorrect or misleading scores. The scores authors have used range from 1 to 5 where 5 means more readable and 1 means least readable. Authors have ensured that all the snippets have some features including line length, number of character, identifier length indentation, loops and many other features. For a large number of experiments this technique is useful for conducting readability index.

D. Relf (2004)
Relf [8] examines in this paper that identifier naming standards that improve the code readability are acceptable by software professionals. Author claims that naming standards affect source code readability and that greatly impact code maintainability. To examine the impact of naming standards author collects 21 naming standards from research. These include multiple underscore characters, outside underscore character, numeric digits, naming convention anomaly, identifier encoding, short identifier name, long identifier name, number of words, class qualification, abstract words, constant qualification, numeric identifier name and some others. Author analyzes some codes written in ADA and Java programming languages and rates these programs on the basis of naming standards used from 1 to 5 (1 is strong acceptance and 5 is strong rejection). This study also states that expert programmers accept the naming standards more than the beginners.

E. DeYoung, Kampen, Topolski (1992)
An automated readability measure will be useful for developers during coding as it will continuously assessing their code and assisting them to improve. DeYoung et al. [9] examine the machine computable and human-judged program features. They identify that length of identifiers and are very useful in predicting code readability. Using analyzer generated quality of comments, logicality of control flow and meaningfulness of identifier names are studied to find out whether these predictors are worthy for readability estimation [9]. The proposed predictors increase the proportion of readability of judgments from 41% to 72%. Authors also claim that when logicality of control flow is added as a predictor, it produces better results as compared to human judgment but somehow these predictors are expensive to obtain.

F. Buse and Westley (2008)
Buse and Westley [2] perform a detailed empirical study to calculate readability of code. For this they have chosen 100 snippets and around 120 annotators that grade these snippets. The biggest issue in this research is that authors have used 19 parameters including line length, identifiers, identifier length, indentation, keywords, numbers, comments, periods, commas, spaces, parenthesis, arithmetic operators, comparison operators, assignment, branches, loops, blank lines, occurrences of any character and occurrences of any single identifier, which are difficult to calculate. From these parameters authors have constructed automated readability measurement and proved that it will be 80% more effective than human judgment. Furthermore, he discusses that how readability has potential for improving programming language design with respect to software quality. Authors also suggest to decrease the parameters for readability analysis and sets this as future work for their research.

G. Relf (2005)
Relf [6] describes a practical study to show whether coder increases the readability of his programs if he gets support from source code editor that provides vibrant responses on his identifier naming practices. Software coder should adopt a standard for software interface to gain benefits. This paper is useful for both student and professional software coder for maintaining the code and significant for the improvement of code readability. Author uses only one parameter for code readability that is identifier naming practices.

H. Daryl, Hindle, and Devanbu (2011)
Daryl et al. [13] propose to use entropy for predictive modeling approach. Authors study that whether size of the code impact the readability of the code or not. They have used six parameters including mathematical equations, average number of comments, and maximum indention, maximum word, maximum line length and maximum occurrence character in the code snippets. Author also used Halstead's metrics to find the size of code on the mean readability. For mean readability total number of operators and operands are combined and formulate the Halstead's metrics. For measuring the Entropy total number of tokens and unique token is counted. Also Entropy model improves the performance in term of prediction and readability but byte entropy does not improve the prediction.

IV. FRAMEWORK FOR COMPARATIVE ANALYSIS
In this section we present the framework we have proposed for performing the comparative analysis among the selected three programming languages (Java, C#, and C++).
The main objective of our work is to compare the readability of three of the top five most popular programming languages. In proposed framework we compare the human judgment with readability index: ARI (Automated Readability Index), SMOG, FOG, and FKG. The framework is presented in Fig. 2.
To perform comparison first we have to find the programing parameters that can affect the readability of source code. For this we select the constructs from the research work of Buse and Westley [2]. Second step is to compute the effect of these constructs on the readability of Java, C# and C++ languages. To calculate the effect, we have selected code snippets of Java, C# and C++ languages. After snippets selection, online survey is conducted, in which expert opinion is obtained and results are obtained for every selected programming construct. Selected snippets are measured with different readability indexes. Text readability indexes include ARI (Automated Readability index), SMOG Fog, and Flesch Kincaid Grade level, while the code readability includes Halstead's complexity. The effect of readability by each construct is then calculated with these readability indexes.

A. Selection of Readability Parameters
Readability of code is normally linked with comments and naming standards and also called the important factor that impact readability but there are some other aspects the affect the readability. Number of parameters are used in coding that make the code possible and easy to build. There are number of parameters that we find in the literature [2] out of those we have chosen 14 to conduct this comparative study. Table  II presents this list of selected parameters.

B. Selection of Code Snippets
A small section of source code or text is called code snippet. Normally they are defined in effective unit of large programs model. In the readability model, first we select the code snippets of Java, C++ and C#. As we know snippets are the small portion of source code, thus we select a small human readable codes that are neither too short nor too long. Each snippet contains a parameter to check their readability impact of that we have discuss earlier. Snippet does not include comments, header functions, and blank lines because they are not meaningful. Secondly code snippets should be logically clear to respondent so, he/she can easily read them. Finally, these snippets are given to the annotators (explain functionality of codes). The ratings for the code snippets are assigned from 1 to 5 where 4 and 5 mean that code is more readable and rank 1 and 2 mean that code is less readable and rank 3 is for average. To perform the online survey, we have used Google Forms and Excel sheets. Respondent can choose one rank (1 to 5) against each language.

V. COMPARATIVE ANALYSIS
In this section we perform detailed comparative analysis using the proposed framework presented in the previous section. First we present the details and results of the Survey that we have conducted with the help of programmers of different skill and level.

A. Survey
As mention earlier a set of snippets are selected for human judgment for estimating readability. In Table II, we have presented 14 language constructs that we have chosen to compare the readability of selected programming languages. For every construct we have prepared 6 to 7 pieces of codes for all three languages. Then they are presented to 100 programmers including IT professionals, Programmers, and Computer Science Students. According to their judgment they have ranked snippets. Participant have to rank each snippet from 1 to 5 where 1 is less readable and 5 is more readable.
Each snippet contains a parameter that affects the code readability. And against each parameter participant rank the code readability. Each participant was given the same questionnaires using Google Forms. To improve the visibility results of the survey are presented in bar-chart form in Fig. 3.
We can notice that as per the experts, code snippets written in Java are more readable for almost every selected programming construct. The results also show that C# performs better for two language constructs including DO-While and For Loop is more readable.

B. Code Readability Index
We have computed code readability index for all the selected code snippets against all the selected language constructs using Halstead's metric. Halstead's metric proposed by Maurice Howard Halstead is used to measure the complexity of a program. It depends upon the actual implementation of program which is computed from some operators and operands. It can also computes words size, errors and testing time for C++, C# and Java codes. The In order to apply the above mentioned metrics on the code snippets, we have developed a source code readability tool (SCRT). SCRT calculates the vocabulary of code, size, volume efforts, errors, testing time and difficulty of the code for all the programs. After calculating these different metrics we have presented the results in upcoming tables including Table III,  Table IV, and Table V for Java, C#, and C++, respectively.
After obtaining the results of Halstead's matrices, we have plotted one of the aspects, which is "difficulty" with the help of a line chart to compare the results of all three languages. The results in Fig. 4 clearly show that C++ programs are more difficult to read and understand as compared to the programs written in Java or C#. Mostly Java seems to be less difficult among all the languages in nearly all the language constructs except for comparison operator and arithmetic expressions.

C. Text Readability Index
Now we calculate the readability of the code with various different text readability indexes. There are many metrics available for the same and among those we have chosen some most popular metrics listed below. After that we have applied them on all the selected code snippets of all three programming languages. The results are presented in Table  VI, Table VII, and Table VIII. Before presenting the results, below are the metrics that we have applied to calculate text readability indexes: The obtained results after computing text readability indexes, are plotted with the help of bar-chart. Fig. 5 shows the results for all three programming languages against all programming constructs. The results again show that Java language codes are more readable as compare to C# and C++. But in some constructs such as comparison operators, arithmetic equations and scope C# is more readable as per text readability index.

VI. STATISTICAL ANALYSIS
In this section we present statistical analysis that we have performed on the results obtained after experiments. For this we have chosen T-test for the same. The T-test is used to compare the two sample means. Where one sample means can be paired with other sample mean observation. In paired T-Test each entity is measured twice, result will be given in pairs. Table IX presents Halstead's arithmetic mean, standard deviation and standard mean error are given for all three languages (Java, C#, C++). Table X presents paired correlation between Java and Halstead index, C# and Halstead index and C++ and Halstead index of C++. Where correlation r ¿ 0.50 shows the strong relationship and r ¡ 0.50 shows the weak positive relationship. The statistical results show that Java programming language has been found being more readable as compared to other programming languages.

VII. CONCLUSION
Code readability influences maintenance of a software at great deal. Due to its salient importance, we have conducted a comparative study to estimate readability of the codes produced by Java, C#, and C++ programming languages. We identify important language constructs that affect the code readability and then propose a novel framework to compare the codes using three different dimensions. First we have performed an expert survey involving programmers and experts to judge the readability of codes. Then we have applied code readability and text readability indexes to again calculate readability of the same programs. We have computed these indexes using a source code readability tool (SCRT). The experiment results show that Java language produces more readable code as compared to C# and C++. Only for a few language constructs like comparison operators and arithmetic operator. We have also statistically analyzed the results using SPSS tool to verify the effectiveness of experiments. This analysis also verifies that Java language code is more readable than C# and C++.

VIII. FUTURE WORK
In future we are planning to extend our analysis on other famous languages also including Python and VB.NET. Other than these, we are also looking to conduct an analysis on programming languages that are used specifically for mobile application development.