The Codespector Tool 1 The Idea of the Tool

for Increasing Source Code Quality Mariusz JADACH and Bogumiáa HNATKOWSKA

3. The Codespector Tool 1 The Idea of the Tool

The Codespector tool presented in the paper is an application for analyzing a Java source file. The analysis is done in order to check conforming to some coding standards. These standards are described as a set of rules (ruleset) which is stored in a text file, called rules file. Rules are written in a special language, called Code Rules Language (CRL).

The collection of the built-in coding rules is usually available in the most of the existing tools for checking conforming to coding standards. These rules may be used to build an own programming style. Creating own rules is also possible in many of the tools [4], [7], [8]. It is a very useful and powerful function. That is why the basic aim of the proposed tool is to enable the user to express his or her coding rules in easy, consistent and comfortable way.

Creating own rules in many of the existing tools may be quite sophisticated or boring task. Providing a special, external class definition with some implemented methods is the common way of defining the user’s rule. The knowledge about the API of the analyzing source code and about some programming language is of course required. What is worse, some modifications of the textual configuration file (usually special XML file) are also necessary to add such a custom rule (the class) to the set of standard’s rules. The syntax and semantics of the file elements must be known. It may be noticed, that this approach results in “mixing” of the different languages. Besides, setting some classpaths or putting the external classes into a special folder may be required for proper linking the user’s rules.

The solution may be the CRL language. The whole definition of the rule is written and saved in the rules file. One ruleset is contained within one file. The idea of the CRL language is similar to the Refine language, which was described in previous chapter (ESQMS tool). But the CRL was intent to have more clear and more simple syntax.

The rules files in the CRL language are used by the tool as the entry data. The analyzed Java source code is the second entry data for the tool. Scanning and parsing the source file is the first phase of the source code analysis.

The analyzed representation of the code must be completely compatible with the CRL language. The code structure is stated by the tree of parts and the list of tokens, which are connected with each other. The list of tokens (equivalent to lexemes) is the result of scanning (lexical analysis) the source code. The code part is a higher level fragment of the code. It is a syntax unit produced by the parser (syntax analysis).

Every item of the code structure (token or part) is visited and analyzed during the analysis (exactly once). Each of them is potentially a triggering event, which may trigger checking the associated rules. The active ruleset is searched in order to find such a corresponding rule. Exactly one triggering event is defined for every rule. The triggering event is simply the name of the code fragment’s type (e.g. keyword, class, method etc.). The visited code fragment – triggering event is the current context of the analysis. This is used for checking the fulfilling of the rule requirements. A variety of rules may be defined for one triggering event.

The validation report is the main result of the Codespector’s analysis. The information about found violations of coding standards is included in the report and may be inspected by the user.

3.2. The CRL Language

CRL is a declarative and specialized language used to describe the ruleset in a clear and easy way. One ruleset is included in one ASCII text file, which may be created or edited by the user. The structure of this file is strictly defined. It is briefly presented below.

The descriptive information about the ruleset is included at the beginning of the file. It is stated by the values of the five properties, which are written as name=value pairs. These properties are: name of the ruleset, name of the ruleset’s author, file version, compatibility with the tool version and textual description of the file content.

The particular rules are defined in the rest part of the file. One or more rule definitions may be contained within the file. Each rule definition is constructed by four sections. Each section has a name (CRL keyword(s)) and a body. The body of the section is always included between the pair of the brace brackets.

The first section (rule section) begins with the keyword rule. Only the name of the rule is contained within the section body.

The second section is the for section. It specifies a triggering event for the rule. The optional entry conditions of the rule may be included in the section body. The entry conditions define the conditions that must be met for further analysis of the current rule. These conditions may be used as a kind of filter for analyzed code fragments.

The body of for section consists of boolean condition expressions. The expressions are separated by semicolon within the section. The results of the particular expressions are connected by logical operator and, giving the final logical result of the all section conditions. An individual condition expression may be composed using many nested expressions, which are connected by logical operators (and, or, not). The nested expression must be included within the pair of round brackets.

The unnested condition expression has the form of built-in condition or external condition. The built-in condition is the comparison of two sides, which represent some values. All values in the CRL are objects. The comparison operators are: =, <>, <, >, <=, >=, in. The set of correct operators in the current context is determined by the types of values on the two sides of the comparison. Six data types are defined within the CRL. The simple types are: boolean (logical), text (any quoted strings) and number (integral and floating). The values of these types may be directly used (written) in the ruleset file. The complex types in the CRL are: set (a set of text or number values

only), token and part. The last two types are used for representing the fragments of the

analyzed code and may not be used directly as a written value. The special value null, which is relevant to empty object, is defined for all types.

The function list is always the left side of the comparison. The current context of the rule analysis is represented by the keyword this that is always put at the beginning of the list. The current context is the syntax unit of the code – the triggering event of the rule. Many subsequent functions’ invocations may occur following the keyword this. They are separated by the dot symbol, which means the reference to object’s (value’s) function, analogous to object-oriented programming languages. Each of the functions in the function list is invoked for the object – the result of invoking the previous (on the left) function. Of course, the type of the first value in the list (keyword this) is part or token. The result of the last function’s invocation in the list is used for comparison as the final value of the whole list. The right side of the comparison may be another function list or some direct written value.

The set of functions is part of the CRL language specification. Each function is invoked for objects of one of CRL type. The functions are identified by their names and must return a value of the CRL type. Optional function’s arguments may be also defined. The part and token types are specific. Their current sets of available functions are additionally dependent on current category of the code fragment (e.g. a value of the type token may be included in the category keyword, operator, etc.). An error is generated on attempt to invoke the function, which does not exist or is not available for the object. Similarly, invoking any function for the null object is not allowed too.

There is also possibility to write some user’s conditions in Java language as the external classes, which may be linked to rules in the ruleset file. The external condition is used for this purpose. The CRL keyword externalClass may be put as the part of the condition expression, associated with the fully-qualified name of the external class. The special Java interface must be implemented by the class in order to call interface’s method during conditions evaluation. The result of this method is logical and indicates whether the external condition is fulfilled.

The third section of the rule is the check section. The check conditions are defined within it. These conditions are used to verify the conformance of the current context (code fragment) to the rule. All of the conditions within the section must be met in order to assure that the analyzed fragment meets the rule requirements. The syntax and the semantics of the check conditions are analogous to entry conditions. At least one condition expression must be contained within the check section.

The last part of the rule is the action section. The message about the rule violation (message) and the rule severity (kind) is included there. The three levels of the severity are defined: note, warning and error. The message and the rule severity are used to generating the validation report. Optionally, the action section may define some external action in the form of invoking special method of the provided, external Java class. The external action is executed when the code rule is not fulfilled.

The traditional comments in C style (/* …*/) are supported in the CRL language. They may be put anywhere in the file. The language is case sensitive.

Let us present two simple examples of defining code rules using the CRL language. The content of the example ruleset file is written below.

plugin = “1.0”; name = “Test file”; version = “1.0”;

author = “Mariusz Jadach”;

description = “It is just an example”; rule /* Rule 1 */

{

name = “Rule 1: using magical numbers”; } for integer { this.getAncestorLevel(“field”) = null; } check { this.getTokenText in { “0”, “1” }; } action

{

message = “You should always declare numbers other than 0 or 1 as constant values”; kind = note;

}

rule /* rule 2 */ {

name = “Rule 2: block is obligatory”; } for if { } check {

(this.getThenPart.getPartType = “block” and (this.getElsePart = null

or this.getElsePart.getPartType = “block”)); }

action {

message = “Instructions within if should be always written in a block”; kind = error;

}

Presented Rule 1 checks if the integral values, different than 0 or 1 are written explicite in the code. Using many “magical” numbers in the source code is not good practice (floating-point numbers too) [9]. Modifications of them may be error-prone and troublesome. Every number should be used as constant or variable.

The triggering event of the rule is integer. The rule is checked when some integral number is spotted in the source code. The condition in the for section checks whether the analyzed number is a part of the class field declaration (as field’s initial value). In such a case the number may appear in the code. The getAncestorLevel function with argument “field” returns the number of levels in the parts tree between the integer token and closest field part. When the result of the function is null, the token does not have an ancestor of the category field.

Only one condition expression is included in the check section. This is the unnested expression. The function list on the left side of the comparison is very simple. The invocation of the function getTokenText is performed for the current context value. The result of the invocation (of text type) is compared with the right side of the com- parison using the operator in. The in operator returns true if the left side is a member of the set, defined on the right side. The set of two text values is on the right side. The rule is violated when the text of analyzed number (token) is different from 0 or 1.

The second rule verifies whether the statements within if instruction are included in the block (compound statement). Putting statements in the block may help in adding new statements without introducing errors [3].

The triggering event of the rule is if, because if statements in the code must be checked. The for section is empty, so each if statement is further analyzed in the check section.

The condition expression within the check section is nested. The two partial expressions are connected by and operator. The first expression checks whether the

statement executing in the normal case is compound statement. The code fragment, corresponding to the statement running in the normal case may be obtained by invoking the CRL function getThenPart. The function getPartType is used to getting the name of the fragment’s category – it must be “block”.

The second expression checks whether the else action of the if statement is also a compound statement. The analogous method of evaluation is used as in the first con- dition expression. Existing of the else fragment is obviously necessary. That is why the additional checking for the null value is present.

3.3. The Project of the Tool

The Codespector tool is designed as plug-in for the Eclipse environment (very popular IDE). The architecture of the Codespector tool is briefly described in this section. The architectural project is mainly determined by the functionalities and the properties of the proposed tool.

The four architectural parts (modules) of the program are identified. They are presented in the Figure 1. The three of them are functional modules: compilation mo- dule, structure module and validation module. These modules are also called the core of the tool. The user interface module is responsible for interaction between the core and its environment. Compilation of the input CRL rules file is the main role of the compilation module. Syntax errors and some semantic errors are detected by the compiler. This compiler is automatically generated by the Grammatica tool (parsers generator). The abstract syntax tree of ruleset file is created during the process. The tree is used to construct the internal representation of the ruleset.

Figure 1. The architecture of the Codespector tool

The structure module is used to transform the input source code to corresponding, internal representation, which is compatible with the CRL language. The Java tokenizer (lexical scanner) included in Eclipse environment JDT (Java Development Environ- ment) is used within the module. The JDT classes are also used to get the AST of the analyzed source code. The resulting syntax tree and the list of tokens must be transformed to get the parts’ tree and the list of tokens, which are compatible with the CRL. Then they are merged in order to build the one consistent code structure.

The main functionality of the tool is realized by the validation module. The input data of this process are: the internal representation of the compiled CRL ruleset file and the internal representation of the analyzed source code (structure of the code). The visitor design pattern is used to browse the code structure and visit each token and code part. The set of corresponding rules is found for every such a code fragment. These rules are applied to the current analyzed fragment in order to check its conformance to the rules. Any violations of the rules are saved by adding some adequate information to the validation report.

The user interface of the tool is designed to interact with the plug-in user and with the plug-in environment (Eclipse IDE). The current user interface is Eclipse specific, but it may be changed into other user interface (e.g. command line UI). The core of the tool is completely independent from the used particular user interface.

The Codespector tool is already implemented. On the Figure 2 an example validation report is presented. The report lists violations of rules defined in the uploaded rulset file along with the information of their type (i.e. error, warning, note). A user may navigate in an easy way to the code that violates a given rule.

3.4. Verification of the Tool

The functions of the Codespector were verified by manual testing. For that purpose a file with 15 different code rules relating to real recommendations of good style of programming was prepared. This ruleset file was used for analysis of 5, arbitrary chosen java files. All the tested files come from different open-source projects. The files have different lengths. They were:

jdbcDatabaseMetaData.java – the file from the relational database system HSQLDB (http://hsqldb.sourceforge.net)

CppCodeGenerator.java – the file from the ANTLR project – a parser generator (http://www.antlr.org)

WorkbenchWindow.java – the file from the Eclipse project

Emitter.java – the file from Jflex tool – a free parser generator (http://jflex.de)

ButtonDemo.java – the file from an exemplanatory program, presenting the usage of the Swing library (standard distribution of JDK 1.4.1)

TestClass.java – the file, prepared by the author

The number of all violations, and the number of violations of a given type (i.e. error, warning, note) for particular test files are summarized in Table 1. The Table 2 presents the number of violations of each rule for different test files.

Table 1. The number of all violations and violations of a given type

Violations of a given type

File name All violations

error warning note

jdbcDatabaseMetaData.java 80 0 10 670 CppCodeGenerator.java 3483 134 77 3272 WorkbenchWindow.java 1565 62 58 1445 Emitter.java 1459 90 69 1300 ButtonDemo.java 865 7 18 840 TestClass.java 56 2 12 42

How to foresee easily, the file prepared by the author violated every rule of code. The most often violated rule was the rule number 8. This rule concerns applying spaces between parentheses and the content. The rule is controversial and rarely used.

The code rules, presented in details in the previous section are described also in Table 2 as the rule 11 (magic numbers), and the rule 7 (if statement).

4. Summary

The quality of the source code determines the total quality of the final software product. Using programming style is a good practical approach to improve the code quality. The process of checking conformity of the code to the programming style can be automated by different tools.

Table 2. The number of violations of particular rules for different test files Rule numer File name 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 jdbcDatabaseMetaData.java 1 1 0 1 0 0 0 666 0 0 2 0 7 2 0 CppCodeGenerator.java 3 16 0 12 1 20 134 3157 24 0 66 1 44 5 0 WorkbenchWindow.java 2 17 0 3 3 0 62 1413 28 0 9 3 22 0 3 Emitter.java 0 10 0 17 2 5 90 1248 39 0 18 2 27 1 0 ButtonDemo.java 0 8 0 0 0 0 7 828 11 1 3 0 7 0 0 TestClass.java 3 5 1 2 2 6 2 15 2 1 12 1 2 1 1

The paper presents the Codespector tool, working as the plug-in to the Eclipse IDE for checking Java source files. The flexibility and simplicity of defining user’s rules, being the parts of the own code standard, were the basic motivations to design and implement it. That’s why, the most important element of the proposed solution is the CRL language. The CLR language enables the user to express many rules in an easy

In document Software Engineering - Evolution and Emerging Technologies.pdf (Page 138-146)