Automatic software measurement data collection for students

(1)

Automatic software measurement data collection for students

1. Automatic software measurement within a software engineering class

Software is invisible and complex, so it is difficult to understand the current status of the product developed and it is hard to measure progress. To solve this issue, in practice we find several solutions, one is to use software measurement.

Software measurement aims to make software visible, i.e., to measure it and to describe to developers, managers, users, etc. several interesting aspects of software so that they can decide how to move on.

Usually software measurement requires effort from the developers: they need to - analyze their code,

- fill in the effort reports manually, - analyze the collected data, etc.

This makes the developer loose time, which could’ve been spent on development itself. Also, the data entered by the developer himself or herself can be biased, so it is not 100% reliable. These problems can be solved by making the data collection process automatic.

Automatic software measurement doesn’t require a lot of effort from the developer and makes it easier to understand the characteristics of the software being developed.

Figure 1 shows what we mean: the little lamp represents our toolset. We do not want to check every brick manually, we want to get informed if there is a problem, that’s it.

Figure 1: Automatic measurement

¹

1

http://sese.hpu.edu.cn/ie/mirror/leanword.htm

(2)

Using the automatic software measurement tool in a software engineering class has the following advantages:

1. During the development of the project students have the possibility to analyze and understand their own software development process. Typical questions that can be studied are:

a. How fast is our team?

b. Where do we have the most problems?

c. How good is our code?

d. How good are our estimations? Why?

Answering these questions makes it possible to understand issues and problems within a typical software development project and allows it to avoid the same mistakes in the future.

2. Students can track their effort, automatically analyze the source code, calculate metrics, and try to improve the development process according to this information.

3. The evolution of the code can be visualized, and compared among the teams. This should lead to interesting results if comparing the evolution of code between teams that use different software development approaches, like Agile, Waterfall, Spiral, etc.

4. The toolset tracks also the different applications used during the development, this allows to study also how important Internet, E-Mail, Online help pages, etc. are for the development.

2. How the system is structured

The automatic measurement framework is a set of tools, that collects, stores and analyzes the software measurement data automatically. That means that developers can entirely focus on the task, without being interrupted with manual software measurement data collection, which is usually perceived as an annoying task, distracting developers from the development itself. The measurement software performs the collection and analysis of two types of data:

 Data about the amount of time spent working on a project, which is sent to the server continuously. This data is collected using the plug-ins for IDEs (Eclipse or Visual Studio), which are able to identify the particular method or the class of the code, that the user is working on.

The data about using other software along with IDE is also collected, so the student could see how much effort is dedicated to coding, writing the report, browsing the web for possible solutions, reading articles, etc.

 Data about the properties of the code. This data is collected by analyzing the source code in

the repository (e.g., SVN) of the project. This analysis tells the student how big, complex,

reusable the developed software is. These metrics are calculated once per a defined timeframe.

(3)

The information collected by the plug-ins and the information obtained from the source-code analysis is placed on the application server, where the further analysis of it can be performed. This analysis results in diagrams, that can be used for analysis of the work performed.

Figure 2: Overview of the collected data

3. How students interact with the system

This section lists the possible use cases of the system, and the insights that it provides to the students of software engineering. The students use the system studying the visualizations and relating them to the generated code.

Figure 3: Possible visualizations of properties of the code

Some examples of which data is collected and how it can be used by the students to understand their code better is as follows:

Database Code repository

Computer

Data about effort Data about code

(4)

Effort analysis

Input: Students develop code, the system automatically measures the time spent per method/class/namespace/file/folder/project

Output: Aggregated effort per method/class/namespace/file/folder/project

Use: Understand if a certain part of the system was very difficult to develop, if a certain part of the system is too big, if a certain part of the system is too complex, where the team is losing time, …

Source code analysis

Input: Source code

Output: CK-Metrics

²

, some examples as follows:

 Coupling between objects: the number of non-inheritance related couples with other classes. Low coupling between objects is a sign of a well-designed, easily maintainable system.

 Lack of Cohesion in Methods: shows how widely the object state variables are used for sharing data between member methods. The lower the LCOM metric value, the higher the quality of the code is. High LCOM indicates weak encapsulation.

 Lines of Code: shows the size of the whole system in lines of code.

 Weighted Methods per Class: shows the sum of weighted methods of the class.

The weight of the method is its complexity. Higher WMC metric values usually correlate with higher development, testing and maintenance efforts.

 Depth of Inheritance Tree: show the maximum number of levels in each of the class’s inheritance paths. Higher DIT correspond with greater error density and lower quality.

 Number Of Children: shows how widely a class is reused to build other classes.

The higher the value, the greater reuse of the class is. It indicates a need for increased testing. It also could indicate a misuse of subclassing.

 Response For a Class: measures the overall complexity of the calling hierarchy of the methods making up a class. Larger RFC indicates increased testing requirements.

Use: Analyze the software metrics to evaluate the quality of the code.

Combined analysis

Input: Effort + Source code metrics

2

Chidamber, Kemmerer, A metrics suite for object oriented design, IEEE Transactions on Software

Engineering, Vol. 20, No. 6, June 1994

(5)

Output: Productivity: The team can compare their effort with their output and understand their productivity.

4. Data quality

We are interested in supporting students to understand their software development process during a specific course. The collected data should reflect only their activities during software

development. Therefore:

 the student should turn the software off when not working on the project;

 the student should review the submitted data and delete records that are not related.

5. Privacy

It is not necessary to collect the data with the real names of the students, we can agree on some

naming schema. On the other hand, it would be useful to define the teams (who is in which team) so

that we can provide data on the team level.