BUILDING, VERSIONING, AND RELEASING BEYOND SELECTING A BUTTON ON THE GUI

(1)

BUILDING, VERSIONING, AND RELEASING

“BEYOND SELECTING A BUTTON ON THE GUI”

BY PANKAJ KAMTHAN

1. INTRODUCTION

This document explores certain aspects of building, versioning, and releasing program and software, which are relevant to large-scale programming and software engineering, respectively.

In particular, these aspects are important, for example, for a proper software release

engineering and, more generally, for a successful realization of practices inherent to agile methodologies and DevOps1 [Howard, 2012; Hüttermann, 2012; Kort, 2016, Chapter 12; McNutt, 2016; Karvonen, Behutiye, Oivo, Kuvaja, 2017; Kodumal, 2019; IEEE, 2021]. In that regard, the notions of automation, containerization, or

virtualization can help simplify some of these elements [Goasguen, 2016; Mouat, 2016;

Miell, Sayers, 2019; Nickoloff, Kuenzli, 2019]. However, this document, does not discuss these at any length.

There have been a number of pioneering developments in computing, of which two are of special interest.

1.1. PHENOMENON 1: THE CHANGE IN THE DIRECTION OF ATTENTION

The direction in computing has reversed over time. In the past, computing was machine-oriented, and machines were primary concern and humans were secondary concern. In the present, computing is human-oriented, and humans are primary concern and machines are secondary concern.

1

(2)

1.2. PHENOMENON 2: THE MAGNITUDE OF THE DISTANCE BETWEEN HUMAN AND MACHINE

Figure 1 shows (schematically, not to any scale) that the distance between humans (programmers and users) and machines has increased, perhaps nonlinearly, over time. This has occurred through abstraction, sometimes by necessity, and other times by choice.

Figure 1. An abstract illustration of relative positioning in the history of computing of

human and machine.

Today, build systems and configuration management systems are sophisticated, and

“mask” many of the complexities of building and configuration management,

respectively. If current surveys are any indicator, this trend of abstraction and sophistication is likely to continue in the foreseeable future.

However, there is more to building software than merely using an IDE, and there is

more to configuration management than merely installing and using a configuration management system [Summers, 2013, Section 6.2].

(3)

2. BUILDING

In essence, build engineering is the discipline of turning source code into binary executables, effectively and efficiently [Aiello, Sachs, 2011, Chapter 2].

Definition [Build] [IEEE, 2012].

(Noun) An operational version of a system or component that incorporates a specified

subset of the capabilities that the final product will provide.

(Verb) To perform the steps required to produce an instance of the product.

NOTE—In software, this means processing source files to derive target files. In hardware, this means assembling a physical object.

The main purpose of a build system is to translate human-readable source code into an

executable program.

In addition, a build system supports related activities, such as the generation of

documentation (such as, in PDF or HTML) and the automatic analysis of source code

(to find and report bugs).

In fact, a build system can handle any activity in which output files are created from input files. This includes removing files and copying of files from one place to another.

REMARKS

In some ways, a build reflects, but goes beyond, a high-fidelity prototype, and a build system reflects, but goes beyond, a compiler.

2.1. “UNDER THE HOOD”: TAKE ONE

The make program, available on certain operating systems, including UNIX, is intended to automate the manual and mundane aspects of transforming source code into an

executable. EXAMPLE

(4)

1 SRCS = add.c calculate.c divide.c multiply.c subtract.c 2 OBJS = $(SRCS:.c=.o) 3 PROG = calculator 4 CC = gcc 5 CFLAGS = -g 6 INSTALL_ROOT = /usr/local 7 8 $(PROG): $(OBJS) 9 $(CC) $(CFLAGS) -o $@ $^ 10 11 $(OBJS): numbers.h 12 13 clean: 14 rm -f $(OBJS) $(PROG) 15 16 install: $(PROG) 17 cp $(PROG) $(INSTALL_ROOT)/bin

Listing 1. An example of a make file.

 Line 1: This defines the SRCS variable to include the full list of source files in the program.

 Line 2: This replaces .c with .o in each file’s name in the list of source files. OBJS is therefore defined as the complete list of object files.

 Line 3: This defines the name of the executable program.

 Line 4: This defines the name of the compilation tool.

 Line 5: This sets the CFLAGS variable (that is, –g flag) to enable debugging

information.

 Line 6: This specifies a directory of the target machine.

(5)

 Line 9: This states that the $@ refers to the files mentioned on the left side of the rule (namely, calculator), and $^ refers to the files listed on the right side of the rule (namely, all the object files).

 Line 11: This states that all object files depend on numbers.h.

 Line 14: This provides a directive to remove any generated files that were created while compiling the software.

 Line 16: This ensures that the entire calculator program is brought up-to-date

automatically.

 Line 17: This provides a directive to copy the executable program into the directory specified by the INSTALL_ROOT variable.

2.2. THE ARCHITECTURE OF A BUILD SYSTEM FOR COMPILED PROGRAMMING LANGUAGES

Figure 2 shows a high-level view of a typical build system for compiled languages.

Figure 2. The architecture of a build system for compiled programming languages.

(Source: [Smith, 2011, Chapter 1].)

In this model, source files are compiled into object files, which are then linked into code libraries or executable programs. The resulting files are collected into a release package that can be installed on a target machine.

(6)

 Version-Control Tool: This is a tool that stores the program’s source code and enables multiple developers to make concurrent changes to the code base. It also facilitates the retrieval of historical versions of the code.

 Source Trees and Object Trees: This is the set of source files and compiled object files that a particular software engineer works with. The software engineers can make their own private changes in these trees, without impacting the work of others, assuming that the corresponding files are human-readable and human-editable.

(There are several possible intermediate representations, including, but not limited to, object code. An intermediate representation is expressed in an intermediate

language. For example, register transfer language (RTL) is an intermediate

language used by certain C compilers, and leads to intermediate representations that are indeed human-readable and human-editable.)

 Compilation Tools: This is the set of tools that take input files and generate output files. In doing so, these tools, for example, may convert source code files into object code and executable programs. For example, C compiler is a compilation tool. (These tools may also include documentation generators and unit test generators.)

 Build Machines: This is the computing equipment on which the compilation tools are executed.

 Release Packaging and Target Machines: This is the method by which the software is packaged, distributed to end users, and then installed on the target machine.

2.3. THE ARCHITECTURE OF A BUILD SYSTEM FOR INTERPRETED PROGRAMMING LANGUAGES

Figure 3 shows the high-level view of a typical build system for interpreted languages.

(7)

Figure 3. The overview of the architecture of a build system for interpreted programming

languages. (Source: [Smith, 2011, Chapter 1].)

In this model, the source files are collected into a release package, ready to be installed on the target machine. If compilation tools are required in this type of build system, then the focus of such tools is on transforming source files and storing them in the release package. The compilation into machine code is not performed at build time, even though it may happen at runtime.

2.4. THE ARCHITECTURE OF A BUILD SYSTEM FOR ENTERPRISE WEB APPLICATIONS

Figure 4 shows the high-level view of a typical build system for Enterprise Web Applications.

Figure 4. The overview of the architecture of a build system for Web Applications.

(8)

The build system for a Web Application is a mix of compiled code, interpreted code, and configuration or data files. In this case, some files (such as HTML or XML files) are copied directly from the source tree to the release package, whereas others (such as Java source files) are first compiled into object code. In addition, both the Web Application server and the end user’s Web user agent play a role in interpreting or compiling code, as necessary, but that is beyond the scope of this build system.

In general, a typical Enterprise Web Application deals with several file types, such as the following:

 HTML or XML files, containing markup documents to be displayed in a Web user agent. These files are copied directly to the release package.

 JavaScript files containing source code to be interpreted by an end user’s Web user agent. These files are also copied directly to the release package.

 JSP, ASP, or PHP files, containing a mix of HTML or XML and executable source code. These files are compiled and executed by the Web Application server rather than by the build system. These files are also copied to the release package, ready for installation on the Web server.

 Java source code files to be compiled into object code and packaged as part of the Web Application [Clark, 2004]. The build system performs this transformation

before packaging the Java class files. The Java classes are executed on the Web

Application server or, sometimes, even within the Web user agent (using a Java applet).

2.5. OTHER

There are build systems for other, specialized, cases, such as unit tests.

(9)

Definition [Configuration Management] [IEEE, 2012].

(1) A discipline applying technical and administrative direction and surveillance to:

identify and document the functional and physical characteristics of a configuration item, control changes to those characteristics, record and report

change processing and implementation status, and verify compliance with specified requirements.

(2) [A collection of] technical and organizational activities comprising configuration identification, control, status accounting, and auditing.

REMARKS

It has been pointed out [Hass, 2002] that configuration, “to form from or after”, derives from the Latin “com”, meaning “with” or “together”, and “figurare”, “to form”. It also means “a relative arrangement of parts or elements”. Thus, configuration management refers to managing a relative arrangement of parts or elements.

3.1. HISTORY OF CONFIGURATION MANAGEMENT

The origins of configuration management go back to late 1950s as a technical management discipline for hardware material items [Moreira, 2010, Section 2.1; Leon, 2015, Chapter 1; Christensen, 2010, Chapter 33]. It became a technical discipline in its own right in the late 1960s when the United States Department of Defense (DoD) developed a series of military standards called the “480 series” (that is, MIL-STD-480, MIL-STD-481, and MIL-STD-483).

3.2. MOTIVATION FOR CONFIGURATION MANAGEMENT

There are (at least) the following problems that plague current software development (and which configuration management aims to address and solve, at least to some extent) [Sink, 2011, Chapter 1; Leon, 2015, Chapter 3; Quigley, Robertson, 2015, Chapter 7]:  Communication Breakdown Problem

 Shared Resource Problem  Multiple Maintenance Problem  Simultaneous Update Problem

3.2.1. COMMUNICATION BREAKDOWN PROBLEM

(10)

If there are two people working on the same project, then there are two communicators

and two listeners with four communication paths.

This changes nonlinearly as the number of people involved increases (that is, the team size grows). In general, if there are n people working on the same project, then there are

4·C(n, 2) = 2n(n – 1)

communication paths.

The potential for communication breakdown increases not only because of a steep rise in the number of communication paths, but also because of the problem of interpretative

cognition.

For the sake of this document, interpretative cognition is a part of the process that occurs when two or more people communicate with one another. It is a measure of how much

of a person’s communication is understood by the other person(s).

(a) Communicators = 2, Listeners = 2, Communication Paths = 4.

(11)

(c) Communicators = 4, Listeners = 4, Communication Paths = 24.

Figure 5. An illustration of the communication breakdown problem. (Source: [Leon,

2015, Chapter 3].)

3.2.2. SHARED RESOURCE PROBLEM

This problem is illustrated in Figure 6. This problem occurs in any environment where

two or more programmers share a common resource.

For example, the resource could be some function or data. The problem arises if one

programmer makes a change to any of the shared resources, and the other programmers are not aware of the change.

For instance, this can occur if the programmer or the technical environment does not inform the other programmers of the change. Having every programmer with his or her

own personal copy of the resource alleviates, but does not solve, the problem.

(12)

(b)

Figure 6. An illustration of (a) the shared resource problem, and (b) a partial solution.

(Source: [Leon, 2015, Chapter 3].)

Board Time!

Explain why having every programmer with his or her own personal copy of the resource does not solve the shared resource problem.

3.2.3. MULTIPLE MAINTENANCE PROBLEM

This problem is illustrated in Figure 7. This problem is a variation of the shared data

problem. It occurs when there are multiple copies of the shared resources in the system.

(13)

(b)

Figure 7. An illustration of (a) the multiple maintenance problem, and (b) a partial

solution. (Source: [Leon, 2015, Chapter 3].)

3.2.4. SIMULTANEOUS UPDATE PROBLEM

This problem is illustrated in Figure 8. The following scenarios illustrate this problem.

There are two programmers, A and B.

Programmer A has found a bug and has fixed it. Programmer A copies the bug-fixed version to the repository, thus overwriting the existing copy. There are two possibilities:

1. Programmer B has found the same bug and has fixed it. Programmer B is not aware of the fact that Programmer A has found the bug, and is fixing it or has already fixed it. Programmer B also fixes the bug and then copies the function to the repository, thus overwriting the copy that was created by Programmer A. Thus, the work that was done by Programmer A to fix the bug is lost.

2. Programmer B has found a different bug and has fixed it. However, depending on which programmer updates the repository copy last, the other programmer’s work is

lost. In any case, both bug fixes are necessary and need to be incorporated into

(14)

Figure 8. An illustration of the simultaneous update problem. (Source: [Leon, 2015,

Chapter 3].)

3.3. CONFIGURATION ITEM AND BASELINE

Definition [Configuration Item] [IEEE, 2012]. [The] aggregation of work products that

is designated for configuration management and treated as a single entity in the configuration management process.

Definition [Software Item] [IEEE, 2012].

(1) It can be source code, object code, control code, control data, or a collection of these items.

(2) It can be an aggregation of software, such as a computer program or database, that satisfies an end use function and is designated for specification, qualification testing, interfacing, configuration management, or other purposes.

(3) It can be identifiable part of a software product.

(15)

Definition [Baseline] [IEEE, 2012].

(1) [The] specification or product that has been formally reviewed and agreed upon, that thereafter serves as the basis for further development, and that can be changed

only through formal change control procedures.

(2) [A] formally approved version of a configuration item, regardless of media, formally designated and fixed at a specific time during the configuration item’s life cycle.

NOTE—A software baseline is a set (one or more) of software configuration items formally designated and fixed at a specific time during the software life cycle. A

baseline, together with all approved changes to the baseline, represents the current approved configuration. The term is thus used to refer to a particular version of a software configuration item that has been agreed on, for example, as

a stable base for further development or to mark a specific project milestone. In either case, any new baseline is agreed through the project’s agreed change control procedures.

The baselines are important because they serve as reference points.

There can be different kinds of baselines, such as development baseline, test baseline,

deployment baseline, and release baseline.

3.3.1. ON THE CONFIGURATION ITEMS IN A BASELINE

The configuration items to be included in a baseline depend on the purpose for which the baseline is created. For example, a release baseline will only contain those

configuration items that are to be delivered to the customer.

It is important to note that a baseline does not represent simply any collection of configuration items; rather, it is a particular grouping of configuration items, at a

specific point in time, for a given purpose. EXAMPLE

For example, just because an item has been checked-in does not automatically imply that it should (or will) become part of a build. It may, for example, simply be a part of

(16)

3.3.2. TIMELINE OF A BASELINE

It is important to establish a baseline as early as possible, but a baseline should not be rushed.

EXAMPLE

For example, consider a programmer developing a program. The programmer has completed coding, but, during unit testing, discovers a better algorithm to accomplish some aspect of the program. If the source code is not part of a baseline, the programmer can make the necessary change, and continue with the testing.

However, if the source code has already become part of a baseline, then the programmer will need to make a change request and then follow the change management process to make the necessary change.

3.4. SIGNIFICANCE OF SOFTWARE CONFIGURATION MANAGEMENT

The reasons for having configuration management support in software projects include the following [Leon, 2015, Chapter 5]:

 Improved Software Development Productivity. For example, proper configuration management facilitates communication among software engineers, enables sharing source code, reduces the potential of duplicated effort, and provides a solution to the

“simultaneous update problem”.

 Improved Maintainability. For example, proper configuration management assists corrective, preventive, adaptive, and/or perfective maintenance. At the center of any maintenance is change, and management of such change: what was changed exactly, why there was a change, when the change occurred, and who was involved in the change.

 Improved Security. For example, proper configuration management assures copies of the original, which can be recovered in case the source code gets corrupted,

(17)

3.4.1. SIGNIFICANCE OF SOFTWARE CONFIGURATION MANAGEMENT FOR AGILE METHODOLOGIES

There is an intimate relationship between software configuration management and agile methodologies [Appleton, Berczuk, Cowham, 2006; Moreira, 2010; Humble, Farley, 2011; Moran, 2015, Chapter 9].

For example, many of the agile practices, including “Continuous Integration”,

“Embracing Change”, and “Release Early, Release Often”, can be realized effectively only when proper software configuration management is in place [Meyer, 2014; Bryant,

Marín-Pérez, 2019].

4. VERSIONING

For the sake of this document, versioning is one of the practices of configuration management [Leon, 2015].

The idea of versioning goes beyond software engineering. For example, certain books have multiple editions and models of cars have multiple years. The latest edition is supposed to be an ‘improvement’ over previous editions.

It has been said that if there is any constant in software engineering, then that constant is

change.

Definition [Change]. A change (or, equivalently, diff or delta) represents a specific

modification to an item under version control. The granularity of the modification that is considered a change varies across version control systems.

Figure 9 illustrates the concept of deltas [Leon, 2015, Chapter 5].

The use of deltas is a space-time trade-off.

The advantage of a delta is that, instead of storing complete copies of all versions, one version and the deltas are stored. The deltas are smaller than the source code of

a system version, so the amount of disk space required for version management is reduced significantly. Then, the required version can be derived at any point in time by applying the relevant deltas to the base version.

The disadvantage of a delta is retrieval time. This is especially the case with forward

(18)

Figure 9. An illustrative use of deltas. 4.1. FORWARD DELTA AND REVERSE DELTA

(19)

In the case of forward delta storage, a complete copy of the original file is kept. If a new version is checked-in, the two versions are compared and a delta is created. This

delta is stored, instead of storing the complete copy of the new version. If the new

version is required, the delta is applied to the original file to get the new version.

In the case of reverse delta storage, a complete copy of only the most recent version

of the file is kept. If a new version is checked-in, it is compared to the previous version

and the delta is created. Then, the previous version is deleted, and the new version is stored.

4.2. ‘INSIDE’ DELTA

A delta in and of itself is less important, the contents of a delta and the size of a delta are more important.

Definition [Change List]. There are version control systems that allow only atomic

multi-change commits. A change list, update, or patch identifies the set of changes made in a single commit.

Definition [Version] [IEEE, 2012].

(1) An initial release or re-release of a computer software configuration item, associated with a complete compilation or recompilation of the computer software configuration item.

(2) An initial release or complete re-release of a document, as opposed to a revision resulting from issuing change pages to a previous release.

Definition [Version] [ISO/IEC, 2005]. [It is] a state of an evolving item.

A version of a software item is a particular identified and specified item [ISO/IEC, 2005].

4.3. VERSION VERSUS REVISION VERSUS VARIANT

Definition [Revision] [ISO/IEC, 2005]. [It is] a new version of an item that is intended

to replace the old version of the item.

Definition [Variant] [ISO/IEC, 2005]. [It is] a new version of an item that will be added

(20)

EXAMPLE

In some cases, multiple incarnations of the same software system can be functionally

equivalent, but may be designed for different hardware or software environments. In

such cases, these systems are variants rather than different versions.

For example, two different instances of the same item—say, one for Microsoft Windows and the other for Linux—are variants, rather than different versions.

In general, unlike versions, variants are not comparable [Reussner, Goedicke, Hasselbring, Vogel-Heuser, Keim, Märtin, 2019, Section 2.2.4]. For example, unlike a version, one variant of an item is in itself not an improvement on another variant.

Definition [Versioning] [IEEE, 2012]. The assignment of either unique version names or unique version numbers to unique states of software configuration items, usually

for a specific purpose, such as a release of the software product to an external group or the identification of a specific baseline.

4.4. MOTIVATION FOR VERSION CONTROL

The significance of version control has been underscored in Put Everything Under

Version Control and Know Your Next Commit, among the 97 Things Every Programmer Should Know [Henney, 2010], and Keep Everything in Version Control, one of the principles of software delivery [Humble, Farley, 2011].

There are a number of reasons for version control [Humble, Farley, 2011; Kemper, Oxley, 2012, Chapter 4; Leon, 2015, Chapter 5; Scott, 2017, Chapter 2; Visser, 2017, Chapter 4; Lemaire, 2021, Chapter 3]:

 To Support Individual As Well As Communal Work. It is understood that any non-trivial software development requires (not only individual, but also) communal work. To do that successfully, appropriate infrastructure for collaboration is necessary.

(21)

 To Allow Multiple Releases. It is not uncommon for development of an item to proceed in multiple directions.

 To Have A Team Memory. The multiple versions of an item, as a collective, are part of software project team memory, even if some of those versions are discarded or not released. It could highlight patterns of sizes of commits and patterns of frequencies

of commits by the members of a team or of the team as a whole (by examining, say,

the history of commits).

 To Support Evolutionary Development. It is understood that any non-trivial software development is carried out iteratively and incrementally over space and over time. In such an approach, it is important to keep track of things that are evolving.

 To Enable Change Management. It is understood that change is inevitable in any non-trivial software development. The management of change needs to be

disciplined and systematic. The provision for multiple versions allows the software

project team to make changes confidently, to reduce the potential for conflicts

readily, to run regression tests easily, and revert to a particular change if necessary.

4.5. CANDIDATES AND NON-CANDIDATES FOR VERSION CONTROL

The version control systems are intended for relatively small text files, not relatively large binary files.

TO BE … (UNDER VERSION CONTROL)

In general, all human-created software project artifacts should be placed under version control [Kandt, 2006, Chapter 5; Wilson, Bryan, Cranston, Kitzes, Nederbragt, Teal, 2017; Laporte, April, 2018, Chapter 8].

(22)

Furthermore, all human-created software project artifacts used to create software

project artifacts should be placed under version control [Kandt, 2006, Chapter 5].

For example, libraries used for software implementation are such artifacts.

… OR NOT TO BE (UNDER VERSION CONTROL)

Then, by reference, data files, object files, and executable programs should not be placed under version control.

Furthermore, certain files that must have one and only one copy on the system should

not be placed under version control. These include build management scripts. EXAMPLE

For example, a script that advises developers of which build machine is currently the

fastest or which file system currently has the most disk space should not be

committed to version control.

This is because of a number of reasons, including the following:

 The same script is used for all code branches. The script does not need to behave any differently for one branch of the source code versus another. Therefore, there is

no use of a different version of such a script for one release of the software versus

another.

 The script is concerned with only the build environment’s present, not its past.

 If changes to such a script are deemed necessary, then that can be done in a

single place. 4.6. REPOSITORIES

Definition [Repository] [IEEE, 2012].

(1) A collection of all software-related artifacts belonging to a system. (2) The location/format in which such a collection is stored.

(23)

Definition [Software Repository] [IEEE, 2012]. A software library providing permanent, archival storage for software and related documentation.

This definition is out-of-date. For example, a software repository could archive

conceptual models.

In the pre-Web era, software repositories were essentially limited to microcomputers, minicomputers, and mainframe computers, and limited to local area networks. Today, there are several global repositories, including source code repositories, on a cloud computing platform.

Definition [Working Copy]. The local copy of files from a repository, at a specific time

or revision. (It is called working copy as all work done to the files in a repository is done

initially on a working copy.)

Definition [Check-In]. To check-in (or, equivalently, commit) is to write or merge the

changes made in the working copy back to the repository.

A checked-in item is a controlled item, and subject to the change management process. This means that a checked-in item cannot be arbitrarily taken out and modified (even by its author).

It is recommended that an item go through a review before being checked-in [Rüping, 2003]. For example, a source code fragment should go through a source code review before being checked-in.

Definition [Check-Out]. To check-out is to create a local working copy from the

repository. A user may specify a specific revision or obtain the latest.

An item could be checked-out if there is a change request.

It is customary to describe a check-in. (In Git, such a description constitutes a commit

message [Blischak, Davenport, Wilson, 2016; Demaree, 2016; Stack Overflow

Community, 2018; Tsitoara, 2020], which could be generated automatically or created

manually.) It is crucial that a check-in description adhere to characteristics of proper writing, namely be clear, concise, and consistent, but, most importantly, it must be useful.

Definition [Import]. The act of copying a local directory tree (that is, not currently a

(24)

Definition [Export]. The act of obtaining the files from the repository. It is similar to

checking-out except that it creates a clean directory tree without the version-control metadata used in a working copy. This is often done prior to, for example, publishing an item.

Definition [Update]. The act of merging changes made in the repository (by other people, for example) into the local working copy. This is same as check-out in version

control systems that require each repository to have exactly one working copy (as is the case in many distributed version control systems).

4.7. USES OF VERSION CONTROL SYSTEM

There are number of uses of a version control system (VCS):

 Obtain Copy. A VCS allows one to obtain a copy of the source code, ready for private examination or modifications to be made.

 Control Commits. A VCS allows one to control check-ins or commits so that private changes can be made available for other developers to use.

 Manage Development and Maintenance. A VCS allows one to facilitate the creation of multiple code streams to manage the development and maintenance of

different versions of the same product.

 Control Access. A VCS allows one to control access to files so that only authorized

developers can change certain source files.

 View History. A VCS enables one to view older versions of each source file, even if newer revisions have superseded them.

This, as shown in Figure 11, allows different developers to work on different versions of the software, while ensuring the necessary level of separation in their work, a type of separation of concerns.

If a customer reports a bug in an old version of source code, a developer can

reproduce the exact set of source files that were used to compile that older version,

(25)

Figure 11. Two possible directions (branches) in development. (Source: [Smith,

2011, Chapter 1].)

A comparison of old and new versions is one type of traceability [Laporte, April, 2018, Chapter 8].

In Git, there are log files that allow browsing and searching the history of commits [Stack Overflow Community, 2018, Chapter 2].

4.8. A CLASSIFICATION OF VERSION CONTROL (SYSTEMS)

There are a number of ways of classifying version control (systems) [Chacon, 2009; Sink, 2011; Chacon, Straub, 2014], one of which is scope.

In the rest of the section, a classification of version control (systems), namely into

localized, centralized, and distributed version control (systems), is motivated by the history of version control [Humble, Farley, 2011, Chapter 14; Perforce, 2015] and by

attempts to overcome the limitations of the different approaches for version control.

4.8.1. LOCALIZED

In the simplest realization of version control, a file can be copied to another,

time-stamped, directory. However, this process is error-prone, as, for example, wrong files

could be copied or files could be copied over. In case of a large number of files, this process is also tedious and mundane.

(26)

Figure 12. An abstract model of a localized version control system. EXAMPLE

For example, the Revision Control System (RCS) is such a VCS.

RCS keeps patch sets (that is, the differences between files) in a special format on disk. It can, at any point in time, re-create a file by adding up all its patches.

4.8.2. CENTRALIZED

A localized VCS has certain limitations. For example, it does not help collaboration among developers on different machines.

(27)

Figure 13. An abstract model of a centralized version control system. EXAMPLE

For example, Concurrent Version System (CVS), Subversion, and Perforce are such VCS.

These have a single server that contains all the versioned files, and a number of clients that check out files from that central place. This allows everyone involved to know, to a certain degree, what everyone else on the project is doing. Furthermore, it is relatively

easier to administer a centralized VCS to check who can do what than it is to deal with

local databases on every client.

4.8.3. DISTRIBUTED

(28)

To deal with this, distributed VCSs were invented. A model of a distributed version control system is shown in Figure 14.

It allows mirroring a repository. In fact, several mirrors of the same repository are possible, which enables synchronized collaboration among different groups of people, in different ways, on the same project.

Figure 14. An abstract model of a distributed version control system. EXAMPLE

For example, Git and Mercurial are such VCSs.

(29)

To over the limitations of distributed VCSs, there is movement towards “hybrid” VCSs, that is, VCSs that are combinations of centralized VCSs and distributed VCSs [Perforce, 2015].

REMARKS

The scope of distributed VCSs has, over the years, broadened to provide support for

project management practices (such as, collaborative authoring and sharing using a

Wiki) and programming practices (such as, source code review).

4.9. STRUCTURE OF A VERSION-CONTROLLED (SOFTWARE) PROJECT

The structure of a version-controlled (software) project is a rooted directed acyclic

graph (DAG). The representation of this DAG is spatial, but the phenomenon which the

DAG reflects is temporal. The graph is rooted because there is an oldest version, it is

directed because children are always forward in time, and it is acyclic because parents

are always backward in time.

(30)

Figure 15. A part of the structure of a version-controlled (software) project.

Definition [Branch]. A set of items under version control may be branched or forked at a

point in time so that, from that time forward, two or more copies of those items may develop at different speeds, or in different ways, independently of each other.

In linear development, there are no branches. The motivation for branching is to enable

nonlinear development, specifically, parallel development, such that each branch has a different goal.

(31)

(c) (d)

Figure 16. An assortment of trunks and branches, illustrating linear and nonlinear

development. (Source: [Leon, 2015, Chapter 5].)

EXAMPLE

For example, one software engineer may work towards fixes bugs in earlier versions of the product (corrective maintenance), while another software engineer may work towards adding new features to the product (perfective maintenance).

Definition [Trunk]. The unique line of development that is not a branch.

Definition [Merge]. An operation in which two sets of changes are applied to a file or set

of files. (For example, a user updates his or her working copy, or synchronizes his or her working copy with changes made by other users, and checks it into the repository.)

Usually, a versioning scheme by itself carries limited semantics. This motivates the need of a means for metainformation.

Definition [Tag]. A label that refers to an important snapshot in time, consistent across

many files. These files at that point may all be tagged with an easy to understand and

meaningful name or number. OBSERVATIONS

 There can be one or more branches.

(32)

 A branch can change.

 A branch can have other branches.

 A branch may or may not be merged back into the trunk.

 A discontinued development branch cannot have more branches and is not merged back into the trunk.

4.9.1. BRANCHING STRATEGIES

There are different branching strategies. The addition of each branch entails a certain

management cost that will be incurred later, say in the integration and testing phases

[Crouch, 2018; Laporte, April, 2018, Chapter 8]. Therefore, it is important to choose a

strategy that is aligned with the type of software project. BRANCHING STRATEGY 1

This strategy requires the creation of two types of branches, namely (1) the main branch

(trunk), and (2) a version branch, as shown in Figure 17.

Figure 17. An illustration of branching strategy 1.

In this branching strategy, the main branch is used by the developers of the software. This implies that the main branch must remain stable at all times as the developers are always in it.

The point in time when the item is stable and the development team wants to deliver a

version to its customer, a version branch is created (say, 1.0.1) that contains a complete version of all the artifacts from the first production version. The development team can

(33)

In time, when the development team is ready to deliver a subsequent release, it creates another release branch (say, 1.1.3), which becomes the new production version for the customer. The previous branch could be kept as historical, or may be archived, given that the customer has been provided a new version.

APPLICABILITY OF BRANCHING STRATEGY 1

This strategy is appropriate for the following situations:

 The project is relatively small. (For example, the development of a small-scale Web Application.)

 The development team is relatively small and co-located.

 The corrections to production software require a merging to the main branch. These must be closely controlled because they are made directly within the development branch.

 The new version shipped from the main branch includes all previous changes from this branch.

BRANCHING STRATEGY 2

This strategy requires the creation of three types of branches, namely (1) the main

branch (trunk), (2) a development branch, and (3) a production branch, as shown in

Figure 18.

(34)

In this branching strategy, no one works directly in the main branch. It is used only for the integration of items from the developers. This strategy compels the developers to work in a development branch.

The main branch receives an intermediary version from time to time that marks progress and can be used for demonstrations. Its rate of change (and frequency of release) depends on the number of merges coming from the development branch (which could range anywhere from every day to every few days, but at least once a week).

The development branch is used for most of the activities and changes. The development team must control the content of the development branch. In principle,

items in a development branch should always work (say, be able to compile without

error). As one of the best practices of version control says, “Don’t Break The Tree” [Sink, 2011, Chapter 13].

The production branch is used to solve production problems, such as fixing a bug. If it is changed, then someone must merge those changes to the development branch to ensure synchronization. It is expected that, eventually, the production branch also becomes stable, as it contains the production version.

APPLICABILITY OF BRANCHING STRATEGY 2

This strategy is appropriate for the following situations:

 The project is relatively large. (For example, the development of a large-scale operating system or maintenance of a social network.)

 The development team is relatively large and not necessarily co-located.

 There is need for delivery of a unique major version.

 There is need for delivery of a major version at regular (possibly, periodic) intervals.

(35)

4.10. VERSION NUMBERING

Definition [Software Version Identifier] [IEEE, 2012]. An explicit and immutable

version identifier (name or number) inserted into each configuration item, including each individual release, that can be used to identify the exact version of the configuration item in any instance or repository.

The version indicator is usually, but not always, a number. The version indicators must

increase.

There is currently no ‘standard’ for version numbering. The following are some possible version sequences:  R1, R2, R3  1.2.0, 1.2.1, 1.2.2, 1.3.0  3.1, 95, NT, 98, Me, 2000, XP, Vista, 7, 8, 10  SE 6, SE 7, SE 8, SE 9, SE 10, SE 11  737-300, 737-400, 737-500, 747-300, 747-400

The currently most common version system for software seems to be the three-number

approach. The three-number approach recognizes the difference in large feature changes, small feature changes, and bug fixes2. In this approach, a version number should follow the format X.Y.Z (Build B).

 X increments whenever major feature changes are made to the software. This often means that configuration and data files that were used in previous versions of the software are no longer compatible (and must be upgraded).

 Y increments whenever minor feature changes are made. These changes add new capabilities to the software, but do not significantly change the way the software is used or result in a disruptive upgrade.

 Z increments for every new bug fix (or set of bug fixes). No new functionality is added to the software, but the user can expect that the software quality has improved.

2

(36)

 (Build B) increments with every release build of the software. This number is typically large and has no relation to the values of X, Y, or Z. The customer need

not be concerned with this number: It merely indicates the number of times the tester has received a new package to test. (It does not say anything about the new

features or bug fixes that may be present in the package.)

EXAMPLE

It is possible to use version numbering to show evolution, as shown in Figure 19.

Figure 19. The evolution of Kubernetes3. (Source: Wikipedia.)

4.10.1. MANAGING THE VERSION NUMBER

In a release build system, successive software releases use successive version numbers, incrementing the individual parts of the number as appropriate.

The version number could be stored in one of the following ways:

1. External Disk File: In this case, the release engineer maintains a disk file containing the current version number, such as 1.2.3 (Build 456). There is need for a script which, after each successful build, increments the build number.

(37)

The build number B is incremented automatically after every successful build, but the

X, Y, and Z are incremented manually after a conscious decision by the product

managers.

EXAMPLE

For example, here is a sequence of release builds:

2.3.0 (Build 523): Released to Customers 2.3.1 (Build 524): Internal Only

2.3.1 (Build 525): Internal Only 2.3.1 (Build 526): Internal Only

2.3.1 (Build 527): Released to Customers

Board Time!

Give a reason why X, Y, or Z are not incremented automatically.

4.10.2. RELEVANCY OF THE VERSION NUMBER TO STAKEHOLDERS AND USE OF THE VERSION NUMBER BY STAKEHOLDERS

The version numbers are relevant to and used by different stakeholders for different reasons.

CUSTOMERS

For strategic as well as technical reasons, the customers (and other external stakeholders) need not be made aware of the build number. For example, the customers (and other external stakeholders) should not have to care about the number of times the package has been built and sent to the tester. Therefore, the build number should be

hidden before a release. TESTERS

(38)

RELEASE ENGINEERS

The release engineer tags the VCS to indicate that a particular build, such as 527, was the official release of version 2.3.1. If developers wanted to reproduce the source code for this version, they would reference the appropriate tag, such as Release_2.3.1. To reproduce internal releases, the tag would be Release_2.3.1_Build_526.

4.11. LIMITATIONS OF CONFIGURATION MANAGEMENT

The current limitations of configuration management are mostly human-related.

If the configuration management process is deemed to be overly bureaucratic, cost of

acquiring and operating a configuration management system is excessive, or the access

to a configuration management system is prohibitively slow, then programmers can be

discouraged to commit to a configuration management system, or to not use a

configuration management system to its potential.

5. RELEASING

Definition [Release] [IEEE, 2012].

(1) A delivered version of an application that may include all or part of an application. (2) [A] collection of new and/or changed configuration items that are tested and

introduced into the live environment together.

(3) A software version that is made formally available to a wider community.

OBSERVATIONS

 For software to be a candidate for release, it must have been tested thoroughly [Humble, Farley, 2011; Nygard, 2018].

 Usually, a release consists of more than the executable code. For example, a release could include installation instructions, sample data, and documentation.

 Usually, there are more revisions of a system than releases of that system.

(39)

 It has been pointed out that Frequent Releases is one of the 97 Things Every Java

Programmer Should Know [Henney, Gee, 2020].

The size and frequency are two interrelated concepts in software release engineering. Let Ri and Ri+1 be two consecutive releases. Let || ൉ || be a suitable norm for size. It is

(1) relatively easier, (2) relatively less time consuming, and (3) relatively less risky to have small || Ri+1 – Ri || than to have large || Ri+1 – Ri ||. Therefore, releases with small changes over their previous incarnations can be produced with relatively high frequency.

5.1. SOFTWARE RELEASE ENGINEERING

Definition [Software Release Management] [IEEE, 2012]. [The] management of the activities surrounding the release of one or more versions of software to one or more

customers, including identifying, packaging, and delivering the elements of a product.

According to Wikipedia, software release management is a sub-discipline of software

release engineering that, in turn, is a sub-discipline of software engineering. It can be

expected that the support for software release engineering contributes to the process

maturity of an organization.

The Release Management Wiki4, supported by Electric Cloud, is a compilation of resources on Release Management.

ACKNOWLEDGEMENT

The inclusion of images from external sources is solely for non-commercial educational purposes, and their use is hereby acknowledged.

4

(40)

REFERENCES

[Aiello, Sachs, 2011] Configuration Management Best Practices: Practical Methods that Work in the Real World. By B. Aiello, L. Sachs. Addison-Wesley. 2011.

[Appleton, Berczuk, Cowham, 2006] Principles of Agile Version Control: From OOD to TBD. By B. Appleton, S. Berczuk, R. Cowham. June 19, 2006.

[Blischak, Davenport, Wilson, 2016] A Quick Introduction to Version Control with Git and GitHub. By J. D. Blischak, E. R. Davenport, G. Wilson. PLOS Computational Biology. Volume 12. Number 1. 2016. Pages 1-18.

[Bryant, Marín-Pérez, 2019] Continuous Delivery in Java: Essential Tools and Best Practices for Deploying Code to Production. By D. Bryant, A. Marín-Pérez. O’Reilly Media. 2019.

[Chacon, 2009] Pro Git. By S. Chacon. Apress. 2009.

[Chacon, Straub, 2014] Pro Git. By S. Chacon, B. Straub. Second Edition. Apress. 2014.

[Christensen, 2010] Flexible, Reliable Software: Using Patterns and Agile Development. By H. B. Christensen. CRC Press. 2010.

[Clark, 2004] Pragmatic Project Automation: How to Build, Deploy, and Monitor Java Apps. By M. Clark. The Pragmatic Bookshelf. 2004.

[Crouch, 2018] Picking the Right Branch-Merge Strategy. By A. Crouch. CMCrossroads. November 14, 2018. URL: https://www.cmcrossroads.com/print/article/picking-right-branch-merge-strategy.

[Demaree, 2016] Git for Humans. By D. Demaree. A Book Apart. 2016.

[Goasguen, 2016] Docker Cookbook: Solutions and Examples for Building Distributed Applications. By S. Goasguen. O’Reilly Media. 2016.

(41)

[Henney, Gee, 2020] 97 Things Every Java Programmer Should Know: Collective Wisdom from the Experts. By K. Henney, T. Gee. O’Reilly Media. 2020.

[Howard, 2012] IT Release Management: A Hands-on Guide. By D. Howard. CRC Press. 2012.

[Humble, Farley, 2011] Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. By J. Humble, D. Farley. Addison-Wesley. 2011.

[Hüttermann, 2012] DevOps for Developers. By M. Hüttermann. Apress. 2012.

[IEEE, 2012] IEEE Standard 828-2012. IEEE Standard for Configuration Management in Systems and Software Engineering. The Institute of Electrical and Electronics Engineers (IEEE) Computer Society. 2009.

[IEEE, 2021] IEEE Standard 2675-2021. IEEE Standard for DevOps: Building Reliable and Secure Systems Including Application Build, Package, and Deployment. The Institute of Electrical and Electronics Engineers (IEEE) Computer Society. 2021.

[ISO/IEC, 2005] ISO/IEC TR 19759:2005. Guide to the Software Engineering Body of Knowledge (SWEBOK). The International Organization for Standardization (ISO)/The International Electrotechnical Commission (IEC). 2005.

[Kandt, 2006] Software Engineering Quality Practices. By R. K. Kandt. Auerbach Publications. 2006.

[Karvonen, Behutiye, Oivo, Kuvaja, 2017] Systematic Literature Review on the Impacts of Agile Release Engineering Practices. By T. Karvonen, W. Behutiye, M. Oivo, P. Kuvaja. Information and Software Technology. Volume 86. 2017. Pages 87-100.

[Kemper, Oxley, 2012] Foundation Version Control for Web Developers. By C. Kemper, I. Oxley. Friends of Ed. 2012.

[Kodumal, 2019] Effective Feature Management: Releasing and Operating Software in the Age of Continuous Delivery. By J. Kodumal. O’Reilly Media. 2019.

(42)

[Laporte, April, 2018] Software Quality Assurance. By C. Y. Laporte, A. April. IEEE Computer Society. 2018.

[Lemaire, 2021] Refactoring at Scale: Regaining Control of Your Codebase. By M. Lemaire. O’Reilly Media. 2021.

[Leon, 2015] Software Configuration Management Handbook. By A. Leon. Third Edition. Artech House. 2015.

[McNutt, 2016] Release Engineering: How Google Builds and Delivers Software. By D. McNutt. O’Reilly Media. 2016.

[Meyer, 2014] Continuous Integration and Its Tools. By M. Meyer. IEEE Software. Volume 31. Number 3. 2014. Pages 14-16.

[Miell, Sayers, 2019] Docker in Practice. By I. Miell, A. H. Sayers. Second Edition. Manning Publications. 2019.

[Moran, 2015] Managing Agile: Strategy, Implementation, Organisation and People. By A. Moran. Springer International Publishing. 2015.

[Moreira, 2010] Adapting Configuration Management for Agile Teams: Balancing Sustainability and Speed. By M. E. Moreira. John Wiley and Sons. 2010.

[Mouat, 2016] Using Docker: Developing and Deploying Software with Containers. By A. Mouat. O’Reilly Media. 2016.

[Nickoloff, Kuenzli, 2019] Docker in Action. By J. Nickoloff, S. Kuenzli. Second Edition. Manning Publications. 2019.

[Nygard, 2018] Release It! Design and Deploy Production-Ready Software. By M. T. Nygard. The Pragmatic Bookshelf. Second Edition. 2018.

[Perforce, 2015] Best Practices for Version Management. By Perforce Software. White Paper. 2015.

(43)

[Reussner, Goedicke, Hasselbring, Vogel-Heuser, Keim, Märtin, 2019] Managed Software Evolution. By R. Reussner, M. Goedicke, W. Hasselbring, B. Vogel-Heuser, J. Keim, L. Märtin (Editors). Springer Nature. 2019.

[Rüping, 2003] Agile Documentation: A Pattern Guide to Producing Lightweight Documents for Software Projects. By A. Rüping. John Wiley and Sons. 2003.

[Scott, 2017] Collaborative Web Development By A. D. Scott. O’Reilly Media. 2017.

[Shmueli, Ronen, 2017] Excessive Software Development: Practices and Penalties. By O. Shmueli, B. Ronen. International Journal of Project Management. Volume 35. 2017. Pages 13-27.

[Sink, 2011] Version Control by Example. By E. Sink. Pyrenean Gold Press. 2011.

[Smith, 2011] Software Build Systems: Principles and Experience. By P. Smith. Addison-Wesley. 2011.

[Stack Overflow Community, 2018] Git® Notes for Professionals. By Stack Overflow Community. 2018.

[Summers, 2013] Effective Methods for Software and Systems Integration. By B. L. Summers. CRC Press. 2013.

[Tsitoara, 2020] Beginning Git and GitHub: A Comprehensive Guide to Version Control, Project Management, and Teamwork for the New Developer. By M. Tsitoara. Apress. 2020.

[Visser, 2017] Building Software Teams: Ten Best Practices for Effective Software Development. By J. Visser. O’Reilly Media. 2017.

[Wilson, Bryan, Cranston, Kitzes, Nederbragt, Teal, 2017] Good Enough Practices in Scientific Computing. By G. Wilson, J. Bryan, K. Cranston, J. Kitzes, L. Nederbragt, T. K. Teal. PLOS Computational Biology. Volume 13. Number 6. 2017. Pages 1-20.

(44)