Since the representation of program state is a logical predicate, there is an alternative to keeping a complete representation of the state at
Chapter 8: Finite State Verification
8.8 Data Model Verification with Relational Algebra
Many information systems have relatively simple logic and algorithms, with much of their complexity in the structure of the data they maintain. A data model is a key design
description for such systems. It is typically described, for example, in the class and object diagrams of a Unified Modeling Language (UML) design document, possibly augmented by assertions in the Object Constraint Language (OCL). The finite state verification techniques we have described are suited to reasoning about complex or subtle program logic, but are quite limited in dealing with complex data. Fortunately, suitable finite state verification techniques can also be devised for reasoning about data models.
The data model consists of sets of data and relations among them. Often a data model describes many individual relations and constraints; the challenge is in knowing whether all of the individual constraints are consistent, and whether together they ensure the desired properties of the system as a whole. Constructing and testing a portion or partial version of the system may provide some increased confidence in the realizability of the system, but even with incremental development it can happen that a fundamental problem in the data model is discovered only after a great deal of development effort has been invested in the flawed model. Reasoning about the model itself is a more timely and cost-effective way to find and correct these flaws.
Let us consider, for example, a simple Web site with a data model described as sets and relations as follows:
A set of pages, divided among restricted, unrestricted, and maintenance pages.
Unrestricted pages are freely accessible, while restricted pages are accessible only to registered users, and pages in maintenance are currently inaccessible to both sets of users.
A set of users, classified as administrator, registered, and unregistered users.
A set of links relations among pages. Different relations describe different kinds of links. Private links lead to restricted pages, public links lead to unrestricted pages, and maintenance links lead to pages undergoing maintenance.
A set of access rights relations between users and pages, relating different classes of users to the pages they can access. Unregistered users can access only
unrestricted pages, registered users can access both restricted and unrestricted pages, and an administrator can access all pages, including pages under
maintenance.
So far we have identified the sets involved in the relations, which we call their signature. To complete the description we need to indicate the rules that constrain relations among
specific elements. For example we may:
Exclude self loops from "links" relations; that is, specify that a page should not be directly linked to itself.
Allow at most one type of link between two pages. Note that relations need not be symmetric; that is, the relation between A and B is distinct from the relation between B and A, so there can be a link of type private from A to B and a link of type public from B back to A.
Require the Web site to be connected; that is, require that there be at least one way of following links from the home page to each other page of the site.
A data model can be visualized as a diagram with nodes corresponding to sets and edges representing relations, as in Figure 8.14.
Figure 8.14: The data model of a simple Web site.
We can reason about sets and relations using mathematical laws. For example, set union and set intersection obey many of the same algebraic laws as addition and subtraction of integers:
A ∪ B = B ∪ A commutative law A ∩ B = B ∩ A " "
(A ∪ B) ∪C = A ∪ (B ∪C) associative law (A ∩ B) ∩ C = A ∩ (B ∩ C) " "
A ∩ (B ∪ C)=(A ∩ B) ∪ (A ∩ C) distributive law etc.
These and many other laws together make up relational algebra, which is used extensively in database processing and has many other uses.
It would be inconvenient to write down a data model directly as a collection of mathematical formulas. Instead, we use some notation whose meaning is the same as the mathematical formulas, but is easier to write, maintain, and comprehend. Alloy is one such modeling notation, with the additional advantage that it can be processed by a finite state verification tool.
The definition of the data model as sets and relations can be formalized and verified with relational algebra by specifying signatures and constraints. Figure 8.15 presents a
formalization of the data model of the Web site in Alloy. Keyword sig (signature) identifies three sets: Pages, User, and Site. The definition of set Pages also defines three disjoint
relations among pages: linksPriv (private links), linksPub (public links), and linksMain
(maintenance links). The definition of User also defines a relation between users and pages.
User is partitioned into three disjoint sets (Administrator, Registered, and Unregistered). The definition of Site aggregates pages into the site and identifies the home page. Site is defined static since it is a fixed classification of objects.
1 module WebSite 2
3 // Pages include three disjoint sets of links
4 sig Page{ disj linksPriv, linksPub, linksMain: set Page } 5 // Each type of link points to a particular class of page 6 fact connPub{ all p: Page, s: Site | p.linksPub in s.unres } 7 fact connPriv{ all p: Page, s: Site | p.linksPriv in s.res } 8 fact connMain{ all p: Page, s: Site | p.linksMain in s.main } 9 // Self loops are not allowed
10 fact noSelfLoop{ no p: Page| p in p.linksPriv+p.linksPub+p.linksMain } 11
12 // Users are characterized by the set of pages that they can access 13 sig User{ pages: set Page }
14 // Users are partitioned into three sets
15 part sig Administrator, Registered, Unregistered extends User {}
16 // Unregistered users can access only the home page, and unrestricted pages 17 fact accUnregistered{
18 all u: Unregistered, s: Site| u.pages = (s.home+s.unres) }
19 // Registered users can access home, restricted and unrestricted pages 20 fact accRegistered{
21 all u: Registered, s: Site|
22 u.pages = (s.home+s.res+s.unres) 23 }
24 // Administrators can access all pages 25 fact accAdministrator{
26 all u: Administrator, s: Site|
27 u.pages = (s.home+s.res+s.unres+s.main) 28 }
29
30 // A web site includes one home page and three disjoint sets 31 // of pages: restricted, unrestricted and maintenance
32 static sig Site{
33 home: Page,
34 disj res, unres, main: set Page 35 }{
36 // All pages are accessible from the home page ('^' is transitive closure) 37 all p: (res+unres+main)| p in home.^(linksPub+linksPriv+linksMain)
38 }
39
Figure 8.15: Alloy model of a Web site with different kinds of pages, users, and access rights (data model part). Continued in Figure 8.16.
1 module WebSite 39 ...
40 // We consider one Web site that includes one home page 41 // and some other pages
49 // We consider one administrator and some registered and unregistered users 50 fun initUsers() {one Administrator and
51 some Registered and
64 // check if unregistered users can visit all unrestrited pages,
65 // i.e., all unrestricted pages are connected to the home page with 66 // at least a path of public links.
67 // Perform analysis with sets of at most 3 objects.
68 // '*' indicates the transtivie closure including the source element.
69
70 assert browsePub{
71 all p: Page, s: Site| p in s.unres implies s.home in p.* linksPub 72 }
73 check browsePub for 3
Figure 8.16: Alloy model of a Web site with different kinds of pages, users, and access rights, continued from Figure 8.15.
The keyword facts introduces constraints.[6] The constraints connPub, connPriv and connMain restrict the target of the links relations, while noSelfLoop excludes links from a page to itself. The constraints accAdministrator, accRegistered, and accUnregistered map users to pages. The constraint that follows the definition of Site forces the Web site to be connected by requiring each page to belong to the transitive closure of links starting from the Web page (operator ‘∘’).
A relational algebra specification may be over-or underconstrained. Overconstrained specifications are not satisfiable by any implementation, while underconstrained
specifications allow undesirable implementations; that is, implementations that violate important properties.
In general, specifications identify infinite sets of solutions, each characterized by a different set of objects and relations (e.g., the infinite set of Web sites with different sets of pages, users and correct relations among them). Thus in general, properties of a relational
specification are undecidable because proving them would require examining an infinite set of possible solutions. While attempting to prove absence of a solution may be inconclusive, often a (counter) example that invalidates a property can be found within a finite set of small models.
We can verify a specification over a finite set of solutions by limiting the cardinality of the sets. In the example, we first verify that the model admits solutions for sets with at most five elements (run init for 5 issued after an initialization of the system.) A positive outcome
indicates that the specification is not overconstrained - there are no logical contradictions. A negative outcome would not allow us to conclude that no solution exists, but tells us that no
"reasonably small" solution exists.
We then verify that the example is not underconstrained with respect to property browsePub that states that unregistered users must be able to visit all unrestricted pages by accessing the site from the home page. The property is asserted by requiring that all unrestricted pages belong to the reflexive transitive closure of the linkPub relation from the home page (here we use operator ‘*’ instead of ‘∘’ because the home page is included in the closure). If we check whether the property holds for sets with at most three elements (check
browsePub for 3) we obtain a counter-example like the one shown in Figure 8.17, which shows how the property can be violated.
Figure 8.17: A Web site that violates the "browsability" property, because public page Page_2 is not reachable from the home page using only unrestricted links. This diagram was generated by the Alloy tool.
The simple Web site in the example consists of two unrestricted pages (page_1, the home page, and Page_2), one restricted page (page_0), and one unregistered user (user_2).
User_2 cannot visit one of the unrestricted pages (Page_2) because the only path from the home page to Page_2 goes through the restricted page page_0. The property is violated because unrestricted browsing paths can be "interrupted" by restricted pages or pages under maintenance, for example, when a previously unrestricted page is reserved or disabled for maintenance by the administrator.
The problem appears only when there are public links from maintenance or reserved pages, as we can check by excluding them:
1 fact descendant{
2 all p: Page, s: Site| p in s.main+s.res implies no p.linksPub 3 }
This new specification would not find any counter-example in a space of cardinality 3. We cannot conclude that no larger counter-example exists, but we may be satisfied that there is no reason to expect this property to be violated only in larger models.
Summary
Finite state verification techniques fill an important niche in verifying critical properties of programs. They are particularly crucial where nondeterminism makes program testing ineffective, as in concurrent execution. In principle, finite state verification of concurrent execution and of data models can be seen as systematically exploring an enormous space of possible program states. From a user's perspective, the challenge is to construct a suitable model of the software that can be analyzed with reasonable expenditure of human
and computational resources, captures enough significant detail for verification to succeed, and can be shown to be consistent with the actual software.
Further Reading
There is a large literature on finite state verification techniques reaching back at least to the 1960s, when Bartlett et al. [BSW69] employed what is recognizably a manual version of state space exploration to justify the corrrectness of a communication protocol. A number of early state space verification tools were developed initially for communication protocol
verification, including the Spin tool. Holzmann's journal description of Spin's design and use [Hol97], though now somewhat out of date, remains an adequate introduction to the
approach, and a full primer and reference manual [Hol03] is available in book form.
The ordered binary decision diagram representation of Boolean functions, used in the first symbolic model checkers, was introduced by Randal Bryant [Bry86]. The representation of transition relations as OBDDs in this chapter is meant to illustrate basic ideas but is
simplified and far from complete; Bryant's survey paper [Bry92] is a good source for understanding applications of OBDDs, and Huth and Ryan [HR00] provide a thorough and clear step-by-step description of how OBDDs are used in the SMV symbolic model checker.
Model refinement based on iterative refinements of an initial coarse model was introduced by Ball and Rajamani in the tools Slam [BR01a] and Bebop [BR01b], and by Henzinger and his colleagues in Blast [HJMS03]. The complementary refinement approach of FLAVERS was introduced by Dwyer and colleagues [DCCN04].
Automated analysis of relational algebra for data modeling was introduced by Daniel Jackson and his students with the Alloy notation and associated tools [Jac02].
Exercises
8.1
We stated, on the one hand, that finite state verification falls between basic flow analysis and formal verification in power and cost, but we also stated that finite state verification techniques are often designed to provide results that are tantamount to formal proofs of program properties. Are these two statements contradictory? If not, how can a technique that is less powerful than formal verification produce results that are tantamount to formal proofs?
8.2
Construct an ordered binary decision diagram (OBDD) for the proposition
1. How does the size of the OBDD representation of
differ depending on which variable (x, y,or z) is first in the variable ordering (i.e., appears in the root node of the OBDD representation)? Is the size of
8.3 the OBDD equivalent for some different orderings of the variables? Why or why not?
2. Predict whether the order of variables would make a difference for
8.4
A property like "if the button is pressed, then eventually the elevator will come" is classified as a liveness property. However, the stronger real-time version "if the button is pressed, then the elevator will arrive within 30 seconds" is technically a safety property rather than a liveness property. Why?
[6]The order in which relations and constraints are given is irrelevant. We list constraints after the relations they refer to.