Decision problems for language equations

(1)

Contents lists available atScienceDirect

Journal of Computer and System Sciences

www.elsevier.com/locate/jcss

✩

Alexander Okhotin

a

,

b

,

∗

a_{Academy of Finland, Finland}

b_{Department of Mathematics, University of Turku, Turku FIN-20014, Finland}

a r t i c l e

i n f o

a b s t r a c t

Article history:

Received 13 February 2004

Received in revised form 5 September 2008 Available online 21 August 2009

Keywords: Language equations Boolean operations Computability

Equations with formal languages as unknowns using all Boolean operations and concate-nation are studied. Their main properties, such as solution existence and uniqueness, are characterized by ﬁrst-order formulae. It is shown that testing solution existence is Π1

-complete, while solution uniqueness and existence of a least and of a greatest solution are allΠ2-complete problems. The families of languages deﬁned by components of unique,

least and greatest solutions of such systems are shown to coincide with the classes of recursive, recursively enumerable and co-recursively enumerable sets, respectively.

1. Introduction

Equations, in which variables assume values of formal languages over a ﬁnite alphabet, the constants are formal lan-guages as well, and the operations used are the language-theoretic operations (such as concatenation, Boolean operations, Kleene star, etc.) are known as language equations. Being a mathematical abstraction for reasoning about sets of strings, language equations naturally arise in different areas of computer science, and problems that can be formally described by language equations can be found in all kinds of applications.

Language equations were ﬁrst used in connection to their most natural application: the description of syntax. The basic and the most well-known model of syntax is a context-free grammar, and Ginsburg and Rice [9] deﬁned the semantics of these grammars by systems of equations of the form

⎧

⎪

⎨

⎪

⎩

X1

=

ϕ

1

(

X1

, . . . ,

Xn

),

..

.

Xn

=

ϕ

n

(

X1

, . . . ,

Xn

),

(

∗

)

where each variable Xi represents an unknown language (a “nonterminal symbol” of a grammar), while each expression

ϕ

i is a union of concatenations of variables and singleton constants. This semantics is in many senses preferable to the Chomskian deﬁnition based on derivation: as rightfully observed by Autebert et al. [2], a speciﬁcation of the form “an instruction is . . . ” is more natural than “a symbol for instruction derives . . . ”. Furthermore, the approach of Ginsburg and Rice [9] can be generalized to obtain more powerful models of syntax based upon language equations, such asconjunctive grammars[17,18], which extend context-free grammars with an explicit conjunction operation interpreted by intersection in (

∗

), and Boolean grammars [19] that allow the use of all propositional connectives represented by the corresponding Boolean operations on languages.

✩ _{A preliminary version of this paper was presented at the ICALP 2003 conference held in Eindhoven, the Netherlands, June 30–July 4, 2003. This research} was done during the author’s studies at the School of Computing, Queen’s University (Kingston, Ontario, Canada).

*

Address for correspondence: Department of Mathematics, University of Turku, Turku FIN-20014, Finland. E-mail address:alexander.okhotin@utu.ﬁ.

(2)

Another recurring application of language equations is representing various properties of computation. Systems of the form (

∗

) were used to represent finite automata in an early monograph by Salomaa [21], and Brzozowski and Leiss [5] generalized these systems to define alternation in finite automata. Later, more general equations over sets of terms, known as set constraints, were used to represent some properties of programs, and efficient algorithms for their analysis were obtained by Aiken et al. [1]. Connections between set constraints and language equations of a more general form were investigated by Charatonik [6], who is notable for obtaining the first undecidability result for language equations.

Language equations have been adopted as a model in several applied areas of computer science. For instance, language equations with special operations on strings were used by Daley, Ibarra and Kari [8] to model recombination of genes in DNA, Yevtushenko et al. [22] applied language equations to represent component design of complex systems, and more recently Kari and Konstantinidis [13] used language equations of another form to analyze error-detection properties of a communication channel. In applied logic, Baader and Narendran [3] used finite solutions of language equations to represent unification and matching indescription logic FL0, while Zhang [23] characterized an extension ofpropositional domain logic by fixed points of language equations.

Besides the research motivated by applications, some purely theoretical work in the area has been done as well. Con-way [7] was the ﬁrst to study a large class of systems with regular solutions, as well as to raise important questions on more general equations. Later Kari and Thierrin [14] studied equations of a simple form using generalized string operations, Leiss [16] constructed the ﬁrst example of a language equation over a one-letter alphabet with a nonperiodic solution, Karhumäki and Petre [12] investigated the equation X L

=

L X proposed by Conway [7], while Karhumäki and Lisovik [11] were the ﬁrst to consider inﬁnite systems of language equations and to show undecidability of their basic properties.

Since language equations naturally occur whenever sets of strings are being considered, a general theory of these equa-tions would be helpful for many applicaequa-tions. However, as sharply noted by Karhumäki and Petre [12] in 2003, the present knowledge on language equations amounts to “very little, or in fact almost nothing”. Though some interesting results on particular kinds of equations have been obtained, no general methods of reasoning about language equations have been developed. The goal of this paper is to introduce such methods and to develop basic formal properties of equations with Boolean operations and concatenation.

As the main model, this paper adopts system of the form (

∗

), in which the right-hand sides

ϕ

imay contain any Boolean operations and concatenation. As shown in Section 2, these systems are as powerful as more general systems involving arbitrary equalities

ϕ

(

X

)

=

ψ(

X

)

and inequalities

ϕ

(

X

)

⊆

ψ(

X

)

, and hence almost all language equations ever studied fall under this case. The next Section 3 presents technical results on encoding computations of Turing machines using language equations, which are subsequently used in all hardness arguments.

The main tool for the analysis of language equations introduced in this paper is the notion of asolution modulo a language defined in Section 4. The idea is to take a system of equations and to consider only finitely many strings, thus turning the system into a finitely manageable object. Then, applying quantification over the finite set of strings under consideration (the modulus), one can represent various properties of the system. This is used in the following Sections 5 and 6 to char-acterize the conditions of having a solution and having a unique solution by first-order formulae, and determine the exact undecidability levels of the corresponding decision problems. A similar study of least and greatest solutions is conducted in Section 7.

Finally, the families of languages deﬁned by unique, least and greatest solutions of systems of language equations are considered in Section 8, where it is proved that these are exactly the recursive, the recursively enumerable (r.e.) and the co-recursively enumerable sets.

2. Language equations

Let us begin with the basic notation of formal language theory used in the paper. Analphabet

Σ

is a ﬁnite nonempty set, its elements are calledsymbolsorletters. Astringover

Σ

is a ﬁnite sequencew

=

a1

. . .

awith

0 andai

∈

Σ

, where the number

is the lengthof the string, denoted

|

w

|

. The uniqueempty stringof length 0 is denoted by

ε

. The set of all strings is denoted

Σ

∗, and any subset of this set is called alanguage. Theconcatenationof two stringsu

,

v

∈

Σ

∗is the string uv

∈

Σ

∗, and the concatenation of two languagesK

,

L

⊆

Σ

∗ is the language K L

= {

uv

|

u

∈

K

,

v

∈

L

}

. The concatenation of k copies of the same string or language is denoted by wkand Lk, respectively. TheKleene starof a language L

⊆

Σ

∗ is the language L∗

=

∞_k₀Lk. DeﬁneL+

=

∞_k₁Lk.

The language equations considered in this paper may use concatenation and all Boolean operations, and are constructed of expressions of the following form.

Deﬁnition 1 (Language expressions). Let

Σ

be an alphabet, and let X

=

(

X1

, . . . ,

Xn

)

withn

1 be a vector of language variables. The set ofexpressionsover

Σ

in variables X is deﬁned inductively as follows:

•

any constant language L0

⊆

Σ

∗is an expression;

•

any variable from X is an expression;

(3)

The concatenation in an expression is said to be linear, if for every subformula

(

ϕ

ψ)

, either

ϕ

or

ψ

is a constant lan-guage.

It will be assumed that concatenation has a higher precedence than Boolean operations. This, in addition to associativity of concatenation, union and intersection, allows omitting some of the parentheses in the expressions. For succinctness, when a singleton constant

{

w

}

is concatenated to any expression, it will be denoted byw.

Deﬁnition 2 (Resolved system of equations).Let

Σ

be an alphabet. Letn

1. Let X

=

(

X1

, . . . ,

Xn

)

be a vector of language variables. Let

ϕ

=

(

ϕ

1

, . . . ,

ϕ

n

)

be a vector of expressions over the alphabet

Σ

and in variablesX. Then

⎧

⎪

⎨

⎪

⎩

X1

=

ϕ

1

(

X1

, . . . ,

Xn

),

..

.

Xn

=

ϕ

n

(

X1

, . . . ,

Xn

)

or X

=

ϕ

(

X

)

in vector form, is called a (resolved)system of language equationsover

Σ

in variables X. A vector of languages L

=

(

L1

, . . . ,

Ln

)

is said to be a solutionof the system if the substitution Xi

=

Li for alli turns each j-th equation into an equalityLj

=

ϕ

j

(

L1

, . . . ,

Ln

)

. In the vector form, this is denoted asL

=

ϕ

(

L

)

.

Besides the resolved systems deﬁned above, one can considerunresolved systemsformed of any equations

ϕ

(

X

)

=

ψ(

X

)

, and well as the more general inequalities

ϕ

(

X

)

⊆

ψ(

X

)

. However, it turns out that any such equations and inequalities are expressible using resolved equations. In order to impose a restriction

ϕ

(

X

)

⊆

ψ(

X

)

, it suﬃces to add an auxiliary variable Y and an equation

Y

=

Y

∩

ϕ

∩

ψ,

which is a contradiction unless the mentioned inclusion holds. Multiple inclusions

ϕ

i

(

X

)

⊆

ψ

i

(

X

)

(1

i

m) can be speci-ﬁed in a similar way using a single auxiliary variable:

Y

=

Y

∩

m

i=1

(

ϕ

i

∩

ψ

i

).

(1)

An equality

ϕ

(

X

)

=

ψ(

X

)

can be expressed by two inclusions. In particular, Conway’s [7]commutation equation X L

=

L X, whereLis a constant language, can be written down as the following resolved system:

_X

=

_X

,

Y

=

Y

∩ [

(

X L

∩

L X

)

∪

(

X L

∩

L X

)

]

.

This shows that unresolved systems of relations do not provide any additional expressive power in comparison to resolved systems of equations, and without loss of generality one can restrict attention to systems as in Deﬁnition 2.

Consider the following example of a resolved system: Example 1.The system of language equations

⎧

⎪

⎨

⎪

⎩

X1

=

X2X3

∩

X3X2

∩

X4

,

X2

= {

a

,

b

}

X2

{

a

,

b

} ∪ {

a

}

,

X3

= {

a

,

b

}

X3

{

a

,

b

} ∪ {

b

}

,

X4

= {

aa

,

ab

,

ba

,

bb

}

X4

∪ {ε}

over the alphabet

Σ

= {

a

,

b

}

has the unique solution X1

= {

w w

|

w

∈ {

a

,

b

}

∗

}

, X2

= {

xay

|

x

,

y

∈ {

a

,

b

}

∗

,

|

x

| = |

y

|}

, X3

=

{

xby

|

x

,

y

∈ {

a

,

b

}

∗

,

|

x

| = |

y

|}

,X4

= {

u

|

u

∈ {

a

,

b

}

2n

,

n

0

}

.

If the ﬁrst variable of the system is interpreted as the main variable (cf.start symbolof a context-free grammar), then the system in Example 1 speciﬁes the language

{

w w

|

w

∈ {

a

,

b

}

∗

}

, which is a well-known non-context-free language.

Judging by this example, language equations with Boolean operations appear to be an extension of context-free grammars enriched by intersection and complementation. Indeed, a large subset of these equations deﬁne the language inductively, with the membership of longer strings in the solution determined by the membership of the shorter strings. This subclass gives rise to the family ofBoolean grammars[19], which inherit many important properties of the context-free grammars, in particular, eﬃcient parsing algorithms.

However, the general case of these language equations turns out to have fundamentally different properties, and a much higher expressive power. The key distinction is that in general the deﬁnition of the language is not bound to be inductive, which is illustrated in the following example:

(4)

Example 2.Let

Σ

= {

a

}

. The system of language equations

X

=

X

,

Y

=

Y

∩

a X

has the unique solution X

= ∅

,Y

= ∅

.

Note that the explicit deﬁnition of X tells absolutely nothing about X, as every language satisﬁes X

=

X. However, if X

= ∅

and there is a stringa

∈

_X_{, then the equation for} _Y _{becomes a contradiction of the form “}_a+1

∈

_Y _{if and only if} a+1

∈

/

_Y_{”. So} _X _{may only be}

∅

_{, and then}_Y _is

∅

_{as well.}

3. Computations of Turing machines

An important language expressible by language equations in the inductive, “context-free” way is the language of valid accepting computations of every Turing machine T. It is deﬁned as the set of all strings of the formw

CT

(

w

)

, where wis a string accepted by T,

is a separator symbol, whileCT

(

w

)

is a certain encoding of the computation ofT onw, which is basically a concatenation of consecutive conﬁgurations ofT in its computation on w.

This language was discovered by Hartmanis [10], who proved that it is an intersection of two context-free languages, while its complement is context-free, and inferred many undecidability results for context-free grammars from this fact. The construction was later reﬁned by Baker and Book [4] by using linear context-free grammars. This result is now standard and can formulated as follows:

Proposition 1.For every Turing machine T with an input alphabet

Σ

there exists an alphabet

Γ

disjoint with

Σ

and a mapping CT

:

L

(

T

)

→

Γ

+, such that the language

VALC

(

T

)

=

w

·

CT

(

w

)

T halts on w and accepts

⊆

Σ

∗

Γ

∗

,

(2)

where

/

∈

Σ

∪

Γ

, is an intersection of two linear context-free languages L1

,

L2

⊆

Σ

∗

Γ

∗. Given T , two such linear context-free grammars can be effectively constructed.

Sketch of a proof. LetT be a Turing machine with the work alphabetV and with the set of states Q, which contains initial stateq0, accepting stateqacc and rejecting stateqre j. Assume that

Σ

∩

V

= ∅

and

{

a

|

a

∈

Σ

} ⊆

V, that is, the input symbols written on the tape are distinguished. Also assume, without loss of generality, that whenever T halts, it halts after an even number of steps. Denote its instantaneous descriptions by strings from V∗Q V+;x

T

y means that there is a transition of T from instantaneous descriptionxto y. Let #

∈

/

V

∪

Q. Let

Γ

=

V

∪

Q

∪ {

#

}

. For every stringw

=

a1

. . .

aaccepted byT,

deﬁne

CT

(

a1

. . .

a

)

=

x1#x3#

· · ·

#x2k−1#x2Rk#xR2k−2#

· · ·

#x2R#xR0

x0

=

q0a1

. . .

a

,

xi

Txi+1for alli

(

0

i

2k

−

1

),

x2k

∈

Γ

∗qacc

Γ

∗

.

Then the language (2) is an intersection of the following two languages over

Γ

∪ {

}

: L1

=

a1

. . .

a

x1#x3#

· · ·

#x2k−1#x2Rk#x2Rk−2#

· · ·

#x2R#xR0

0

,

aj

∈

Σ,

x0

=

q0a1

. . .

a

,

x2i−1

Tx2ifor alli

(

1

i

k

)

,

L2

=

a1

. . .

a

x1#x3#

· · ·

#x2k−1#x2Rk#x2Rk−2#

· · ·

#x2R#xR0

0

,

aj

∈

Σ,

x2i

T x2i+1for alli

(

0

i

<

k

),

x2k

∈

Γ

∗qacc

Γ

∗

.

Note that each of these languages speciﬁes only nested dependencies between xi, and hence both L1 and L2 are linear context-free. The intersection L1

∩

L2 checks both sets of nested dependencies and thus equals (2).

2

The above construction requires adding new symbols to the alphabet

Σ

. In order to obtain more precise characterizations in the later parts of this paper, it is important to establish this statement without adding any extra symbols. Fortunately, an improved result may be inferred from Proposition 1 by encoding VALC

(

T

)

as follows:

Lemma 1.Let T be a Turing machine over an alphabet

Σ

with

|

Σ

|

2, let a

,

b be two distinct symbols in

Σ

, and let

Γ

,

, CT and L1

,

L2

⊆

Σ

∗

Γ

∗be as in Proposition1. Deﬁne the homomorphism h

:

(Σ

∪ {

} ∪

Γ )

∗

→

Σ

∗as a block code with respect to

Γ

∪ {

}

(with h

(

s

)

=

h

(

t

)

and

|

h

(

s

)

| = |

h

(

t

)

|

for all s

,

t

∈

Γ

∪ {

}

, s

=

t), and as identity on

Σ

(with h

(

c

)

=

c for all c

∈

Σ

). Then the language

h

VALC

(

T

)

=

w

·

h

()

·

h

CT

(

w

)

w

∈

L

(

T

)

⊆

Σ

∗ (3)

is an intersection of two linear context-free languages L₁

,

L₂

⊆

Σ

∗

·

h

()

·

h

(Γ

∗

)

. Given T , two such linear context-free grammars can be effectively constructed.

(5)

Proof. LetL1,L2 be the linear context-free languages given by Proposition 1, with VALC

(

T

)

=

L1

∩

L2. It is claimed that

h

(

L1

∩

L2

)

=

h

(

L1

)

∩

h

(

L2

).

(4)

It is known that every injective mapping (in particular, a code)respects intersectionin the above sense. Thoughh is not a code, it is injective on

Σ

∗

Γ

∗, that is,h

(

u

x

)

=

h

(

v

y

)

foru

,

v

∈

Σ

∗andx

,

y

∈

Γ

∗impliesu

=

vandx

=

y. Indeed, ifx

=

y, then the stringsh

(

x

)

andh

(

y

)

are suﬃx-incomparable becausehis a block code on

(Γ

∪ {

}

)

∗, and henceh

(

u

x

)

cannot be equal toh

(

v

y

)

. Since L1

,

L2

⊆

Σ

∗

Γ

∗, the statement (4) follows.

It is left to deﬁneL₁

=

h

(

L1

)

andL₂

=

h

(

L2

)

. These languages are linear context-free by the closure of this language fam-ily under homomorphisms, and the corresponding grammars can be constructed from the grammars given by Proposition 1. Then the required representation ish

(

VALC

(

T

))

=

L₁

∩

L₂.

2

Remark 1.An obvious modiﬁcation in the proof of Proposition 1 allows representing the following language of terminating and rejecting computations of Turing machines:

h

VALCrej

(

T

)

=

w

·

h

()

·

h

CT

(

w

)

Thalts onwand rejects

⊆

Σ

∗

.

4. Solutions modulo a language

A vector of languages

(

L1

, . . . ,

Ln

)

is a solution of a system if the substitution Xi

=

Li turns every equation into an equality, that is, every string w

∈

Σ

∗ belongs to its left-hand side if and only if it is in its right-hand side. If instead of all strings w, only strings belonging to a subset M

⊂

Σ

∗ are considered, this deﬁnes asolution modulo M. This notion is formalized in the following deﬁnitions.

Deﬁnition 3. Two languages K

,

L

⊆

Σ

∗ are called equal modulo a third language M

⊆

Σ

∗ (denoted K

=

L

(

mod M

)

), if K

∩

M

=

L

∩

M. This relation is extended to vectors of languages by saying that K

=

(

K1

, . . . ,

Kn

)

equals L

=

(

L1

, . . . ,

Ln

)

modulo MifKi

=

Li

(

modM

)

for alli.

In particular, every two languages are equal modulo

∅

. Equality modulo

Σ

∗ means equality in the ordinary sense. Obviously, equality modulo M implies equality modulo every subset of M. For every ﬁxed M, equality modulo M is an equivalence relation.

The moduliMused throughout this paper shall be languages of the following special form:

Deﬁnition 4.For every string w

∈

Σ

∗, let substrings

(

w

)

= {

y

|

w

=

xyzfor somex

,

z

∈

Σ

∗

}

. For every language L

⊆

Σ

∗, deﬁne substrings

(

L

)

=

_w_∈_Lsubstrings

(

w

)

. A language L is said to be closed under substrings (or substring-closed), if substrings

(

L

)

=

L, that is, all substrings of every string from Lare also inL.

For instance, the languages

∅

,

Σ

_with

_{0, and}

Σ

∗_{are substring-closed.}

The reason for using only substring-closed moduli is that the equations may contain concatenation, and the membership of a string in the concatenation of some languages depends on the membership of its substrings in those languages. Thus the closure of the modulus under substrings is essential for the following basic property to hold:

Lemma 2.Let

ϕ

(

X1

, . . . ,

Xn

)

be an expression on languages over

Σ

. Let M

⊆

Σ

∗be any substring-closed language. Then, if two vectors of languages, L

=

(

L1

, . . . ,

Ln

)

and L

=

(

L₁

, . . . ,

Ln

)

, are equal modulo M, then

ϕ

(

L1

, . . . ,

Ln

)

and

ϕ

(

L₁

, . . . ,

Ln

)

are also equal modulo M.

Proof. By the symmetry, it is enough to prove that if w

∈

M is in

ϕ

(

L1

, . . . ,

Ln

)

, then w is in

ϕ

(

L1

, . . . ,

Ln

)

as well. The proof is a straightforward induction on the structure of

ϕ

.

•

If

ϕ

=

C

⊆

Σ

∗ is a constant language, the statement is clear.

•

Let

ϕ

=

Xi. Since Li

=

L_i

(

modM

)

by assumption, the result holds.

•

Let

ϕ

=

ψξ

. If w

∈

ϕ

(

L

)

, then there exists a factorization w

=

uv, withu

∈

ψ(

L

)

andv

∈

ξ(

L

)

. Sinceu

,

v

∈

M by the closure ofMunder substrings, by the induction hypothesis,u

∈

ψ(

L

)

andv

∈

ξ(

L

)

. Therefore,uv

∈

ϕ

(

L

)

.

•

The cases of Boolean operations are proved analogously, without using the closure ofMunder substrings.

2

Now the notion of a solution modulo a substring-closed language can be deﬁned:

Deﬁnition 5.Let X

=

ϕ

(

X

)

be a system of equations and letM be a substring-closed language. A vectorL

=

(

L1

, . . . ,

Ln

)

is said to be asolution modulo Mof the system if

ϕ

i

(

L

)

=

Li

(

modM

)

for everyi.

(6)

An important fact concerning this notion is that a vector’s L

=

(

L1

, . . . ,

Ln

)

being a solution modulo Mdepends entirely on this vector taken moduloM, that is, on

(

L1

∩

M

, . . . ,

Ln

∩

M

)

. Therefore, there are only ﬁnitely many candidates for being solutions modulo M. This is proved as follows:

Lemma 3.Let X

=

ϕ

(

X

)

be a system, let M be a substring-closed language, let L and Lbe two vectors of languages equal modulo M. Then L is a solution of the system modulo M if and only if Lis a solution of the system modulo M.

Proof. Suppose that L is a solution modulo M. Then L

=

L

=

ϕ

(

L

)

=

ϕ

(

L

) (

mod M

)

, where the ﬁrst equality is by as-sumption, the second one holds since L is a solution modulo M, and the third one follows by Lemma 2. Therefore, L

=

ϕ

(

L

) (

modM

)

, that is,Lis a solution moduloM.

2

In view of this property, equality of solutions modulo some M shall always be considered in the sense of equality modulo M. This notion of equality will be used whenever a solution moduloMis said to be unique.

Let us state an obvious property of solutions modulo a language:

Proposition 2(On nested moduli). A solution of a system X

=

ϕ

(

X

)

modulo some language M closed under substrings is its solution modulo every substring-closed subset of M. In particular, every solution in the ordinary sense(that is, modulo

Σ

∗)is a solution modulo every substring-closed language.

Besides being substring-closed, the moduli considered in the following will typically befinite. There are countably many such moduli. The reason to consider solutions modulo finite substring-closed languages is that some properties of solutions of a system may be reformulated as statements on solutions modulo languages of this form, using quantification over moduli. Consider a trivial case of such reformulation:

Proposition 3.If two languages(vectors of languages)K , L are equal modulo every ﬁnite substring-closed language M, then K

=

L. Equivalently, if two languages(vectors of languages)are not equal, then they are not equal modulo some ﬁnite substring-closed lan-guage.

Indeed, if K

=

L, then the symmetric difference K

L contains some string w, and therefore K and L are not equal modulo substrings

(

w

)

.

In the following, several such statements will be established for solutions of language equations. Note that directly stated properties of solutions, such as “the system X

=

ϕ

(

X

)

has a unique solution” are second-order formulae, as the quantification is over sets of strings. On the other hand, their reformulations through solutions modulo finite languages will be first-order formulae by definition. These first-order characterizations will form the basis of the analysis of language equations.

5. Existence of a solution

For some families of language equations, the question of solution existence is trivial, as there is always a solution. This is the case, for instance, for the systems of Ginsburg and Rice [9], as well as for their generalization with intersection [18]. However, it is easy to see that a system of language equations with complementation does not necessarily have a solution: consider an equation X

=

X. In this section, a necessary and suﬃcient condition of existence of solutions is developed, which is based upon solutions modulo ﬁnite languages.

Several useful results on the relationship between solutions modulo ﬁnite languages and solutions in the ordinary sense (which may be regarded in this context as solutions modulo

Σ

∗) have to be proved ﬁrst.

Lemma 4(Finite refutation of a non-solution). If L

=

(

L1

, . . . ,

Ln

)

is not a solution of a system X

=

ϕ

(

X

)

, then there exists a ﬁnite language M closed under substrings, such that L is not a solution of the system modulo M.

Equivalently, if a vector of languages L is a solution of a system X

=

ϕ

(

X

)

modulo every ﬁnite language M closed under substrings, then L is a solution of the system.

Proof. IfLis not a solution of X

=

ϕ

(

X

)

, thenL

=

ϕ

(

L

)

. By Proposition 3, there exists a modulusMclosed under substrings, such that L

=

ϕ

(

L

) (

modM

)

, which means thatLis not a solution moduloM.

2

In order to state the next result, some new terminology has to be introduced.

Deﬁnition 6(Extension and refutation of solutions modulo M). LetX

=

ϕ

(

X

)

be a system. LetM

⊆

Mbe two substring-closed languages. LetLbe a solution modulo M. ThenLis said to beextendable to M if there exists a solutionLmodulo Mwith L

=

L

(

modM

)

; in this case Lis calledan extension of L to M. Otherwise, if there is no suchL, then Lis said to berefuted modulo M.

(7)

A solutionLmodulo a ﬁnite substring-closedMis said to berefutable, if it is refuted modulo some ﬁnite substring-closed M

⊇

M, andunrefutableotherwise.

Consider the system in Example 2 and the modulusM

= {

ε

}

. There are two solutions moduloM:

(

∅

,

∅

)

and

(

{

ε

}

,

∅

)

. The former is extendable to everyM, as

(

∅

,

∅

)

is a solution of the system. The other one,

(

{

ε

}

,

∅

)

, is refuted moduloM

= {

ε

,

a

}

, as

ε

∈

X turns the equation for Y into a contradiction of the form “a

∈

Y if and only if a

∈

/

Y”. Therefore, all refutable solutions moduloM are refuted modulo thisM. The next lemma shows that such anMalways exists:

Lemma 5(Refutation of refutable solutions). Let X

=

ϕ

(

X

)

be a system of language equations and let M be a ﬁnite language closed under substrings. Then there exists a ﬁnite language M

⊇

M closed under substrings, such that all refutable solutions modulo M are refuted modulo M.

Proof. Let L(1)

, . . . ,

_L(m) _{be all refutable solutions of the system modulo} _M_{. For all} _i _{with 1}

_i

_m_{, let} _M

i be a ﬁnite substring-closed language modulo which L(i) _{is refuted. Deﬁne} _M

=

k

i=1Mi. Then L(1)

, . . . ,

L(k) are all refuted mod-uloM.

2

By deﬁnition, a solution modulo Mis unrefutable if it can be extended to a solution modulo every ﬁnite superset ofM. However, that solution modulo a superset can be refutable itself, and, in fact, one can imagine a hypothetical situation that a solutionL modulo Mmight be extendable to every M

⊇

M, but every such extension would be refutable. The following stronger claim rules out this possibility:

Lemma 6(Finite extension of an unrefutable solution). Let X

=

ϕ

(

X

)

be a system, let M be a ﬁnite substring-closed language, let L be an unrefutable solution modulo M. Then, for every ﬁnite M

⊇

M closed under substrings, L can be extended to an unrefutable solution modulo M.

In other words, for every suchMthere exists an unrefutable solution moduloMthat coincides with Lmodulo M. Proof. Let

L[1]

,

L[2]

, . . . ,

L[m] (5)

be all solutions modulo M that coincide with Lmodulo M. Let us prove that at least one of these solutions moduloM must be unrefutable. Suppose the contrary, that is, that each L[i]_{is refutable. Then, by Lemma 5, all (5) are refuted modulo} some language M

⊇

M.

SinceLis an unrefutable solution moduloM, it is not refuted moduloM, that is, there exists a solutionLmodulo M, which coincides withL modulo M. DeﬁneL as the restriction ofL modulo M. By the construction of the collection (5), Lmust be among

{

L[i]

}

_im₌₁and thus be refuted moduloM. However,L is a witness to the contrary.

The contradiction obtained proves that one of the solutions (5) moduloMmust be unrefutable. Since all (5) are exten-sions ofLtoM, it has been proved thatLcan be extended to an unrefutable solution moduloM.

2

The next step is to apply Lemma 6 for larger and larger languages M, and to take the limit of this process. This will lead to an extension to a solution in the ordinary sense.

Lemma 7(Inﬁnite extension of an unrefutable solution). Let X

=

ϕ

(

X

)

be a system, let M be a ﬁnite substring-closed language, let LM be an unrefutable solution modulo M. Then LMcan be extended to a solution L of the system.

The statement of the lemma can be reformulated without using the notion of unrefutable solution as follows: if for every ﬁnite languageM

⊇

Mclosed under substrings the system has a solution moduloM, which coincides with LM modulo M, then the system has a solution that also coincides withLM modulo M.

Proof. Consider any ascending sequence of nested ﬁnite moduli (each closed under substrings) M

=

M0

⊂

M1

⊂

M2

⊂ · · · ⊂

Mk

⊂ · · ·

that converges to

Σ

∗ in the sense that

∞_k₌₀Mk

=

Σ

∗. Let us show that there exists a corresponding sequence of vectors of ﬁnite languages

LM

=

L(0)

,

L(1)

,

L(2)

, . . . ,

L(k)

, . . . ,

(6)

where every L(k)is an unrefutable solution modulo the corresponding Mk, and which is componentwise increasing in the sense that L(_ik)

⊆

L(_ik+1). The proof is not constructive; the existence of consecutive terms of this sequence is inductively shown.

(8)

Basis. L(0)

=

LM is an unrefutable solution moduloMby the assumption. Induction step. LetL(k)_{be an unrefutable solution modulo} _M

k. By Lemma 6, it can be extended to an unrefutable solution L(k+1)_modulo_M

k+1. Next, by the deﬁnition of extension, L(k)

=

L(k+1)

(

modMk

)

. ThenL(ik)

⊆

L

(k+1)

i for eachi. Having obtained the increasing sequence (6), consider its limit

L

=

_∞

k=0 L(₁k)

, . . . ,

∞

k=0 Ln(k)

.

Clearly,L

=

L(k)

(

_mod_M

k

)

for everyk, and thereforeLis a solution modulo everyMk. It remains to show thatLis a solution modulo every ﬁnite languageMclosed under substrings. Since the sequence

{

Mk}∞_k₌₀is ascending and

∞_k₌₀Mk

=

Σ

∗, there existsk, such thatM

⊆

Mk. BecauseLis a solution moduloMk, it is a solution moduloMby Proposition 2. Therefore,Lis a solution of the whole system by Lemma 4.

2

Using Lemma 7, the following characterization of systems of equations that have solutions can be obtained:

Theorem 1(Criterion of solution existence). A system has a solution if and only if it has a solution modulo every ﬁnite substring-closed language.

Proof.

⇒ If L

=

(

L1

, . . . ,

Ln

)

is a solution, then it is a solution modulo every ﬁnite language closed under substrings by Proposition 2.

⇐

LetM

= ∅

and consider thatLM

=

(

∅

, . . . ,

∅

)

is, trivially, a solution moduloM. Assume the system has a solution mod-ulo every ﬁnite substring-closed language M

⊆

Σ

∗. Then LM is extendable to every suchM, that is, LM is an unrefutable solution modulo M. Therefore, by Lemma 7, the system has a solution.

2

The condition given by Theorem 1 is actually a first-order formula with one universal quantifier over a countable set. Hence, the set of systems that have at least one solution is co-recursively enumerable. The problem is hard for this class as well (that is, undecidable), which was first proved by Charatonik [6] in a different context.

Theorem 2.The set of systems of language equations with Boolean operations, linear concatenation and singleton constants that have solutions is co-r.e.-complete. It remains co-r.e. for unrestricted concatenation and any recursive constants.

Proof. Membership in co-r.e.(unrestricted concatenation, recursive constants).The complement of the problem is accepted by a Turing machine that considers all ﬁnite moduli and accepts if the given system has no solutions modulo any M. If no such modulus is found, the machine does not terminate. Then, according to Theorem 1, the machine accepts if and only if the system has no solutions.

Co-r.e.-hardness(linear concatenation, singleton constants).Reduction from the co-r.e.-complete Turing machine emptiness problem. Let T be any given Turing machine and construct a system of language equations Xi

=

ϕ

i

(

X1

, . . . ,

Xn

)

representing the language VALC

(

T

)

. This is done according to Proposition 1: linear context-free grammars are transcribed as equations, and then intersection is used in the equation for X1 to obtain X1

=

VALC

(

T

)

in the unique solution. Then consider the system of equations Y

=

Y

∩

X1

,

X1

=

ϕ

1

(

X1

, . . . ,

Xn

),

..

.

Xn

=

ϕ

n

(

X1

, . . . ,

Xn

)

⎫

⎪

⎬

⎪

⎭

a system for the language

VALC

(

T

)

= {

w

CT

(

w

)

|

Thalts onwand accepts

}

,

wherew

∈

Σ

∗andCT

(

w

)

∈

Γ

∗

.

The equation for Y is a contradiction unless X1

= ∅

, and therefore the system has a solution if and only if VALC

(

T

)

= ∅

, which holds if and only if L

(

T

)

= ∅

. This completes the reduction.

2

6. Uniqueness of a solution

In Section 5 it was proved that a system has solutions if and only if it has solutions modulo every language closed under substrings. However, it turns out that the same property does not hold with respect to the uniqueness of solution, and a system can have multiple solutions modulo every ﬁnite language, but still a unique solution.

This is demonstrated by the system in Example 2. It has the unique solution

(

∅

,

∅

)

. However, for every ﬁnite nonempty M

⊂

a∗ closed under substrings, which is of the form a

= {

ε

,

_a

,

_aa

, . . . ,

_a

}

_{for some}

_{0, the system has exactly two}

(9)

extended to the modulusa+1. Thus, in order to check the membership of a string of length

in the components of the unique solution, one has to consider strings of length

+

1.

This illustrates the following property of systems of language equations with a unique solution: the membership of longer strings in the solution may in fact determine the membership of shorter strings by refuting one of the alternative solutions modulo the smaller language. From Lemma 4 it is known that every “wrong” solution modulo a finite language (that is, one that is not extendable to a solution) has a refutation modulo some greater finite language, and thus if a system has a unique solution, then, for every finite M closed under substrings, all but one of the solutions modulo M should be refutable. This necessary condition of solution uniqueness is actually sufficient, and the following theorem provides a first-order characterization of systems with a unique solution similar to Theorem 1:

Theorem 3(Criterion of solution uniqueness). A system has a unique solution if and only if for every ﬁnite language M closed under substrings there exists a ﬁnite language M

⊇

M closed under substrings, such that the system has at least one solution modulo M, and all the solutions modulo Mare equal modulo M.

Proof.

⇒ Let a system X

=

ϕ

(

X

)

have a unique solution L, and fix a finite substring-closed M. By Lemma 5, all refutable solutions modulo Mare refuted modulo some finite superset ofM; denote it by M.Lis a solution moduloM by Proposi-tion 2; it remains to argue that all soluProposi-tions moduloMmust coincide modulo M.

Suppose the contrary, that there exist two solutions L andLmodulo M, which are different moduloM. LetLM

=

LM be these solutions taken modulo M. They are not refuted modulo M, and therefore, by the choice ofMand by Lemma 5, they are unrefutable. Hence, by Lemma 7, they can be extended to distinct solutions of the whole system, which contradicts the uniqueness of solution and proves the necessity claim.

⇐

Let a system X

=

ϕ

(

X

)

be such that for every ﬁnite modulusMclosed under substrings there exists a ﬁnite modulus M

⊇

Mclosed under substrings, such that all solutions of the system moduloMare equal moduloM.

Suppose that the system has at least two distinct solutions, L

=

(

L1

, . . . ,

Ln

)

andL

=

(

L1

, . . . ,

Ln

)

. Then L

=

L implies that L

=

L

(

modM

)

for some ﬁnite substring-closed modulusM. By assumption, for this particular Mthere exists a ﬁnite modulus M

⊇

M closed under substrings, such that all solutions modulo M are equal modulo M. By Proposition 2, Land Lare solutions of the system moduloM, and therefore must coincide moduloM, which yields a contradiction.

2

The necessary and sufficient condition of solution uniqueness given by Theorem 3 specifies the set of systems that have a unique solution by a first-order formula with one universal quantifier and one existential quantifier over a countable set. Therefore, the problem is in the second level of the arithmetical hierarchy, namely in

Π

2. The next theorem shows that it is complete for this class:

Theorem 4.The set of systems of language equations with Boolean operations, linear concatenation and singleton constants that have exactly one solution is

Π

2-complete. The similar set for systems for unrestricted concatenation and any recursive constants remains in

Π

2.

Proof. Membership in

Π

2(unrestricted concatenation).According to Theorem 3, the uniqueness of a solution is expressed by the following ﬁrst-order formula

φ (

w

)

= ∀

x

∃

y R

(

x

,

y

,

w

),

(7)

whereR is a recursive predicate that evaluates to true on a triple

(

x

,

y

,

w

)

if and only if

(i) wis a syntactically valid description of an alphabet

Σ

and of a system of language equations over

Σ

, (ii) xand ydescribe two ﬁnite languagesMx

⊆

My

⊂

Σ

∗, each closed under substrings,

(iii) the system speciﬁed by w has solutions modulo the language given by y, and all of these solutions coincide modulo the language given byx.

The correctness of this representation is given by Theorem 3, while ﬁrst-order formulae of the form (7) are precisely those that form the class

Π

2.

Π

2-hardness(linear concatenation, singleton constants).Reduction from the Turing machine universality problem, which is stated as “Given a Turing machine T over an alphabet

Σ

, determine whetherL

(

T

)

=

Σ

∗” and is known to be complete for

Π

2 [20, §14.8].

FixT, a Turing machine over an alphabet

Σ

. Let Xi

=

ϕ

i

(

X1

, . . . ,

Xn

)

withi

∈ {

1

, . . . ,

n

}

be a system of language equations with a unique solution

(

L1

, . . . ,

Ln

)

, in which L1 is the language VALC

(

T

)

of valid accepting computations of T. Such a system exists and can be effectively constructed by Proposition 1. Add four more variables, Y,Z1, Z2 andT, and construct the following system:

(10)

Z1

=

Y

∪

s∈Γ Z1s

,

(8b) Z2

= {ε} ∪

a∈Σ a Z2

,

(8c) T

=

T

∩

(

X1

∩

Z1

)

∪

(

Y

∩

Z2

)

,

(8d) X1

=

ϕ

1

(

X1

, . . . ,

Xn

),

..

.

Xn

=

ϕ

n

(

X1

, . . . ,

Xn

)

⎫

⎪

⎬

⎪

⎭

a system for the language

VALC

(

T

)

= {

w

CT

(

w

)

|

Thalts onwand accepts

}

,

wherew

∈

Σ

∗andCT

(

w

)

∈

Γ

∗

.

(8e)

Here the equation (8b) speciﬁes Z1

=

Y

Γ

∗, while the equation (8c) represents Z2

=

Σ

∗. Hence the equation for T imple-ments the following two inclusions using the method (1):

VALC

(

T

)

⊆

Y

Γ

∗

,

(9a)

Y

⊆

Σ

∗

.

(9b)

The inclusion (9a) states that for every string w

∈

Σ

∗ accepted by the Turing machine, the corresponding computation history w

CT

(

w

)

must be in Y

Γ

∗. This implies w

∈

Y, that is, every string accepted by T must be in Y. The second constraint (9b) restrictsY to subsets of

Σ

∗. Therefore, the set of solutions of the system (8) is

L

,

L

Γ

∗

, Σ

∗

,

∅

,

L1

, . . . ,

Ln

L

(

T

)

⊆

L

⊆

Σ

∗

(10∗)

.

(10)

Clearly, the solution of (8) is unique if and only if the bounds (10*) are tight, that is, if L

(

T

)

=

Σ

∗.

This completes the reduction from the Turing machine universality problem. Since the latter is

Π

2-complete, the

Π

2-hardness of the solution uniqueness problem for systems of language equations is established.

2

7. Least and greatest solutions

Every system of language equations of the kind deﬁned by Ginsburg and Rice [9], possibly with intersection [18], is known to have two special solutions: the least and the greatest one, which are the componentwise intersection and the componentwise union, respectively, of all solutions of the system. As the right-hand sides of such systems are monotone and continuous functions, these solutions always exist and can be obtained by ﬁxpoint iteration starting from the vectors

(

∅

, . . . ,

∅

)

and

(Σ

∗

, . . . , Σ

∗

)

. However, once the use of complementation in equations is allowed, this property is lost. For instance, the system

X

=

Y

,

Y

=

Y

has the set of solutions

{

(

L

,

L

)

|

L

⊆

Σ

∗

}

, and these solutions are pairwise incomparable. Accordingly, having a least (greatest) solution is a nontrivial property of a system, which should be studied in the same way as solution uniqueness.

Let us formally introduce comparison ofn-tuples of languages:

Deﬁnition 7.A partial order “

” on the set of languages over an alphabet

Σ

is deﬁned asK

L ifK

⊆

L. For eachn

1, this order is extended to vectors ofnlanguages as

(

K1

, . . . ,

Kn

)

(

L1

, . . . ,

Ln

)

ifKi

Li for alli. Languages (vectors) K and Lare said to beincomparable, if K LandK L.

Deﬁnition 8. Let X

=

ϕ

(

X

)

be a system of language equations. A vector L is said to be theleast (greatest) solution of the system if it is a solution and for every solution Lit holds that L

L(L

L, respectively).

Comparison of languages can also be done modulo a language M, similarly to equality modulo languages.

Deﬁnition 9.A language K

⊆

Σ

∗ is said to beless than or equal to L

⊆

Σ

∗modulo M

⊆

Σ

∗, denotedK

L

(

mod M

)

, if K

∩

M

⊆

L

∩

M. This relation is extended to vectors of languages so that

(

K1

, . . . ,

Kn

)

(

L1

, . . . ,

Ln

) (

modM

)

ifKi

Li

(

modM

)

for all i. Languages (vectors) K andLare said to beincomparable modulo M, ifK L

(

modM

)

andL K

(

modM

)

.

It is easy to note that K

L

(

modM

)

impliesK

L

(

modM0

)

for allM0

⊆

M. If K

,

Lare incomparable modulo someM, they are incomparable modulo every superset of M.

(11)

Proposition 4.If for two languages(vectors of languages)K , L it holds that K

L

(

modM

)

for every ﬁnite M closed under substrings, then K

L. Equivalently, if for two languages(vectors of languages)K

,

L it holds that K L, then K L

(

modM

)

for some ﬁnite substring-closed M.

If two languages(vectors of languages)are incomparable, then they are incomparable modulo some ﬁnite substring-closed lan-guage.

Proof. The proof of the ﬁrst part is immediate: if K L, then there exists a string w

∈

K

\

L, and hence K L

(

mod substrings

(

w

))

.

If K

,

L are incomparable, then K L and K L. According to the ﬁrst part, there exist ﬁnite M1

,

M2 closed under substrings, such that K L

(

modM1

)

andK L

(

modM2

)

. ThenK andLare incomparable moduloM1

∪

M2.

2

Similarly to the case of uniqueness of solution, the existence of a least (greatest) solution cannot be reduced to having a least (greatest) solution modulo every ﬁniteM. Consider the following variant of Example 2:

Example 3.The system

_X

=

_X

,

Y

=

Y

∩

((

X

∩

a X

)

∪

a2X

)

has a unique solution

(

∅

,

∅

)

, which is trivially the least and the greatest.

For each substring-closed language M

=

a_{, there are three solutions modulo} _M_: _L

=

(

∅

,

∅

)

_,_L

=

(

{

_a−1

}

,

∅

)

_and_L

=

(

{

a

}

,

∅

)

_{. Of these,}_L_{is unrefutable,}_L_{is refuted modulo}_a+1_and_L _{is refuted modulo}_a+2_.

In this example, there is a least solution and it is the least modulo everyM. On the other hand, though there is a greatest solution, there is no greatest solution modulo any M(except for M

= ∅

andM

= {

ε

}

), but if one considers onlyunrefutable solutions moduloM, then there is always the greatest among them.

The next lemma shows that such a property holds for every system with a least (greatest) solution. Furthermore, since all refutable solutions are known to be refuted modulo some ﬁnite language by Lemma 5, this fact can be stated as follows: Lemma 8.If a system has a least(greatest)solution L then for every ﬁnite substring-closed language M

⊂

Σ

∗there exists a ﬁnite substring-closed language M

⊇

M, such that L is the least(greatest, respectively)solution modulo M extendable to M.

Proof. Fix an arbitrary ﬁnite language Mclosed under substrings. By Lemma 5, there should exist a ﬁnite substring-closed language, modulo which all refutable solutions modulo Mare refuted. Denote it byM.

Suppose L is not the least solution modulo M extendable to M, that is, there exists a solution

L modulo M with L

L

(

modM

)

. By the choice ofM,

Lis an unrefutable moduloM, and therefore, in accordance with Lemma 7, it can be extended to a solution

Lof the whole system, which inherits the propertyL

L. The latter contradicts the assumption that Lis the least solution of the system.

2

The condition in Lemma 8 can be developed into the following necessary and suﬃcient condition of having a least or a greatest solution:

Theorem 5(Criterion of least/greatest solutions). A system has a least(greatest)solution if and only if for every ﬁnite language M closed under substrings there exists a ﬁnite language M

⊇

M closed under substrings, such that there is the least(the greatest)among the solutions modulo M extendable to M.

Note that M in Theorem 5 need not be the language modulo which all refutable solutions modulo M are refuted. In particular, looking at Example 3, for M

=

a_{, one possible value of} _M _is_a+1_{: the solutions} _L _and _L _modulo _M _are extendable toa+1, while the solutionLmoduloMis refuted moduloa+1, and sinceL

<

L, the condition of Theorem 5 is met. The modulus M

=

a+2 _{satisﬁes the theorem as well, and} _L _{is the unique solution modulo} _M _{extendable to} thisM. But even though not every modulusMin the theorem gives reliable information about the least (greatest) solution modulo M, the criterion holds as stated.

Proof. The below proof applies to least solutions; the case of greatest solutions is proved identically. ⇒

Assuming that the system has a least solution, for every suchMthe correspondingMis given by Lemma 8. ⇐

Let a systemX

=

ϕ

(

X

)

satisfy the condition written in the statement of the theorem. Consider an arbitrary ascending sequence

(12)

of ﬁnite substring-closed languages that converges to

Σ

∗ in the sense that

∞_n₌₁Mn

=

Σ

∗. For eachi

1, letMi

⊇

Mi be the ﬁnite substring-closed language, modulo which all refutable solutions modulo Mi are refuted; it is known to exist by Lemma 5.

For every such M_i, by assumption, there exists a ﬁniteM_i closed under substrings, such that there is the least among the solutions modulo M_i that are not refuted moduloM_i. Denote this solution modulo M_iby L_i. LetLibe Litaken modulo Mi

⊆

M_i. Let us establish the properties of the sequence

{

Li}.

Claim 1.Liis an unrefutable solution modulo Mi.

Indeed, by its construction,Liis not refuted modulo M_i (this is witnessed byL_i), while every refutable solution modulo Mi is (by the choice ofMi).

Claim 2.Liis the least among unrefutable solutions modulo Mi.

Suppose there is another unrefutable solution

LimoduloMi, such thatLi

Li

(

modMi

)

. By Lemma 6, there consequently exists an unrefutable solution

L_i modulo M_i, with

Li

=

L_i

(

mod Mi

)

. This implies L_i

=

Li

=

L_i

(

mod Mi

)

, and hence L_i

L_i

(

modM_i

)

. As bothL_iand

L_iare solutions moduloM_ithat are not refuted moduloM_i, this means thatL_iis not the least among such solutions, which contradicts its construction.

Claim 3.Li

Li+1for all i.

SinceLiis the least among unrefutable solutions moduloMi, whileLi+1is an unrefutable solution moduloMi, it follows that Li

Li+1

(

modMi

)

. On the other hand,Li, by deﬁnition, is taken moduloMi, and thus none of its components contain strings outside ofMi, that is,

∅ =

Li

Li+1

(

modMi

)

. Therefore, Li

Li+1.

Claim 4.Li

=

Li+1

(

modMi

)

for all i.

The unrefutable solution Limodulo Miis extendable to some unrefutable solution

Li+1 moduloMi+1 by Lemma 6. Since Li+1 is the least among unrefutable solutions moduloMi+1 by Claim 2,Li+1

Li+1. This implies

Li+1

=

Li

(

modMi

),

and since the converse inequalityLi

Li+1

(

modMi

)

is known from Claim 3, the statement is proved. Thus an increasing sequence of solutions modulo the languages (11) has been obtained:

L1

L2

· · ·

Ln

· · ·

,

(12)

where each Li is the least among unrefutable solutions modulo Mi. As a monotone sequence, (12) converges to a certain vector L, such that L

=

Li

(

modMi

)

for alli

1. It is left to prove thatLis the least solution of the system.

As in the proof of Lemma 7, it can be inferred that Lis a solution. Suppose the existence of some other solution

L, with L

L. Then, by Proposition 4, there exists a ﬁnite language M closed under substrings, for which L

L

(

mod M

)

. Let i be a number, such that M

⊆

Mi. Then L

L

(

modMi

)

, and therefore for Li (which coincides with Ltaken modulo Mi by Claim 4) it holds that Li

L

(

modMi

)

, where

Ltaken moduloMiis an unrefutable solution modulo this language. Hence, Li is not the least among unrefutable solutions moduloMi, which contradicts Claim 2.

2

As in the case of solution uniqueness, the statement of Theorem 5 is again a

Π

2-formula, which leads to the following computational characterization:

Theorem 6.The set of systems of language equations with Boolean operations, linear concatenation and singleton constants that have a least(greatest)solution is

Π

2-complete. It remains

Π

2-complete if concatenation is unrestricted and any recursive constants are allowed.

Proof. Membership in

Π

2(unrestricted concatenation, recursive constants).As in the proof of Theorem 4, the property of having a least (greatest) solution is represented by the following ﬁrst-order formula:

φ (

w

)

= ∀

x

∃

y R

(

x

,

y

,

w

),

where Ris true on

(

x

,

y

,

w

)

if and only if

(i) wdescribes an alphabet

Σ

and a system of language equations over

Σ

, (ii) xandy describe ﬁnite substring-closed languagesMx

⊆

My

⊂

Σ

∗,

(13)

(iii) the system speciﬁed by w has solutions modulo the language represented by y, and among them there is the least (the greatest, resp.) modulo the language given byx.

The correctness of this

Π

2 representation is given by Theorem 5.

Π

2-hardness(linear concatenation, singleton constants).Reduction from the Turing machine universality problem. Given a Turing machine, consider the system (8) augmented by an additional equation

Y

=

Y

.

(13)

Let us show that the resulting system (13), (8) in variables

(

Y

,

Y

,

Z1

,

Z2

,

T

,

X1

, . . . ,

Xn

)

has both a least and a greatest solution ifL

(

T

)

=

Σ

∗, and neither a least nor a greatest solution ifL

(

T

)

=

Σ

∗.

Indeed, ifL

(

T

)

=

Σ

∗, then (8) has the unique solution

Σ

∗

, Σ

∗

Γ

∗

, Σ

∗

,

∅

,

L1

, . . . ,

Ln

,

and therefore (13, 8) has the unique solution

∅

, Σ

∗

, Σ

∗

Γ

∗

, Σ

∗

,

∅

,

L1

, . . . ,

Ln

,

which is at the same time the least and the greatest. IfL

(

T

)

=

Σ

∗, then the set of solutions of (13), (8) is

L

,

L

,

L

Γ

∗

, Σ

∗

,

∅

,

L1

, . . . ,

Ln

L

(

T

)

⊆

L

⊆

Σ

∗

(14∗)

,

(14)

which consists of multiple pairwise incomparable vectors of languages. This shows the correctness of the reduction and proves

Π

2-hardness.

2

8. Families of languages

Consider systems of language equations that have a unique, least or greatest solution. Such systems can be regarded as specifications of the components of these solutions, and every class of language equations accordingly defines a family of formal languages. The family defined by equations with union and concatenation is the family of context-free languages [9]. Equations with union, intersection and concatenation correspond toconjunctive grammars, an extension of the context-free grammars [17,18]. Normal form theorems for these grammars show that unique, least and greatest solutions yield the same family of languages. On the other hand, for language equations with all Boolean operations and concatenation, the expressive power of unique, least and greatest solutions turns out to be different, and it will now be determined.

Theorem 7.For every alphabet

Σ

with

|

Σ

|

2, the family of languages deﬁned by components of unique solutions of systems of language equations with Boolean operations, linear concatenation and singleton constants is exactly the class of recursive sets over

Σ

. The same result holds for unrestricted concatenation and any recursive constants.

Proof. The ﬁrst claim is that if a system X

=

ϕ

(

X

)

has a unique solution, then each of its components is recursive. This is given by the following decision procedure that determines the membership of strings in the ﬁrst component:

Givenw

∈

Σ

∗, let M

=

substrings

(

w

)

.

For all ﬁnite moduliM

⊇

Mclosed under substrings:

If all solutions of X

=

ϕ

(

X

)

modulo Mcoincide moduloM.

Let

(

L1

, . . . ,

Ln

)

be the common part modulo Mof solutions moduloM. Accept ifw

∈

L1, reject ifw

∈

/

L1.

The loopfor all ﬁnite moduliconsiders all ﬁnite substring-closed languages in any order. There are countably many of them. Since X

=

ϕ

(

X

)

has a unique solution, by Theorem 3, the modulus sought in the

if

statement will eventually be found, and therefore this algorithm always terminates. What it computes is the unique solution moduloM, which must be the unique solution of the system taken moduloM. This shows that the membership ofw is determined correctly.

Now consider an arbitrary recursive set L

⊆

Σ

∗. The task is to construct a system of language equations with a unique solution, whose ﬁrst component is L. Let T be a Turing machine over

Σ

that halts on every input and recognizes the language L. Let a

,