• No results found

The String Pointer Reduction System

In this chapter we consider the string pointer reduction system, which we will recall now (see also [11] and Chapter 9 in [12]).

We fixκ≥2, and define the alphabet∆ ={2,3, . . . , κ}. ForD⊆∆, we define

¯

D={¯a|a∈D} andΠD=D∪D¯; alsoΠ = Π∆. We will use the alphabet Πto formally denote the pointers — the intuition is that the pointerpiwill be denoted by eitherior¯i. Accordingly, elements ofΠ will also be calledpointers.

We use the ‘bar operator’ to move from∆to¯ and back from¯ to. Hence,

for p∈ Π, p¯¯= p. For a string u=x1x2· · ·xn with xi ∈ Π, the inverse of uis the string u¯ = ¯xnx¯n−1· · ·x¯1. Forp∈ Π, we define p=

(

p ifp∈∆ ¯

p ifp∈∆¯, i.e., pis

the ‘unbarred’ variant of p. The domain of a string v ∈ Π∗ is dom(v) = {p |

poccurs inv}. A legal string is a string u ∈ Π∗ such that for each p Π that

occurs inu,ucontains exactly two occurrences from{p,p}¯ .

We define the alphabet Θκ ={Mi,M¯i | 1≤i≤κ} — these symbols denote the MDSs and their inversions. With each string overΘκ, we associate a unique string overΠthrough the homomorphismπκ: Θ∗κ→Π∗ defined by:

πκ(M1) = 2, πκ(Mκ) =κ, πκ(Mi) =i(i+ 1) for1< i < κ,

andπκ( ¯Mj) =πκ(Mj)for1≤j ≤κ. A permutation of the stringM1M2· · ·Mκ, with possibly some of its elements inverted, is called amicronuclear patternsince it can describe the MIC form of a gene. Stringuisrealisticif there is a micronuclear patternδsuch that u=πκ(δ).

Example 1

The MIC form of the gene that encodes the actin protein in the stichotrich Sterkiella nova is described by micronuclear pattern

δ=M3M4M6M5M7M9M¯2M1M8

(see [22, 12]). The associated realistic string isπ9(δ) = 34456756789¯3¯2289. Note that every realistic string is legal, but a legal string need not be realistic. For example, a realistic string cannot have ‘gaps’ (missing pointers): thus2244is not realistic while it is legal. It is also easy to produce examples of legal strings which do not have gaps but still are not realistic —3322is such an example. For a pointerpand a legal stringu, if bothpandp¯occur inuthen we say that bothp

24 The String Pointer Reduction System

andp¯arepositive inu; if on the other hand onlypor onlyp¯occurs inu, then both

pand p¯are negative in u. So, every pointer occurring in a legal string is either positive or negative in it. A nonempty legal string with no proper nonempty legal substrings is calledelementary. For example, the legal string234324is elementary, while the legal string234342is not (because3434is a proper legal substring). Definition 1

Letu =x1x2· · ·xn be a legal string with xi ∈Π for1 ≤i ≤ n. For a pointer

p∈ Π such that{xi, xj} ⊆ {p,p}¯ and 1≤i < j ≤n, the p-interval of uis the substringxixi+1· · ·xj. Two distinct pointersp, q∈Πoverlapinuif thep-interval ofuoverlaps with the q-interval ofu.

The string pointer reduction system consists of three types of reduction rules operating on legal strings. For allp, q∈Πwithp6=q, we define:

• thestring negative rule forpbysnrp(u1ppu2) =u1u2,

• thestring positive rule forpbysprp(u1pu2pu¯ 3) =u1u¯2u3,

• thestring double rule forp, q bysdrp,q(u1pu2qu3pu4qu5) =u1u4u3u2u5, whereu1, u2, . . . , u5are arbitrary strings overΠ.

Note that each of these rules is defined only on legal strings that satisfy the given form. For example,snr2 is not defined on legal string2323. It is important to realize that for every non-empty legal string there is at least one reduction rule applicable. Indeed, every legal string for which no string positive rule and no string double rule is applicable must have only nonoverlapping, negative pointers and thus a string negative rule is applicable.

We also define Snr = {snrp | p ∈ Π}, Spr = {sprp | p ∈ Π} and Sdr =

{sdrp,q | p, q ∈ Π,p6= q} to be the sets containing all the reduction rules of a specific type.

The string negative rule corresponds to the loop recombination operation, the string positive rule corresponds to the hairpin recombination operation, and the string double rule corresponds to the double-loop recombination operation. Note that the fact (pointed out at the end of Section 2.2) that the molecular operations remove pointers is explicit in the string pointer reduction system — indeed when a string rule for a pointerp(or pointerspandq) is applied, then all occurrences ofpandp¯(orp,p¯,qandq¯) are removed.

Definition 2

The domain dom(ρ) of a reduction rule ρ equals the set of unbarred variants of the pointers the rule is applied to, i.e., dom(snrp) = dom(sprp) = {p} and

dom(sdrp,q) = {p,q} for p, q ∈ Π. For a composition ϕ = ϕ1 ϕ2 · · · ϕn of reduction rulesϕ1, ϕ2, . . . , ϕn, thedomaindom(ϕ)is the union of the domains of its constituents, i.e.,dom(ϕ) =dom(ϕ1)∪dom(ϕ2)∪ · · · ∪dom(ϕn).

Definition 3

Letu andv be legal strings and S ⊆ {Snr, Spr, Sdr}. Then a compositionϕof reduction rules fromSis called an(S-)reduction ofu, ifϕis applicable to (defined on)u. Asuccessful reduction ϕ of uis a reduction ofusuch thatϕ(u) =λ. We then also say thatϕissuccessful foru. We say thatuisreducible tovinSif there is aS-reductionϕofusuch that ϕ(u) =v. We simply say that uis reducible to

v ifuis reducible tovin {Snr, Spr, Sdr}. We say thatuissuccessful inS ifuis reducible toλinS.

Note that if ϕis a reduction of u, then dom(ϕ) = dom(u)\dom(ϕ(u)). Because (as pointed out already) for every non-empty legal string there is at least one reduction rule applicable, we easily obtain Theorem 9.1 in [12] which states that every legal string is successful in{Snr, Spr, Sdr}.

Example 2

LetS={Snr, Spr},u= 3245¯45¯3¯2, andv= ¯54¯5¯4. Thenuis reducible to v inS, because(snr3 spr2)(u) =v. Since applyingϕ=spr¯5 spr4 snr¯2 spr3 touyields

λ,ϕis successful foru. On the other hand,u= 3232is not reducible to anyv in

S, because none of the rules inSnrand none of the rules inSpris applicable for thisu.

Referring to the Introduction, in Theorem 11 we present a characterization of the intermediate strings that may be constructed during the transformation of a given gene from its micronuclear form to its macronuclear form. Formally, this is a characterization of reducibility, which allows one to determine for any given legal stringsuandv andS ⊆ {Snr, Spr, Sdr}, whether or notuis reducible tov

inS. This result can be seen as a generalization of the results from Chapter 13 in [12], which provide a characterization of successfulness for realistic strings, that is, for the case whereuis realistic andv=λ.