On the complexity of parallel parsing of general context free languages

(1)

http://wrap.warwick.ac.uk/

Original citation:

Rytter, W. (1986) On the complexity of parallel parsing of general context-free languages. University of Warwick. Department of Computer Science. (Department of Computer Science Research Report). (Unpublished) CS-RR-075

Permanent WRAP url:

http://wrap.warwick.ac.uk/60774

Copyright and reuse:

The Warwick Research Archive Portal (WRAP) makes this work by researchers of the University of Warwick available open access under the following conditions. Copyright © and all moral rights to the version of the paper presented here belong to the individual author(s) and/or other copyright owners. To the extent reasonable and practicable the material made available in WRAP has been checked for eligibility before being made available.

Copies of full items can be used for personal research or study, educational, or not-for-profit purposes without prior permission or charge. Provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way.

A note on versions:

The version presented in WRAP is the published version or, version of record, and may be cited as it appears here.For more information, please contact the WRAP Team at:

(2)

R.esearch

report

OH TnE CotupuexmY oF PRnauleu PARSING

OF GENERAL CONTEXT.FREE LANGUAGES

by

Wojciech Rytter

(RR75)

Abstract

Let T(n) be the time to recognize context-free languages on

a parallel random access machine without write conflicts (P-RAM) using

a

polynomial number

of

processors. We

assume that T(n) ₌Q(log n). Let P(n) be the time to compute

a

representation of a parsing tree tor strings of length n

using a polynomial number of processors. Then we prove:

P(n) ₌O(T(n)). A related result is

a

parallel time log n

computation of the transitive closure of directed graphs having special structure.

Department of Computer Science

University of Warwick

(3)

@I{TEff.FREE IAI{GUAGES

Wojciech Rytter,

University of Warwick _,Department_of_Comp.Science

Coventry CV47AL

and

Warsaw University, Institute of Informatics

Abstract

Let T(n) be the time.to recognize _{context{ree languages}_{on a}_parallel random access machine _{without write conflicts (p-nnrrrrl}_{using a}

polynomial number of processors.

_we

_{assume that T(n)}_=Cl(lo6_n). Let P(n) be the time

to

compute a representation of a parsing _tree_for strings o_f

lenglll

using _{a polynomiai}_number_{of processors.}

Then we

prove:

P(n)= O(T(n)).

A related result is a parallel time tog n computation of the transitive closure of directed graphs _having_special_structure.

The problem of parsing for context-free _{tanguages (cfl's,shor1y) seems}_to be harder than the problem of recognition. lt _{was proved by}_Ruzzo_[₁_]_that

if

r'(n)

_{is the}_{time to recognize}_cft's_{on a}_RAM_{(sequentially)}

then

the time needed to parse cfl's on a RAM is o(T'(n)log

n)). we

show that when one _{considers parallel time then parsing}_{is not harder}_than_recognition

(however the number of processors _{can grow considerably, though}

(4)

Our model of parallel computation is a parallel random access machine without write conflicts _{(known also as}_{a CREW p-RAM).}_{Such a machine}

consists of _{a number of synchronously working processors (RAM's)}_which

are using a common memory . No two processors _{can attempt}_{to write}_in the same step into the same location, however many processors _can_read

from the same location. Such a modet corresponds to bounded fan-in

circuits.

The best algorithms for parallel _{general context-free recognition}_on_a P-RAM work in log2 (n) time using o(n6) processors, _see

_[2,3]

_(such complexity can even be achieved on much weaker models of parallel

computations, cube connected computers and perfect shuffle computers, see

_[4]).

lt is a hard _{open problem whether general}_context-free

recognition _{can be}_done_in_parailel_time_{log(n) on}_{a p-RAM.}

For

unambiguous languages log n time is enough,see _[5],and if the language is

a bracket _cfl_{then the}_{number of}_processors_can_{be linear,see}

[6]. Optimal (log n _{time and}_n/rog(n)_processors)_parartet_parsing_and_recognition

algorithms can be constructed _for_one-sided_{Dyck languages.}

The algorithms for generat _{context-free recognition}_use_the_parsing

matrix (to be defined later). We show that using this _{matrix a parsing tree}

can be constructed _{in log}(n) time _{using a}_cubic_{number of}_processors._Even

if

the

_{recognition does}_not_{construct the parsing}_matrix

then we can

construct it _{by executing simultaneously}_o(n2)_parallel_recognizing algorithms. The parallel time does not increase, _however_the_{number of} processors should be multiplied by n2 in this case.

Throughout the paper _{we assume}_(for_{ease of exposition)}

that the

grammars are in Chomsky normal form. Let G be a _{context-free grammar}_in

Chomsky normal form, let N _,_{Ter, denote the}_set_of_{nonterminal, terminal} symbols, respectively.

_we

_{write A}_-_>B_if_{the grammar}_has_such

a

production, _and

_A->*w

_iff_{the string w}

(5)

3

starting symbol of the grammar.

Let w=a[1]a[2]....a[n] be a given input string of length n. The recognition

problem is: decide whether A->*w, where A e N

The parsing problem is: construct a parsing tree PT.

This tree has 2n-1 nodes numbered 1..2n-1. With each node x there is

associated the following information :

Father[x], Left[x],Right[x], (left and right sons, if x is not a leaf) and

Label[x] (an element of N).

The tree should satisfy locally the rules of the grammar:

Label[root]=S;

Label[x]->Label[Left[x]lLabel[Right[x]l if x is not a leaf ;

Label[x]->alil if x is the

ith

leaf, (from the left).

Fathe r[Left[x]l= Father[Right[x]l=y.

It is enough to compute only the tables Father and Label. Left and Right can

be computed then easily on a P-RAM in log(n) time using a small number of

processors. On the other hand, if we have Left and Right then

this does not determine Father,

since

PT might be any directed acyclic

graph whose nonleaf nodes have outdegree 2.

The parsing table Tab is of the type _{array[0..n,0..n] of subsets of N,}

Tab[i,i]=if icj then _{A : A->.a[i+1]...a[i]]

_else

s.

Example.

Let G be the following grammar:

S->CS S->AS S->CA S.>DD S->AC

C->AA C->BB

D->AA D->DC

A->a

B->b.

(6)

4

Let w=aabba.

The parsing table is presented _in _{Fig.2, and}_{the parsing tree in}

Fig.1 .

S..

Node

/"*r.t,

)

/'f,

G)!_\^

io

&,

_,X,

j

\u

tn'

_:

(7) (B)

_,,

9

Lobel Father Left

S2

c13

A2

st6

c57

B6

A5

Right 5

:

9

[image:6.595.99.429.153.307.2]

:

[image:6.595.146.341.415.593.2]

Fig.l.Porsing tree and

its

representotion

Fig.2. The persing _{table for}_G_end_w=E6bbo.

Let G be _{a directed acyclic graph given}_by_{the relation}

R, where R(u,v)

holds whenever (u,v) is _{an edge of G.}

_we

_{say that G}_{satisfies the}

unique path condition

_(upc,

_shorily)_iff_for_every_two_nodes

v, u there is at most _

2

3

4

q

(7)

5

one path from u to v. (Equivalenily, _{one can say}_{that G}_is_weakly

acyclic,after removing orientation the undirected graph _is_acyclic). The best known upper bound to compute the transitive closure R" of

directed graphs is log2n, however if the graph _{satisfies UpC then}_it_can_be

improved

Lemma.

lf the directed graph G with m nodes satisfies UPC

then

the transitive

closure of G can be computed in log(m) parallel time on a P-RAM using

m3 processors.

Proof .

Let R be the relation _{corresponding to}_a_{directed acyclic graph G}_satisfying

UPC, and V be the set of nodes. _Assume_that_the_nodes_{are numbered}_1..m.

We say that a node is a sink iff it has outdegreezero. Let

s=

log m . First

we compute the tables R1[vl (corresponding _{to some}

_relations),

_forg<k<s.

fo,=R;

for each sink v

do

in parallel _{Rg[v,v]:=true;}

for k:=1 to s do

for

_each

v1 _,v2,v3_{such that}_{R6_1[v1,v2]}_and_{Rp_1[v2,v3]}

do in

parallel

R1[v1 _,v3]:=true;

We claim that there are no _{write conflicts}_in_the_{above algorithm}_and

Rr[v1 _{,v21 holds}_iff_there_is_a_path_from_v1_{to v2 and v2}

is a sink.

(8)

(-)

if

Rp[v,v1] and Rp[v,v2l and v, v1, vz are lying on the same path in G

then v1=v2, for k=0..s.

This invariant implies that whenever _{we have}_{Rt_t}_[v1_,v2]_and_{Rp_1}

[v2,v3]

and Rp-1[v1,v2'] and Rp-1[v2',v3] then v2:v2', because

upc

_{guarantees that}

all the nodes involved lie on the same path.

The second fact follows from our _{doubling technique. We are doubling}_the

distances between v1, v2tor

which

Rp[v1,v2] hotds _,until ultimately vz

becomes a sink. The foilowing invariant can be easily proved:

(**)

lf Rp[x,yl and y is not a sink then dist(x,y)=2k

(dist is the length of the path from x to y in the graph G).

we

have computed a part of R*, if y is a sink then R.(x,y)=pr[x,yl. Now we

compute

R*

for all nonsink nodes.

we

introduce two relations R';, and Dp (k=0..s), represented by

two-dimensional tables with the same names.

R'p[x,yl holds

iff

R1[x,yJ _{holds and}_y_{is not a}_sink.

Dp[x,yl holds iff dist(x,y).2k*1, _{for nonsink}_nodes_X,y_.

Let lD denote the identity relation _{and . denote the composition}_of

relations, we consider only the nodes _{which are}_not_{sinks. The retations}_D1

can be computed _{using the}_following_recurrence_{formula (following}_from

invariant (*-)):

DO= R+lD; Dk+1 _{=Dk+Dk.R'k*1.}

we

can easily compute R'g ?nd Dg _,n€xt we apply the _{recurrence equation}

log(n) times.

fork:=1

_bsdg

(9)

for each x,z do in parallel

jI

_{Df_t [x,z]}then D1[x,z]:=true;

for each x,y,z such that Dg_1[x,y] _and_R'p[y,zl_do_inparallel _{D1[x,z]:=true}

end

There are no write conflicts here because it x,y,z are lying on the same

path and R'1[y,z] holds then y _{is uniquely determined by}_{x, z}_(as_{a node}_lying

on the path from x to z, whose _{distance to}z is _{zk \. observe that}in this

algorithm Dp could be replaced by D (in _{fact the subscript k}_is_not_needed,

though it helps to apply the recurrence formula direcily).

Now we can compute R*=

Rs*D,

in one parallel step. This completes the

proof .

Theorem

Assume that context-free _{recognition can be done}_in_{the parallel time}

T(n) using R(n) processors of a p-RAM, and T(n)=911sg n).

Then the representation of a parsing tree (if there is any) can be

constructed in parallel time _{o(T(n)) using o(R(n)}_n2*n3)_processors.

lf the recognition procedure _constructs_the_parsing_{table then}_O(R(n)+n3)

processors _are_enough.

Proof .

First we construct the parsing table Tab for a given input string

w=a[1]a[2]...a[n], and a given grammar G in _{Chomsky normal form. This}_can

be done in parallel time T(n) using n2 R(n) processors. For each

A,i.j,

simultaneously we check whether A->*a[i+1 _]...atil.

lf s->"w then we start _{to compute a}parsing tree etse we stop here, because we know that in such case there is no parsing tree.

We construct now the following acyclic directed graph

G

represented by

the relation R (the relation:to _bea possible father). _{The set of nodes}_is

(10)

I

for

each

node (A,i,j),

i<j,

do in parallel find ickcj, B,C such that

there is a production _A->Bc_and_B_e_Tab[i,k]_and

_c

_eTab[k,i]; R[( B, i, k), (A, _i,j)]:= R[(C, k,j), (A, i,j)] :=tru e ;

((A,i,i) becomes a possible father of (B,i,k),(c,k,i), the number _{k and} nonterminals _{B, C}_can_be_found_using_a_tinear_number

of processors for fixed i,j,these processors _{coutd be organized into}_{the tree to search}

i-th

row and _{j-th column (simultaneously)}_{of Tab}_{for suitable}_B,C.

There can be

many k's possible, but we choose any one of _{them and it}is then fixed _). The graph corresponding _{to our exampte grammar and}_{the string}_aabba

is

shown in Fig.3. observe _{that the tree}_from_Fig.1_{corresponds to}_{a subgraph}

of this graph.

s,0,5

_

\

4,4,5

4,1 ,2 8,2,3 _8,3,4

Fig.3. The groph G.

Next _{we compute the transitive closure}_R*._{The graph G satisfies}

_upc

and

we can use _{algorithm from the}_lemma._{Let vg}_=(s,O,n)._{The parsing tree}

_pr

consists of all nodes v such that R-(v,vg) holds _{.The root}_of

pr

_is_vg_._The

(11)

I

function Father is computed as follows:

for

each

u, V e

PT

do in parallel if R(u,v) then Father[u]:=v;

The tables Left and Right can be computed in parallel time log n using the

table _{Father.so far the nodes are}_{not numbered}_(from₁_{to 2n,1)}_as

required, each node is a triple of the form _{(A,i,j) ,and all the}tables have

entries indexed by such triples. The set of such triples belonging to pT can

be numbered from 1 to 2n-1 using the algorithm of Tarjan and Vishkin _[7 _]

for preorder (or postorder) _{numbering of trees.This can be}_{done also}_by

arranging all possible triples in any initiat order (e.g. lexicographically), and then the _{final number of}_a_triple_belonging_to_PT_could_be_obtained_by

counting the number of preceding triples which are elements of pT (using a prefix computation). lf num is the numbering obtained _,then

Label _[nu m (A, i,j)] :=A.

The constructed tree now satisfies all the requirements. _{This completes}

the

proof.

L

It was proved in _[5]that every unambiguous cft _{can be recognized in}_log(n)

time on a P-RAM using a polynomial _{number of processors.}_Now_we_can

strengthen the result of _[5].

Corollary.

Every unambiguous cfl can parsed on a P-RAM in log n time using

polynomial number of processors.

Acknowledgments.

(12)

References.

11l W. Ruzzo. On the complexity of general context free language parsing

and recognition. Automata, languages and programming. Lect.Notes in

Computer Science,1979

tzl

W.Ruzzo. Tree-size bounded alternation. Journal of Comp. and

Syst.Sciences 21, 218-235, 1 980

t3l

W.Rytter. The complexity of two way pushdown automata and recursive

'

_{programs. NATO Advanced}_Research_{Workshop "Combinatorial}_algorithms_on

words" (ed.A.Apostolico,Z.Gal i l),Spri n ger-Ve rlag 1 985

t4l

W.Rytter. On the recognition of context free languages. Computation Theory, Lect.Notes in Comp.S'cience,springer Verlag 1985

tsl

W.Rytter. Parallel time log n recognition of unambiguous cfl's. Fund.of Computation Theory, Lect.Notes in Computer Science 1985

16l W.Rytter,R.Giancarlo. Parallel parsing of bracket and recognition of

-

input driven languages. manuscript,1985

l7l

R.Tarjan,U.Vishkin. Finding biconnected components and computing tree functions in logarithmic parallel time. Proceedings of IEEE Symp. on