F or each trappable exon, the sequence at the splice acceptor (from -20 to +2 positions) and splice donor (from -3 to +10 positions) sites was recorded. These ranges of positions were chosen to include the region for which base frequency tables (and therefore consensus sequences) are available, and a few neighbouring bases. Frequency tables w ere constructed for each base at each position (see tables 3.4 and 3.5), and for each position in the splice sites, a chi squared test was perform ed to test the null hypothesis: ‘The frequency of each base at position N is the sam e in exons w hich were and w ere not trapped’ (see tables 3.6 and 3.7).
acceptor splice site for exons not trapped (n=78) -2 0 -19 -18 -17 -1 6 -15 -14 -13 -1 2 -11 -1 0 -9 -8 -7 -6 -5 -4 -3 -2 -1 + \ +2 A 14 12 11 8 6 9 4 3 3 7 6 6 4 4 6 4 9 1 78 0 12 18 C 32 24 31 35 25 34 30 23 34 33 28 31 44 34 32 33 34 62 0 0 16 16 G 13 18 15 14 11 11 11 16 12 10 11 14 10 10 7 3 24 0 0 78 38 16 T 19 24 21 21 36 24 33 36 29 28 33 27 20 30 33 38 11 15 0 0 12 28
T able 3.4a Frequency table o f splice acceptor seq u en ees o f ex o n s su ccessfu lly trapped
a cceptor sp lice site for trapped exon s (n = 39)
-2 0 -1 9 -18 -17 -16 -15 -14 -13 -1 2 -11 -1 0 -9 -8 -7 -6 -5 -4 -3 -2 -1 4-1 + 2
A 4 5 3 1 2 4 4 1 1 0 2 3 1 2 4 3 7 1 39 0 7 11
C 14 17 14 17 16 19 19 13 15 17 12 18 19 21 18 19 22 34 0 0 8 4
G 7 5 9 11 9 7 4 7 7 4 9 1 4 3 3 4 7 0 0 39 22 10
T 14 12 13 10 12 9 12 18 16 18 16 17 15 13 14 13 3 4 0 0 2 14
T able 3.4b Frequency table o f splice acceptor sequences o f ex o n s not trapped
donor sp ice site in exo n s not tra p p ed (n = 7 8 )
-3 -2 -1 + \ +2 -H3 -H4 4-5 +6 4-7 4-8 4-9 + \ 0
A 21 4 6 9 0 0 37 54 3 1 17 14 9 19
C 33 6 4 0 0 1 10 6 17 23 32 32 26
G 15 15 61 78 0 4 0 9 68 16 28 22 20 14
T 9 1 1 4 0 78 0 5 1 38 10 10 17 19
T able 3 .5 a Frequency table o f splice donor seq u en ces o f ex o n s su ccessfu lly trapped
d on or sp ice site in trapped exon s (n=39)
-3 -2 -1 -kl 4-2 -f 3 4-4 -h5 +6 4-7 4-8 4-9 + \ 0
A 16 17 4 0 0 15 26 2 4 8 6 4 3
C 13 6 1 0 0 1 4 1 6 6 11 15 20
G 5 7 31 39 0 22 8 34 9 23 11 16 9
T 5 9 3 0 39 1 1 2 20 2 11 4 7
T able 3.5 b Frequency table o f splice donor seq u en ces o f ex o n s not trapped
Total chi square value sp lice acceptor
-20 -19 -1 8 -17 -16 -15 -1 4 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 +1 + 2 2.4 2.7 1.5 3.3 3.6 1 2.9 0.3 0.5 4.3 1.7 5.6 2.3 1.4 0.7 3.7 4 .2 1.7 0 0 2.7 2.2
T able 3.6 Chi square test for sequence at each position in sp lice acceptor site
Chi squared values for whether each position in the sp lice acceptor site is different
b etw een ex on s w hich w ere trapped su ccessfu lly and ex o n s w hich w ere not trapped.
Total chi square value sp lice donor
-3 -2 + 1
+2
+3 +4 +5+6
+7 +8 +9 +102.8 3.9 0.7 2.9 2.4 2.7 0.7 6.8 4.7 6.8
T able 3.7 Chi square test for sequence at each position in sp lice donor site
Chi squared values for whether each position in the sp lice acceptor site is different
b etw een exo n s w hich were trapped su ccessfu lly and exo n s w hich w ere not trapped.
T he critical value o f chi squared for a table with three degrees o f freedom (P = 0 .0 5 ) is
7 .8 1 5 . N on e o f the chi squared values calculated ex ceed the critical value, su ggestin g
that there is no sin gle position in the range tested w hich is important in determ ining
w hether an exon can be trapped.
T esting the w hole splice site a t once
C on sidering each p osition individually is a very sim p le w ay o f loo k in g at the sp lice site
seq u en ces, but a signal allo w in g sp licin g to occur may not alw ays be found at the sam e
p o sitio n relative to the sp lice site, and may in volve more than on e base. O ne way to
look at more than a sin gle position at a tim e is to assess how w ell each sp lice site as a
w h o le fits with the con sen su s for that sp lice site. Published data co llec ted from known
sp lic e sites (Senapathy et al. 1990) g iv es a short con sen su s seq u en ce at each site
described by the frequency tables given in tables 3.8 and 3.9.
-14 -13 -12 -11 -1 0 -9 -8 -7 -6 -5 -4 -3 -2 -1 +1 A 10 8 6 6 9 9 8 9 6 6 23 2 100 0 28 C 31 36 34 34 37 38 44 41 4 4 4 0 28 79 0 0 14 G 14 14 12 8 9 10 9 8 6 6 26 1 0 100 47 T 4 4 43 48 52 45 44 40 41 45 48 23 18 0 0 11 C on sen su s T T T T T T T T T T N C A G G C C C C C C C C C
T able 3.8 Published sp lice acceptor frequency table (%)
-3
-2
-1 +1
+2 +3 +4 + 5+6
A 28 59 54 74 16 C 40 14 18 17 13100
42 1 1 85 21 14 14 100 45 C on sen su s A G T A GT able 3.9 Published sp lice donor frequency table (%)
C alcu lating scores for each splice site
It is p ossib le that the factor determ ining ease o f exon trapping is how w ell the sp lice
sites conform to these con sen su s sequences. A sim p le w ay to m easure h ow w ell a
seq u en ce fits the con sen su s described by the frequency table is given by the GCG
program F itC onsensus (Staden 1984). For each position in the test seq u en ce, the score
is eq u ivalent to the frequency o f that base in the table. The individual scores are totalled
and d ivid ed by the number o f positions in the table. For exam p le the donor sequence
G G G G T G A G T gets the score (17+ 13+ 81 + 1 00 + 1 00 + 4 2 + 7 4 + 8 5 + 4 5 )7 9 = 61 .9 . Scores
w ere calculated for acceptor and donor sp lice sites separately, as w ell as a com bin ed
score for both sp lice sites o f an exon (calculated by im agining the tw o frequency tables
put together). T able 3 .1 0 sh o w s the m axim um and m inim um p o ssib le scores for each
sp lice site, and the com bin ed score.
A cceptor Donor Com bined
M axim um 53.9 70.9 60.3
M inim um 20.7 29.1 23.9
T able 3.10 M axim um and m inim um splice site scores