Gene Expression Analysis
Annotation/Functional
Annotation/Functional Analysis
DE analysis at gene level (Trinity)
• Filtered genes from DESeq2 -> filtered isoforms
• In Excel
• Save list of names of "genes", e.g.
TRINITY_DN7498_c0_g1
• Divide genes into groups based on expression pattern
• Up, Equal, Down by LFC cutoff and P-value
• Two comparisons between time points 1, 2, 3
UU, UE, UD
EU, EE, ED
DU, DE, DD
• Save lists of genes to separate text files
• Extract all isoforms corresponding to filtered genes
from transcriptome fasta file
fasta_select.py list_file fasta_file > isoform.fa
• Need peptide sequences for most annotation
analyses
• Use TransDecoder to get probable protein sequences
Computational Genomics 2020
Week 9
# 2
DD
2%
DE
40%
ED
2%
DU
8%
UD
4%
EU
3%
UE
38%
UU
3%
Annotation/Functional Analysis
Observed transcripts – genes to transcripts
• Stringtie
• use gffread to write out all transcript sequences from merged GTF file
• copy transcript file and remove isoform numbers
• I used vi with the command
:%s/\.\d* CDS/ CDS/
replaces period, followed by any number of digits, space, CDS with space, CDS
• work on a copy, after editing this file will have duplicate names
• write out list of selected genes from DESeq2 analysis
• all genes without padj or LFC selection
• one gene per line (if you used the gene count file there will be no transcript numbers)
• use seqtk subseq to select the sequences
seqtk subseq merged_transcript.fa stringtie_filtered.list > stringtie_filtered.fa
• count number of genes with grep and wc
• 20610 selected genes -> 36310 transcripts
>MSTRG.1
.2
CDS=1-869
Transcriptome Assembly
Trinity output
• Trinity.fasta.gene_trans_map
maps transcripts to genes
• recursive_trinity.cmds
butterfly commands
• recursive_trinity.cmds.completed
successful commands
• recursive_trinity.cmds.failed
rerun these
Computational Genomics 2020
Week 7
# 4
Transcriptome Assembly
Trinity output
• Trinity produces many files
• Trinity files take a lot of space
• You
MUST
compress
• I suggest, backup the entire result with tar, then
Transcriptome Assembly
Trinity output
• cleanup
• tar with parallel compression
• takes about 77 min
• final size of 200421_avocado_trinity.tar.tgz = 105 G
Computational Genomics 2020
Week 7
# 6
Transcriptome Assembly
Trinity genome guided assembly
• Use reads mapped against reference genome
• I used reads from HISAT from stringtie analysis
• merge into single bam file with samtools merge
• recommended only if genome is fairly complete
• with defaults, produced 234415 predicted transcripts (compared to 265348 for de novo)
Transcriptome Assembly
Trinity results
• By default, results are in the directory trinity_out_dir/Trinity.fasta
• Change this name to a more informative one immediately
• avocado_trinity_200422.fa
• how many predicted transcripts
grep '>' avocado_trinity_200422.fa | wc
265348 1473154 20582035
• in predicted transcripts file each sequence is on a single line. May not work for all downstream
programs
Computational Genomics 2020
Week 7
# 8
>TRINITY_DN8_c0_g1_i3 len=630 path=[0:0-251 2:252-503 3:504-504 5:505-629]
ATCAAATCTTACGAGGTGTGAAAGTCAGGTTCCATGACAAAGAGGGAAAAGGCTGCAGACAGAAAGAAACAGCCACTCTGCAACTCAGTATTAATAGGAAAATG
TCTTTAATGGAGAAAAGCTCTCATCCAAGGGCGGGAATAATAGCAGGCCTGTTTTGAAAATGATTTCATTTTCATCATCTACCTGTGCGTTCTTAAATAAGTTG
GAGCTGAAAACTAAGCACTCTTTGAAACTCTCTCTCTATATCAGGTAGTAAGAAAAAGAGAAGAAAGAAGAAAGAAATGCCCGTCTTTGATCTTATATCTGCTA
TCCAACTGCATTTCTACTGACATTCTATAGATTTTTATGCCATTCACTACTTCTTGTCATCTTTTTACCTGGGTTTGTTTGGACTATGGATTTCTGGTTTATAT
ATGAATCAATAAATTATGGATATAAACCTACAAGTTTTTCCCCTTTTTCTTGACTGGGAGTTCGAAAATCAGTGTTTTTGCTTTGATGGGTACTTCTGTTCTCT
GTTCCACCCTCTCTCGATCTTTTCTGTTGGTTTTTCTCTGATGGGTTATTCCTGTATTCAGCTGAGGAGTTGATGTCAGTGAATTTTTTTTTTTTTTTTTCAGT
TTTCTG
>TRINITY_DN8_c0_g1_i1 len=3405 path=[0:0-251 2:252-503 3:504-504 4:505-3404]
Transcriptome Assembly
Trinity predicted transcript IDs
• _DN read cluster, (inchworm) contain overlapping kmers
• _c component, (chrysalis) have read support
• _g gene (butterfly), alternative de Bruijn graph taces
• -i isoform
>TRINITY_DN8_c0_g1_i3 len=630 path=[0:0-251 2:252-503 3:504-504 5:505-629]
>TRINITY_DN8_c0_g1_i1 len=3405 path=[0:0-251 2:252-503 3:504-504 4:505-3404]
>TRINITY_DN8_c0_g1_i2 len=3207 path=[1:0-305 3:306-306 4:307-3206]
>TRINITY_DN8_c0_g1_i4 len=3152 path=[0:0-251 4:252-3151]
>TRINITY_DN8_c0_g2_i4 len=3043 path=[2:0-675 4:676-819 5:820-823 6:824-2349 8:2350-3042]
>TRINITY_DN8_c0_g2_i3 len=788 path=[1:0-94 8:95-787]
>TRINITY_DN8_c0_g2_i6 len=338 path=[0:0-121 4:122-265 5:266-269 7:270-337]
>TRINITY_DN8_c0_g2_i2 len=2489 path=[0:0-121 4:122-265 5:266-269 6:270-1795 8:1796-2488]
>TRINITY_DN8_c0_g2_i5 len=2343 path=[3:0-119 5:120-123 6:124-1649 8:1650-2342]
Annotation/Functional Analysis
Some selected genes
• Multiple isoforms of many, which are interesting?
Computational Genomics 2020
Week 9
# 10
Annotation/Functional Analysis
TransDecoder (trinity)
• Try to identify "best" proteins
• For stranded data run in –S mode
• TransDecoder.LongOrfs
Find the longest ORFs
• minimum ORF is 100, change with –m
• --gene_trans_map
• ORFS must start with M
not good for fragments or alternative start codons
• TransDecoder.Predict
Evaluate ORFs
• 5
th
order (hexamer) Markov model based on longest ORFs in set of predicted transcripts
hexamer model includes
amino acid frequencies in proteins
amino acid pair frequencies in proteins
codon usage in the organism of interest
TransDecoder.Predict
• longest_orfs.cds.scores
-log likelihood
Computational Genomics 2020
Week 9
# 12
Annotation/Functional Analysis
TransDecoder.LongOrfs predicted coding regions
>DN8_c0_g1 len=630 path=[0:0-251 2:252-503 3:504-504 5:505-629]
>DN8_c0_g1 len=3405 path=[0:0-251 2:252-503 3:504-504 4:505-3404]
>DN8_c0_g1 len=3207 path=[1:0-305 3:306-306 4:307-3206]
Annotation/Functional Analysis
TransDecoder.LongOrfs predicted coding regions
• are g1 and g2 really different genes
• Is the longest predicted transcript the best
• is the longest predicted ORF the best
Computational Genomics 2020
Week 9
# 14
>DN8_c0_g2.p1 type:complete
len:676
gc:universal DN8_c0_g2:719-2746(+)
>DN8_c0_g2.p2 type:complete len:98 gc:universal DN8_c0_g2:1297-1004(-)
>DN8_c0_g2.p3 type:complete len:95 gc:universal DN8_c0_g2:872-588(-)
>DN8_c0_g2.p4 type:complete len:74 gc:universal DN8_c0_g2:1173-1394(+)
>DN8_c0_g2.p5 type:complete len:67 gc:universal DN8_c0_g2:1914-1714(-)
>DN8_c0_g2.p6 type:complete len:66 gc:universal DN8_c0_g2:2774-2971(+)
>DN8_c0_g2.p7 type:complete len:61 gc:universal DN8_c0_g2:2535-2717(+)
>DN8_c0_g2.p8 type:complete len:53 gc:universal DN8_c0_g2:1452-1294(-)
>DN8_c0_g2.p9 type:complete len:96 gc:universal DN8_c0_g2:204-491(+)
>DN8_c0_g2.p10 type:complete len:66 gc:universal DN8_c0_g2:519-716(+)
>DN8_c0_g2.p11 type:complete len:61 gc:universal DN8_c0_g2:280-462(+)
>DN8_c0_g2.p12 type:complete len:60 gc:universal DN8_c0_g2:71-250(+)
>DN8_c0_g2.p13 type:complete
len:676
gc:universal DN8_c0_g2:165-2192(+)
>DN8_c0_g2.p14 type:complete len:98 gc:universal DN8_c0_g2:743-450(-)
>DN8_c0_g2.p15 type:complete len:74 gc:universal DN8_c0_g2:619-840(+)
>DN8_c0_g2.p16 type:complete len:67 gc:universal DN8_c0_g2:1360-1160(-)
>DN8_c0_g2.p17 type:complete len:66 gc:universal DN8_c0_g2:2220-2417(+)
>DN8_c0_g2.p18 type:complete len:66 gc:universal DN8_c0_g2:318-121(-)
>DN8_c0_g2.p19 type:complete len:61 gc:universal DN8_c0_g2:1981-2163(+)
>DN8_c0_g2.p20 type:complete len:53 gc:universal DN8_c0_g2:898-740(-)
>DN8_c0_g2.p21 type:complete
len:644
gc:universal DN8_c0_g2:115-2046(+)
>DN8_c0_g2.p22 type:complete len:98 gc:universal DN8_c0_g2:597-304(-)
>DN8_c0_g2.p23 type:complete len:74 gc:universal DN8_c0_g2:473-694(+)
>DN8_c0_g2.p24 type:complete len:67 gc:universal DN8_c0_g2:1214-1014(-)
>DN8_c0_g2.p25 type:complete len:66 gc:universal DN8_c0_g2:2074-2271(+)
>DN8_c0_g2.p26 type:complete len:61 gc:universal DN8_c0_g2:1835-2017(+)
>DN8_c0_g2.p27 type:complete len:53 gc:universal DN8_c0_g2:752-594(-)
>DN8_c0_g2 len=3043 path=[2:0-675 4:676-819 5:820-823 6:824-2349 8:2350-3042]
>DN8_c0_g2 len=788 path=[1:0-94 8:95-787]
>DN8_c0_g2 len=338 path=[0:0-121 4:122-265 5:266-269 7:270-337]
Annotation/Functional Analysis
TransDecoder.LongOrfs predicted coding regions
• is DN78 a coding gene?
• which is the best ORF for DN18
>DN78_c0_g1 len=891 path=[0:0-890] >DN18_c0_g1 len=1354 path=[2:0-119 3:120-152 4:153-178 5:179-230 7:231-265 9:266-284 11:285-374 12:375-424 14:425-578 15:579-616 17:617-689 18:690-869 19:870-1027 22:1028-1353] >DN18_c0_g1 len=1791 path=[10:0-721 11:722-811 12:812-861 14:862-1015 15:1016-1053 17:1054-1126 18:1127-1306 19:1307-1464 22:1465-1790] >DN18_c0_g1 len=1384 path=[0:0-257 4:258-283 6:284-335 7:336-370 8:371-479 12:480-529 13:530-683 15:684-721 16:722-794 18:795-974 20:975-1383] >DN18_c0_g1 len=399 path=[18:0-179 19:180-337 21:338-398] >DN18_c0_g1 len=1252 path=[0:0-257 4:258-283 5:284-335 7:336-370 9:371-389 10:390-1111 11:1112-1201 12:1202-1251] >DN18_c0_g1 len=1453 path=[1:0-218 3:219-251 4:252-277 5:278-329 7:330-364 9:365-383 11:384-473 12:474-523 14:524-677 15:678-715 17:716-788 18:789-968 19:969-1126 22:1127-1452]
>DN78_c0_g1.p1 type:5prime_partial
len:193
gc:universal DN78_c0_g1:2-580(+)
>DN78_c0_g1.p2 type:complete len:136 gc:universal DN78_c0_g1:547-140(-)
>DN78_c0_g1.p3 type:3prime_partial len:90 gc:universal DN78_c0_g1:267-1(-)
>DN78_c0_g1.p4 type:5prime_partial len:67 gc:universal DN78_c0_g1:889-689(-)
>DN78_c0_g1.p5 type:5prime_partial len:60 gc:universal DN78_c0_g1:1-180(+)
>DN18_c0_g1.p1 type:complete
len:327
gc:universal DN18_c0_g1:130-1110(+)
>DN18_c0_g1.p2 type:complete len:109 gc:universal DN18_c0_g1:749-1075(+)
>DN18_c0_g1.p3 type:complete len:92 gc:universal DN18_c0_g1:395-120(-)
>DN18_c0_g1.p4 type:3prime_partial len:75 gc:universal DN18_c0_g1:1132-1353(+)
>DN18_c0_g1.p5 type:complete len:59 gc:universal DN18_c0_g1:1166-1342(+)
>DN18_c0_g1.p6 type:complete
len:302
gc:universal DN18_c0_g1:642-1547(+)
>DN18_c0_g1.p7 type:complete len:109 gc:universal DN18_c0_g1:1186-1512(+)
>DN18_c0_g1.p8 type:complete len:81 gc:universal DN18_c0_g1:832-590(-)
>DN18_c0_g1.p9 type:3prime_partial len:75 gc:universal DN18_c0_g1:1569-1790(+)
>DN18_c0_g1.p10 type:complete len:59 gc:universal DN18_c0_g1:1603-1779(+)
>DN18_c0_g1.p11 type:complete
len:327
gc:universal DN18_c0_g1:235-1215(+)
>DN18_c0_g1.p12 type:complete len:198 gc:universal DN18_c0_g1:1185-592(-)
>DN18_c0_g1.p13 type:complete len:68 gc:universal DN18_c0_g1:1118-915(-)
>DN18_c0_g1.p14 type:complete len:66 gc:universal DN18_c0_g1:500-303(-)
>DN18_c0_g1.p15 type:complete len:58 gc:universal DN18_c0_g1:854-1027(+)
>DN18_c0_g1.p16 type:5prime_partial len:126 gc:universal DN18_c0_g1:1-378(+)
>DN18_c0_g1.p17 type:complete len:109 gc:universal DN18_c0_g1:59-385(+)
>DN18_c0_g1.p18 type:complete len:81 gc:universal DN18_c0_g1:1222-980(-)
>DN18_c0_g1.p19 type:3prime_partial len:74 gc:universal DN18_c0_g1:1032-1250(+)
>DN18_c0_g1.p20 type:complete len:64 gc:universal DN18_c0_g1:235-426(+)
Annotation/Functional Analysis
TransDecoder.LongOrfs predicted coding regions
• Longest transcript
• Longest ORF
Computational Genomics 2020
Week 9
# 16
>DN53_c0_g1 len=3139 path=[0:0-2584 2:2585-3138]
>DN53_c0_g1 len=3254 path=[0:0-2584 1:2585-2699 2:2700-3253]
>DN53_c0_g3 len=2958 path=[0:0-555 1:556-2736 4:2737-2957]
>DN53_c0_g3 len=3068 path=[0:0-555 1:556-2736 3:2737-2846 4:2847-3067]
>DN53_c0_g3 len=698 path=[0:0-555 2:556-697]
Annotation/Functional Analysis
TransDecoder.LongOrfs predicted coding regions
• Questions
• are different _g isoforms really different genes
• Is the longest predicted transcript the best
• is the longest predicted ORF the best
• How good is transdecoder and predicting the CDS?
• Method
• compare to protein library
• blastp of predicted protein
• blastx of predicted transcript
• use diamond, 1000 times faster than Blast
• use uniref50 condensed database (clustered at 50% identity)
• best should have longest match to known protein
Annotation/Functional Analysis
Transdecoder predicted coding regions
• diamond blastx
• make sure to set –threads or max will be used
Computational Genomics 2020
Week 9
# 18
diamond v0.9.14.115 | by Benjamin Buchfink <[email protected]>
Licensed under the GNU AGPL <https://www.gnu.org/licenses/agpl.txt>
Check http://github.com/bbuchfink/diamond for updates.
Syntax: diamond COMMAND [OPTIONS]
Commands:
makedb
Build DIAMOND database from a FASTA file
blastp
Align amino acid query sequences against a protein reference database
blastx
Align DNA query sequences against a protein reference database
view View DIAMOND alignment archive (DAA) formatted file
help Produce help message
version Display version information
getseq
Retrieve sequences from a DIAMOND database file
dbinfo
Print information about a DIAMOND database file
General options:
--threads (-p) number of CPU threads
--db (-d) database file
Annotation/Functional Analysis
Transdecoder predicted coding regions
--outfmt (-f) output format
0 = BLAST pairwise
5 = BLAST XML
6 = BLAST tabular, Value 6 may be followed by a space-separated list of these keywords:
qseqid means Query Seq - id
qlen means Query sequence length
sseqid means Subject Seq - id
sallseqid means All subject Seq - id(s), separated by a ';'
slen means Subject sequence length
qstart means Start of alignment in query
qend means End of alignment in query
sstart means Start of alignment in subject
send means End of alignment in subject
qseq means Aligned part of query sequence
sseq means Aligned part of subject sequence
evalue means Expect value
bitscore means Bit score
score means Raw score
length means Alignment length
pident means Percentage of identical matches
nident means Number of identical matches
mismatch means Number of mismatches
positive means Number of positive - scoring matches
gapopen means Number of gap openings
gaps means Total number of gaps
ppos means Percentage of positive - scoring matches
qframe means Query frame
btop means Blast traceback operations(BTOP)
staxids means unique Subject Taxonomy ID(s), separated by a ';' (in numerical order)
stitle means Subject Title
salltitles means All Subject Title(s), separated by a '<>'
qcovhsp means Query Coverage Per HSP
qtitle means Query title
Annotation/Functional Analysis
Transdecoder predicted coding regions
• Diamond/blastx search vs uniref50
Computational Genomics 2020
Week 9
# 20
DN8_c0_g1.p5 675 1-674 A0A1U7Z5Q7 679 1-679 700 2.0e-129 470.3 A0A1U7Z5Q7 filament-like plant protein isoform X1 n=12 Tax=Magnoliopsida TaxID=3398 RepID=A0A1U7Z5Q7_NELNU DN8_c0_g1.p5 675 1-407 A0A5B6ZQX7 449 1-404 409 4.2e-127 462.6 A0A5B6ZQX7 Putative filament-like plant protein (Fragment) n=2 Tax=Pentapetalae TaxID=1437201
DN8_c0_g1.p5 675 1-666 F6H4F3 672 1-666 674 4.1e-122 446.0 F6H4F3 Uncharacterized protein n=87 Tax=Mesangiospermae TaxID=1437183 RepID=F6H4F3_VITVI
DN8_c0_g1.p16 675 1-674 A0A1U7Z5Q7 679 1-679 700 2.0e-129 470.3 A0A1U7Z5Q7 filament-like plant protein isoform X1 n=12 Tax=Magnoliopsida TaxID=3398 RepID=A0A1U7Z5Q7_NELNU DN8_c0_g1.p16 675 1-407 A0A5B6ZQX7 449 1-404 409 4.2e-127 462.6 A0A5B6ZQX7 Putative filament-like plant protein (Fragment) n=2 Tax=Pentapetalae TaxID=1437201
DN8_c0_g1.p16 675 1-666 F6H4F3 672 1-666 674 4.1e-122 446.0 F6H4F3 Uncharacterized protein n=87 Tax=Mesangiospermae TaxID=1437183 RepID=F6H4F3_VITVI
DN8_c0_g1.p27 675 1-674 A0A1U7Z5Q7 679 1-679 700 2.0e-129 470.3 A0A1U7Z5Q7 filament-like plant protein isoform X1 n=12 Tax=Magnoliopsida TaxID=3398 RepID=A0A1U7Z5Q7_NELNU DN8_c0_g1.p27 675 1-407 A0A5B6ZQX7 449 1-404 409 4.2e-127 462.6 A0A5B6ZQX7 Putative filament-like plant protein (Fragment) n=2 Tax=Pentapetalae TaxID=1437201
DN8_c0_g1.p27 675 1-666 F6H4F3 672 1-666 674 4.1e-122 446.0 F6H4F3 Uncharacterized protein n=87 Tax=Mesangiospermae TaxID=1437183 RepID=F6H4F3_VITVI DN8_c0_g2.p1 676 1-407 A0A5B6ZQX7 449 1-406 411 9.1e-130 471.5 A0A5B6ZQX7 Putative filament-like plant protein (Fragment) n=2 Tax=Pentapetalae TaxID=1437201
DN8_c0_g2.p1 676 1-668 A0A1U7Z5Q7 679 1-673 679 6.1e-126 458.8 A0A1U7Z5Q7 filament-like plant protein isoform X1 n=12 Tax=Magnoliopsida TaxID=3398 RepID=A0A1U7Z5Q7_NELNU DN8_c0_g2.p1 676 1-584 A0A4S4DQK1 735 1-645 650 2.8e-123 449.9 A0A4S4DQK1 Uncharacterized protein n=33 Tax=Mesangiospermae TaxID=1437183 RepID=A0A4S4DQK1_CAMSI
DN8_c0_g2.p13 676 1-407 A0A5B6ZQX7 449 1-406 411 9.1e-130 471.5 A0A5B6ZQX7 Putative filament-like plant protein (Fragment) n=2 Tax=Pentapetalae TaxID=1437201
DN8_c0_g2.p13 676 1-668 A0A1U7Z5Q7 679 1-673 679 6.1e-126 458.8 A0A1U7Z5Q7 filament-like plant protein isoform X1 n=12 Tax=Magnoliopsida TaxID=3398 RepID=A0A1U7Z5Q7_NELNU DN8_c0_g2.p13 676 1-584 A0A4S4DQK1 735 1-645 650 2.8e-123 449.9 A0A4S4DQK1 Uncharacterized protein n=33 Tax=Mesangiospermae TaxID=1437183 RepID=A0A4S4DQK1_CAMSI
DN8_c0_g2.p13 676 1-407 A0A5B6ZQX7 449 1-406 411 9.1e-130 471.5 A0A5B6ZQX7 Putative filament-like plant protein (Fragment) n=2 Tax=Pentapetalae TaxID=1437201
DN8_c0_g2.p13 676 1-668 A0A1U7Z5Q7 679 1-673 679 6.1e-126 458.8 A0A1U7Z5Q7 filament-like plant protein isoform X1 n=12 Tax=Magnoliopsida TaxID=3398 RepID=A0A1U7Z5Q7_NELNU DN8_c0_g2.p13 676 1-584 A0A4S4DQK1 735 1-645 650 2.8e-123 449.9 A0A4S4DQK1 Uncharacterized protein n=33 Tax=Mesangiospermae TaxID=1437183 RepID=A0A4S4DQK1_CAMSI
DN8_c0_g2.p21 644 9-375 A0A5B6ZQX7 449 40-406 369 2.7e-115 423.3 A0A5B6ZQX7 Putative filament-like plant protein (Fragment) n=2 Tax=Pentapetalae TaxID=1437201
DN8_c0_g2.p21 644 2-636 A0A1U7Z5Q7 679 37-673 643 5.6e-113 415.6 A0A1U7Z5Q7 filament-like plant protein isoform X1 n=12 Tax=Magnoliopsida TaxID=3398 RepID=A0A1U7Z5Q7_NELNU DN8_c0_g2.p21 644 11-552 A0A4S4DQK1 735 42-645 606 5.8e-110 405.6 A0A4S4DQK1 Uncharacterized protein n=33 Tax=Mesangiospermae TaxID=1437183 RepID=A0A4S4DQK1_CAMSI
Annotation/Functional Analysis
10 20 30 40 50 60 70 80
DN8_c0 MENRSWLWRKKSSEKSPGETESSGSVSSHSERFSDDQEASRGPPNHSQSPEISSNLAGSKVQDTVKSLTERLSAALSNIS :: ::::::.:::::::::::::::::: ::::::::::. :::..:::.:::::::..:::::::::.::::::::: DN8_c0 MERRSWLWRRKSSEKSPGETESSGSVSSG--RFSDDQEASRASPNHTRSPEVSSNLAGSEAQDTVKSLTEKLSAALSNIS
Annotation/Functional Analysis
Transdecoder predicted coding regions
• DN8_c0_g1 vs DN8_c0_g2
• match to same set of proteins
• similar but not identical assembly block structure
• not block differences expected from mis-assembly
• align well at RNA and protein level, with consistent small differences
• about 16% different at the amino acid level
• more likely to be duplicated gene than alleles
Annotation/Functional Analysis
Transdecoder predicted coding regions
query
length source
transcript
length
begin end subject
title
length
begin end
align
length E
score
taxonomy
DN78_c0_g1.p1
193 2-580 (+)
891
42 192 A0A3S3R0T1
Lipoyl synthase
151
1 151
151
1.10E-77
123.6 Cinnamomum micranthum
DN78_c0_g1.p2
136 547-140 (-)
no
DN78_c0_g1.p3
90 267-1 (-)
no
DN78_c0_g1.p4
67 889-689 (-)
no
DN78_c0_g1.p5
60 1-180 (+)
no
DN18_c0_g1.p1
327
130-1110 (+)
1345
9 326 A0A498HHW2
Cysteine synthase
1103
786 1103
318
1.50E-162
579.3
Magnoliopsida
DN18_c0_g1.p2
109 749-1075 (+)
no
DN18_c0_g1.p3
92 395-120 (-)
1
74 A0A0A9NZJ6
Uncharacterized
98
25
98
74
2.00E-11
75.5 Arundo donax
DN18_c0_g1.p4
75 1132-1353 (+)
no
DN18_c0_g1.p5
59 1166-1342 (+)
no
DN18_c0_g1.p6
302 642-1547 (+)
1791
27 301 A0A498HHW2
Cysteine synthase
1103
829 1103
275
3.30E-140
505 Magnoliopsida
DN18_c0_g1.p7
109 1186-1512 (+)
no
DN18_c0_g1.p8
81 832-590 (-)
no
DN18_c0_g1.p9
75 1569-1790 (+)
no
DN18_c0_g1.p10
59 1603-1779 (+)
no
DN18_c0_g1.p11
327
235-1215 (+)
399
9 323 F6HTU8
Cysteine synthase
701
89 403
315
4.10E-160
571.2
Mesangiospermae
DN18_c0_g1.p12
198 1185-592 (-)
no
DN18_c0_g1.p13
68 1118-915 (-)
1
67 I3STT7
Uncharacterized
82
16
82
67
3.50E-13
80.9 Lotus japonicus
DN18_c0_g1.p14
66 500-303 (-)
3
63 A0A448Z883
Uncharacterized
354
235 295
61
1.20E-07
62.4 Pseudo-nitzschia multistriata
DN18_c0_g1.p15
58 854-1027 (+)
no
DN18_c0_g1.p16
126 1-378 (+)
1252
1 125 A0A498HHW2
Cysteine synthase
1103
965 1089
125
2.90E-58
231.5 Magnoliopsida
DN18_c0_g1.p17
109 59-385 (+)
no
DN18_c0_g1.p18
81 1222-980 (-)
no
DN18_c0_g1.p19
74 1032-1250 (+)
27
73 B7FKU7
Cysteine synthase
325
51
97
47
5.50E-17
93.6 Pentapetalae
DN18_c0_g1.p20
64 235-426 (+)
4
52 A0A2I4HII2
Cysteine synthase
81
3
51
49
9.20E-16
89.4 Cellular organism
DN18_c0_g1.p21
327
229-1209 (+)
1453
9 326 A0A498HHW2
Cysteine synthase
1103
786 1103
318
1.50E-162
579.3
Magnoliopsida
DN18_c0_g1.p22
126 494-117 (-)
1
74 A0A0A9NZJ6
Uncharacterized
98
25
98
74
2.70E-11
75.5 Arundo donax
DN18_c0_g1.p23
109 848-1174 (+)
no
DN18_c0_g1.p24
75 1231-1452 (+)
no
Annotation/Functional Analysis
Transdecoder predicted coding regions
Computational Genomics 2020
Week 9
# 24
query length source
transcript
length begin end subject title length begin end align
length E score taxonomy
DN53_c0_g1.p1 769 239-2545 (+) 3139 62 756 A0A443PRB7 SWIM-type 783 6 781 778 1.40E-283 982.6Cinnamomum micranthum DN53_c0_g1.p2 80 2514-2275 (-) no DN53_c0_g1.p3 70 1648-1439 (-) no DN53_c0_g1.p4 65 2333-2139 (-) no DN53_c0_g1.p5 64 3-194 (+) no DN53_c0_g1.p6 63 2752-2940 (+) no DN53_c0_g1.p7 54 2012-1851 (-) no DN53_c0_g1.p8 50 1785-1934 (+) no DN53_c0_g1.p9 50 2531-2382 (-) no
DN53_c0_g1.p10 769239-2545 (+) 3254 62 756 A0A443PRB7 783 783 6 781 778 1.40E-283 982.6Cinnamomum micranthum DN53_c0_g1.p11 80 2514-2275 (-) no DN53_c0_g1.p12 70 1648-1439 (-) no DN53_c0_g1.p13 65 2333-2139 (-) no DN53_c0_g1.p14 64 3-194 (+) no DN53_c0_g1.p15 63 2867-3055 (+) no DN53_c0_g1.p16 54 2012-1851 (-) no DN53_c0_g1.p17 50 1785-1934 (+) no DN53_c0_g1.p18 50 2531-2382 (-) no no
DN53_c0_g3.p1 828205-2688 (+) 2958 58 827 A0A443PRB7 SWIM-type 783 1 783 783 0.00E+00 1353.2Cinnamomum micranthum DN53_c0_g3.p2 187 1031-471 (-) no
DN53_c0_g3.p3 97 461-171 (-) 1 87 A0A0A9E9H7 Uncharacterized 128 24 110 87 5.80E-06 57.4 Arundo donax DN53_c0_g3.p4 81 2508-2266 (-) no DN53_c0_g3.p5 69 :2681-2887 (+) no DN53_c0_g3.p6 61 764-946 (+) no DN53_c0_g3.p7 58 2801-2628 (-) no DN53_c0_g3.p8 55 2-166 (+) no DN53_c0_g3.p9 51 2309-2157 (-) no
DN53_c0_g3.p10 828205-2688 (+) 3068 58 827 A0A443PRB7 SWIM-type 783 1 783 783 0.00E+00 1353.2Cinnamomum micranthum DN53_c0_g3.p11 187 1031-471 (-) no
DN53_c0_g3.p12 97 461-171 (-) 1 87 A0A0A9E9H7 Uncharacterized 128 24 110 87 5.80E-06 57.4 Arundo donax DN53_c0_g3.p13 81 2508-2266 (-) no
DN53_c0_g3.p14 61 764-946 (+) no DN53_c0_g3.p15 55 2-166 (+) no DN53_c0_g3.p16 51 2309-2157 (-) no DN53_c0_g3.p17 50 2848-2997 (+) no
Annotation/Functional Analysis
Transdecoder predicted coding regions
Annotation/Functional Analysis
Transdecoder predicted coding regions
Computational Genomics 2020
Week 9
# 26
DN8_c0_g1.p5
675
1 674 A0A1U7Z5Q7
679
1 679 700 2.00E-129
470.3
filament-like
Magnoliopsida
DN8_c0_g1
3405 546 2567 A0A1U7Z5Q7
679
1 679 700 6.80E-130
472.6
filament-like
Magnoliopsida
DN8_c0_g2.p1
676
1 407 A0A5B6ZQX7
449
1 406 411 9.10E-130
471.5
filament-like
Pentapetalae
DN8_c0_g2
3043 719 1939 A0A5B6ZQX7
449
1 406 411 8.00E-130
472.2
filament-like
Pentapetalae
DN78_c0_g1.p1
193 42 192 A0A3S3R0T1
151
1 151 151
1.10E-77
296.6
Lipoyl synthase
Cinnamomum
DN78_c0_g1
891 125 577 A0A3S3R0T1
151
1 151 151
1.00E-77
297.4
Lipoyl synthase
Cinnamomum
DN18_c0_g1.p1
327
9 326 A0A498HHW2 1103 786 1103 318 1.50E-162
579.3
Cysteine synthase
Magnoliopsida
DN18_c0_g1
1354 109 1107 A0A498HHW2 1103
772
1103 333 4.10E-163
581.6
Cysteine synthase
Magnoliopsida
Annotation/Functional Analysis
TransDecoder.Predict predicted coding regions
• 74 predicted isoforms
• ranked by LL
>TRINITY_DN0_c0_g2_i14.p1 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i14.p1 ORF type:complete len:915 (+),score=220.37 TRINITY_DN0_c0_g2_i14:195-2939(+) >TRINITY_DN0_c0_g2_i29.p1 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i29.p1 ORF type:complete len:908 (+),score=219.50 TRINITY_DN0_c0_g2_i29:195-2918(+) >TRINITY_DN0_c0_g2_i25.p1 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i25.p1 ORF type:complete len:958 (+),score=214.17 TRINITY_DN0_c0_g2_i25:148-2874(+) >TRINITY_DN0_c0_g2_i57.p1 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i57.p1 ORF type:complete len:951 (+),score=213.30 TRINITY_DN0_c0_g2_i57:148-2853(+) >TRINITY_DN0_c0_g2_i16.p1 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i16.p1 ORF type:complete len:879 (+),score=205.19 TRINITY_DN0_c0_g2_i16:148-2637(+) >TRINITY_DN0_c0_g2_i65.p1 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i65.p1 ORF type:complete len:872 (+),score=204.32 TRINITY_DN0_c0_g2_i65:148-2616(+) >TRINITY_DN0_c0_g2_i7.p1 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i7.p1 ORF type:complete len:821 (+),score=199.81 TRINITY_DN0_c0_g2_i7:590-3052(+) >TRINITY_DN0_c0_g2_i41.p1 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i41.p1 ORF type:complete len:833 (+),score=196.98 TRINITY_DN0_c0_g2_i41:148-2499(+) >TRINITY_DN0_c0_g2_i18.p1 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i18.p1 ORF type:complete len:826 (+),score=196.11 TRINITY_DN0_c0_g2_i18:148-2478(+) >TRINITY_DN0_c0_g2_i30.p1 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i30.p1 ORF type:complete len:777 (+),score=190.38 TRINITY_DN0_c0_g2_i30:195-2525(+) >TRINITY_DN0_c0_g2_i48.p1 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i48.p1 ORF type:complete len:770 (+),score=189.50 TRINITY_DN0_c0_g2_i48:195-2504(+) >TRINITY_DN0_c0_g2_i26.p1 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i26.p1 ORF type:complete len:652 (+),score=147.66 TRINITY_DN0_c0_g2_i26:148-1956(+) >TRINITY_DN0_c0_g2_i74.p1 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i74.p1 ORF type:complete len:645 (+),score=146.79 TRINITY_DN0_c0_g2_i74:148-1935(+) >TRINITY_DN0_c0_g2_i68.p1 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i68.p1 ORF type:complete len:295 (+),score=72.21 TRINITY_DN0_c0_g2_i68:195-1079(+) >TRINITY_DN0_c0_g2_i68.p2 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i68.p2 ORF type:3partial len:215 (+),score=52.78 TRINITY_DN0_c0_g2_i68:1432-2073(+) >TRINITY_DN0_c0_g2_i22.p2 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i22.p2 ORF type:complete len:116 (+),score=11.57 TRINITY_DN0_c0_g2_i22:3672-4019(+) >TRINITY_DN0_c0_g2_i7.p5 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i7.p5 ORF type:complete len:74 (+),score=2.15 TRINITY_DN0_c0_g2_i7:179-400(+) >TRINITY_DN0_c0_g2_i18.p4 TRINITY_DN0_c0_g2~~TRINITY_DN0_c0_g2_i18.p4 ORF type:complete len:66 (+),score=1.99 TRINITY_DN0_c0_g2_i18:3933-4130(+)
score=220.37 TRINITY_DN0_c0_g2_i14.p1
915 1 913 A2Q5Q7
901
1 901 940
1.30E-285 989.6 Not CCR4-Not complex component, N-terminal; tRNA-binding arm
score=219.50 TRINITY_DN0_c0_g2_i29.p1
908 1 906 A2Q5Q7
901
1 901 940
1.10E-279 969.9 Not CCR4-Not complex component, N-terminal; tRNA-binding arm
score=214.17 TRINITY_DN0_c0_g2_i25.p1
909 1 907 A2Q5Q7
901 1 901 927
4.80E-251 874.8 Not CCR4-Not complex component, N-terminal; tRNA-binding arm
score=213.30 TRINITY_DN0_c0_g2_i57.p1
902 1 900 A2Q5Q7
901
1 901 927
3.90E-245 855.1 Not CCR4-Not complex component, N-terminal; tRNA-binding arm
score=205.19 TRINITY_DN0_c0_g2_i16.p1
830 1 826 A2Q5Q7
901
1 820 846
5.80E-203 714.9 Not CCR4-Not complex component, N-terminal; tRNA-binding arm
score=204.32 TRINITY_DN0_c0_g2_i65.p1
823 1 819 A2Q5Q7
901
1 820 846
3.60E-197 695.7 Not CCR4-Not complex component, N-terminal; tRNA-binding arm
score=199.81 TRINITY_DN0_c0_g2_i7.p1
821 1 819 A2Q5Q7
901 95 901 846
3.70E-234 818.5 Not CCR4-Not complex component, N-terminal; tRNA-binding arm
score=196.98 TRINITY_DN0_c0_g2_i41.p1
784 1 781 A0A3B6I0Y8
763
1 736 800
3.90E-185 655.6 Not3 domain-containing protein
score=196.11 TRINITY_DN0_c0_g2_i18.p1
777 1 774 A0A3B6I0Y8
763
1 736 800
2.40E-179 636.3 Not3 domain-containing protein
score=190.38 TRINITY_DN0_c0_g2_i30.p1
777 1 775 A2Q5Q7
901
1 901 940
1.50E-213
750 Not CCR4-Not complex component, N-terminal; tRNA-binding arm
score=189.50 TRINITY_DN0_c0_g2_i48.p1
770 1 768 A2Q5Q7
901
1 901 940
1.20E-207 730.3 Not CCR4-Not complex component, N-terminal; tRNA-binding arm
score=147.66 TRINITY_DN0_c0_g2_i26.p1
603 1 590 A0A0D2U011 623
1 558 597
9.70E-184 650.6 Not3 domain-containing protein
Annotation/Functional Analysis
TransDecoder predicted coding regions
Computational Genomics 2020
Week 9
# 28
Query 1 MGASRKLQGEIDRVLKKVQEGVDVFDSIWNKVYDTDNANQKEKFEADLKKEIKKLQRYRD 60 MGASRKLQGEIDRVLKKVQEGV+VFDSIWNKVYDTDNANQKEKFEADLKKEIKKLQRYRD Sbjct 1 MGASRKLQGEIDRVLKKVQEGVEVFDSIWNKVYDTDNANQKEKFEADLKKEIKKLQRYRD 60 Query 61 QIKTWIQSSEIKDKKVSASYEQALLESRKQIEREMERFKVCEKETKTKAFSKEGLVQQPK 120 QIKTWIQSSEIKDKKVSASYEQAL+++RK IEREMERFK+CEKETKTKAFSKEGL QQPK Sbjct 61 QIKTWIQSSEIKDKKVSASYEQALVDARKLIEREMERFKICEKETKTKAFSKEGLGQQPK 120 Query 121 TDPKEKAKSETRDWLNNVVGELESQIDNFEAELEGLFVKKGKTRPPRLTHLETSIVRHKA 180 TDP+EKAKSETRDWLNNVVGELESQIDNFEAELEGL VKKGK RP RLTHLETSI RHKA
Sbjct 121 TDPREKAKSETRDWLNNVVGELESQIDNFEAELEGLTVKKGKNRPSRLTHLETSITRHKA 180
Query 181 HIMKLELILRLLDNDELSPDQVNDVKDFLDDYVERNQEQFDEFSDVDELYSSLPLDKVES 240 HI K EL+LRLLDNDELSP++VNDVKDFLDDYVERNQ+ FDEF DVDELYSSLPLDKV++
Sbjct 181 HIKKCELVLRLLDNDELSPEEVNDVKDFLDDYVERNQDDFDEFDDVDELYSSLPLDKVDT 240 Query 241 LEDLVAIGTPALVVKGVS--PISTGSAV---LSLKTSVATSPTHSSA 282 LEDLV I T V K +S P+ G + LSLKT +A S + S++ Sbjct 241 LEDLVTIPTSVAVAKTISSLPLDEGKTLEDLVTIPTGLAKVAPGLSLKTPLAASASQSAS 300 Query 283 TLPSTAQQVSSVQDQAEETASQDSNSDSAPRTPPSKSGMMGSSVSSVSSAVGSIPTGSNT 342 S +QA+ETASQDSNSD +TPP KSG + SS S+ PTG++ Sbjct 301 ---SQTSEQADETASQDSNSDIVAKTPPPKSGGISSSTST---PTGNH- 342 Query 343 TVATPAR-NLAG----GSTASAILSGPGYIRGVMENAPAAVSSSLANLSSSVQEDDVSSF 397
ATPA N++G + A+AILG +R ++ENA + N S+S +E+++++F
Sbjct 343 --ATPASVNVSGLNLSSAPAAAILPGSNSVRNILENA---IVNQSTSPKEEEINNF 393 Query 398 PGRRSSPALPEIGIGKGIGRGSVVAGLSSPVSGVSLNLTSGNGLPSNGALGTTPVVSDMA 457 P RR SP+L + + + GR S+ S + S+ L SGN + S GALG P S++ Sbjct 394 PTRRPSPSLSDAALVR--GRNSL---SNQATASIPLGSGNTVSSIGALGVVPSASEIT 446 Query 458 KRNLLGADERIGNG--AQPLVSPLSNRMLLQQVSKTMDGIVSSDSNNIGE-GVTAGRTFS 514 KRN+LGAD+R+G+ QPLVSPLSNR++L Q+ K DG S DS+ + E +GR FS Sbjct 447 KRNILGADDRLGSSGMVQPLVSPLSNRLILPQIGKANDGAASVDSSIVNEAAAVSGRVFS 506 Query 515 PSAVSGVQWRPQSPSSFQNQNEMGQFRGRTEIAPDQREKFLQRLQQVQQQGHSNLLGVSH 574 PS V G+QWRP SP FQNQN+ GQ RGRTEIAPDQREKFLQ+ QQVQQQG S LL + Sbjct 507 PSVVPGMQWRPGSP--FQNQNDAGQLRGRTEIAPDQREKFLQKFQQVQQQGPSTLLNMPS 564 Query 575 LPGANHKQFPTQ---QQFNSQSSSLSPQVGLGLGVQSSVGLTAVTSSSLQQQSAIHQ 628 L G NHKQF +Q QQFNSQ SS+S Q +GLG QS L ++S SLQQ +++H Sbjct 565 LVGGNHKQFSSQQQSPLLQQFNSQGSSVSSQSSMGLGAQSP-SLGGISSVSLQQLNSVHS 623 Query 629 QSAQHALMPAGPRDTDAAQVKIEDQQQQHNSSDDVNTELATNPELNKILMNEDDLKTSYM 688 S QH +D D K E+ QQ N D+ TE ++ + K L EDDLK++Y Sbjct 624 PSGQHPFAGVA-KDAD----KFEEHQQHQNFPDESTTESTSSTGIGKNLTVEDDLKSAYA 678 Query 689 ----AGGTGSSKDATQVPRDTDLSPRQPLPFNQSSADLGVIGRRSVPDLGAIGDNLSQST 744 AG + S +A Q RD DLSP QPL NQS+ +LGVIGRR+ +LGAIGD+ S+ Sbjct 679 LDSPAGLSASLPEAAQTFRDIDLSPGQPLQSNQSTGNLGVIGRRNGVELGAIGDSFGASS 738 Query 745 VNNGLMQERLYSLQMLDAAYHRLPQSKDSERAKNYTPRHPTKTPASFPQVQAPIVDNPAF 804 VN+G ++++LY+LQML+AA+ R+PQ +DSER + YTPRHP TP+S+PQVQAPIV+NPAF
Sbjct 739 VNSGGVRDQLYNLQMLEAAHFRMPQPRDSERPRTYTPRHPAITPSSYPQVQAPIVNNPAF 798
Query 805 WERLSLDSVGTDTLFFAFYYQQNTYQQYLAARELKKQSWRYHRKYSTWFQRHEEPKVTTD 864 WERL L+ GTDTLFFAFYYQQNTYQQYLAA+ELKKQSWRYHRKY+TWFQRHEEPKV TD
Sbjct 799 WERLGLEPFGTDTLFFAFYYQQNTYQQYLAAKELKKQSWRYHRKYNTWFQRHEEPKVATD 858
Query 865 EYEQGTYVYFDFHIANDDLNHGWCQRIKTEFTFEYSYLEDELL 907 +YEQGTYVYFDFHIANDDL HGWCQRIK +FTFEY+YLEDEL+
Sbjct 859 DYEQGTYVYFDFHIANDDLQHGWCQRIKNDFTFEYNYLEDELV 901 Query 1 MGASRKLQGEIDRVLKKVQEGVDVFDSIWNKVYDTENANQKEKFEADLKKEIKKLQRYRD 60 MGASRKLQGEIDRVLKKVQEGV+VFDSIWNKVYDT+NANQKEKFEADLKKEIKKLQRYRD Sbjct 1 MGASRKLQGEIDRVLKKVQEGVEVFDSIWNKVYDTDNANQKEKFEADLKKEIKKLQRYRD 60 Query 61 QIKTWIQSSEIKDKKVSASYEQALLDARKIIEREMERFKVCEKETKTKAFSKEGLGQQPK 120 QIKTWIQSSEIKDKKVSASYEQAL+DARK+IEREMERFK+CEKETKTKAFSKEGLGQQPK Sbjct 61 QIKTWIQSSEIKDKKVSASYEQALVDARKLIEREMERFKICEKETKTKAFSKEGLGQQPK 120 Query 121 TDPKEKAKSETRDWLNNVVSELESQVDNFEAEIEGLSFKKGKTRPPRLTHLETSIVRHKA 180 TDP+EKAKSETRDWLNNVV ELESQ+DNFEAE+EGL+ KKGK RP RLTHLETSI RHKA
Sbjct 121 TDPREKAKSETRDWLNNVVGELESQIDNFEAELEGLTVKKGKNRPSRLTHLETSITRHKA 180
Query 181 HIMKLELILRLLDNDELSPDQVNDVRDFLEDYVERNQEQFDEFSDVDELYNTLPLDKVES 240 HI K EL+LRLLDNDELSP++VNDV+DFL+DYVERNQ+ FDEF DVDELY++LPLDKV++
Sbjct 181 HIKKCELVLRLLDNDELSPEEVNDVKDFLDDYVERNQDDFDEFDDVDELYSSLPLDKVDT 240 Query 241 LEDLVAIGPP-ALVKGVTSVP---AAGAVLGLKTSLATSATQLPATSP--STAQQGAS 292 LEDLV I A+ K ++S+P ++ + T LA A L +P ++A Q AS Sbjct 241 LEDLVTIPTSVAVAKTISSLPLDEGKTLEDLVTIPTGLAKVAPGLSLKTPLAASASQSAS 300 Query 293 IQ--DQAEETASQDSNSDVILRTPPSKNGVMGSSVSSSTTAIGSATPAGSNIATAAGNIS 350 Q +QA+ETASQDSNSD++ +TPP K+G +SSST +TP G++ A+ N+S Sbjct 301 SQTSEQADETASQDSNSDIVAKTPPPKSG----GISSST---STPTGNHATPASVNVS 351 Query 351 AHSLVGGPTASAIL--SSPVRGTMDNTTAAASQPPVNLPSSIKEDENATVPNRRPSPALA 408 +L P A+AIL S+ VR ++N VN +S KE+E P RRPSP+L+ Sbjct 352 GLNLSSAP-AAAILPGSNSVRNILENAI---VNQSTSPKEEEINNFPTRRPSPSLS 403 Query 409 DVGLAKAIGRGSAVGGMSSQ-LSGISLSSGNGIPSDAALGGGPTVSDIAKHNILGADERI 467
D L + GR S +S+Q + I L SGN + S ALG P+ S+I K NILGAD+R+
Sbjct 404 DAALVR--GRNS----LSNQATASIPLGSGNTVSSIGALGVVPSASEITKRNILGADDRL 457 Query 468 G-NGSLQPLVSPLSNRMLLQPASRASDGTVSTESSNVGDSTVIGGRVFSPS-VPGVQWKP 525 G +G +QPLVSPLSNR++L +A+DG S +SS V ++ + GRVFSPS VPG+QW+P Sbjct 458 GSSGMVQPLVSPLSNRLILPQIGKANDGAASVDSSIVNEAAAVSGRVFSPSVVPGMQWRP 517 Query 526 HNTGSFPNTNEMGQFRGRTEIAPDQREKFLQRLQQV-QQGHSTLLGVPHLAGANHKQFAT 584 + F N N+ GQ RGRTEIAPDQREKFLQ+ QQV QQG STLL +P L G NHKQF++ Sbjct 518 GS--PFQNQNDAGQLRGRTEIAPDQREKFLQKFQQVQQQGPSTLLNMPSLVGGNHKQFSS 575 Query 585 QPQSSLLQQFNSQSSPVSPQVGLGPGVQ--SLAGATATSSSLQITMHQQSGQHALLSVGP 642 Q QS LLQQFNSQ S VS Q +G G Q SL G ++ S ++H SGQH V Sbjct 576 QQQSPLLQQFNSQGSSVSSQSSMGLGAQSPSLGGISSVSLQQLNSVHSPSGQHPFAGVA- 634 Query 643 KDTDAAHVKVEDQQQHQNPSDDLKTEPATNSGLSKNLMNEDDLKFSYAADTPSGGSGPLT 702 KD D K E+ QQHQN D+ TE +++G+ KNL EDDLK +YA D+P+G S L Sbjct 635 KDAD----KFEEHQQHQNFPDESTTESTSSTGIGKNLTVEDDLKSAYALDSPAGLSASLP 690 Query 703 EAVHEPRDVDLSPRQPLQSNQSSAGLGVIGRRSVSDLGAIGDNLSASTANSGAIQEQLYN 762 EA RD+DLSP QPLQSNQS+ LGVIGRR+ +LGAIGD+ AS+ NSG +++QLYN
Sbjct 691 EAAQTFRDIDLSPGQPLQSNQSTGNLGVIGRRNGVELGAIGDSFGASSVNSGGVRDQLYN 750
Query 763 LQMLEAAFCKLPQPKDSERTKHYIPRHPVKTPPSFPQVPAPVVDNPAFWERLSLEPLGTD 822 LQMLEAA ++PQP+DSER + Y PRHP TP S+PQV AP+V+NPAFWERL LEP GTD
Sbjct 751 LQMLEAAHFRMPQPRDSERPRTYTPRHPAITPSSYPQVQAPIVNNPAFWERLGLEPFGTD 810
Query 823 TLFFAFYYQPNTYQQYLAARELKKQSWRYHRKYSTWFQRHEEPKVTTDEYEQGTYVYFDF 882 TLFFAFYYQ NTYQQYLAA+ELKKQSWRYHRKY+TWFQRHEEPKV TD+YEQGTYVYFDF
Sbjct 811 TLFFAFYYQQNTYQQYLAAKELKKQSWRYHRKYNTWFQRHEEPKVATDDYEQGTYVYFDF 870
Query 883 HVANDDSQNGWCQRIKTEFTFEYLYLEDELV 913 H+ANDD Q+GWCQRIK +FTFEY YLEDELV
Sbjct 871 HIANDDLQHGWCQRIKNDFTFEYNYLEDELV 901