• No results found

Uncertainty and Sensitivity Analysis of Dynamic Biological Systems under Data Sparseness.

N/A
N/A
Protected

Academic year: 2020

Share "Uncertainty and Sensitivity Analysis of Dynamic Biological Systems under Data Sparseness."

Copied!
139
0
0

Loading.... (view fulltext now)

Full text

(1)

ABSTRACT    

BASNET,  LOCHAN.  Uncertainty  and  Sensitivity  Analysis  of  Dynamic  Biological  Systems  under   Data  Sparseness.  (Under  the  direction  of  Joel  J.  Ducoste)  

   

Systems  biology  integrates  experimental  observations  with  mathematical  models  to  infer  

the  behavior  of  biological  organisms  at  the  genetic  and  metabolic  levels.  Ordinary  

Differential  Equation  (ODE)  based  dynamic  models  are  frequently  used  to  describe  these  

biological  behaviors  under  dynamic  and  steady  state  conditions.  ODE-­‐based  models  are  

flexible  in  structure  but  may  involve  a  large  number  of  parameters.  Under  data  sparseness  

from  the  lack  of  experimental  replicates  under  transient  conditions,  the  parameters  that  are  

estimated  have  uncertainties  associated  with  them.  Parameter  uncertainties  get  

propagated  through  the  model  into  the  outputs  resulting  in  misleading  predictions  and  

inferences.  A  comprehensive  understanding  of  these  uncertainties  can,  however,  prevent  

such  misleading  deductions  as  well  as  provide  valuable  information  about  the  behavior  of  

the  biological  systems.  Given  the  scope  of  dynamic  models  in  biological  studies,  such  

comprehensive  uncertainty  studies  for  these  models  are  lacking.  

 

In  this  research,  our  aim  was  to  perform  a  comprehensive  uncertainty  study  of  ODE-­‐based  

dynamic  models  under  data  sparseness.  Through  such  study,  our  goal  was  to  quantify  the  

uncertainty  in  the  model  parameters  and  outputs,  assess  the  identifiability  of  these  

(2)

output  uncertainties  under  data  sparseness.  We  studied  two  ODE-­‐based  dynamic  models:  

the  yeast  synthetic  network  model  in  Saccharomyces  cerevisiae  and  a  preliminary  iron   deficiency  response  model  in  Arabidopsis  thaliana.    In  this  research,  an  assessment  of  the   accuracy  of  these  methods  was  provided  to  deduce  the  biological  information  for  both  

systems.  As  part  of  this  assessment,  we  demonstrated  the  applicability  of  the  Morris  

Screening  method  for  sensitivity  analysis  of  dynamic  models.  Through  the  bootstrap  

method  of  uncertainty  analysis,  we  calculated  confidence  intervals  that  encapsulated  true  

parameter  values  as  well  as  the  true  expression  of  the  genes  in  the  yeast  synthetic  network  

model.  Assessment  of  practical  identifiability  using  profile  likelihood  showed  that  the  data  

sparseness  was  high  for  both  system  models.  Finally,  the  Morris  Screening  method  was  able  

to  identify  the  important  and  the  unimportant  parameters  based  on  their  contribution  to  

the  output  uncertainties  for  both  of  the  models.  Based  on  the  results  from  Morris  

Screening,  we  were  able  to  explain  the  regulatory  effects  of  SWI5  and  ASH1  genes  on  CBF1  

gene  in  the  yeast  synthetic  network  model.  For  the  iron  transcription  factors  in  the  

Arabidopsis  thaliana  gene  regulatory  network,  we  identified  a  clear  influence  of  iron  effect   in  the  gene  expression  for  COL4  and  of  constitutive  transcription  and  degradation  in  the  

gene  expression  of  bHLH104,  however,  similar  deductions  could  not  be  made  for  the  

remaining  transcription  factors  in  the  network.  Sensitivity  results  identified  the  least  

important  parameters  in  the  yeast  synthetic  network  model  that  could  be  screened  out  for  

the  purpose  of  addressing  data  sparseness  in  any  future  analyses.  

(3)

                                 

Ó  Copyright  2017  Lochan  Basnet  

(4)

Uncertainty  and  Sensitivity  Analysis  of  Dynamic  Biological  Systems   under  Data  Sparseness  

       

by   Lochan  Basnet    

     

A  thesis  submitted  to  the  Graduate   Faculty  of  North  Carolina  State  University  

in  partial  fulfillment  of  the   requirements  for  the  Degree  of  

Master  of  Science    

 

Civil  Engineering    

   

Raleigh,  North  Carolina    

2017    

   

APPROVED  BY:    

   

_________________________                _________________________  

Cranos  M.  Williams,  PhD                                                                                                                                  Ranji  Ranjithan,  PhD                      

   

_________________________                                                                                                                                                                                                                                 Joel  J.  Ducoste,  PhD  

(5)

DEDICATION        

To  my  sister  Ritu  Basnet,  

Thank  you  for  the  support,  love  and  guidance  in  every  walk  of  my  life.    

     

To  my  mother  and  father,  

Thank  you  for  your  love,  care  and  all  the  opportunities  you  have  provided   that  I  feel  blessed  of.  

(6)

BIOGRAPHY  

Lochan  Basnet  was  born  in  Pokhara,  Nepal.  He  completed  his  undergraduate  studies  

as  a  Civil  Engineer  in  year  2013  from  Nepal.  In  the  fall  of  2015,  he  joined  North  Carolina  

State  University  and  began  his  pursuit  of  Master  of  Science  in  Civil  Engineering  under  the  

guidance  of  Dr.  Ducoste.  

   

(7)

ACKNOWLEDGEMENTS  

I  would  first  like  to  thank  my  advisor  Dr.  Joel  J.  Ducoste  for  providing  me  the  opportunity  to  

be  a  part  of  such  wonderful  research.  I  am  grateful  for  his  continuous  support,  

encouragement,  guidance,  and  valuable  advice  without  which  this  thesis  would  not  have  

been  possible.  You  are  a  great  teacher  and  a  wonderful  person.    

 

I  also  want  to  thank  my  thesis  committee  member  and  the  principal  investigator  for  the  

research  Dr.  Cranos  M.  Williams  for  the  critical  analysis  of  my  work  and  the  valuable  insights  

that  has  helped  me  throughout  my  study.  I  am  very  grateful  to  the  entire  INSPIRE  group  for  

providing  me  the  data  and  the  model  used  in  this  study.    

 

I  am  thankful  to  Dr.  Ranji  Ranjithan  for  taking  time  to  serve  in  my  thesis  committee.  I  am  

also  thankful  to  all  the  faculty  and  staff  of  the  Department  of  Civil  Engineering.  Many  thanks  

to  Renee  Howard  for  all  the  timely  information  that  helped  me  remain  in  right  track  

throughout  the  pursuit  of  my  Master’s  degree.  

 

I  would  like  to  thank  the  National  Science  Foundation  for  the  financial  support.  

(8)

TABLE  OF  CONTENTS  

LIST  OF  TABLES  ...  vii  

LIST  OF  FIGURES  ...  viii  

SUPPLEMENTAL  TABLES  ...  ix  

SUPPLEMENTAL  FIGURES  ...  x  

CHAPTER  1  ...  1  

Introduction  ...  1  

1.1   Objectives  ...  3  

CHAPTER  2  ...  5  

Background  ...  5  

2.1   Dynamic  system  models  and  uncertainty  ...  5  

2.2   Uncertainty  Analysis  ...  5  

2.2.1   Non-­‐sampling  based  Uncertainty  Analysis  ...  7  

2.2.2   Sampling  based  Uncertainty  Analysis  ...  8  

2.3   Sensitivity  Analysis  ...  10  

2.3.1   Local  Sensitivity  Analysis  ...  11  

2.3.2   Global  Sensitivity  Analysis  ...  12  

2.4   Preliminary  Iron  deficiency  response  model  in  Arabidopsis  thaliana  ...  14  

CHAPTER  3  ...  16  

Methods  ...  16  

3.1   Model  ...  16  

3.1.1   Yeast  Synthetic  Network  Model  in  Saccharomyces  cerevisiae  ...  16  

3.1.2   Preliminary  Iron  Deficiency  Response  Model  in  Arabidopsis  thaliana  ...  19  

3.2   Bootstrap  Sampling  for  Data  Generation  ...  19  

3.3   Parameter  Estimation  ...  21  

3.4   Uncertainty  Analysis  ...  21  

3.5   Identifiability  Analysis  ...  22  

3.6   Sensitivity  Analysis  ...  23  

3.6.1   Extension  to  Morris  Method  ...  24  

CHAPTER  4  ...  26  

(9)

4.1   Introduction  ...  27  

4.2   Background  ...  30  

4.2.1   Dynamic  system  models  and  uncertainty  ...  30  

4.2.2   Uncertainty  Analysis  ...  30  

4.2.3   Sensitivity  Analysis  ...  32  

4.2.4   Iron  deficiency  response  model  in  Arabidopsis  thaliana  ...  33  

4.3   Methods  ...  35  

4.3.1   Model  ...  35  

4.3.2   Bootstrap  Sampling  for  Data  Generation  ...  37  

4.3.3   Parameter  Estimation  ...  39  

4.3.4   Uncertainty  Analysis  ...  39  

4.3.5   Identifiability  Analysis  ...  39  

4.3.6   Morris  Screening  for  Sensitivity  Analysis  ...  40  

4.4   Results  ...  42  

4.4.1   Yeast  Synthetic  Model  in  Saccharomyces  cerevisiae  ...  42  

4.4.2   Iron  Deficiency  Model  in  Arabidopsis  thaliana  ...  48  

4.5   Discussion  ...  55  

4.6   Conclusion  ...  58  

CHAPTER  5  ...  59  

Conclusion  and  Future  Work  ...  59  

BIBLIOGRAPHY  ...  62  

APPENDIX  ...  74    

(10)

LIST  OF  TABLES  

Table  2.1:  Comparison  of  frequently  used  UA  techniques  ...  7  

Table  2.3:  Comparison  of  commonly  used  SA  methods  ...  11  

Table  4.1:  Parameters  ranked  by  Savage  scores  under  Fe-­‐  condition  ...  55  

 

(11)

LIST  OF  FIGURES  

Figure  1:     Iron  Deficiency  Gene  Regulatory  Network  in  Arabidopsis  thaliana:  ...  15  

Figure  2:     Yeast  Synthetic  Network  in  Saccharomyces  cerevisiae:  ...  18  

Figure  4.1:     Iron  Deficiency  Gene  Regulatory  Network  in  Arabidopsis  thaliana:  ...  34  

Figure  4.2:     Yeast  Synthetic  Network  in  Saccharomyces  cerevisiae:  ...  36  

Figure  4.3     Boxplots  of  few  parameters  in  Yeast  synthetic  model  ...  43  

Figure  4.4:     Profile  Likelihood  plots  for  assessing  practical  identifiability  ...  45  

Figure  4.5:     Output  Uncertainty  for  CBF1  ...  46  

Figure  4.6:     Parameter  distributions  for  COL4  parameters:  ...  50  

Figure  4.7:     Uncertainty  for  COL4  ...  52  

 

(12)

SUPPLEMENTAL  TABLES  

Table  S1:  Comparison  of  frequently  used  UA  techniques  ...  122  

Table  S2:  Comparison  of  commonly  used  SA  methods  ...  123  

Table  S3:  Coefficient  of  Variation  for  the  30  parameters  in  the  Yeast  synthetic  network  

model  in  Saccharomyces  cerevisiae  ...  124   Table  S4:  Sensitivity  rankings  based  on  absolute  mean  elementary  effects  calculated  by  

Morris  Screening  and  the  overall  rank  based  on  the  Savage  scores  for  Yeast  synthetic  

network  model  in  Saccharomyces  cerevisiae  ...  125   Table  S5:  Decay  rates  obtained  from  statistical  analysis  on  the  half-­‐life  experiment...  126  

 

(13)

SUPPLEMENTAL  FIGURES  

Figure  S1:  Comparison  of  model  Dynamics:  ...  75  

Figure  S2:  Bootstrap  size  selection  ...  78  

Figure  S3:  Boxplots  for  the  30  parameters  of  Yeast  synthetic  network  model  in  

Saccharomyces  cerevisiae  ...  79  

Figure  S4:  Uncertainty  in  the  gene  expression  outputs  for  the  5  genes  in  Yeast  synthetic  

network  model  ...  83  

Figure  S5:  Time  varying  sensitivity  diagrams  for  GAL4  ...  87  

Figure  S6:  Parameter  uncertainties  in  the  iron  deficiency  response  model  in  Arabidopsis   thaliana  ...  88   Figure  S7:  Profile  Likelihood  plots  for  identifiability  analysis  ...  95  

Figure  S8:  Total  uncertainty  in  gene  expressions  under  Fe-­‐  for  genes  in  iron  deficiency  

response  model  in  Arabidopsis  thaliana  ...  114   Figure  S9:  95%  confidence  intervals  for  gene  expressions  under  Fe  +  (dark  grey  region)  and  

(14)

CHAPTER  1  

Introduction  

Systems  Biology  [1],  [2]  aims  to  understand  the  behavior  of  interacting  biological  system  

components  by  gathering  experimental  observations  and  deciphering  the  implication  of  

these  interactions  with  mathematical  reasoning  [3].  These  observations  and  reasoning  can  

be  used  to  understand  and  develop  gene  regulatory  networks  and  perform  metabolic  

pathway  analyses  to  explain  how  organisms  function  relative  to  environmental  stresses  or  

under  transgenic  conditions  [4]–[6].  

 

 At  the  genetic  level,  high-­‐throughput  experimental  methods  like  microarray,  qRT-­‐PCR  [7],  

and  RNA-­‐SEQ  [8],  [9]  are  primarily  employed  to  collect  information  required  to  develop  a  

gene  regulatory  network  and  to  help  build  predictive  mathematical  relationships.    However,  

much  of  these  experimental  techniques  are  costly  and  sometimes  time  consuming  to  repeat  

thus  resulting  in  insufficient  collection  of  data.  This  insufficient  data  poses  a  barrier  for  

researchers  studying  a  biological  system.  The  barrier  is  even  greater  when  complex  dynamic  

mathematical  models  are  used  to  identify  and  understand  the  interactions  occurring  in  the  

biological  system.  

 

Ordinary  Differential  Equations  (ODEs),  a  type  of  mathematical  model,  are  frequently  

employed  as  dynamic  models  to  track  important  inputs  and  their  influence  on  transient  

(15)

important  to  estimate  to  achieve  the  best  model  fit  to  the  experimental  data.  The  lack  of  

sufficient  experimental  data,  however,  leads  to  a  condition  of  sparseness  where  the  

number  of  model  parameters  exceeds  the  available  experimental  data.  The  parameter  

values  obtained  under  such  sparse  conditions  can  have  uncertainties  associated  with  them  

that  lead  to  inaccurate  model  outputs  [4],  [14]–[17].  If  these  uncertainties  in  parameters  

are  ignored,  then  erroneous  conclusions  can  be  inferred  from  the  modeling  results.  The  

need  to  understand  and  quantify  the  uncertainties  related  to  mathematical  models  have  led  

to  the  introduction  of  uncertainty  analysis  and  sensitivity  analysis  [18]–[20].  These  

approaches  have  been  extensively  used  in  other  areas  such  as  risk  analysis  [21]–[23].    

 

Several  studies  that  make  use  of  the  uncertainty  and  sensitivity  analyses  in  systems  biology  

have  been  reported  [24]–[28].  However,  these  studies  seem  to  be  lacking  one  or  more  

points  that  might  cause  confusion  among  researchers  looking  to  incorporate  the  

uncertainty  and  sensitivity  analyses  in  their  studies.  These  points  include  the  following:  

•   The  quantification  of  output  uncertainty  due  to  variability  in  parameters  involves  

two  steps:  parameter  uncertainty  quantification  and  forward  propagation  of  

parameter  uncertainty.  Forward  propagation  refers  to  propagating  the  uncertainty  

of  the  parameters  by  applying  the  uncertainty  in  the  model  equations.  However,  in  

most  of  the  systems  biology  studies,  the  parameter  uncertainties  are  not  calculated  

(16)

differently  under  varied  conditions,  the  assumed  parameter  uncertainties  might  not  

be  accurate  thus  resulting  in  an  incorrect  assessment  of  output  uncertainties.  

•   A  comprehensive  understanding  of  uncertainty  includes  quantifying  the  variation  in  

the  parameters  and  the  outputs  (i.e.  uncertainty  analysis/quantification),  

determining  the  importance  of  parameters  in  relation  to  the  change  in  output  (i.e.  

sensitivity  analysis),  and  assessing  if  the  parameters  are  identifiable  (i.e.  

identifiability  analysis).  These  three  analyses  provide  different  information  about  

the  model  and  the  system  under  study.  The  information  obtained  is  often  

complementary  and  exclusive  thus  assisting  in  better  understanding  the  model  and  

the  system  behavior  under  study.  However,  most  system  biology  studies  focus  on  

only  one  of  these  aspects  of  uncertainty  that  potentially  results  in  an  incomplete  

understanding  of  the  system.    

•   Sensitivity  analysis  for  dynamic  models  include  studies  demonstrating  the  

application  of  variance-­‐based  methods  [29],  [30].  However,  researchers  have  shown  

that  these  methods  are  computationally  intensive  and  infeasible  for  complex  

systems.  Screening  methods,  especially  the  method  of  Morris,  is  considered  to  be  

an  efficient  alternative.  However,  the  practical  relevance  of  Morris  Screening,  

particularly  to  biological  models,  is  somewhat  unclear.    

 

1.1   Objectives  

(17)

•   To  affirm  the  need  to  conduct  uncertainty  analysis  in  the  presence  of  data  

sparseness;  

•   To  perform  the  uncertainty  analysis,  the  sensitivity  analysis  and  the  identifiability  

analysis  on  a  dynamical  system  model  to  illustrate  a  comprehensive  understanding  

of  mathematical  model  uncertainty;  

•   To  illustrate  the  use  of  Morris  Screening  in  dynamic  models;  and  

•   To  use  the  uncertainty  information  to  develop  a  better  understanding  of  iron  effects  

on  gene  expression  of  transcription  factors  in  the  Arabidopsis  thaliana  gene   regulatory  network    

•   To  assess  whether  the  parameter  sensitivity  analysis  can  be  used  to  decrease  the  

sparseness  in  a  dynamical  biosystem  model.    

A  successful  completion  of  these  analyses  will  provide  great  insight  into  the  model  behavior  

and  the  model  parameters.  This  information  in  turn  will  help  decipher  the  different  

biological  interactions  and  properties  of  the  biosystems  under  study.  

(18)

CHAPTER  2  

Background  

2.1   Dynamic  system  models  and  uncertainty  

There  are  a  significant  number  of  different  models  for  biological  systems.  Broadly,  these  

models  are  classified  under  two  types:  static  models  and  dynamic  models.  Although  a  

successful  history  of  static  models  have  been  applied  to  systems  biology  studies  [31],  [32],  

the  use  of  dynamic  models  has  a  greater  potential  for  success.  The  reason  for  greater  

success  is  that  a  dynamic  model  can  provide  the  systems  behavior  over  time,  unlike  static  

models  that  provide  information  only  at  a  particular  state  [33].  However,  dynamic  models  

are  limited  and  typically  include  a  large  number  of  parameters  that  need  to  be  estimated  

with  sparse  data.  Under  such  sparse  conditions,  large  uncertainties  are  associated  with  the  

parameters  as  well  as  the  predictions.  These  uncertainties  could  lead  to  incorrect  inferences  

about  the  system.  To  ensure  better  understanding  of  the  system,  the  uncertainties  in  the  

dynamic  models  must  be  understood  properly.  However,  studies  that  perform  a  

comprehensive  assessment  of  uncertainties  in  dynamic  models  are  lacking.    

 

2.2   Uncertainty  Analysis  

Parameter  uncertainty  analysis  is  important  in  the  evaluation  of  any  mathematical  model  

and  critically  important  when  these  models  are  evaluated  with  sparse  data  sets.  Uncertainty  

Analysis  (UA)  consists  of  two  components:  parameter  uncertainty  quantification  and  

(19)

the  uncertainty  would  be  to  initially  assess  the  parameter  uncertainty  and  then  propagate  

this  uncertainty  into  the  model  to  determine  the  model  output  uncertainty.  Yet,  most  of  the  

uncertainty  analysis  performed  in  systems  biology  have  only  utilized  forward  propagation  

based  on  some  form  of  random  sampling  [35].  The  parameter  uncertainty  in  these  studies  is  

either  based  on  a  priori  knowledge  or  is  assumed.  Such  assessments  are  not  accurate  

because  system  behavior  and  properties  vary  for  different  conditions.  Several  uncertainty  

analysis  methods  frequently  used  in  systems  biology  studies  as  well  as  in  other  disciplines  

are  listed  in  Table  2.1.  The  table  summarizes  the  type  of  results  obtained,  the  advantages  

and  the  limitations  of  the  uncertainty  analysis  methods.  In  the  sections  following,  these  

methods  are  discussed  based  on  the  findings  from  previous  studies.  Moreover,  the  

advantages  and  applicability  of  one  of  the  uncertainty  analysis  methods,  Bootstrap  method  

used  in  this  study,  is  highlighted.  

(20)

Table  2.1:  Comparison  of  frequently  used  UA  techniques  

Name   Result   Advantages   Limitations  

Polynomial  Chaos  

Method  (PC)  [36]   Output  distribution   Efficient  compared  to  MCS   -­‐  Can  handle  limited  number  of  parameters   -­‐  Inefficient  for  large  number  of   parameters  

Response  Surface  

Method  (RSM)[35]   Output  distribution   Efficient  compared  to  MCS   -­‐  Requires  correlation  among  selected  parameters  

Bounded-­‐Error  

Method  [37]   Parameter  Bounds   Only  requires  information  about   uncertain  

parameter  bounds  

-­‐  Computationally  inefficient  

Markov  Chain  Monte  Carlo  

Method  [38]   Posterior  distribution   Complete  information  about   parameter   uncertainty  

-­‐  Requires  weak  identifiability  of   parameters  

-­‐  inefficient  

Generalized  Likelihood  

Uncertainty  Estimation  [39]   Posterior  Distribution   Conceptually  simple  and  easy  to   implement  

-­‐  Subjective  posterior  and   Confidence  Interval  (CI)   -­‐  Inefficient  

Monte  Carlo  Sampling  

Method  (MCS)   Output  distribution   Straight  forward  implementation   and  accurate   results  

-­‐  Requires  knowledge  of  priors   -­‐  inefficient  

Fisher  Information  Matrix   Confidence   Intervals  

Computationally   efficient  

-­‐  Unreliable  upper  confidence   bounds  under  non-­‐linearity  

Profile  Likelihood  (PL)  Method   Confidence   Intervals  

Provides  additional   information  on   identifiability  

-­‐  CI  absent  under  non-­‐ identifiability  

Multi-­‐start  Method   Distribution  of   parameters   and  outputs  

Provides  good   approximation  of   parameter  values  

-­‐  Requires  multiple  replicates  

Bootstrap  Method   Distribution  of   parameters   and  outputs  

Generate  multiple   replicates  based  on   experimental  data  

-­‐  Multiple  model-­‐fitting  can  be   inefficient  

2.2.1   Non-­‐sampling  based  Uncertainty  Analysis  

Polynomial  Chaos  (PC)  method  and  Response  Surface  Method  (RSM)  [35]  are  two  popular  

non-­‐sampling  based  methods.  PC,  based  on  the  concept  of  Wiener  [36],  uses  spectral  

(21)

on  applying  and  improving  PC  methods.  These  studies  show  that  PC  methods  are  efficient  

compared  to  Monte-­‐Carlo  methods,  yet  in  the  presence  of  large  number  of  uncertainties,  

these  methods  become  inefficient.  Debusschere  et  al.  [47]  discuss  the  numerical  challenges  

of  applying  PC  methods  and  showed  that  high  PC  orders  are  necessary  to  avoid  negative  

values  for  strictly  non-­‐negative  parameters  and  truncation  errors  at  a  high  computational  

cost.  RSM  [35]  develops  a  response  surface  using  few  selected  parameters  to  describe  the  

relationship  between  parameters  and  output.  RSMs  work  better  when  they  are  highly  

correlated  but  may  not  be  able  to  detect  important  interactions  when  two  parameters  are  

super-­‐additive  [48].  Both,  PC  and  RSM  are  forward  propagating  uncertainty  methods.  

Another,  non-­‐sampling  based  method  is  the  bound-­‐error  method  [37].  Bound-­‐error  method  

is  based  on  interval  analysis  method  which  does  not  require  probability  distribution  

information  of  the  uncertain  parameters.  It  rather  assumes  the  bounds  of  the  parameters  

to  quantify  uncertainty.  Several  works  have  used  this  method  and  improvements  have  been  

put  forward  [49]–[51].  Yet,  the  interval  analysis  based  bound-­‐error  methods  can  still  be  

computationally  infeasible  for  complex  biological  models.  

 

2.2.2   Sampling  based  Uncertainty  Analysis  

Sampling  based  uncertainty  analysis  falls  under  two  approaches:  Bayesian  approach  and  

Frequentist  approach.  Bayesian  methods  are  probabilistic  approaches  based  on  Bayesian  

theory  while  the  frequentist  approaches  are  based  on  multiple  model  fitting  [52].  Most  

(22)

(GLUE)  of  Beven  and  Binley  [39]  and  Markov  Chain  Monte  Carlo  (MCMC)  [38]  methods.  

GLUE  and  MCMC  result  in  posterior  probability  that  expresses  the  parameter  uncertainty.  

However,  the  posteriors  obtained  from  GLUE  are  subjective  and  may  not  be  the  true  

parameter  uncertainty  distribution.  Like  all  Bayesian  methods,  both  GLUE  and  MCMC  

require  the  knowledge  of  priors  [53].  Vrugt  et  al.  [54]  compared  GLUE,  and  two  MCMC  

based  approaches,  Differential  Evolution  Adaptive  Metropolis  (DREAM)  [55]  and  Delayed  

Rejection  Adaptive  Metropolis  (DRAM)  [56].They  observed  that  GLUE  resulted  in  similar  

estimates  of  parameter  and  prediction  uncertainty  as  MCMC  approaches.  However,  MCMC  

based  DREAM  was  much  more  efficient  with  regards  to  number  of  model  evaluations  

compared  to  GLUE  and  DRAM.  According  to  Iorgulescu  et  al.  [57],  GLUE  may  require  billions  

of  model  evaluations  to  generate  only  a  few  good  solutions.  Though,  DREAM  MCMC  is  

efficient  in  comparison  with  GLUE  and  DRAM,  it  can  still  be  inefficient  and  infeasible  for  

complex  and  stiff  biosystem  models  [58].  Yang  et  al.  [48]  pointed  out  the  difficulty  in  

constructing  a  likelihood  function  and  the  difficulty  in  assessing  multi-­‐modal  distributions  

with  MCMC  approaches.  

 

Methods  that  fall  under  the  frequentist  approach  are  the  Fisher  Information  Matrix  (FIM),  

the  Profile  Likelihood  method  (PL),  the  Multi-­‐start  method  and  the  Bootstrapping  method.  

These  methods  result  in  the  confidence  intervals  (CIs)of  the  parameters.  Multi-­‐start  and  the  

bootstrap  result  in  parameter  distributions  [59]–[62].  Studies  have  shown  that  the  

(23)

parameters  are  identifiable  [63],  [64].  However,  Joshi  et  al.  [62]  showed  that  FIM  has  two  

important  shortcomings:  1)  it  only  provides  lower  bounds  when  the  model  is  non-­‐linear  and  

2)  the  confidence  interval  is  symmetric  which  may  not  be  true  in  all  models.  In  addition,  FIM  

is  a  parameter  dependent  measure  and  requires  prior  knowledge  about  the  parameters  

[65].    

The  PL  approach  provides  information  about  identifiability  of  parameters  [16],  [66],  [67].  

Multi-­‐start  methods  can  be  used  along  with  bootstrap  methods  to  quantify  parameter  

uncertainties.  Bootstrap  methods  are  able  to  generate  multiple  samples  based  on  few  

experimental  observations.  These  samples  are  used  to  quantify  the  parameter  uncertainties  

and  subsequently  the  output  uncertainties  thus  making  them  suitable  for  systems  biology  

studies.  However,  multiple  model  fitting  of  bootstrap  samples  can  be  computationally  

expensive  for  complex  models.  

 

2.3   Sensitivity  Analysis  

Sensitivity  Analysis  (SA)  quantifies  the  impact  of  parameter  uncertainties  on  the  model  

outputs  [68].  The  objective  of  SA  is  to  identify  critical  inputs  (parameters  and  initial  

conditions)  of  a  model.  Some  of  the  important  and  frequently  used  SA  methods  are  

compared  in  Table  2.3.  The  review  below  reports  the  findings  of  several  SA  studies  to  

establish  the  appropriate  SA  method  for  dynamic  systems.  

(24)

Table  2.3:  Comparison  of  commonly  used  SA  methods  

Name   Type   Advantages   Disadvantages  

1.  Local  SA   Local   -­‐  Easy  

implementation   -­‐  Low  computational   cost  

-­‐  Non-­‐generalizable   results  

-­‐  Unable  to  detect   parameter   interactions   2.  Global  SA    

i.  Partial  Rank   Correlation   Coefficient  (PRCC)   [69]  

ii.  Standardized  Rank   Regression  

Coefficient  (SRCC)   [69]  

Correlation/Regression   based  

-­‐  Computationally   efficient  

-­‐  Unreliable  under   non-­‐monotonicity  

iii.  Sobol  Method   [70]  

iv.  Extended  Fourier   Amplitude  

Sensitivity  Test  [71]  

Variance-­‐based   -­‐  Model  independent   -­‐  Quantitative  

measure  of  sensitivity  

-­‐  Computationally   inefficient  

v.  Morris  Screening  

[72]   Screening  method   -­‐  Computationally  efficient   -­‐  Comparable  to   Sobol  indices  

-­‐  Provides  qualitative   information  only  

 

2.3.1   Local  Sensitivity  Analysis  

Research  on  local  SA  studied  the  effect  of  small  perturbations  of  parameters,  typically  about  

the  mean  value,  on  the  model  output.  The  application  of  local  SA  to  signal  transduction  and  

metabolic  pathway  models  ([73],  [74])  report  the  ease  in  implementation  and  the  low  

computational  time  required.  However,  these  methods  are  not  able  to  study  the  effects  of  

simultaneous  parameter  variations  and  the  sensitivity  is  investigated  in  the  immediate  

(25)

parameter  interactions  and  are  not  generalizable  over  the  entire  parameter  range.  A  list  of  

commonly  used  local  SA  methods  can  be  found  in  Hamby  [75].  

 

2.3.2   Global  Sensitivity  Analysis  

In  contrast  to  local  SA,  global  SA  considers  the  entire  parameter  range,  facilitates  the  

simultaneous  variation  of  parameter  values,  and  allows  the  exploration  of  parameter  

interaction  effects  [68].  There  are  three  Global  SA  methods:  correlation  and  regression  

methods,  variance-­‐based  methods,  and  screening  methods.  Partial  Rank  Correlation  

Coefficient  (PRCC)  [69],  and  Standardized  Rank  Regression  Coefficient  (SRRC)  are  the  most  

efficient  and  reliable  correlation  and  regression  methods  [76].  However,  these  methods  are  

suitable  for  monotonic  parameter-­‐output  relationships  and  are  not  reliable  under  non-­‐

monotonicity  that  are  usually  present  in  biological  models.  Variance-­‐based  methods  provide  

information  about  the  fractional  variance  from  individual  parameters  and  groups  of  

parameters  on  the  output  variance.  They  do  not  depend  on  assumptions  of  linearity  or  

monotonicity  which  make  them  suitable  for  biological  models  [30].  The  method  of  Sobol  

[70]  and  the  extended  Fourier  amplitude  sensitivity  test  (eFAST)  [71]  are  the  two  most  

commonly  used  variance-­‐based  methods.  The  method  of  Sobol  is  relatively  easy  to  

implement  compared  to  other  variance-­‐based  methods  [77]  and  its  modified  form  is  

efficient  as  the  eFAST  method  [78].  Zheng  &  Rundell  [29]  compared  sensitivity  results  of  the  

local  SA,  the  PRCC,  the  method  of  Sobol  and  the  eFAST  methods  on  a  model  of  the  ErK-­‐

(26)

and  the  global  SA.  Interestingly,  the  PRCC  produced  consistent  results  as  the  Sobol  and  the  

eFAST  method  suggesting  the  robustness  of  the  sensitivity  pattern  of  the  model  to  the  

monotonic  assumption.  A  comparison  of  the  PRCC  and  the  eFAST  method  on  different  

deterministic  and  stochastic  biological  models  by  Marino  et  al.  [30]  revealed  differences  

between  the  PRCC  and  eFAST  results  under  non-­‐linear  and  non-­‐monotonic  relationships.  As  

PRCC  performs  well  under  monotonicity,  its  results  could  be  misleading  under  non-­‐

monotonic  and  non-­‐linear  conditions.  Overall,  the  variance-­‐based  methods  can  be  

impractical  to  use  if  the  model  is  very  complex  and  the  number  of  parameters  is  large  [79].  

 

An  efficient  alternative  to  the  variance-­‐based  methods  are  the  screening  methods.  The  

trade-­‐off,  however,  is  that  screening  methods  only  provide  qualitative  information  i.e.  the  

rank  of  the  parameters  in  terms  of  sensitivity.  Morris  Screening  [72]  is  a  commonly  used  

screening  technique  since  it  considers  the  entire  parameter  space  and  allows  variation  of  

multiple  parameters  unlike  other  screening  methods  like  iterated  fractional  factorial  design  

(IFFD)  [80],  sequential  bifurcation  (SB)  [81]  and  Cotter’s  design  [82].  An  extension  to  the  

Morris  method  was  suggested  by  Campolongo  et  al.  [83]  to  improve  the  sensitivity  

assessment  during  non-­‐monotonicity.  The  study  found  that  the  new  absolute  mean  of  

elementary  effects  was  comparable  to  the  total  sensitivity  index  from  the  method  of  Sobol.  

Sensitivity  analyses  for  dynamic  models  should  be  able  to  consider  the  entire  dynamics.  

Focusing  at  a  particular  time  point  of  interest,  as  done  in  most  of  the  existing  studies  in  the  

(27)

period.  Some  studies  have  been  performed  recently  to  include  the  dynamical  aspect  in  SA  

of  the  system  [29],  [30],  [74],  [84]  using  the  PRCC,  the  Sobol  and  the  eFAST  methods.  But,  

similar  studies  using  Morris  screening  is  lacking  and  is  part  of  the  current  study.  

 

2.4   Preliminary  Iron  deficiency  response  model  in  Arabidopsis  thaliana  

A  successful  study  that  can  explain  the  various  interactions  occurring  under  dynamic  iron  

conditions  could  be  very  important  since  iron  is  required  for  the  growth  and  development  

of  plants.  Such  studies  could  help  develop  ways  to  sustain  the  plant’s  growth  and  

productivity  under  iron  deficient  conditions.  This  kind  of  study  has  been  of  great  interest  in  

biology  and  more  so  in  the  light  of  systems  understanding  through  gene  regulatory  network  

construction  and  gene  interactions  modeling.  Several  efforts  have  been  performed  to  study  

iron  responses  in  Arabidopsis  thaliana  [85]–[92].  These  responses  are  known  to  be  

transcriptionally  induced  by  transcription  factors.  A  set  of  transcription  factors  (TFs)  COL4,  

ETF9,  ASIL2,  and  MYB55  that  influence  known  iron  response  TFs  were  identified  by  

Koryachko  et  al.  [85].  Additional  TFs  bHLH34,  bHLH104,  and  bHLH05  (ILR3)  were  identified  

by  Li  et  al.  [90]  and  Zhang  et  al.  [91].  Koryachko  et  al.  (yet  to  be  published)  incorporated  

these  TFs  and  their  targets  to  formulate  a  gene  regulatory  network  (Figure  1)  and  an  

Ordinary  Differential  Equation  (ODE)  based  iron  deficiency  response  model  to  describe  the  

interactions  occurring  under  iron  deficient  conditions  in  Arabidopsis  thaliana.  The  iron   deficiency  model  is  still  under  study  and  the  ODE-­‐based  equations  used  in  this  study  is  a  

(28)

parameters  and  the  outputs  could  help  in  better  understanding  the  iron  responses  in  

Arabidopsis  thaliana.  

 

Figure  1:  Iron  Deficiency  Gene  Regulatory  Network  in  Arabidopsis  thaliana:  Top  row  shows  the  TFs   and  the  bottom  row  shows  their  targets.  Lines  with  arrow-­‐head  indicate  activation  while  lines  with   flat  end  indicate  inhibition.  Bold  lines  indicate  direct  binding  between  regulator's  protein  and   target's  promoter.  

 

(29)

CHAPTER  3  

Methods  

In  the  previous  chapter,  various  uncertainty  and  sensitivity  analysis  techniques  were  

discussed.  This  chapter  describes,  in  detail,  the  methods  used  for  this  study  that  are  

applicable  for  a  comprehensive  understanding  of  uncertainty  for  any  complex  dynamic  

biological  model  under  sparse  data  conditions.  

 

3.1   Model  

The  methods  described  in  this  section  were  first  applied  to  a  Yeast  synthetic  network  model  

in  Saccharomyces  cerevisiae  and  then  to  the  preliminary  iron  deficiency  model  in   Arabidopsis  thaliana.  

 

3.1.1   Yeast  Synthetic  Network  Model  in  Saccharomyces  cerevisiae  

The  synthetic  network  model  used  in  this  study  is  based  on  the  synthetic  network  in  the  

yeast  Saccharomyces  cerevisiae  built  by  Cantone  et  al.  [93].  Figure  2  displays  the  synthetic   network.  The  network  consists  of  five  interacting  genes.  Cantone  et  al.  described  the  

network  using  a  system  of  nonlinear  Delay  Differential  Equations  (DDEs)  and  hill  kinetics  

[93].  The  variables  in  the  model  represent  the  mRNA  abundances  of  each  gene  and  the  

degradation  kinetics  were  assumed  first-­‐order.  In  our  study,  we  made  two  changes  to  the  

original  DDE  based  model.  The  delay  function  and  the  rectangular  window  function  were  

(30)

simplify  the  model  and  to  modify  the  model  into  an  Ordinary  Differential  Equation  (ODE)  

based  system  of  equations  like  the  preliminary  iron  deficiency  model  in  Arabidopsis   thaliana.  The  modification  to  the  yeast  Saccharomyces  cerevisiae  model,  however,  did  not   change  the  dynamics  of  the  system  and  the  output  expression  as  shown  in  supplemental  

section  (Figure  S1).  

 

Following  these  changes,  the  rate  of  change  of  the  mRNA  concentrations  of  the  genes  in  the  

yeast  synthetic  network  (Figure  2)  were  expressed  as:  

  𝑑𝑥𝑑𝑡# = 𝛼# + 𝑣# 𝑥)

*+

𝑘#*++ 𝑥)*+ 1 +𝑥.*/ 𝑘0*/

− 𝑑#𝑥# ∙ 𝑡3+

𝑐03++ 𝑡3+   (1)  

  𝑑𝑥𝑑𝑡0 =   𝛼0+ 𝑣0 𝑥# *6

𝑘)*6+ 𝑥#*6 − 𝑑0− 𝛽#∙

𝑐)3+

𝑐)3++ 𝑡3+ ∙ 𝑥0   (2)  

  𝑑𝑥)

𝑑𝑡 =   𝛼)+ 𝑣)

𝑥0*8

𝑘9*8 + 𝑥

0*8 1 + 𝑥99 𝛾99

− 𝑑)𝑥)   (3)  

  𝑑𝑥𝑑𝑡9 =   𝛼9+ 𝑣9 𝑥) *;

𝑘.*; + 𝑥 )*;

− 𝑑9− 𝛽0∙ 𝑐) 3+

𝑐)3+ + 𝑡3+ ∙ 𝑥9   (4)  

  𝑑𝑥.

𝑑𝑡 =   𝛼.+ 𝑣.

𝑥)*< 𝑘=*<+ 𝑥

)*<

− 𝑑.𝑥.   (5)  

where  [CBF1]  =  𝑥#;  [GAL4]  =  𝑥0;  [SWI5]  =  𝑥);  [GAL80]  =  𝑥9;  [ASH1]  =  𝑥.,    𝑑>,  𝑖  =  1,  .  .  .  ,  5  are  

(31)

basal  activities,  𝑣>  represent  the  maximal  transcription  rates,  𝛾  is  an  affinity  constant.  

𝑐#, 𝑐0, 𝑐)  are  constants  with  value  16,  100  and  10  respectively,  which  are  used  to  model  the  

100  minutes  delay  in  the  HO  promoter  binding  for  CBF1  gene  and  the  10  minutes  transient  

increase  in  mRNA  stability  due  to  experimental  washing  steps.  The  concentrations  are  

reported  in  arbitrary  units  [a.u.],  the  degradation  rates  𝑑#,  𝑑0,  𝑑),  𝑑9,  𝑑.  in   𝑚𝑖𝑛E# ,  the  

Michaelis-­‐Menten  constants  𝑘#,  𝑘0,  𝑘),  𝑘9,  𝑘.,  𝑘=  in  [𝑎. 𝑢.],  the  affinity  constant  in  [𝑎. 𝑢.],  

the  basal  activities  𝛼#,  𝛼0,  𝛼),  𝛼9,  𝛼.  in  [𝑎. 𝑢. 𝑚𝑖𝑛E#],  the  maximal  transcription  rates  in  

[𝑎. 𝑢. 𝑚𝑖𝑛E#],  and  the  coefficients  𝛽

#  and  𝛽0  for  the  magnitude  of  the  increase  in  mRNA   stability  in  [𝑚𝑖𝑛E#].  

 

Figure  2:  Yeast  Synthetic  Network  in  Saccharomyces  cerevisiae  [93]:  Lines  with  arrow  heads  indicate   activation  while  lines  with  flat  ends  indicate  inhibition  

(32)

3.1.2   Preliminary  Iron  Deficiency  Response  Model  in  Arabidopsis  thaliana  

The  preliminary  iron  deficiency  response  model  is  based  on  the  gene  regulatory  network  in  

Arabidopsis  thaliana  that  consists  of  6  regulator  genes  and  their  7  target  genes  (Figure  1).   For  our  study,  we  focused  on  the  6  regulators  in  the  GRN.  Two  set  of  ODEs  were  used  to  

describe  the  gene  expression  under  iron  sufficient  (Fe+)  and  iron  depleted  (Fe-­‐)  conditions  

which  are:  

  𝑑𝑥

𝑑𝑡 = 𝑎 − 𝑏𝑥   (6)  

  𝑑𝑥𝑑𝑡 =  𝑎 + 𝑞 𝑢> 1 1 +   𝑢>

𝑢K− 1 𝑒

EMNO − 𝑏𝑥   (7)  

where  𝑥  represents  the  gene  expression,  𝑎  the  constitutive  transcription  rate,  𝑏  the  

degradation  rate,  𝑞  represents  the  sensitivity  of  iron  on  gene  expression,  𝑢>  represents  the  

rate  of  rise  in  iron  signal  and  𝑢K  represents  the  delay  in  the  iron  effect.  Equation  6  describes  

the  Fe+  condition  and  (7)  describes  the  Fe-­‐  condition.  

3.2   Bootstrap  Sampling  for  Data  Generation  

Due  to  the  high  expense  and  the  long  duration  associated  with  high-­‐throughput  methods  

like  microarrays,  qRT-­‐PCR,  and  RNA-­‐Seq  used  to  measure  gene  expressions,  the  number  of  

data  replicates  are  limited.  Lack  of  multiple  data  replicates  make  it  challenging  to  quantify  

the  parameter  distribution.  However,  a  bootstrap  method  is  able  to  generate  such  

replicates  required  to  characterize  the  parameter  distribution  [62].  We  used  the  parametric  

(33)

mean  expression  to  create  data  samples.  The  mean  expression  and  the  sample  were  

created  as  follows:  

(i)   Mean  expression:  The  parameter  values  provided  in  [93]  were  used  as  the  true   values  for  the  yeast  synthetic  model  parameters  in  this  study.  These  values  were  

substituted  in  the  model  equations  (1)  –  (5)  to  obtain  the  mean  expression.  To  

create  data  sparseness,  expression  values  at  13  time-­‐points:  0,  10,  20,  40,  70,  

100,  120,  150,  170,  200,  220,  240  and  280  hours  were  taken.  For  the  iron  

deficiency  response  model,  however,  the  3  qRT-­‐PCR  measurements  at  4  

different  time-­‐points  each  for  Fe+  and  Fe-­‐  conditions  were  available.  The  mean  

expression  was  the  average  of  these  measurements.    

(ii)   Parametric  Bootstrap  sample  generation:    Once  the  mean  expressions  were   calculated,  the  next  step  involved  generation  of  noise  samples.  Parametric  

bootstrap  creates  random  noise  based  on  the  information  on  the  distribution  of  

the  data.  As  a  general  practice,  we  considered  a  Gaussian  distribution.  A  set  of  

random  Gaussian  noise  were  created.  These  noises  were  added  to  the  mean  

expression  to  get  the  bootstrap  data  samples.  In  case  of  the  yeast  synthetic  

model,  the  variance  of  the  Gaussian  noise  was  selected  such  that  the  noise  to  

signal  ratio  was  10%.  For  the  iron  deficiency  model,  the  variances  of  the  

measurements  were  used  to  generate  the  Gaussian  noise.  Statistically,  an  

estimate  of  variance  based  on  3  samples  may  not  be  truly  reflective  of  the  

(34)

considered  those  variances  as  approximate  estimations.  More  than  20,000  

bootstrap  samples  were  generated  using  this  approach.  The  resulting  samples  

were  freed  of  any  outliers  using  the  Tukey’s  outlier  analysis  [94].  Out  of  the  

20,000  samples,  2,500  random  bootstrap  samples  were  chosen  for  further  

analysis.  The  appropriate  size  of  the  bootstrap  i.e.  2,500  was  selected  

considering  that  the  mean  values  for  the  parameters  did  not  change  for  any  

greater  bootstrap  size  (Figure  S2  in  Supplemental  section).    

3.3   Parameter  Estimation  

Parameter  values  were  estimated  by  fitting  the  model  to  the  bootstrap  generated  sample  

data.  For  model  fitting,  a  local  optimization  routine  “fmincon”  was  employed  in  MATLAB.  A  

cost  function  based  on  weighted  sum  of  squares  were  defined  for  both  models.  The  

weighted  sum  of  squares  cost  function  was  defined  as:  

  𝐽 =   (𝑦3ST3

> 𝑗 −  𝑦

UVW> (𝑗))0 𝜎>@0

Z

@[# \

>[#

  (8)  

where  𝑗  denotes  the  time  point  and  𝑖  represents  the  gene.  

 

3.4   Uncertainty  Analysis  

Uncertainty  analysis  involves  quantifying  the  parameter  uncertainty  and  the  uncertainty  in  

the  model  outputs  resulting  from  these  parameter  uncertainties.  To  quantify  the  parameter  

uncertainty,  model-­‐fitting  was  performed  on  each  of  the  2,500  bootstrap  sample  data  that  

(35)

respective  distributions  were  constructed  for  each  parameter.  Next,  the  output  

uncertainties  were  characterized  using  forward  propagation  of  the  parameter  uncertainty.  

In  forward  propagation,  the  2,500  parameter  combinations  were  applied  to  the  model  to  

generate  the  output  distribution.  Finally,  confidence  intervals  were  calculated  for  the  

parameters  as  well  as  the  outputs.  

 

3.5   Identifiability  Analysis  

Identifiability  denotes  the  ambiguity  in  the  estimation  of  parameters  [95].  A  parameter  is  

non-­‐identifiable  if  it  cannot  be  estimated  with  a  finite  confidence  interval.  Two  types  of  

non-­‐identifiability  occur  in  a  model,  namely  structural  non-­‐identifiability  and  practical  non-­‐

identifiability  [16].  Structural  non-­‐identifiability,  as  the  name  suggests,  is  related  to  the  

structure  of  the  model  while  practical  non-­‐identifiability  is  related  to  the  quality  and  

amount  of  data.  Since,  the  focus  of  our  study  was  on  sparseness  of  data,  we  limited  our  

identifiability  tests  to  the  practical  non-­‐identifiablity  of  the  parameters.  We  used  the  Profile  

Likelihood  approach  [16]  to  assess  the  practical  non-­‐identifiability  of  the  parameters  in  the  

two  models.  

Profile  Likelihood  approach  detects  identifiability  by  assessing  the  flatness  of  likelihood.  If  

the  cost  function  is  defined  as  𝜒0,  then  the  profile  likelihood  for  a  particular  parameter  𝜃 >  is:    

  𝜒_`0 𝜃> = mind

efN 𝜒 0(𝜃)  

(9)  

As  shown  in  Equation  9,  the  cost  function  is  re-­‐optimized  with  respect  to  all  other  

(36)

identifiability  of  the  parameters,  a  threshold  is  defined.  The  threshold  corresponds  to  the  

inverse  chi-­‐squared  value  for  a  confidence  𝛼  and  degree  of  freedom  equal  to  the  total  

number  of  parameters.  The  profile  likelihood  for  the  parameters  are  plotted  and  checked  if  

they  intersect  the  threshold  at  either  side  of  the  minimum  cost  function.  A  parameter  is  

considered  practically  non-­‐identifiable  if  its  profile  likelihood  flattens  out  at  either  end  (i.e.  

the  profile  likelihood  does  not  intersect  the  threshold  at  either  end).    

 

3.6   Sensitivity  Analysis  

Sensitivity  analysis  quantifies  the  change  in  output  due  to  the  change  in  the  parameters.  For  

our  study,  we  chose  the  Morris  Screening  method,  one  of  the  screening  methods  described  

earlier  for  sensitivity  analysis.  Morris  Screening  method  [72],  like  any  other  screening  

method,  is  based  on  One-­‐at-­‐a-­‐time  (OAT)  designs.  These  designs  are  local  methods  as  they  

evaluate  the  effects  of  varying  only  a  single  parameter  around  its  nominal  values  at  a  time  

on  the  outputs.  However,  unlike  most  OAT  designs,  Morris  Screening  calculates  sensitivities  

at  multiple  points  in  the  parameter  space  thus  making  it  a  global  method  [96].  

 

Morris  defines  the  elementary  effect  for  determining  the  sensitivity  of  the  parameters.  For  

a  model  with  𝑘  parameters,  the  region  of  experimentation  i.e.  the  entire  parameter  space  is  

denoted  by  𝜔,  which  is  a  regular  𝑘  -­‐dimensional  𝑝  -­‐  level  grid.  In  this  grid,  each  parameter  

𝑥>  may  take  values  from  {0,  1/(p-­‐1),  2/(p-­‐1),  .  .  .  ,1}.  For  a  vector  of  values  𝑥,  the  elementary  

(37)

  𝐸𝐸> 𝑥 =   𝑦 𝑥#, … , 𝑥>E#, 𝑥> + ∆, 𝑥>l#, … , 𝑥m − 𝑦(𝑋)

∆   (10)  

where  ∆  is  a  multiple  of  1/(𝑝 − 1)  and  𝑋  is  such  that  𝑋   +  ∆  is  still  in  the  region  of  

experimentation.  A  total  of  𝑟  elementary  effects  were  calculated.  Morris  calculates  the  

mean,  𝜇,  and  standard  deviation,  𝜎,  of  the  𝑟  EEs  to  indicate  the  sensitivity.  However,  under  

non-­‐monotonicity,  it  was  found  that  the  mean  underestimated  the  sensitivities  [83].  In  our  

study,  we  calculated  the  absolute  mean  values  as  suggested  in  [83]  to  represent  the  

sensitivities.  The  absolute  mean  indicates  the  overall  effect  of  a  parameter  on  the  output  

while  the  standard  deviation  indicates  the  effect  due  to  interactions  with  other  parameters.  

The  vector  of  parameter  values  used  to  calculate  the  EEs  were  sampled  based  on  radial  

points  described  by  Campolongo  et  al.  [97].  

 

3.6.1   Extension  to  Morris  Method  

Studies  that  have  applied  Morris  Screening  method  on  dynamic  models  are  rare.  In  the  few  

studies  where  the  method  has  been  applied  to  a  dynamic  model,  sensitivity  is  defined  at  a  

specific  time  point  of  interest.  This  approach  cannot  provide  the  time-­‐varying  nature  of  

parameter  sensitivities.  In  our  study,  we  calculated  the  sensitivities  at  all  time-­‐points  to  

examine  the  time-­‐varying  sensitivities.  Moreover,  for  multiple  outputs,  Morris  Screening  

provides  multiple  rankings  and  is  difficult  to  understand  how  a  parameter  ranks  overall.  So,  

to  get  an  overall  rank,  we  used  the  Savage  Scores  [98].  Savage  Scores  were  calculated  as  

(38)

  𝑆> =   1 𝑗 \

@[#

  (11)  

where  𝑖  is  the  rank  assigned  to  the  𝑖th  order  statistic  in  a  sample  of  size  𝑛.  

 

(39)

CHAPTER  4  

Journal  Article  (IEEE  Transactions  on  Systems,  Man,  and  Cybernetics:  Systems)    

Uncertainty  and  Sensitivity  Analysis  in  Dynamic  Biological  Systems  under  Data  Sparseness    

Basnet,  L.,  Koryachko,  A.,  Matthiadis,  A.,  Muhammad,  D.,  Tuck,  J.,  Williams,  C.M.,  Long,   T.,  Ducoste,  J.J.  

 

Abstract  

A  comprehensive  understanding  of  the  uncertainties  due  to  data  sparseness  in  dynamic  

models  used  in  systems  biology  studies  can  prevent  misleading  deductions  and  provide  

valuable  biological  information  about  their  systems.  Given  the  scope  and  flexibility  of  

Ordinary  Differential  Equation  (ODE)  based  dynamic  models  in  biological  studies,  

comprehensive  uncertainty  studies  for  these  models  are  lacking.  In  this  research,  two  ODE-­‐

based  dynamical  models:  the  yeast  synthetic  network  model  in  Saccharomyces  cerevisiae   and  the  preliminary  iron  deficiency  response  model  in  Arabidopsis  thaliana  were  studied   under  data  sparseness.  The  goal  was  to  quantify  the  variability  in  the  model  parameters  and  

outputs,  assess  the  identifiability  of  parameters  and  identify  the  importance  of  parameters  

to  deduce  important  biological  information.  Bootstrap  results  showed  high  uncertainties  in  

(40)

parameters  and  gene  expressions  in  the  yeast  synthetic  network  model.  Practical  

identifiability  assessment  using  profile  likelihood  suggested  high  data  sparseness  for  both  

models.  Morris  screening  sensitivity  results  explained  the  regulatory  effects  of  SWI5  and  

ASH1  genes  on  CBF1  in  the  yeast  synthetic  network  model.  Clear  influence  of  iron  effect  in  

COL4  gene  expression,  and  of  constitutive  transcription  and  degradation  in  bHLH104  gene  

expression  were  identified  in  the  Arabidopsis  thaliana  gene  regulatory  network.   Keywords  

Systems  biology;  Dynamic  model;  Data  Sparseness;  Uncertainty;  Identifiability;  Sensitivity;  

 

4.1   Introduction  

Systems  Biology  [1],  [2]  aims  to  understand  the  behavior  of  interacting  biological  system  

components  by  gathering  experimental  observations  and  deciphering  the  implication  of  

these  interactions  with  mathematical  reasoning  [3].  These  observations  and  reasoning  can  

be  used  to  understand  and  develop  gene  regulatory  networks  and  perform  metabolic  

pathway  analyses  to  explain  how  organisms  function  relative  to  environmental  stresses  or  

under  transgenic  conditions  [4]–[6].    

 

 At  the  genetic  level,  high-­‐throughput  experimental  methods  like  microarray,  qRT-­‐PCR  [7],  

and  RNA-­‐SEQ  [8],  [9]  are  primarily  employed  to  collect  information  required  to  develop  a  

gene  regulatory  network  and  to  help  build  predictive  mathematical  relationships.    However,  

Figure

TABLE 
  OF 
  CONTENTS 
  
Table 
  2.1: 
  Comparison 
  of 
  frequently 
  used 
  UA 
  techniques 
  
Figure 
  1: 
  Iron 
  Deficiency 
  Gene 
  Regulatory 
  Network 
  in 
  Arabidopsis 
  thaliana: 
  Top 
  row 
  shows 
  the 
  TFs 
  and 
  the 
  bottom 
  row 
  shows 
  their 
  targets
Figure 
  2: 
  Yeast 
  Synthetic 
  Network 
  in 
  Saccharomyces 
  cerevisiae 
  [93]: 
  Lines 
  with 
  arrow 
  heads 
  indicate 
  activation 
  while 
  lines 
  with 
  flat 
  ends 
  indicate 
  inhibition 
  
+7

References

Related documents