3. Materials and Methods
7.2 Background Frequencies Computed from SwissProt Sequences Amino Acid Frequency
7.3.2 conservation::Statistics Module
SYNOPSIS
use conservation::Statistics;
#create a statistics object, using the alignment object
$stats = conservation::Statistics->new(-alnObj => $para->{'-alnObj'}); #another form of creation, using file directly
$stats = conservation::Statistics->new(-file => 'filename', -format => 'alignment_format');
#get the unweighted frequencies for residues in the alignment columns $uwFreqs = $stats->uwFrequencies();
#get the independent count based frequencies for the residues in the #alignment
$icFreqs = $stats->independentcounts();
#get the Henikoff and Henikoff (1994) based frequencies for the residues #in the alignment
$wFreqs = $stats->wFrequencies();
#get the distribution of each amino acid in the whole alignment using the #unweighted scheme
$uwTFreqs = $stats->uwTotalFrequency();
#get the distribution of each amino acid in the whole alignment based on #the ind. count scheme
$icTFreqs = $stats->icTotalFrequency();
#get the distribution of each amino acid in the whole alignment based on #Henikoff and Henikoff (1994) scheme
$wTFreqs = $stats->wTotalFrequency();
DESCRIPTION
The statistics class provide the basic statistics needed for the calculation of various
conservation measures. It directly relies on the conservation::AlnWrapper Object to make available various sub-parts of the alignment and the conservation::Weights objects to make available the calculated weights of various sequence in the alignment, as well as the independent counts of various amino acids in an alignment column.
METHODS
Title: new
Usage: $stats = conservation::Statistics->new(-alnObj =>
alignment_object);
$stats = conservation::Statistics->new(-file => 'filename', -format => 'alignment_format');#another form of creation, using file directly
Function: Creates a new conservation::Statistics object Returns: conservation::Statistics object
Args: alignment_object - Bio::Align::AlignI compliant object.
filename: Name of the file containing the alignment.
bioperl will try to guess the alignment format.
Title: gapChar
Usage: $gpchar = $stats->gapChar();
Function: Returns the character used to represent gaps in the current
alignment
Returns: Character Args:
Title: getBGFreqs
Usage: $bgfreq = $stats->getBGFreqs();
Function: Returns the background frequency of the amino acids as pre-
calculated from amino acids in SwissProt.
Returns: Reference to a hash, in which the amino acids are the keys and
the values are the frequency of each amino acid from SwissProt.
Args:
Title: uwFrequencies
Usage: $uwFreq = $stats->uwFrequencies();
Function: Calculates and returns the unweighted frequencies of each
amino acid in each column of the alignment.
Returns: Reference to a hash of a hash, with the first hash having as
its key indexes to column in the alignment and its value being a reference to another hash having as its key amino acids in the column and its values frequencies of those amino acids.
Args:
Title: uwPositionalAAFreq
Usage: $uwCount = $stats->uwPositionalAAFreq();
Function: Calculates the unweighted count of each amino acid at each
position.
Returns: Reference to a hash of a hash, with the first hash having as
its key indexes to column in the alignment and its value being a reference to another hash having as its key amino acids in the column and its values counts of those amino acids.
Args:
Title: uwTotalPositionalFreq
Usage: $uwCountSum = $stats->uwTotalPositionalFreq();
Function: Calculates the sum of the unweighted counts for each
position.
Returns: A hash reference, with the keys being column index and the
value the sum of the unweighted counts
Args:
Title: independentcounts
Usage: $indCounts = $stats->independentcounts();
Function: Calculates and returns the independent counts for the entire
alignment.
Returns: A hash reference, with the keys being column index and the
value being a reference to another hash, having the amino acids in the column as keys and their estimated independent count as values.
Args:
Title: wFrequencies
Usage: $wFreq = $stats->wFrequencies();
Function: Calculates and returns the weighted frequency for the whole
Returns: A hash reference, with the keys being column index and the
value being a reference to another hash, having the amino acids in the column as keys and their weighted frequencies as values.
Args:
Title: wPositionalAAFreq
Usage: $wCounts = $stats->wPositionalAAFreq();
Function: Calculates the weighted count of each amino acid at each
position.
Returns: Reference to a hash of a hash, with the first hash having as
its key indexes to column in the alignment and its value being a reference to another hash having as its key amino acids in the column and its values weighted counts of those amino acids.
Args:
Title: wTotalPositionalFreq
Usage: $wCountSum = stats->wTotalPositionalFreq();
Function: Calculates the sum of the weighted counts for each position. Returns: A hash reference, with the keys being column index and the
value the sum of the unweighted counts
Args:
Title: uwTotalFrequency
Usage: $uwTFreq = $stats->uwTotalFrequency();
Function: Calculates the unweighted frequency of each amino acid in the
whole alignment.
Returns: A hash reference, with the keys being amino acids and the
values the associated alignment-wide frequency.
Args:
Title: icTotalFrequency
Usage: $icTFreq = $stats->icTotalFrequency();
Function: Calculates the frequency of each amino acid in the whole
alignment based on the independent count scheme.
Returns: A hash reference, with the keys being amino acids and the
values the associated alignment-wide ind. count frequencies.
Args:
Title: wTotalFrequency
Usage: $wTFreq = $stats->wTotalFrequency();
Function: Calculates the weighted frequency of each amino acid in the
whole alignment.
Returns: A hash reference, with the keys being amino acids and the
values the associated alignment-wide weighted frequency.
Args:
Title: getCutoffIndices
Usage: $cutoffInd = $stats->getCutoffIndices(-cutoff => cutoff_value, -
method => 'method_value');
Function: Returns the column indices of those columns that have the
percentage gaps in them less than that specified as cutoff.
Returns: A hash reference, containing as keys only the indices of the
columns to be retained and the values set to 1.
Args: cutoff_value - figure between 0 and 1 specifying the limit
above which the percentage gap in a column must be for the column conservation score to be discarded.
method_value - could be 'hh94' (for Henikoff and Henikoff (1994) weights), 'indcount' or it could be left out
(unweighted schemes used for indcount and unweighted frequencies).
Title: percentageGaps
Usage: $pgaps = $stats->percentageGaps(-method => 'method_value'); Function: Calculates the weighted frequency of each amino acid in the
whole alignment.
Returns: A hash reference, containing as keys indices of each columns
and the percentage of gaps in the column as values.
Args: method_value - could be 'hh94' (for Henikoff and Henikoff (1994)
weights), 'indcount' or it could be left out (unweighted schemes used for indcount and unweighted frequencies).
Title: maxF
Usage: $mFreq = $stats->maxF(-freq => $frequencies);
Function: Gets the residues having the maximum frequency in a column. Returns: A reference to an array containing the residue(s) having the
maximum frequency.
Args: frequencies - A hash ref containing the residues in a column and
their associated frequencies
DEPENDENCIES