Home  · Classes  · Annotated Classes  · Modules  · Members  · Namespaces  · Related Pages
Classes | Public Types | Public Member Functions | Static Public Member Functions | Static Public Attributes | Protected Member Functions | Protected Attributes | List of all members
EnzymaticDigestion Class Reference

Class for the enzymatic digestion of proteins. More...

#include <OpenMS/CHEMISTRY/EnzymaticDigestion.h>

Classes

struct  BindingSite
 
struct  CleavageModel
 

Public Types

enum  Enzyme { ENZYME_TRYPSIN, SIZE_OF_ENZYMES }
 Possible enzymes for the digestion (adapt NamesOfEnzymes & nextCleavageSite_() if you add more enzymes here) More...
 
enum  Specificity { SPEC_FULL, SPEC_SEMI, SPEC_NONE, SIZE_OF_SPECIFICITY }
 when querying for valid digestion products, this determines if the specificity of the two peptide ends is considered important More...
 

Public Member Functions

 EnzymaticDigestion ()
 Default constructor. More...
 
 EnzymaticDigestion (const EnzymaticDigestion &rhs)
 Copy constructor. More...
 
EnzymaticDigestionoperator= (const EnzymaticDigestion &rhs)
 Assignment operator. More...
 
SignedSize getMissedCleavages () const
 Returns the number of missed cleavages for the digestion. More...
 
void setMissedCleavages (SignedSize missed_cleavages)
 Sets the number of missed cleavages for the digestion (default is 0). This setting is ignored when log model is used. More...
 
Enzyme getEnzyme () const
 Returns the enzyme for the digestion. More...
 
void setEnzyme (Enzyme enzyme)
 Sets the enzyme for the digestion (default is ENZYME_TRYPSIN). More...
 
Specificity getSpecificity () const
 Returns the specificity for the digestion. More...
 
void setSpecificity (Specificity spec)
 Sets the specificity for the digestion (default is SPEC_FULL). More...
 
void digest (const AASequence &protein, std::vector< AASequence > &output) const
 Performs the enzymatic digestion of a protein. More...
 
Size peptideCount (const AASequence &protein)
 Returns the number of peptides a digestion of protein would yield under the current enzyme and missed cleavage settings. More...
 
bool isLogModelEnabled () const
 use trained model when digesting? More...
 
void setLogModelEnabled (bool enabled)
 enables/disabled the trained model More...
 
double getLogThreshold () const
 Returns the threshold which needs to be exceeded to call a cleavage (only for the trained cleavage model on real data) More...
 
void setLogThreshold (double threshold)
 
bool isValidProduct (const AASequence &protein, Size pep_pos, Size pep_length)
 Returns true if peptide at position pep_pos with length pep_length within protein protein was generated by the current model. More...
 

Static Public Member Functions

static Enzyme getEnzymeByName (const String &name)
 
static Specificity getSpecificityByName (const String &name)
 

Static Public Attributes

static const std::string NamesOfEnzymes [SIZE_OF_ENZYMES]
 Names of the Enzymes. More...
 
static const std::string NamesOfSpecificity [SIZE_OF_SPECIFICITY]
 Names of the Specificity. More...
 

Protected Member Functions

void nextCleavageSite_ (const AASequence &sequence, AASequence::ConstIterator &p) const
 moves the iterator p behind (i.e., C-term) the next cleavage site of the sequence More...
 
bool isCleavageSite_ (const AASequence &sequence, const AASequence::ConstIterator &p) const
 tests if position pointed to by p (N-term side) is a valid cleavage site More...
 

Protected Attributes

SignedSize missed_cleavages_
 Number of missed cleavages. More...
 
Enzyme enzyme_
 Used enzyme. More...
 
Specificity specificity_
 specificity of enzyme More...
 
bool use_log_model_
 use the log model or naive digestion (with missed cleavages) More...
 
double log_model_threshold_
 Threshold to decide if position is cleaved or missed (only for the model) More...
 
Map< BindingSite, CleavageModelmodel_data_
 Holds the cleavage model. More...
 

Detailed Description

Class for the enzymatic digestion of proteins.

Digestion can be performed using simple regular expressions, e.g. [KR] | [^P] for trypsin. Also missed cleavages can be modelled, i.e. adjacent peptides are not cleaved due to enzyme malfunction/access restrictions. If n missed cleavages are given, all possible resulting peptides (cleaved and uncleaved) with up to n missed cleavages are returned. Thus no random selection of just n specific missed cleavage sites is performed.

An alternative model is also available, where the protein is cleaved only at positions where a cleavage model trained on real data, exceeds a certain threshold. The model is published in Siepen et al. (2007), "Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics.", doi: 10.1021/pr060507u The model is only available for trypsin and ignores the missed cleavage setting. You should however use setLogThreshold() to adjust FP vs FN rates. A higher threshold increases the number of cleavages predicted.

Member Enumeration Documentation

enum Enzyme

Possible enzymes for the digestion (adapt NamesOfEnzymes & nextCleavageSite_() if you add more enzymes here)

Enumerator
ENZYME_TRYPSIN 
SIZE_OF_ENZYMES 

when querying for valid digestion products, this determines if the specificity of the two peptide ends is considered important

Enumerator
SPEC_FULL 
SPEC_SEMI 
SPEC_NONE 
SIZE_OF_SPECIFICITY 

Constructor & Destructor Documentation

Default constructor.

Copy constructor.

Member Function Documentation

void digest ( const AASequence protein,
std::vector< AASequence > &  output 
) const

Performs the enzymatic digestion of a protein.

Referenced by SimpleSearchEngine::main_().

Enzyme getEnzyme ( ) const

Returns the enzyme for the digestion.

static Enzyme getEnzymeByName ( const String name)
static

convert enzyme string name to enum returns SIZE_OF_ENZYMES if name is not valid

Referenced by seqan::_charComparator().

double getLogThreshold ( ) const

Returns the threshold which needs to be exceeded to call a cleavage (only for the trained cleavage model on real data)

SignedSize getMissedCleavages ( ) const

Returns the number of missed cleavages for the digestion.

Specificity getSpecificity ( ) const

Returns the specificity for the digestion.

static Specificity getSpecificityByName ( const String name)
static

convert spec string name to enum returns SIZE_OF_SPECIFICITY if name is not valid

Referenced by seqan::_charComparator().

bool isCleavageSite_ ( const AASequence sequence,
const AASequence::ConstIterator p 
) const
protected

tests if position pointed to by p (N-term side) is a valid cleavage site

bool isLogModelEnabled ( ) const

use trained model when digesting?

bool isValidProduct ( const AASequence protein,
Size  pep_pos,
Size  pep_length 
)

Returns true if peptide at position pep_pos with length pep_length within protein protein was generated by the current model.

Referenced by FoundProteinFunctor::addHit().

void nextCleavageSite_ ( const AASequence sequence,
AASequence::ConstIterator p 
) const
protected

moves the iterator p behind (i.e., C-term) the next cleavage site of the sequence

EnzymaticDigestion& operator= ( const EnzymaticDigestion rhs)

Assignment operator.

Size peptideCount ( const AASequence protein)

Returns the number of peptides a digestion of protein would yield under the current enzyme and missed cleavage settings.

void setEnzyme ( Enzyme  enzyme)

Sets the enzyme for the digestion (default is ENZYME_TRYPSIN).

Referenced by seqan::_charComparator(), and SimpleSearchEngine::main_().

void setLogModelEnabled ( bool  enabled)

enables/disabled the trained model

void setLogThreshold ( double  threshold)

Sets the threshold which needs to be exceeded to call a cleavage (only for the trained cleavage model on real data) Default is 0.25

void setMissedCleavages ( SignedSize  missed_cleavages)

Sets the number of missed cleavages for the digestion (default is 0). This setting is ignored when log model is used.

Referenced by SimpleSearchEngine::main_().

void setSpecificity ( Specificity  spec)

Sets the specificity for the digestion (default is SPEC_FULL).

Referenced by seqan::_charComparator().

Member Data Documentation

Enzyme enzyme_
protected

Used enzyme.

double log_model_threshold_
protected

Threshold to decide if position is cleaved or missed (only for the model)

SignedSize missed_cleavages_
protected

Number of missed cleavages.

Map<BindingSite, CleavageModel> model_data_
protected

Holds the cleavage model.

const std::string NamesOfEnzymes[SIZE_OF_ENZYMES]
static

Names of the Enzymes.

Referenced by seqan::_charComparator().

const std::string NamesOfSpecificity[SIZE_OF_SPECIFICITY]
static

Names of the Specificity.

Referenced by seqan::_charComparator().

Specificity specificity_
protected

specificity of enzyme

bool use_log_model_
protected

use the log model or naive digestion (with missed cleavages)


OpenMS / TOPP release 2.0.0 Documentation generated on Wed Mar 30 2016 12:49:27 using doxygen 1.8.11