Compute peptide and protein abundances from annotated feature/consensus maps or from identification results.
potential predecessor tools | ![]() ![]() | potential successor tools |
IDMapper | external tools e.g. for statistical analysis | |
FeatureLinkerUnlabeled (or another feature grouping tool) |
Reference:
Weisser et al.: An automated pipeline for high-throughput label-free quantitative proteomics (J. Proteome Res., 2013, PMID: 23391308).
Input: featureXML or consensusXML
Quantification is based on the intensity values of the features in the input files. Feature intensities are first accumulated to peptide abundances, according to the peptide identifications annotated to the features/feature groups. Then, abundances of the peptides of a protein are averaged to compute the protein abundance.
The peptide-to-protein step uses the (e.g. 3) most abundant proteotypic peptides per protein to compute the protein abundances. This is a general version of the "top 3 approach" (but only for relative quantification) described in:
Silva et al.: Absolute quantification of proteins by LCMSE: a virtue of parallel MS acquisition (Mol. Cell. Proteomics, 2006, PMID: 16219938).
Only features/feature groups with unambiguous peptide annotation are used for peptide quantification, and generally only proteotypic peptides (i.e. those matching to exactly one protein) are used for protein quantification. As an exception to this rule, if protein inference results (ProteinProphet: convert protXML to idXML using IDFileConverter; Fido: use FidoAdapter) for the whole sample set are provided with the protein_groups
option, or are already included in a featureXML input, also groups of indistinguishable proteins will be quantified. The reported quantity then refers to the total for the whole group.
Peptide/protein IDs from multiple identification runs can be handled, but will not be differentiated (i.e. protein accessions for a peptide will be accumulated over all identification runs).
Peptides with the same sequence, but with different modifications are quantified separately on the peptide level, but treated as one peptide for the protein quantification (i.e. the contributions of differently-modified variants of the same peptide are accumulated).
Input: idXML
Quantification based on identification results uses spectral counting, i.e. the abundance of each peptide is the number of times that peptide was identified from an MS2 spectrum (considering only the best hit per spectrum). Different identification runs in the input are treated as different samples; this makes it possible to quantify several related samples at once by merging the corresponding idXML files with IDMerger. Depending on the presence of multiple runs, output format and applicable parameters are the same as for featureXML and consensusXML, respectively.
The notes above regarding quantification on the protein level and the treatment of modifications also apply to idXML input. In particular, this means that the settings top
0 and average
sum
should be used to get the "classical" spectral counting quantification on the protein level (where all identifications of all peptides of a protein are summed up).
More information below the parameter specification.
The command line parameters of this tool are:
INI file documentation of this tool:
Output format
The output files produced by this tool have a table format, with columns as described below:
Protein output (one protein/set of indistinguishable proteins per line):
top
).Peptide output (one peptide or - if filter_charge
is set - one charge state of a peptide per line):
filter_charge
was set.consensus:normalize
was set.Protein quantification examples
While quantification on the peptide level is fairly straight-forward, a number of options influence quantification on the protein level - especially for consensusXML input. The three parameters top
, include_all
and consensus:fix_peptides
determine which peptides are used to quantify proteins in different samples.
As an example, consider a protein with four proteotypic peptides. Each peptide is detected in a subset of three samples, as indicated in the table below. The peptides are ranked by abundance (1: highest, 4: lowest; assuming for simplicity that the order is the same in all samples).
sample 1 | sample 2 | sample 3 | |
peptide 1 | X | X | |
peptide 2 | X | X | |
peptide 3 | X | X | X |
peptide 4 | X | X |
Different parameter combinations lead to different quantification scenarios, as shown here:
parameters "*": no effect in this case | peptides used for quantification "(...)": not quantified here because ... | explanation | ||||
top | include_all | c .:fix_peptides | sample 1 | sample 2 | sample 3 | |
0 | * | no | 1, 2, 3, 4 | 2, 3, 4 | 1, 3 | all peptides |
1 | * | no | 1 | 2 | 1 | single most abundant peptide |
2 | * | no | 1, 2 | 2, 3 | 1, 3 | two most abundant peptides |
3 | no | no | 1, 2, 3 | 2, 3, 4 | (too few peptides) | three most abundant peptides |
3 | yes | no | 1, 2, 3 | 2, 3, 4 | 1, 3 | three or fewer most abundant peptides |
4 | no | * | 1, 2, 3, 4 | (too few peptides) | (too few peptides) | four most abundant peptides |
4 | yes | * | 1, 2, 3, 4 | 2, 3, 4 | 1, 3 | four or fewer most abundant peptides |
0 | * | yes | 3 | 3 | 3 | all peptides present in every sample |
1 | * | yes | 3 | 3 | 3 | single peptide present in most samples |
2 | no | yes | 1, 3 | (peptide 1 missing) | 1, 3 | two peptides present in most samples |
2 | yes | yes | 1, 3 | 3 | 1, 3 | two or fewer peptides present in most samples |
3 | no | yes | 1, 2, 3 | (peptide 1 missing) | (peptide 2 missing) | three peptides present in most samples |
3 | yes | yes | 1, 2, 3 | 2, 3 | 1, 3 | three or fewer peptides present in most samples |
Further considerations for parameter selection
With filter_charge
and average
, there is a trade-off between comparability of protein abundances within a sample and of abundances for the same protein across different samples.
Setting filter_charge
may increase reproducibility between samples, but will distort the proportions of protein abundances within a sample. The reason is that ionization properties vary between peptides, but should remain constant across samples. Filtering by charge state can help to reduce the impact of feature detection differences between samples.
For average
, there is a qualitative difference between (intensity weighted) mean/median and
sum
in the effect that missing peptide abundances have (only if include_all
is set or top
is 0): (intensity weighted) mean and
median
ignore missing cases, averaging only present values. If low-abundant peptides are not detected in some samples, the computed protein abundances for those samples may thus be too optimistic. sum
implicitly treats missing values as zero, so this problem does not occur and comparability across samples is ensured. However, with sum
the total number of peptides ("summands") available for a protein may affect the abundances computed for it (depending on top
), so results within a sample may become unproportional.
OpenMS / TOPP release 2.0.0 | Documentation generated on Wed Mar 30 2016 12:49:26 using doxygen 1.8.11 |