Package picard.sam

Class DuplicationMetrics


  • public class DuplicationMetrics
    extends MergeableMetricBase
    Metrics that are calculated during the process of marking duplicates within a stream of SAMRecords.
    • Field Detail

      • LIBRARY

        public String LIBRARY
        The library on which the duplicate marking was performed.
      • UNPAIRED_READS_EXAMINED

        public long UNPAIRED_READS_EXAMINED
        The number of mapped reads examined which did not have a mapped mate pair, either because the read is unpaired, or the read is paired to an unmapped mate.
      • READ_PAIRS_EXAMINED

        public long READ_PAIRS_EXAMINED
        The number of mapped read pairs examined. (Primary, non-supplemental)
      • SECONDARY_OR_SUPPLEMENTARY_RDS

        public long SECONDARY_OR_SUPPLEMENTARY_RDS
        The number of reads that were either secondary or supplementary
      • UNMAPPED_READS

        public long UNMAPPED_READS
        The total number of unmapped reads examined. (Primary, non-supplemental)
      • UNPAIRED_READ_DUPLICATES

        public long UNPAIRED_READ_DUPLICATES
        The number of fragments that were marked as duplicates.
      • READ_PAIR_DUPLICATES

        public long READ_PAIR_DUPLICATES
        The number of read pairs that were marked as duplicates.
      • READ_PAIR_OPTICAL_DUPLICATES

        public long READ_PAIR_OPTICAL_DUPLICATES
        The number of read pairs duplicates that were caused by optical duplication. Value is always < READ_PAIR_DUPLICATES, which counts all duplicates regardless of source.
      • PERCENT_DUPLICATION

        public Double PERCENT_DUPLICATION
        The fraction of mapped sequence that is marked as duplicate.
      • ESTIMATED_LIBRARY_SIZE

        public Long ESTIMATED_LIBRARY_SIZE
        The estimated number of unique molecules in the library based on PE duplication.
    • Constructor Detail

      • DuplicationMetrics

        public DuplicationMetrics()
    • Method Detail

      • calculateDerivedFields

        public void calculateDerivedFields()
        Fills in the ESTIMATED_LIBRARY_SIZE based on the paired read data examined where possible and the PERCENT_DUPLICATION.
        Overrides:
        calculateDerivedFields in class MergeableMetricBase
      • calculateDerivedMetrics

        @Deprecated
        public void calculateDerivedMetrics()
        Deprecated.
        Fills in the ESTIMATED_LIBRARY_SIZE based on the paired read data examined where possible and the PERCENT_DUPLICATION.

        Deprecated, use calculateDerivedFields() instead.

      • estimateLibrarySize

        public static Long estimateLibrarySize​(long readPairs,
                                               long uniqueReadPairs)
        Estimates the size of a library based on the number of paired end molecules observed and the number of unique pairs observed.

        Based on the Lander-Waterman equation that states: C/X = 1 - exp( -N/X ) where X = number of distinct molecules in library N = number of read pairs C = number of distinct fragments observed in read pairs

      • estimateRoi

        public static double estimateRoi​(long estimatedLibrarySize,
                                         double x,
                                         long pairs,
                                         long uniquePairs)
        Estimates the ROI (return on investment) that one would see if a library was sequenced to x higher coverage than the observed coverage.
        Parameters:
        estimatedLibrarySize - the estimated number of molecules in the library
        x - the multiple of sequencing to be simulated (i.e. how many X sequencing)
        pairs - the number of pairs observed in the actual sequencing
        uniquePairs - the number of unique pairs observed in the actual sequencing
        Returns:
        a number z <= x that estimates if you had pairs*x as your sequencing then you would observe uniquePairs*z unique pairs.
      • calculateRoiHistogram

        public htsjdk.samtools.util.Histogram<Double> calculateRoiHistogram()
        Calculates a histogram using the estimateRoi method to estimate the effective yield doing x sequencing for x=1..10.
      • main

        public static void main​(String[] args)