Available genetic codes¶
[1]:
from cogent3 import available_codes
available_codes()
[1]:
| Code ID | Name |
|---|---|
| 1 | Standard Nuclear |
| 2 | Vertebrate Mitochondrial |
| 3 | Yeast Mitochondrial |
| 4 | Mold, Protozoan, and Coelenterate Mitochondrial, and Mycoplasma/Spiroplasma Nuclear |
| 5 | Invertebrate Mitochondrial |
| 6 | Ciliate, Dasycladacean and Hexamita Nuclear |
| 9 | Echinoderm and Flatworm Mitochondrial |
| 10 | Euplotid Nuclear |
| 11 | Bacterial Nuclear and Plant Plastid |
| 12 | Alternative Yeast Nuclear |
| 13 | Ascidian Mitochondrial |
| 14 | Alternative Flatworm Mitochondrial |
| 15 | Blepharisma Nuclear |
| 16 | Chlorophycean Mitochondrial |
| 20 | Trematode Mitochondrial |
| 22 | Scenedesmus obliquus Mitochondrial |
| 23 | Thraustochytrium Mitochondrial |
17 rows x 2 columns
In cases where a cogent3 object method has a gc argument, you can just use the number under “Code ID” column.
For example:
[2]:
from cogent3 import load_aligned_seqs
nt_seqs = load_aligned_seqs("../data/brca1-bats.fasta", moltype="dna")
nt_seqs[:21]
[2]:
| 0 | |
| TombBat | TGTGGCACAAGTACTCATGCC |
| FlyingFox | ..........A.G........ |
| DogFaced | ..........A.......... |
| FreeTaile | .........GA.......... |
| LittleBro | .........GA.......... |
5 x 21 dna alignment
We specify the genetic code, and that codons that are incomplete as they contain a gap, are converted to ?.
[3]:
aa_seqs = nt_seqs.get_translation(gc=1, incomplete_ok=True)
aa_seqs[:20]
[3]:
| 0 | |
| TombBat | CGTSTHASSVQHENSSLLLT |
| FlyingFox | ...NA....L....-...Y. |
| DogFaced | ...N...N.L........Y. |
| FreeTaile | ...D.....L.......... |
| LittleBro | ...D.....L.......... |
5 x 20 protein alignment
Getting a genetic code with get_code()¶
This function can be used directly to get a genetic code. We will get the code with ID 4.
[4]:
from cogent3 import get_code
gc = get_code(4)
gc
[4]:
| aa | IUPAC code | codons |
|---|---|---|
| Alanine | A | GCT,GCC,GCA,GCG |
| Cysteine | C | TGT,TGC |
| Aspartic Acid | D | GAT,GAC |
| Glutamic Acid | E | GAA,GAG |
| Phenylalanine | F | TTT,TTC |
| Glycine | G | GGT,GGC,GGA,GGG |
| Histidine | H | CAT,CAC |
| Isoleucine | I | ATT,ATC,ATA |
| Lysine | K | AAA,AAG |
| Leucine | L | TTA,TTG,CTT,CTC,CTA,CTG |
| Methionine | M | ATG |
| Asparagine | N | AAT,AAC |
| Proline | P | CCT,CCC,CCA,CCG |
| Glutamine | Q | CAA,CAG |
| Arginine | R | CGT,CGC,CGA,CGG,AGA,AGG |
| Serine | S | TCT,TCC,TCA,TCG,AGT,AGC |
| Threonine | T | ACT,ACC,ACA,ACG |
| Valine | V | GTT,GTC,GTA,GTG |
| Tryptophan | W | TGA,TGG |
| Tyrosine | Y | TAT,TAC |
| STOP | * | TAA,TAG |