{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Apply a non-stationary nucleotide model to an alignment with 3 sequences\n",
    "\n",
    "We load some sample data first and select just 3 sequences, all via ``apps``."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Human', 'Rhesus', 'Galago']"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from cogent3.app import io, sample\n",
    "\n",
    "reader = io.load_aligned(format=\"fasta\", moltype=\"dna\")\n",
    "select_seqs = sample.take_named_seqs(\"Human\", \"Rhesus\", \"Galago\")\n",
    "process = reader + select_seqs\n",
    "aln = process(\"../data/primate_brca1.fasta\")\n",
    "aln.names"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We analyses these using the general Markov nucleotide, GN, model. Because we analyse just 3 sequences, there is only one possible unrooted tree. It's not required to specify the tree in this instance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "model(type='model', sm='GN', tree=None, name=None, sm_args=None, lf_args=None, time_het=None, param_rules=None, opt_args=None, split_codons=False, show_progress=False, verbose=False)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from cogent3.app import evo\n",
    "\n",
    "gn = evo.model(\"GN\")\n",
    "gn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We apply this to `aln`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "cogent3.app.result.model_result"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fitted = gn(aln)\n",
    "type(fitted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `model_result`\n",
    "\n",
    "As the output above indicates, `fitted` is a `model_result` object.\n",
    "\n",
    "This object provides an interface for accessing attributes of a fitted model. The representation display (below), a styled table in a jupyter notebook, presents a summary view with the log-likelihood (`lnL`), number of free parameters (`nfp`) and whether all matrices satisfied the identifiability conditions diagonal largest in column (DLC) and a unique mapping of Q to P. (For description of these quantities and why they matter see [Chang 1996](https://www.ncbi.nlm.nih.gov/pubmed/?term=8854662) and [Kaehler et al](http://www.ncbi.nlm.nih.gov/pubmed/25503772).)\n",
    "\n",
    "`model_result` has dictionary behaviour, hence the `key` column. This will be demonstrated below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<style>\n",
       "tr:last-child {border-bottom: 1px solid #000;} tr > th {text-align: center !important;} tr > td {text-align: left !important;}\n",
       "</style>\n",
       "<caption style=\"color: rgb(250, 250, 250); background: rgba(30, 140, 200, 1); align=top;\"><span style=\"font-weight: bold;\">GN</span><span></span></caption>\n",
       "<thead style=\"background: rgba(161, 195, 209, 0.75); font-weight: bold; text-align: center;\">\n",
       "<th>key</th>\n",
       "<th>lnL</th>\n",
       "<th>nfp</th>\n",
       "<th>DLC</th>\n",
       "<th>unique_Q</th>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr>\n",
       "<td></td>\n",
       "<td style=\"font-family: monospace !important;\">-5964.0129</td>\n",
       "<td style=\"font-family: monospace !important;\">17</td>\n",
       "<td>True</td>\n",
       "<td>True</td>\n",
       "</tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/plain": [
       "GN\n",
       "============================================\n",
       "key           lnL    nfp     DLC    unique_Q\n",
       "--------------------------------------------\n",
       "       -5964.0129     17    True        True\n",
       "--------------------------------------------\n",
       "\n",
       "1 rows x 5 columns"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fitted"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "More detail on the fitted model are available via attributes. For instance, display the maximum likelihood estimates via the likelihood function attribute"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<h4>GN</h4>\n",
       "<p>log-likelihood = -5964.0129</p>\n",
       "<p>number of free parameters = 17</p>\n",
       "<table>\n",
       "<style>\n",
       "tr:last-child {border-bottom: 1px solid #000;} tr > th {text-align: center !important;} tr > td {text-align: left !important;}\n",
       "</style>\n",
       "<caption style=\"color: rgb(250, 250, 250); background: rgba(30, 140, 200, 1); align=top;\"><span style=\"font-weight: bold;\">Global params</span><span></span></caption>\n",
       "<thead style=\"background: rgba(161, 195, 209, 0.75); font-weight: bold; text-align: center;\">\n",
       "<th>A&gt;C</th>\n",
       "<th>A&gt;G</th>\n",
       "<th>A&gt;T</th>\n",
       "<th>C&gt;A</th>\n",
       "<th>C&gt;G</th>\n",
       "<th>C&gt;T</th>\n",
       "<th>G&gt;A</th>\n",
       "<th>G&gt;C</th>\n",
       "<th>G&gt;T</th>\n",
       "<th>T&gt;A</th>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr>\n",
       "<td style=\"font-family: monospace !important;\">1.0628</td>\n",
       "<td style=\"font-family: monospace !important;\">3.1833</td>\n",
       "<td style=\"font-family: monospace !important;\">1.0207</td>\n",
       "<td style=\"font-family: monospace !important;\">1.7952</td>\n",
       "<td style=\"font-family: monospace !important;\">2.3276</td>\n",
       "<td style=\"font-family: monospace !important;\">5.6847</td>\n",
       "<td style=\"font-family: monospace !important;\">9.0910</td>\n",
       "<td style=\"font-family: monospace !important;\">1.1136</td>\n",
       "<td style=\"font-family: monospace !important;\">0.8313</td>\n",
       "<td style=\"font-family: monospace !important;\">1.4997</td>\n",
       "</tr>\n",
       "</tbody>\n",
       "</table>\n",
       "<table>\n",
       "<thead style=\"background: rgba(161, 195, 209, 0.75); font-weight: bold; text-align: center;\">\n",
       "<th>T&gt;C</th>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr>\n",
       "<td style=\"font-family: monospace !important;\">3.5575</td>\n",
       "</tr>\n",
       "</tbody>\n",
       "</table>\n",
       "\n",
       "<table>\n",
       "<style>\n",
       "tr:last-child {border-bottom: 1px solid #000;} tr > th {text-align: center !important;} tr > td {text-align: left !important;}\n",
       "</style>\n",
       "<caption style=\"color: rgb(250, 250, 250); background: rgba(30, 140, 200, 1); align=top;\"><span style=\"font-weight: bold;\">Edge params</span><span></span></caption>\n",
       "<thead style=\"background: rgba(161, 195, 209, 0.75); font-weight: bold; text-align: center;\">\n",
       "<th>edge</th>\n",
       "<th>parent</th>\n",
       "<th>length</th>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr>\n",
       "<td style=\"background: rgba(161, 195, 209, 0.25); font-weight: 600;\">Human</td>\n",
       "<td>root</td>\n",
       "<td style=\"font-family: monospace !important;\">0.0214</td>\n",
       "</tr>\n",
       "<tr>\n",
       "<td style=\"background: rgba(161, 195, 209, 0.25); font-weight: 600;\">Rhesus</td>\n",
       "<td>root</td>\n",
       "<td style=\"font-family: monospace !important;\">0.0208</td>\n",
       "</tr>\n",
       "<tr>\n",
       "<td style=\"background: rgba(161, 195, 209, 0.25); font-weight: 600;\">Galago</td>\n",
       "<td>root</td>\n",
       "<td style=\"font-family: monospace !important;\">0.1781</td>\n",
       "</tr>\n",
       "</tbody>\n",
       "</table>\n",
       "\n",
       "<table>\n",
       "<style>\n",
       "tr:last-child {border-bottom: 1px solid #000;} tr > th {text-align: center !important;} tr > td {text-align: left !important;}\n",
       "</style>\n",
       "<caption style=\"color: rgb(250, 250, 250); background: rgba(30, 140, 200, 1); align=top;\"><span style=\"font-weight: bold;\">Motif params</span><span></span></caption>\n",
       "<thead style=\"background: rgba(161, 195, 209, 0.75); font-weight: bold; text-align: center;\">\n",
       "<th>A</th>\n",
       "<th>C</th>\n",
       "<th>G</th>\n",
       "<th>T</th>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr>\n",
       "<td style=\"font-family: monospace !important;\">0.3740</td>\n",
       "<td style=\"font-family: monospace !important;\">0.1755</td>\n",
       "<td style=\"font-family: monospace !important;\">0.2098</td>\n",
       "<td style=\"font-family: monospace !important;\">0.2408</td>\n",
       "</tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/plain": [
       "GN\n",
       "log-likelihood = -5964.0129\n",
       "number of free parameters = 17\n",
       "============================================================================\n",
       "   A>C       A>G       A>T       C>A       C>G       C>T       G>A       G>C\n",
       "----------------------------------------------------------------------------\n",
       "1.0628    3.1833    1.0207    1.7952    2.3276    5.6847    9.0910    1.1136\n",
       "----------------------------------------------------------------------------\n",
       "\n",
       "continued: \n",
       "==========================\n",
       "   G>T       T>A       T>C\n",
       "--------------------------\n",
       "0.8313    1.4997    3.5575\n",
       "--------------------------\n",
       "\n",
       "==========================\n",
       "  edge    parent    length\n",
       "--------------------------\n",
       " Human      root    0.0214\n",
       "Rhesus      root    0.0208\n",
       "Galago      root    0.1781\n",
       "--------------------------\n",
       "====================================\n",
       "     A         C         G         T\n",
       "------------------------------------\n",
       "0.3740    0.1755    0.2098    0.2408\n",
       "------------------------------------"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fitted.lf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(-5964.012906757179, 17)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fitted.lnL, fitted.nfp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'../data/primate_brca1.fasta'"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fitted.source"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `model_result.tree` attribute is an \"annotated tree\". Maximum likelihood estimates from the model have been assigned to the tree. Of particular significance, the \"length\" attribute corresponds to the expected number of substitutions (or ENS). For a non-stationary model, like GN, this can be different to the conventional length ([Kaehler et al](http://www.ncbi.nlm.nih.gov/pubmed/25503772))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(Tree(\"(Human,Rhesus,Galago)root;\"),\n",
       " 3 x 2814 dna alignment: Human[TGTGGCACAAA...], Rhesus[TGTGGCACAAA...], Galago[TGTGGCAAAAA...])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fitted.tree, fitted.alignment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can access the sum of all branch lengths. Either as \"ENS\" or \"paralinear\" using the `total_length()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9280297908804234"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fitted.total_length(length_as=\"paralinear\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fitting a separate nucleotide model to each codon position\n",
    "\n",
    "Controlled by setting `split_codons=True`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<style>\n",
       "tr:last-child {border-bottom: 1px solid #000;} tr > th {text-align: center !important;} tr > td {text-align: left !important;}\n",
       "</style>\n",
       "<caption style=\"color: rgb(250, 250, 250); background: rgba(30, 140, 200, 1); align=top;\"><span style=\"font-weight: bold;\">GN</span><span></span></caption>\n",
       "<thead style=\"background: rgba(161, 195, 209, 0.75); font-weight: bold; text-align: center;\">\n",
       "<th>key</th>\n",
       "<th>lnL</th>\n",
       "<th>nfp</th>\n",
       "<th>DLC</th>\n",
       "<th>unique_Q</th>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr>\n",
       "<td></td>\n",
       "<td style=\"font-family: monospace !important;\">-5866.9371</td>\n",
       "<td style=\"font-family: monospace !important;\">51</td>\n",
       "<td>True</td>\n",
       "<td>True</td>\n",
       "</tr>\n",
       "<tr>\n",
       "<td style=\"font-family: monospace !important;\">1</td>\n",
       "<td style=\"font-family: monospace !important;\">-1955.5141</td>\n",
       "<td style=\"font-family: monospace !important;\">17</td>\n",
       "<td></td>\n",
       "<td></td>\n",
       "</tr>\n",
       "<tr>\n",
       "<td style=\"font-family: monospace !important;\">2</td>\n",
       "<td style=\"font-family: monospace !important;\">-1934.2598</td>\n",
       "<td style=\"font-family: monospace !important;\">17</td>\n",
       "<td></td>\n",
       "<td></td>\n",
       "</tr>\n",
       "<tr>\n",
       "<td style=\"font-family: monospace !important;\">3</td>\n",
       "<td style=\"font-family: monospace !important;\">-1977.1632</td>\n",
       "<td style=\"font-family: monospace !important;\">17</td>\n",
       "<td></td>\n",
       "<td></td>\n",
       "</tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/plain": [
       "GN\n",
       "============================================\n",
       "key           lnL    nfp     DLC    unique_Q\n",
       "--------------------------------------------\n",
       "       -5866.9371     51    True        True\n",
       "  1    -1955.5141     17                    \n",
       "  2    -1934.2598     17                    \n",
       "  3    -1977.1632     17                    \n",
       "--------------------------------------------\n",
       "\n",
       "4 rows x 5 columns"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gn = evo.model(\"GN\", split_codons=True)\n",
    "\n",
    "fitted = gn(aln)\n",
    "fitted"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The model fit statistics, `lnL` and `nfp` are now sums of the corresponding values from the fits to the individual positions. The `DLC` and `unique_Q` are also a summary across all models. These only achieve the value `True` when all matrices, from all models, satisfy the condition.\n",
    "\n",
    "We get access to the likelihood functions of the individual positions via the indicated dict keys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<h4>3</h4>\n",
       "<p>log-likelihood = -1977.1632</p>\n",
       "<p>number of free parameters = 17</p>\n",
       "<table>\n",
       "<style>\n",
       "tr:last-child {border-bottom: 1px solid #000;} tr > th {text-align: center !important;} tr > td {text-align: left !important;}\n",
       "</style>\n",
       "<caption style=\"color: rgb(250, 250, 250); background: rgba(30, 140, 200, 1); align=top;\"><span style=\"font-weight: bold;\">Global params</span><span></span></caption>\n",
       "<thead style=\"background: rgba(161, 195, 209, 0.75); font-weight: bold; text-align: center;\">\n",
       "<th>A&gt;C</th>\n",
       "<th>A&gt;G</th>\n",
       "<th>A&gt;T</th>\n",
       "<th>C&gt;A</th>\n",
       "<th>C&gt;G</th>\n",
       "<th>C&gt;T</th>\n",
       "<th>G&gt;A</th>\n",
       "<th>G&gt;C</th>\n",
       "<th>G&gt;T</th>\n",
       "<th>T&gt;A</th>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr>\n",
       "<td style=\"font-family: monospace !important;\">1.7484</td>\n",
       "<td style=\"font-family: monospace !important;\">3.4215</td>\n",
       "<td style=\"font-family: monospace !important;\">2.4111</td>\n",
       "<td style=\"font-family: monospace !important;\">2.3729</td>\n",
       "<td style=\"font-family: monospace !important;\">3.4060</td>\n",
       "<td style=\"font-family: monospace !important;\">16.1603</td>\n",
       "<td style=\"font-family: monospace !important;\">16.9889</td>\n",
       "<td style=\"font-family: monospace !important;\">2.0343</td>\n",
       "<td style=\"font-family: monospace !important;\">1.7432</td>\n",
       "<td style=\"font-family: monospace !important;\">1.9241</td>\n",
       "</tr>\n",
       "</tbody>\n",
       "</table>\n",
       "<table>\n",
       "<thead style=\"background: rgba(161, 195, 209, 0.75); font-weight: bold; text-align: center;\">\n",
       "<th>T&gt;C</th>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr>\n",
       "<td style=\"font-family: monospace !important;\">5.2795</td>\n",
       "</tr>\n",
       "</tbody>\n",
       "</table>\n",
       "\n",
       "<table>\n",
       "<style>\n",
       "tr:last-child {border-bottom: 1px solid #000;} tr > th {text-align: center !important;} tr > td {text-align: left !important;}\n",
       "</style>\n",
       "<caption style=\"color: rgb(250, 250, 250); background: rgba(30, 140, 200, 1); align=top;\"><span style=\"font-weight: bold;\">Edge params</span><span></span></caption>\n",
       "<thead style=\"background: rgba(161, 195, 209, 0.75); font-weight: bold; text-align: center;\">\n",
       "<th>edge</th>\n",
       "<th>parent</th>\n",
       "<th>length</th>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr>\n",
       "<td style=\"background: rgba(161, 195, 209, 0.25); font-weight: 600;\">Human</td>\n",
       "<td>root</td>\n",
       "<td style=\"font-family: monospace !important;\">0.0243</td>\n",
       "</tr>\n",
       "<tr>\n",
       "<td style=\"background: rgba(161, 195, 209, 0.25); font-weight: 600;\">Rhesus</td>\n",
       "<td>root</td>\n",
       "<td style=\"font-family: monospace !important;\">0.0319</td>\n",
       "</tr>\n",
       "<tr>\n",
       "<td style=\"background: rgba(161, 195, 209, 0.25); font-weight: 600;\">Galago</td>\n",
       "<td>root</td>\n",
       "<td style=\"font-family: monospace !important;\">0.1797</td>\n",
       "</tr>\n",
       "</tbody>\n",
       "</table>\n",
       "\n",
       "<table>\n",
       "<style>\n",
       "tr:last-child {border-bottom: 1px solid #000;} tr > th {text-align: center !important;} tr > td {text-align: left !important;}\n",
       "</style>\n",
       "<caption style=\"color: rgb(250, 250, 250); background: rgba(30, 140, 200, 1); align=top;\"><span style=\"font-weight: bold;\">Motif params</span><span></span></caption>\n",
       "<thead style=\"background: rgba(161, 195, 209, 0.75); font-weight: bold; text-align: center;\">\n",
       "<th>A</th>\n",
       "<th>C</th>\n",
       "<th>G</th>\n",
       "<th>T</th>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr>\n",
       "<td style=\"font-family: monospace !important;\">0.3375</td>\n",
       "<td style=\"font-family: monospace !important;\">0.1395</td>\n",
       "<td style=\"font-family: monospace !important;\">0.1647</td>\n",
       "<td style=\"font-family: monospace !important;\">0.3583</td>\n",
       "</tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/plain": [
       "3\n",
       "log-likelihood = -1977.1632\n",
       "number of free parameters = 17\n",
       "==============================================================================\n",
       "   A>C       A>G       A>T       C>A       C>G        C>T        G>A       G>C\n",
       "------------------------------------------------------------------------------\n",
       "1.7484    3.4215    2.4111    2.3729    3.4060    16.1603    16.9889    2.0343\n",
       "------------------------------------------------------------------------------\n",
       "\n",
       "continued: \n",
       "==========================\n",
       "   G>T       T>A       T>C\n",
       "--------------------------\n",
       "1.7432    1.9241    5.2795\n",
       "--------------------------\n",
       "\n",
       "==========================\n",
       "  edge    parent    length\n",
       "--------------------------\n",
       " Human      root    0.0243\n",
       "Rhesus      root    0.0319\n",
       "Galago      root    0.1797\n",
       "--------------------------\n",
       "====================================\n",
       "     A         C         G         T\n",
       "------------------------------------\n",
       "0.3375    0.1395    0.1647    0.3583\n",
       "------------------------------------"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fitted[3]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.1"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
