{ "cells": [ { "cell_type": "markdown", "id": "9c16be61-4d87-4043-b2d6-755f6d6aff32", "metadata": {}, "source": [ "# Getting Start" ] }, { "cell_type": "markdown", "id": "e351e332-1431-4bce-baa8-6abaebcda43b", "metadata": {}, "source": [ "Examples of scPhyloX and steps are provided in Jupyter Notebook under notebooks folder. For start, please refer to records analyzing simulation, hek293T, Fly organs, Human HSC/MPPs and mouse CRC tumor datasets." ] }, { "cell_type": "markdown", "id": "ea9d4836-e52f-4832-aea2-dfbfca5cb2b1", "metadata": {}, "source": [ "## ScPhyloX on New Dataset" ] }, { "cell_type": "markdown", "id": "8785bca4-a143-4602-9565-efd4e744f7c4", "metadata": {}, "source": [ "ScPhyloX provides an integrated function for Phylodynamics Inference by default whilst specific configurations might need to be adjusted accordingly." ] }, { "cell_type": "markdown", "id": "bff369a7-4aea-4b4f-9490-ff543ab8c5a0", "metadata": {}, "source": [ "### 1. import package" ] }, { "cell_type": "markdown", "id": "966984ec-0692-4bca-9643-0ef653142ef5", "metadata": {}, "source": [ "```import scPhyloX as spx```" ] }, { "cell_type": "markdown", "id": "a2d181d5-4fdf-44a3-b020-0b3af9fcfda3", "metadata": {}, "source": [ "### 2. Data preprocessing" ] }, { "cell_type": "markdown", "id": "0559c551-d325-4741-9ac5-be384e2a1268", "metadata": {}, "source": [ "#### 2.1 Analysis with phylogenetic tree (rooted, with branch length)" ] }, { "cell_type": "markdown", "id": "14b6e02d-2ba5-4ec1-b992-74780d003caa", "metadata": {}, "source": [ "```\n", "from bio import Phylo\n", "tree = Phylo.read('path_to_your_tree.nwk', format='newick')\n", "depths = tree.get_depths()\n", "lr_dist = [depths[i] for i in tree.get_terminals()]\n", "lp_dist = [i.branch_length for i in tree.get_terminals()]\n", "```\n" ] }, { "cell_type": "markdown", "id": "9a620051-f811-4f0a-8403-52d4341bcb2f", "metadata": {}, "source": [ "#### 2.2 Analysis with character matrix" ] }, { "cell_type": "markdown", "id": "5fe8be5f-3053-4e6c-87d4-9604cd8aeac0", "metadata": {}, "source": [ "```\n", "import pandas as pd\n", "charater_matrix = pd.read_csv('path_to_your_mat.csv', index_col=0)\n", "lr_dist = spx.data_factory.get_mutnum(charater_matrix)\n", "lp_dist = spx.data_factory.get_branchlen(charater_matrix)\n", "```" ] }, { "cell_type": "markdown", "id": "95ae22d9-839d-42c8-8aec-588d0309d5b1", "metadata": {}, "source": [ "### 3. Estimation of mutation rate and cell generation" ] }, { "cell_type": "markdown", "id": "0b24d660-c9bb-45d0-a622-1b6d1e98e568", "metadata": {}, "source": [ "```\n", "import arviz as az\n", "idata_bl = spx.est_mr.mutation_rate_mcmc(lp_dist, draw=500, tune=500)\n", "ge = spx.est_mr.GenerationEst(lr_dist, az.summary(idata_bl).loc['mu']['mean'])\n", "gen_num = ge.estimate(cell_number) ## Cell number is the total number of cells in the study tissue, determined by prior knowledge.\n", "```" ] }, { "cell_type": "markdown", "id": "fc01b387-3507-4cb3-a307-934629323104", "metadata": {}, "source": [ "### 4. Perform phylodynamics inference" ] }, { "cell_type": "markdown", "id": "e062caeb-32f0-4664-bee7-31e89981c0a5", "metadata": {}, "source": [ "#### 4.1 Tissue development model" ] }, { "cell_type": "markdown", "id": "f558bb71-b353-4d7d-a71b-434af2734e55", "metadata": {}, "source": [ "```\n", "res_de = spx.est_tissue.para_inference_DE(gen_num, T=35) ## T is the time of tissue growth, unlimited time unit.\n", "axh, bxh, rh, dh, kh, t0h, c0 = res_de[0][-1]\n", "dh = 10**(-dh)\n", "mcmc_prior = (axh, bxh, rh, dh, kh, t0h)\n", "idata = spx.est_tissue.mcmc_inference(gen_num, mcmc_prior, T=35, c0=c0, sigma=100) ## sigma is hyper-parameter, determined by total cell number \n", "```" ] }, { "cell_type": "markdown", "id": "e25f5de5-41c0-446c-b7dc-c19854a6417a", "metadata": {}, "source": [ "#### 4.2 Tumor growth model" ] }, { "cell_type": "markdown", "id": "9ac83ecb-ccb9-41d4-8298-7b9356b316aa", "metadata": {}, "source": [ "```\n", "res_de = spx.est_tumor.para_inference_DE(gen_num, T=35, c0=100)\n", "rh, ah, sh, uh, c0 = res_de[0][-1]\n", "uh = 10**(-uh)\n", "mcmc_prior = (rh, ah, sh, uh)\n", "idata = spx.est_tumor.mcmc_inference(gen_num, mcmc_prior, T=35, c0=c0, sigma=100)\n", "```" ] }, { "cell_type": "markdown", "id": "4c0a41a5-9ffe-4c8e-8bfe-32cd9043cab9", "metadata": {}, "source": [ "### 5. Convergence analysis of MCMC" ] }, { "cell_type": "markdown", "id": "ab07d764-b08b-4711-b5fe-38f60cd9d173", "metadata": {}, "source": [ "```\n", "print(az.summary(idata_bl))\n", "print(az.summary(idata))\n", "fig = plt.figure(layout='constrained',figsize=(11,3))\n", "gs = GridSpec(2, 3, figure=fig)\n", "ax1 = fig.add_subplot(gs[:,0])\n", "ls = 'solid,dotted,dashed,dashdot'.split(',')\n", "for i, l in enumerate(ls): \n", " sns.kdeplot(idata_bl.posterior['mu'].to_numpy()[i], linestyle=l, ax=ax1, label=f'Chain {i+1}')\n", "ax1.vlines(2, 0, 1.7, color='black')\n", "ax1.legend(fontsize=10,loc=2)\n", "ax1.set_title(r'$\\mu$ distribution')\n", "\n", "ax2 = fig.add_subplot(gs[:,1])\n", "for i, l in enumerate(ls): \n", " sns.kdeplot(idata_bl.posterior['delta'].to_numpy()[i], linestyle=l, ax=ax2, label=f'Chain {i+1}')\n", "ax2.set_title(r'$\\delta$ distribution')\n", "ax3 = fig.add_subplot(gs[0,2])\n", "for i, l in enumerate(ls):\n", " ax3.plot(idata_bl.posterior['mu'][i], linestyle=l, label=f'Chain {i+1}')\n", "ax3.set_title('$\\mu$ mcmc trace')\n", "ax3.set_xlabel('step')\n", "ax4 = fig.add_subplot(gs[1,2])\n", "for i, l in enumerate(ls):\n", " ax4.plot(idata_bl.posterior['delta'][i], linestyle=l, label=f'Chain {i+1}')\n", "ax4.set_title(r'$\\delta$ mcmc trace')\n", "ax4.set_xlabel('step')\n", "\n", "fig = plt.figure(figsize=(12,7))\n", "plt.rcParams['font.size'] = 12\n", "gt = [0.5, 0.6, 0.2, 1e-3]\n", "gs = GridSpec(4, 3, figure=fig)\n", "ls = 'solid.dotted.dashed.dashdot.solid'.split('.')\n", "for ind, sym in enumerate('r,a,s,u'.split(',')):\n", " if ind >= 2:\n", " ax = fig.add_subplot(gs[2:, ind-2])\n", " else:\n", " ax = fig.add_subplot(gs[:2, ind])\n", " for i, l in enumerate(ls): \n", " sns.kdeplot(idata.posterior[sym].to_numpy()[i], linestyle=l, ax=ax, label=f'Chain {i+1}')\n", " ax.legend(fontsize=10,loc=2)\n", " ylim = ax.get_ylim()\n", " ax.vlines(gt[ind], *ylim, color='black')\n", " ax.set_ylim(ylim)\n", " ax.set_title(fr'${sym}$ distribution')\n", "\n", "ax0 = fig.add_subplot(gs[-1, 2])\n", "for i, l in enumerate(ls): \n", " ax0.plot(idata.posterior['u'][i], linestyle=l, label=f'Chain {i+1}')\n", " ax0.set_title(fr'$u$ mcmc trace')\n", "for ind, sym in enumerate('r,a,s'.split(',')):\n", " ax = fig.add_subplot(gs[ind, 2], sharex=ax0)\n", " for i, l in enumerate(ls): \n", " ax.plot(idata.posterior[sym][i], linestyle=l, label=f'Chain {i+1}')\n", " ax.set_title(fr'${sym}$ mcmc trace')\n", " plt.setp(ax.get_xticklabels(), visible=False)\n", "ax0.set_xlabel('step')\n", "plt.tight_layout()\n", "```" ] }, { "cell_type": "markdown", "id": "e9a60bfa-49e9-4908-9efa-7ff3568af4d5", "metadata": {}, "source": [ "### 6. Phylodynamics inference results visualization" ] }, { "cell_type": "markdown", "id": "9a8edfac-28e6-4a3e-95e3-b39dd96b4f3f", "metadata": {}, "source": [ "```\n", "theta_h = az.summary(idata).loc['ax,bx,r,d,k,t0'.split(',')]['mean'].to_numpy()\n", "x0 = [c0, 0]\n", "n_stemcells = np.array([[spx.est_tissue.ncyc(i, j, c0, *theta_h) for j in range(T)] for i in range(max(100, len(gen_num)))])\n", "n_nonstemcells = np.array([[spx.est_tissue.nnc(i, j, c0, *theta_h) for j in range(T)] for i in range(max(100, len(gen_num)))])\n", "fig, ax = plt.subplots()\n", "ax.set_xlabel('Time')\n", "ax.set_ylabel('Cell number')\n", "ax.plot(n_stemcells.sum(0), c='#9098d9', lw=4, label='stem cell')\n", "ax.plot(n_nonstemcells.sum(0), c='#ed9e44', lw=4, label='Non-stem cell')\n", "ax.ticklabel_format (style='sci', scilimits= (-1,2), axis='y')\n", "ax.spines['right'].set_visible(False)\n", "ax.spines['top'].set_visible(False)\n", "ax.legend(loc=2)\n", "ax.ticklabel_format (style='sci', scilimits= (-1,2), axis='y', useMathText=True)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "a273dde5-c42d-4a6e-ba22-ac2e4cb619a7", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }