.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_seqalign.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_seqalign.py: Sequence Alignment Plot ======================= .. GENERATED FROM PYTHON SOURCE LINES 5-18 .. code-block:: Python from collections import Counter import numpy as np import pandas as pd import marsilea as ma import marsilea.plotter as mp import matplotlib as mpl import mpl_fontkit as fk fk.install("Roboto Mono", verbose=False) mpl.rcParams["font.size"] = 30 .. GENERATED FROM PYTHON SOURCE LINES 19-21 Load data --------- .. GENERATED FROM PYTHON SOURCE LINES 21-24 .. code-block:: Python seq = ma.load_data("seq_align") seq = seq.iloc[:, 130:175] .. GENERATED FROM PYTHON SOURCE LINES 25-27 Calculate the height of each amino acid. See https://en.wikipedia.org/wiki/Sequence_logo .. GENERATED FROM PYTHON SOURCE LINES 27-50 .. code-block:: Python collect = [] for _, col in seq.items(): collect.append(Counter(col)) hm = pd.DataFrame(collect) del hm["-"] hm = hm.T.fillna(0.0) hm.columns = seq.columns hm /= hm.sum(axis=0) n = hm.shape[1] s = 20 En = (1 / np.log(2)) * ((s - 1) / (2 * n)) heights = [] for _, col in hm.items(): H = -(np.log2(col) * col).sum() R = np.log2(20) - (H + En) heights.append(col * R) logo = pd.DataFrame(heights).T .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/marsilea/envs/v0.3.2/lib/python3.10/site-packages/pandas/core/arraylike.py:399: RuntimeWarning: divide by zero encountered in log2 result = getattr(ufunc, method)(*inputs, **kwargs) .. GENERATED FROM PYTHON SOURCE LINES 51-53 Prepare color palette and data ------------------------------ .. GENERATED FROM PYTHON SOURCE LINES 53-96 .. code-block:: Python color_encode = { 'A': '#f76ab4', 'C': '#ff7f00', 'D': '#e41a1c', 'E': '#e41a1c', 'F': '#84380b', 'G': '#f76ab4', 'H': '#3c58e5', 'I': '#12ab0d', 'K': '#3c58e5', 'L': '#12ab0d', 'M': '#12ab0d', 'N': '#972aa8', 'P': '#12ab0d', 'Q': '#972aa8', 'R': '#3c58e5', 'S': '#ff7f00', 'T': '#ff7f00', 'V': '#12ab0d', 'W': '#84380b', 'Y': '#84380b', '-': 'white' } max_aa = [] freq = [] for _, col in hm.items(): ix = np.argmax(col) max_aa.append(hm.index[ix]) freq.append(col[ix]) position = [] mock_ticks = [] for i in seq.columns: if int(i) % 10 == 0: position.append(i) mock_ticks.append("^") else: position.append("") mock_ticks.append("") .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/marsilea/checkouts/v0.3.2/examples/plot_seqalign.py:84: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]` freq.append(col[ix]) .. GENERATED FROM PYTHON SOURCE LINES 97-99 Plot ---- .. GENERATED FROM PYTHON SOURCE LINES 99-115 .. code-block:: Python height = 5 width = height * seq.shape[1] / seq.shape[0] ch = ma.CatHeatmap(seq.to_numpy(), palette=color_encode, height=height, width=width) ch.add_layer(ma.plotter.TextMesh(seq.to_numpy())) ch.add_top(ma.plotter.SeqLogo(logo, color_encode=color_encode), pad=.1, size=2) ch.add_left(ma.plotter.Labels(seq.index), pad=.1) ch.add_bottom(ma.plotter.Labels(mock_ticks, rotation=0), pad=.1) ch.add_bottom(ma.plotter.Labels(position, rotation=0)) ch.add_bottom(ma.plotter.Numbers(freq, width=.9, color="#FFB11B", show_value=False), name="freq_bar", size=2) ch.add_bottom(ma.plotter.Labels(max_aa, rotation=0), pad=.1) ch.render() ch.get_ax("freq_bar").set_axis_off() .. image-sg:: /auto_examples/images/sphx_glr_plot_seqalign_001.png :alt: plot seqalign :srcset: /auto_examples/images/sphx_glr_plot_seqalign_001.png, /auto_examples/images/sphx_glr_plot_seqalign_001_2_00x.png 2.00x :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 11.654 seconds) .. _sphx_glr_download_auto_examples_plot_seqalign.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_seqalign.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_seqalign.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_