diff --git a/BINF2025_TP3.ipynb b/BINF2025_TP3.ipynb
index 61e87c2..abc318e 100644
--- a/BINF2025_TP3.ipynb
+++ b/BINF2025_TP3.ipynb
@@ -1,481 +1,598 @@
{
- "nbformat": 4,
- "nbformat_minor": 0,
- "metadata": {
- "colab": {
- "provenance": [],
- "authorship_tag": "ABX9TyNSXnqaXAUgZK9rmJ1TWbGo"
- },
- "kernelspec": {
- "name": "python3",
- "display_name": "Python 3"
- },
- "language_info": {
- "name": "python"
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "V09wQ1WIOmgn"
+ },
+ "source": [
+ "# BINF TP3 - Algorithmes d'alignement par paire"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "er6CtAyOxC6F"
+ },
+ "source": [
+ "Dans ce TP nous allons manipuler les algorithmes d'alignement par paire."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "BqEa3BJ1xICM"
+ },
+ "source": [
+ "# Exercice 0 - Echauffement"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "qqiiq5bcxYvM"
+ },
+ "source": [
+ "Q1. Donnez le score de la superposition :\n",
+ "\n",
+ "| | |\n",
+ "| :---: | :---: |\n",
+ "x | ATGTCATGA---TAC |\n",
+ "y | AT--CTAAATGTTAC |\n",
+ "\n",
+ "\n",
+ "étant donne le schéma d'évaluation :\n",
+ "\n",
+ "| | A | T | G | C |\n",
+ "| :---: | :---: | :---: | :---: | :---: |\n",
+ "| **A** | 1 | -1 | -1 | -1 |\n",
+ "| **T** | -1 | 1 | -1 | -1 |\n",
+ "| **G** | -1 | -1 | 1 | -1 |\n",
+ "| **C** | -1 | -1 | -1 | 1 |\n",
+ "\n",
+ "et\n",
+ "\n",
+ "$\\gamma(g) = 0.5 |g| + 0.5$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "kCJGGGYQ2GNi"
+ },
+ "source": [
+ "```markdown\n",
+ "7 identiques - 3 different - gamma(x) - gamma(y) = 7 - 3 - 2 -1,5 = 0,5\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "XyhXAhK-2NKJ"
+ },
+ "source": [
+ "Q2. Alignez les séquences suivantes avec l'algorithme de Levenshtein : x = ATG et y = ACTG."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "b9iovhyZ2bXw"
+ },
+ "source": [
+ "```markdown\n",
+ "A_TG\n",
+ "ACTG\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "OV_YaQHr2elB"
+ },
+ "source": [
+ "Q3.\tAlignez les séquences suivantes avec l'algorithme de Needleman-Wunsch global x = TAT et y = ATGAC en considérant le schéma d'évaluation suivant\n",
+ "\n",
+ "| | A | T | G | C |\n",
+ "| :---: | :---: | :---: | :---: | :---: |\n",
+ "| **A** | 1 | -0.5 | -0.5 | -0.5 |\n",
+ "| **T** | -0.5 | 1 | -0.5 | -0.5 |\n",
+ "| **G** | -0.5 | -0.5 | 1 | -0.5 |\n",
+ "| **C** | -0.5 | -0.5 | -0.5 | 1 |\n",
+ "\n",
+ "et\n",
+ "\n",
+ "$\\gamma(g) = 0.5 |g|$\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "g_MrecVs3Nrw"
+ },
+ "source": [
+ "```markdown\n",
+ "Votre réponse ici\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "y1YF-G6E3Qoo"
+ },
+ "source": [
+ "Q4. Alignez les séquences suivantes avec l'algorithme de Smith-Waterman x = TTGG y = ATGAC en utilisant le schéma d'évaluation de la question précédente.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "LLMECocb3pgI"
+ },
+ "source": [
+ "```markdown\n",
+ "TTGG_\n",
+ "ATGAC\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "46gw0avh3wGw"
+ },
+ "source": [
+ "# Exercice 1 : Algorithme de Levenshtein - version récursive"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ZKc09Kyg4a6v"
+ },
+ "source": [
+ "Q1. Ecrivez une fonction\n",
+ "\n",
+ "levenshtein(x: str, y: str) -> int\n",
+ "\n",
+ "qui retourne la distance de Levenshtein entre les séquences x et y en utilisant la version récursive de l'algorithme."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "id": "FJR69IEQ4aHv"
+ },
+ "outputs": [],
+ "source": [
+ "#Votre code ici\n",
+ "\n",
+ "def levenshtein(x,y):\n",
+ " if len(y) == 0:\n",
+ " return len(x)\n",
+ " elif len(x) == 0:\n",
+ " return len(y)\n",
+ " elif x[0] == y[0]:\n",
+ " return levenshtein(x[1:],y[1:])\n",
+ " else:\n",
+ " return 1+ min(levenshtein(x[1:],y[1:]),levenshtein(x,y[1:]),levenshtein(x[1:],y))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "arFVwA6E5NWn"
+ },
+ "source": [
+ "Q2. Vous pouvez tester votre code sur les exemples suivants:\n",
+ "\n",
+ "\n",
+ "* $L('CCAG', 'CA') = 2$\n",
+ "* $L('CCGT', 'CGTCA') = 3$\n",
+ "* $L(AY678264^*, OQ870305^*) = 310$\n",
+ "\n",
+ "$^*$ ids genbank de deux sequences."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "2\n",
+ "3\n"
+ ]
}
+ ],
+ "source": [
+ "print(levenshtein('CCAG','CA'))\n",
+ "print(levenshtein('CCGT','CGTCA'))"
+ ]
},
- "cells": [
- {
- "cell_type": "markdown",
- "source": [
- "# BINF TP3 - Algorithmes d'alignement par paire"
- ],
- "metadata": {
- "id": "V09wQ1WIOmgn"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "Dans ce TP nous allons manipuler les algorithmes d'alignement par paire."
- ],
- "metadata": {
- "id": "er6CtAyOxC6F"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "# Exercice 0 - Echauffement"
- ],
- "metadata": {
- "id": "BqEa3BJ1xICM"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "Q1. Donnez le score de la superposition :\n",
- "\n",
- "| | |\n",
- "| :---: | :---: |\n",
- "x | ATGTCATGA---TAC |\n",
- "y | AT--CTAAATGTTAC |\n",
- "\n",
- "\n",
- "étant donne le schéma d'évaluation :\n",
- "\n",
- "| | A | T | G | C |\n",
- "| :---: | :---: | :---: | :---: | :---: |\n",
- "| **A** | 1 | -1 | -1 | -1 |\n",
- "| **T** | -1 | 1 | -1 | -1 |\n",
- "| **G** | -1 | -1 | 1 | -1 |\n",
- "| **C** | -1 | -1 | -1 | 1 |\n",
- "\n",
- "et\n",
- "\n",
- "$\\gamma(g) = 0.5 |g| + 0.5$"
- ],
- "metadata": {
- "id": "qqiiq5bcxYvM"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "```markdown\n",
- "Votre réponse ici\n",
- "```"
- ],
- "metadata": {
- "id": "kCJGGGYQ2GNi"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "Q2. Alignez les séquences suivantes avec l'algorithme de Levenshtein : x = ATG et y = ACTG."
- ],
- "metadata": {
- "id": "XyhXAhK-2NKJ"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "```markdown\n",
- "Votre réponse ici\n",
- "```"
- ],
- "metadata": {
- "id": "b9iovhyZ2bXw"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "Q3.\tAlignez les séquences suivantes avec l'algorithme de Needleman-Wunsch global x = TAT et y = ATGAC en considérant le schéma d'évaluation suivant\n",
- "\n",
- "| | A | T | G | C |\n",
- "| :---: | :---: | :---: | :---: | :---: |\n",
- "| **A** | 1 | -0.5 | -0.5 | -0.5 |\n",
- "| **T** | -0.5 | 1 | -0.5 | -0.5 |\n",
- "| **G** | -0.5 | -0.5 | 1 | -0.5 |\n",
- "| **C** | -0.5 | -0.5 | -0.5 | 1 |\n",
- "\n",
- "et\n",
- "\n",
- "$\\gamma(g) = 0.5 |g|$\n"
- ],
- "metadata": {
- "id": "OV_YaQHr2elB"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "```markdown\n",
- "Votre réponse ici\n",
- "```"
- ],
- "metadata": {
- "id": "g_MrecVs3Nrw"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "Q4. Alignez les séquences suivantes avec l'algorithme de Smith-Waterman x = TTGG y = ATGAC en utilisant le schéma d'évaluation de la question précédente.\n"
- ],
- "metadata": {
- "id": "y1YF-G6E3Qoo"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "```markdown\n",
- "Votre réponse ici\n",
- "```"
- ],
- "metadata": {
- "id": "LLMECocb3pgI"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "# Exercice 1 : Algorithme de Levenshtein - version récursive"
- ],
- "metadata": {
- "id": "46gw0avh3wGw"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "Q1. Ecrivez une fonction\n",
- "\n",
- "levenshtein(x: str, y: str) -> int\n",
- "\n",
- "qui retourne la distance de Levenshtein entre les séquences x et y en utilisant la version récursive de l'algorithme."
- ],
- "metadata": {
- "id": "ZKc09Kyg4a6v"
- }
- },
- {
- "cell_type": "code",
- "source": [
- "#Votre code ici"
- ],
- "metadata": {
- "id": "FJR69IEQ4aHv"
- },
- "execution_count": null,
- "outputs": []
- },
- {
- "cell_type": "markdown",
- "source": [
- "Q2. Vous pouvez tester votre code sur les exemples suivants:\n",
- "\n",
- "\n",
- "* $L('CCAG', 'CA') = 2$\n",
- "* $L('CCGT', 'CGTCA') = 3$\n",
- "* $L(AY678264^*, OQ870305^*) = 310$\n",
- "\n",
- "$^*$ ids genbank de deux sequences."
- ],
- "metadata": {
- "id": "arFVwA6E5NWn"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "# Exercice 2 : Algorithme de Smith-Waterman - version itérative"
- ],
- "metadata": {
- "id": "erCpfG7O7BV-"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "Q1. Ecrivez la fonction\n",
- "\n",
- "sw_fwd(x: str, y: str, cmap: dict, sigma: array, (go, ge): list) -> (array, array)\n",
- "\n",
- "qui construit les matrices $S$ et $B$ en utilisant l'algorithme de Smith-Waterman pour aligner les séquences x et y suivant le schéma d'évaluation donné par la matrice de substitution $\\Sigma$ et la fonction d'évaluation des trous $\\gamma(n)= g_o + g_e \\times n$. Le dictionnaire cmap donne la position des différents nucléotides dans la matrice $\\Sigma$. La fonction retourne la paire de matrices de score $S$ et de retour $B$."
- ],
- "metadata": {
- "id": "rv2Y78y37IOd"
- }
- },
- {
- "cell_type": "code",
- "source": [
- "#Votre code ici"
- ],
- "metadata": {
- "id": "njn3JB0b-WHj"
- },
- "execution_count": null,
- "outputs": []
- },
- {
- "cell_type": "markdown",
- "source": [
- "Q2. Ecrivez la fonction\n",
- "\n",
- "sw_bwd(x: str, y: str, S: array, B: array) -> (str, str, float)\n",
- "\n",
- "qui effectue l'etape de retour de l'algorithme de Smith-Waterman etant donné les séquences $x$ et $y$ et les matrices de score $S$ et de retour $B$. La fonction retourne un tuple contenant les alignements des séquences x et y et le score de l'alignement."
- ],
- "metadata": {
- "id": "55n8mt9U-Wai"
- }
- },
- {
- "cell_type": "code",
- "source": [
- "#Votre code ici"
- ],
- "metadata": {
- "id": "ij9JDpBm_UZ7"
- },
- "execution_count": null,
- "outputs": []
- },
- {
- "cell_type": "markdown",
- "source": [
- "Q3. Vous pouvez tester votre code en utilisant le schéma d'évaluation suivant :"
- ],
- "metadata": {
- "id": "kwmxg2dxAiwS"
- }
- },
- {
- "cell_type": "code",
- "source": [
- "cmap = {\"A\": 0, \"T\": 1, \"G\": 2, \"C\": 3}\n",
- "m = np.array([[1, -0.5, -0.5, -0.5],\n",
- " [-0.5, 1, -0.5, -0.5],\n",
- " [-0.5, -0.5, 1, -0.5],\n",
- " [-0.5, -0.5, -0.5, 1]])\n",
- "go = 0\n",
- "ge = 0.5"
- ],
- "metadata": {
- "id": "JUtYRFTBAwwZ"
- },
- "execution_count": null,
- "outputs": []
- },
- {
- "cell_type": "markdown",
- "source": [
- "* $SW('TCGC', 'CTTAG')$ retourne un score de $1.5$ à la position $(3,5)$ et l'alignement"
- ],
- "metadata": {
- "id": "eMGh4K5aIFxE"
- }
- },
- {
- "cell_type": "code",
- "source": [
- "HTML(\"
\")"
- ],
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/",
- "height": 60
- },
- "id": "joHNwJ9AIf6F",
- "outputId": "a9206810-a083-4d86-8b14-38183f1dd80c"
- },
- "execution_count": null,
- "outputs": [
- {
- "output_type": "execute_result",
- "data": {
- "text/plain": [
- ""
- ],
- "text/html": [
- ""
- ]
- },
- "metadata": {},
- "execution_count": 18
- }
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "erCpfG7O7BV-"
+ },
+ "source": [
+ "# Exercice 2 : Algorithme de Smith-Waterman - version itérative"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "rv2Y78y37IOd"
+ },
+ "source": [
+ "Q1. Ecrivez la fonction\n",
+ "\n",
+ "sw_fwd(x: str, y: str, cmap: dict, sigma: array, (go, ge): list) -> (array, array)\n",
+ "\n",
+ "qui construit les matrices $S$ et $B$ en utilisant l'algorithme de Smith-Waterman pour aligner les séquences x et y suivant le schéma d'évaluation donné par la matrice de substitution $\\Sigma$ et la fonction d'évaluation des trous $\\gamma(n)= g_o + g_e \\times n$. Le dictionnaire cmap donne la position des différents nucléotides dans la matrice $\\Sigma$. La fonction retourne la paire de matrices de score $S$ et de retour $B$."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 60,
+ "metadata": {
+ "id": "njn3JB0b-WHj"
+ },
+ "outputs": [],
+ "source": [
+ "#Votre code ici\n",
+ "def sw_fwd(x: str, y: str, cmap: dict, sigma, go, ge):\n",
+ " S = np.zeros((len(x)+1, len(y)+1))\n",
+ " B = np.zeros((len(x)+1, len(y)+1))\n",
+ " for i in range(len(x)+1):\n",
+ " for j in range(len(y)+1):\n",
+ " if i == 0 or j == 0:\n",
+ " continue\n",
+ " a= sigma[cmap[x[i-1]]][cmap[y[j-1]]]\n",
+ " list_b = [( ge +S[i-k][j]) for k in([1, i])]\n",
+ " b= max(list_b)\n",
+ " c=max([( ge +S[i][j-k]) for k in ([1, j])])\n",
+ " print('i', i, 'j', j, 'a', a, 'b', b, 'c', c)\n",
+ " S[i][j]= max(0, a,b,c)\n",
+ " \n",
+ " print(S)\n",
+ " return S, B"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "55n8mt9U-Wai"
+ },
+ "source": [
+ "Q2. Ecrivez la fonction\n",
+ "\n",
+ "sw_bwd(x: str, y: str, S: array, B: array) -> (str, str, float)\n",
+ "\n",
+ "qui effectue l'etape de retour de l'algorithme de Smith-Waterman etant donné les séquences $x$ et $y$ et les matrices de score $S$ et de retour $B$. La fonction retourne un tuple contenant les alignements des séquences x et y et le score de l'alignement."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 61,
+ "metadata": {
+ "id": "ij9JDpBm_UZ7"
+ },
+ "outputs": [],
+ "source": [
+ "#Votre code ici"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "kwmxg2dxAiwS"
+ },
+ "source": [
+ "Q3. Vous pouvez tester votre code en utilisant le schéma d'évaluation suivant :"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 62,
+ "metadata": {
+ "id": "JUtYRFTBAwwZ"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "i 1 j 1 a -0.5 b 0.5 c 0.5\n",
+ "i 1 j 2 a 1.0 b 0.5 c 1.0\n",
+ "i 1 j 3 a 1.0 b 0.5 c 1.5\n",
+ "i 1 j 4 a -0.5 b 0.5 c 2.0\n",
+ "i 1 j 5 a -0.5 b 0.5 c 2.5\n",
+ "i 2 j 1 a 1.0 b 1.0 c 0.5\n",
+ "i 2 j 2 a -0.5 b 1.5 c 1.5\n",
+ "i 2 j 3 a -0.5 b 2.0 c 2.0\n",
+ "i 2 j 4 a -0.5 b 2.5 c 2.5\n",
+ "i 2 j 5 a -0.5 b 3.0 c 3.0\n",
+ "i 3 j 1 a -0.5 b 1.5 c 0.5\n",
+ "i 3 j 2 a -0.5 b 2.0 c 2.0\n",
+ "i 3 j 3 a -0.5 b 2.5 c 2.5\n",
+ "i 3 j 4 a -0.5 b 3.0 c 3.0\n",
+ "i 3 j 5 a 1.0 b 3.5 c 3.5\n",
+ "i 4 j 1 a 1.0 b 2.0 c 0.5\n",
+ "i 4 j 2 a -0.5 b 2.5 c 2.5\n",
+ "i 4 j 3 a -0.5 b 3.0 c 3.0\n",
+ "i 4 j 4 a -0.5 b 3.5 c 3.5\n",
+ "i 4 j 5 a -0.5 b 4.0 c 4.0\n",
+ "[[0. 0. 0. 0. 0. 0. ]\n",
+ " [0. 0.5 1. 1.5 2. 2.5]\n",
+ " [0. 1. 1.5 2. 2.5 3. ]\n",
+ " [0. 1.5 2. 2.5 3. 3.5]\n",
+ " [0. 2. 2.5 3. 3.5 4. ]]\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "(array([[0. , 0. , 0. , 0. , 0. , 0. ],\n",
+ " [0. , 0.5, 1. , 1.5, 2. , 2.5],\n",
+ " [0. , 1. , 1.5, 2. , 2.5, 3. ],\n",
+ " [0. , 1.5, 2. , 2.5, 3. , 3.5],\n",
+ " [0. , 2. , 2.5, 3. , 3.5, 4. ]]),\n",
+ " array([[0., 0., 0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0., 0., 0.]]))"
]
+ },
+ "execution_count": 62,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "cmap = {\"A\": 0, \"T\": 1, \"G\": 2, \"C\": 3}\n",
+ "m = np.array([[1, -0.5, -0.5, -0.5],\n",
+ " [-0.5, 1, -0.5, -0.5],\n",
+ " [-0.5, -0.5, 1, -0.5],\n",
+ " [-0.5, -0.5, -0.5, 1]])\n",
+ "go = 0\n",
+ "ge = 0.5\n",
+ "\n",
+ "sw_fwd('TCGC','CTTAG', cmap, m, go, ge)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "eMGh4K5aIFxE"
+ },
+ "source": [
+ "* $SW('TCGC', 'CTTAG')$ retourne un score de $1.5$ à la position $(3,5)$ et l'alignement"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 56,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 60
+ },
+ "id": "joHNwJ9AIf6F",
+ "outputId": "a9206810-a083-4d86-8b14-38183f1dd80c"
+ },
+ "outputs": [
+ {
+ "ename": "NameError",
+ "evalue": "name 'HTML' is not defined",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)",
+ "Cell \u001b[1;32mIn[56], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m HTML(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n",
+ "\u001b[1;31mNameError\u001b[0m: name 'HTML' is not defined"
+ ]
+ }
+ ],
+ "source": [
+ "HTML(\"\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "JJlU5yvZI43D"
+ },
+ "source": [
+ "* $SW(AY678264^*, OQ870305^*)$ retourne un score de $342.1$ à la position $(708,717)$ et l'alignement"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 80
},
+ "id": "HUELvWKMFtIO",
+ "outputId": "976bab6f-f1fc-4c5a-c69c-8de02fc838d0"
+ },
+ "outputs": [
{
- "cell_type": "markdown",
- "source": [
- "* $SW(AY678264^*, OQ870305^*)$ retourne un score de $342.1$ à la position $(708,717)$ et l'alignement"
- ],
- "metadata": {
- "id": "JJlU5yvZI43D"
- }
- },
- {
- "cell_type": "code",
- "source": [
- "from IPython.display import HTML\n",
- "HTML(\"| x: | ATGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGC-A-CATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAG---GGCGAGGGCGAGGGC--CGCC-CCTACGAGGGCACCCAGACCGC-CAAGCTGAAGGTG-ACCA-AGG---G-TGGCC---CCCT-GCCCTTCGCCT-GGGA-CATCCTGTCC--C--C-T-CAGTTCATGT-A-CGGCT-CCAAGGCCTACGTG-A--AGCAC--C--C--C--G-CCGACATCCCCG-A--CTAC-T--TGAAGCTG-TCCTTC--C--C-----CGA-GG--GCTTCAAGTGGGAGCG-CGTGATGAACTTCGAGGACGGCGGCGTGGTG-ACCG--T-GA-C-CCAGGAC-TC--CTCCCTGCAGGACGGCGAGTTCATCTACAAGGTG---AAGCTGCGCGGCACCAACTTCCCCT-CCGACGGCCCCGTA-ATGCA-GAAGAAGACCATGGGCTG--GGA-GGCCTCCTCCGAGCGGATGTACCCCGAGGA-CGGCGCC-CTGAAGGGCGAGATCAAGCAGA-GGCTGAAGC-TGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACA-AGGCCAAGAAG-CCCGTGCAGCTGCCCGGC-GCCTACAACGTCAACATCAAGT-TG----GA-CATCACCTCCCACAACGAGGA-CTAC-A-C-CA---T-C-G-TGGAACAGTACG-AACGCGCCGAGGGCCGCCACTCCAC-CGGCGGCATGGACGAGCTGTACAAG |
|---|
| y: | ATGGTGAGCAAGGGCGAGGA-G----C-T-G--TTCA-C-CGG-GGTGGTGCCCATCCTGGT-CGAGC-TGGACGGCGACGTAAACGGCCACAAGTTC-AG--CGTGTCCGGCGAGGGCGAGGGCGATGCCACCTAC---GGCAAGCTGACC-CTGAAG-TTCATTTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCC-AC-CCTCGTGACCACCCTGACCTACGGCGTGCAGTGC-T-TCAGCCGCTACCCCGACC-ACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGC-GCACCATCTTCTTCAAGGACGACGGCAACTACAAGA-CCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGC-A--ACATC--C-TGGGGCACAAGCTG-G-AGTA-CAACTACAACAGCC-ACAACGTC-TATAT-CATG--GCCGA-CAA--GCAGAAGAACGG-CA--T-C-A-AGG-TGAACTTC-AAGATC--CGCCAC--AA---C---ATCGAG--GACGGC---AGCGTGCAGCTCGCCGACCACTACCA-GC--A-G--AACACC-CC--CATCGGCGACG--GCCCCGTGCTGCTGCCCGACAACC-ACTACCTGAGCACCCAGTCCGCCCTGAGCAA-A-GACCC-CAACGAGAAGC-GCGATCACATGGTCCTGCTGG---AGTTCGTGAC-CGCC----GCCGGGA-T-CACTC-TCGGCATGGACGAGCTGTACAAG |
|---|
\")"
+ "data": {
+ "text/html": [
+ "| x: | ATGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGC-A-CATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAG---GGCGAGGGCGAGGGC--CGCC-CCTACGAGGGCACCCAGACCGC-CAAGCTGAAGGTG-ACCA-AGG---G-TGGCC---CCCT-GCCCTTCGCCT-GGGA-CATCCTGTCC--C--C-T-CAGTTCATGT-A-CGGCT-CCAAGGCCTACGTG-A--AGCAC--C--C--C--G-CCGACATCCCCG-A--CTAC-T--TGAAGCTG-TCCTTC--C--C-----CGA-GG--GCTTCAAGTGGGAGCG-CGTGATGAACTTCGAGGACGGCGGCGTGGTG-ACCG--T-GA-C-CCAGGAC-TC--CTCCCTGCAGGACGGCGAGTTCATCTACAAGGTG---AAGCTGCGCGGCACCAACTTCCCCT-CCGACGGCCCCGTA-ATGCA-GAAGAAGACCATGGGCTG--GGA-GGCCTCCTCCGAGCGGATGTACCCCGAGGA-CGGCGCC-CTGAAGGGCGAGATCAAGCAGA-GGCTGAAGC-TGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACA-AGGCCAAGAAG-CCCGTGCAGCTGCCCGGC-GCCTACAACGTCAACATCAAGT-TG----GA-CATCACCTCCCACAACGAGGA-CTAC-A-C-CA---T-C-G-TGGAACAGTACG-AACGCGCCGAGGGCCGCCACTCCAC-CGGCGGCATGGACGAGCTGTACAAG |
|---|
| y: | ATGGTGAGCAAGGGCGAGGA-G----C-T-G--TTCA-C-CGG-GGTGGTGCCCATCCTGGT-CGAGC-TGGACGGCGACGTAAACGGCCACAAGTTC-AG--CGTGTCCGGCGAGGGCGAGGGCGATGCCACCTAC---GGCAAGCTGACC-CTGAAG-TTCATTTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCC-AC-CCTCGTGACCACCCTGACCTACGGCGTGCAGTGC-T-TCAGCCGCTACCCCGACC-ACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGC-GCACCATCTTCTTCAAGGACGACGGCAACTACAAGA-CCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGC-A--ACATC--C-TGGGGCACAAGCTG-G-AGTA-CAACTACAACAGCC-ACAACGTC-TATAT-CATG--GCCGA-CAA--GCAGAAGAACGG-CA--T-C-A-AGG-TGAACTTC-AAGATC--CGCCAC--AA---C---ATCGAG--GACGGC---AGCGTGCAGCTCGCCGACCACTACCA-GC--A-G--AACACC-CC--CATCGGCGACG--GCCCCGTGCTGCTGCCCGACAACC-ACTACCTGAGCACCCAGTCCGCCCTGAGCAA-A-GACCC-CAACGAGAAGC-GCGATCACATGGTCCTGCTGG---AGTTCGTGAC-CGCC----GCCGGGA-T-CACTC-TCGGCATGGACGAGCTGTACAAG |
|---|
"
],
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/",
- "height": 80
- },
- "id": "HUELvWKMFtIO",
- "outputId": "976bab6f-f1fc-4c5a-c69c-8de02fc838d0"
- },
- "execution_count": null,
- "outputs": [
- {
- "output_type": "execute_result",
- "data": {
- "text/plain": [
- ""
- ],
- "text/html": [
- "| x: | ATGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGC-A-CATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAG---GGCGAGGGCGAGGGC--CGCC-CCTACGAGGGCACCCAGACCGC-CAAGCTGAAGGTG-ACCA-AGG---G-TGGCC---CCCT-GCCCTTCGCCT-GGGA-CATCCTGTCC--C--C-T-CAGTTCATGT-A-CGGCT-CCAAGGCCTACGTG-A--AGCAC--C--C--C--G-CCGACATCCCCG-A--CTAC-T--TGAAGCTG-TCCTTC--C--C-----CGA-GG--GCTTCAAGTGGGAGCG-CGTGATGAACTTCGAGGACGGCGGCGTGGTG-ACCG--T-GA-C-CCAGGAC-TC--CTCCCTGCAGGACGGCGAGTTCATCTACAAGGTG---AAGCTGCGCGGCACCAACTTCCCCT-CCGACGGCCCCGTA-ATGCA-GAAGAAGACCATGGGCTG--GGA-GGCCTCCTCCGAGCGGATGTACCCCGAGGA-CGGCGCC-CTGAAGGGCGAGATCAAGCAGA-GGCTGAAGC-TGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACA-AGGCCAAGAAG-CCCGTGCAGCTGCCCGGC-GCCTACAACGTCAACATCAAGT-TG----GA-CATCACCTCCCACAACGAGGA-CTAC-A-C-CA---T-C-G-TGGAACAGTACG-AACGCGCCGAGGGCCGCCACTCCAC-CGGCGGCATGGACGAGCTGTACAAG |
|---|
| y: | ATGGTGAGCAAGGGCGAGGA-G----C-T-G--TTCA-C-CGG-GGTGGTGCCCATCCTGGT-CGAGC-TGGACGGCGACGTAAACGGCCACAAGTTC-AG--CGTGTCCGGCGAGGGCGAGGGCGATGCCACCTAC---GGCAAGCTGACC-CTGAAG-TTCATTTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCC-AC-CCTCGTGACCACCCTGACCTACGGCGTGCAGTGC-T-TCAGCCGCTACCCCGACC-ACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGC-GCACCATCTTCTTCAAGGACGACGGCAACTACAAGA-CCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGC-A--ACATC--C-TGGGGCACAAGCTG-G-AGTA-CAACTACAACAGCC-ACAACGTC-TATAT-CATG--GCCGA-CAA--GCAGAAGAACGG-CA--T-C-A-AGG-TGAACTTC-AAGATC--CGCCAC--AA---C---ATCGAG--GACGGC---AGCGTGCAGCTCGCCGACCACTACCA-GC--A-G--AACACC-CC--CATCGGCGACG--GCCCCGTGCTGCTGCCCGACAACC-ACTACCTGAGCACCCAGTCCGCCCTGAGCAA-A-GACCC-CAACGAGAAGC-GCGATCACATGGTCCTGCTGG---AGTTCGTGAC-CGCC----GCCGGGA-T-CACTC-TCGGCATGGACGAGCTGTACAAG |
|---|
"
- ]
- },
- "metadata": {},
- "execution_count": 15
- }
+ "text/plain": [
+ ""
]
- },
- {
- "cell_type": "markdown",
- "source": [
- "# Exercice 3 : Distribution des scores d’alignement pour des séquences aléatoires\n",
- "\n",
- "Pour tester si un alignement reflète une réelle similarité biologique, on va évaluer la distribution des scores d’alignement pour des paires de séquences aléatoires."
- ],
- "metadata": {
- "id": "Q5jVeLfgMMtA"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "Q1. En considérant deux séquences aléatoires de même taille N, où chaque nucléotide apparaît avec une probabilité uniforme de ¼, calculer le score moyen attendu pour une superposition sans trou dans le cas où une identité vaut +1 et une différence vaut 0."
- ],
- "metadata": {
- "id": "6xyXw0HsMQGf"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "```markdown\n",
- "Votre réponse ici\n",
- "```"
- ],
- "metadata": {
- "id": "meF18gt-Mhcn"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "Q2. La question précédente peut se resoudre analytiquement car on ne considère pas de trou. Pour étendre le résultat precedent à un alignement avec trous, on va se baser sur la simulation de séquences aleatoires.\n",
- "\n",
- "Générez $R$ paires de séquences aléatoires de tailles $N$ avec des probabilitées uniformes d'apparition de nucléotides $p_A = p_T = p_G = p_C = $ ¼. Affichez sous forme de violinplots les distribution des scores d'alignements entre chaque paire, obtenu par :\n",
- " 1. un alignement sans trou (cf. Q1) ;\n",
- " 2. un alignement local via Smith-Waterman (utilisez le code de l'exercice précédent)\n",
- "\n",
- "Utilisez le schéma d'évaluation suivant :"
- ],
- "metadata": {
- "id": "fP5_mHnYMkNI"
- }
- },
- {
- "cell_type": "code",
- "source": [
- "rmap = {\"A\": 0, \"T\": 1, \"G\": 2, \"C\": 3}\n",
- "sigma = np.array([[1, -0.5, -0.5, -0.5],\n",
- " [-0.5, 1, -0.5, -0.5],\n",
- " [-0.5, -0.5, 1, -0.5],\n",
- " [-0.5, -0.5, -0.5, 1]])\n",
- "go =0\n",
- "ge = 0.5"
- ],
- "metadata": {
- "id": "akUVqotnOLkH"
- },
- "execution_count": null,
- "outputs": []
- },
- {
- "cell_type": "code",
- "source": [
- "#Votre code ici"
- ],
- "metadata": {
- "id": "UX0afNaqOVZ2"
- },
- "execution_count": null,
- "outputs": []
- },
- {
- "cell_type": "markdown",
- "source": [
- "Q3. Qu'observez-vous ?"
- ],
- "metadata": {
- "id": "UNn9fUuXO4Le"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "```markdown\n",
- "Votre réponse ici\n",
- "```"
- ],
- "metadata": {
- "id": "dSQEl0XXO8IG"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "Q4. Quelle conclusion peut-on en tirer sur la significativité d'un alignement ?"
- ],
- "metadata": {
- "id": "xHfVXpQhf15n"
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "```markdown\n",
- "Votre réponse ici\n",
- "```"
- ],
- "metadata": {
- "id": "5KjhEeHDgDns"
- }
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
}
- ]
-}
\ No newline at end of file
+ ],
+ "source": [
+ "from IPython.display import HTML\n",
+ "HTML(\"| x: | ATGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGC-A-CATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAG---GGCGAGGGCGAGGGC--CGCC-CCTACGAGGGCACCCAGACCGC-CAAGCTGAAGGTG-ACCA-AGG---G-TGGCC---CCCT-GCCCTTCGCCT-GGGA-CATCCTGTCC--C--C-T-CAGTTCATGT-A-CGGCT-CCAAGGCCTACGTG-A--AGCAC--C--C--C--G-CCGACATCCCCG-A--CTAC-T--TGAAGCTG-TCCTTC--C--C-----CGA-GG--GCTTCAAGTGGGAGCG-CGTGATGAACTTCGAGGACGGCGGCGTGGTG-ACCG--T-GA-C-CCAGGAC-TC--CTCCCTGCAGGACGGCGAGTTCATCTACAAGGTG---AAGCTGCGCGGCACCAACTTCCCCT-CCGACGGCCCCGTA-ATGCA-GAAGAAGACCATGGGCTG--GGA-GGCCTCCTCCGAGCGGATGTACCCCGAGGA-CGGCGCC-CTGAAGGGCGAGATCAAGCAGA-GGCTGAAGC-TGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACA-AGGCCAAGAAG-CCCGTGCAGCTGCCCGGC-GCCTACAACGTCAACATCAAGT-TG----GA-CATCACCTCCCACAACGAGGA-CTAC-A-C-CA---T-C-G-TGGAACAGTACG-AACGCGCCGAGGGCCGCCACTCCAC-CGGCGGCATGGACGAGCTGTACAAG |
|---|
| y: | ATGGTGAGCAAGGGCGAGGA-G----C-T-G--TTCA-C-CGG-GGTGGTGCCCATCCTGGT-CGAGC-TGGACGGCGACGTAAACGGCCACAAGTTC-AG--CGTGTCCGGCGAGGGCGAGGGCGATGCCACCTAC---GGCAAGCTGACC-CTGAAG-TTCATTTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCC-AC-CCTCGTGACCACCCTGACCTACGGCGTGCAGTGC-T-TCAGCCGCTACCCCGACC-ACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGC-GCACCATCTTCTTCAAGGACGACGGCAACTACAAGA-CCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGC-A--ACATC--C-TGGGGCACAAGCTG-G-AGTA-CAACTACAACAGCC-ACAACGTC-TATAT-CATG--GCCGA-CAA--GCAGAAGAACGG-CA--T-C-A-AGG-TGAACTTC-AAGATC--CGCCAC--AA---C---ATCGAG--GACGGC---AGCGTGCAGCTCGCCGACCACTACCA-GC--A-G--AACACC-CC--CATCGGCGACG--GCCCCGTGCTGCTGCCCGACAACC-ACTACCTGAGCACCCAGTCCGCCCTGAGCAA-A-GACCC-CAACGAGAAGC-GCGATCACATGGTCCTGCTGG---AGTTCGTGAC-CGCC----GCCGGGA-T-CACTC-TCGGCATGGACGAGCTGTACAAG |
|---|
\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Q5jVeLfgMMtA"
+ },
+ "source": [
+ "# Exercice 3 : Distribution des scores d’alignement pour des séquences aléatoires\n",
+ "\n",
+ "Pour tester si un alignement reflète une réelle similarité biologique, on va évaluer la distribution des scores d’alignement pour des paires de séquences aléatoires."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "6xyXw0HsMQGf"
+ },
+ "source": [
+ "Q1. En considérant deux séquences aléatoires de même taille N, où chaque nucléotide apparaît avec une probabilité uniforme de ¼, calculer le score moyen attendu pour une superposition sans trou dans le cas où une identité vaut +1 et une différence vaut 0."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "meF18gt-Mhcn"
+ },
+ "source": [
+ "```markdown\n",
+ "Votre réponse ici\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "fP5_mHnYMkNI"
+ },
+ "source": [
+ "Q2. La question précédente peut se resoudre analytiquement car on ne considère pas de trou. Pour étendre le résultat precedent à un alignement avec trous, on va se baser sur la simulation de séquences aleatoires.\n",
+ "\n",
+ "Générez $R$ paires de séquences aléatoires de tailles $N$ avec des probabilitées uniformes d'apparition de nucléotides $p_A = p_T = p_G = p_C = $ ¼. Affichez sous forme de violinplots les distribution des scores d'alignements entre chaque paire, obtenu par :\n",
+ " 1. un alignement sans trou (cf. Q1) ;\n",
+ " 2. un alignement local via Smith-Waterman (utilisez le code de l'exercice précédent)\n",
+ "\n",
+ "Utilisez le schéma d'évaluation suivant :"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "akUVqotnOLkH"
+ },
+ "outputs": [],
+ "source": [
+ "rmap = {\"A\": 0, \"T\": 1, \"G\": 2, \"C\": 3}\n",
+ "sigma = np.array([[1, -0.5, -0.5, -0.5],\n",
+ " [-0.5, 1, -0.5, -0.5],\n",
+ " [-0.5, -0.5, 1, -0.5],\n",
+ " [-0.5, -0.5, -0.5, 1]])\n",
+ "go =0\n",
+ "ge = 0.5"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "UX0afNaqOVZ2"
+ },
+ "outputs": [],
+ "source": [
+ "#Votre code ici"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "UNn9fUuXO4Le"
+ },
+ "source": [
+ "Q3. Qu'observez-vous ?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "dSQEl0XXO8IG"
+ },
+ "source": [
+ "```markdown\n",
+ "Votre réponse ici\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "xHfVXpQhf15n"
+ },
+ "source": [
+ "Q4. Quelle conclusion peut-on en tirer sur la significativité d'un alignement ?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "5KjhEeHDgDns"
+ },
+ "source": [
+ "```markdown\n",
+ "Votre réponse ici\n",
+ "```"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "authorship_tag": "ABX9TyNSXnqaXAUgZK9rmJ1TWbGo",
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.5"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}