diff --git a/BINF2025_TP3.ipynb b/BINF2025_TP3.ipynb
index 61e87c2..f9667b7 100644
--- a/BINF2025_TP3.ipynb
+++ b/BINF2025_TP3.ipynb
@@ -1,49 +1,37 @@
{
- "nbformat": 4,
- "nbformat_minor": 0,
- "metadata": {
- "colab": {
- "provenance": [],
- "authorship_tag": "ABX9TyNSXnqaXAUgZK9rmJ1TWbGo"
- },
- "kernelspec": {
- "name": "python3",
- "display_name": "Python 3"
- },
- "language_info": {
- "name": "python"
- }
- },
"cells": [
{
"cell_type": "markdown",
- "source": [
- "# BINF TP3 - Algorithmes d'alignement par paire"
- ],
"metadata": {
"id": "V09wQ1WIOmgn"
- }
+ },
+ "source": [
+ "# BINF TP3 - Algorithmes d'alignement par paire"
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "Dans ce TP nous allons manipuler les algorithmes d'alignement par paire."
- ],
"metadata": {
"id": "er6CtAyOxC6F"
- }
+ },
+ "source": [
+ "Dans ce TP nous allons manipuler les algorithmes d'alignement par paire."
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "# Exercice 0 - Echauffement"
- ],
"metadata": {
"id": "BqEa3BJ1xICM"
- }
+ },
+ "source": [
+ "# Exercice 0 - Echauffement"
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "qqiiq5bcxYvM"
+ },
"source": [
"Q1. Donnez le score de la superposition :\n",
"\n",
@@ -65,44 +53,72 @@
"et\n",
"\n",
"$\\gamma(g) = 0.5 |g| + 0.5$"
- ],
- "metadata": {
- "id": "qqiiq5bcxYvM"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "kCJGGGYQ2GNi"
+ },
"source": [
"```markdown\n",
- "Votre réponse ici\n",
+ "x = ATGTCATGA---TAC\n",
+ "y = AT--CTAAATGTTAC\n",
+ "\n",
+ "score de superposition = AA + TT - gamma(2) + CC + AT + TA + GA + AA - gamma(3) + TT + AA + CC\n",
+ "score de superposition = 1+1-1.5+1-1-1-1+1-2+1+1+1\n",
+ "score de superposition = 0.5\n",
"```"
- ],
- "metadata": {
- "id": "kCJGGGYQ2GNi"
- }
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "Q2. Alignez les séquences suivantes avec l'algorithme de Levenshtein : x = ATG et y = ACTG."
- ],
"metadata": {
"id": "XyhXAhK-2NKJ"
- }
+ },
+ "source": [
+ "Q2. Alignez les séquences suivantes avec l'algorithme de Levenshtein : x = ATG et y = ACTG."
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "b9iovhyZ2bXw"
+ },
"source": [
+ "| | Ø | A | C | T | G |\n",
+ "| :---: | :---: | :---: | :---: | :---: | :---: |\n",
+ "| **Ø** | 0 | 1 | 2 | 3 | 4 |\n",
+ "| **A** | 1 | 0 | 1 | 2 | 3 |\n",
+ "| **T** | 2 | 1 | 1 | 1 | 2 |\n",
+ "| **G** | 3 | 2 | 2 | 2 | 1 |\n",
+ "\n",
"```markdown\n",
- "Votre réponse ici\n",
+ "D(1,1) = min(D(0,1)+1, D(1,0)+1, D(0,0)+0) = min(2, 2, 0) = 0 \n",
+ "D(1,2) = min(D(0,2)+1, D(1,1)+1, D(0,1)+1) = min(3, 1, 2) = 1\n",
+ "D(1,3) = min(D(0,3)+1, D(1,2)+1, D(0,2)+1) = min(4, 2, 3) = 2\n",
+ "D(1,4) = min(D(0,4)+1, D(1,3)+1, D(0,3)+1) = min(5, 3, 4) = 3\n",
+ "\n",
+ "D(2,1) = min(D(1,1)+1, D(2,0)+1, D(1,0)+1) = min(1, 3, 2) = 1\n",
+ "D(2,2) = min(D(1,2)+1, D(2,1)+1, D(1,1)+0) = min(2, 2, 1) = 1\n",
+ "D(2,3) = min(D(1,3)+1, D(2,2)+1, D(1,2)+1) = min(3, 2, 1) = 1\n",
+ "D(2,4) = min(D(1,4)+1, D(2,3)+1, D(1,3)+1) = min(4, 2, 3) = 2\n",
+ "\n",
+ "D(3,1) = min(D(2,1)+1, D(3,0)+1, D(2,0)+1) = min(2, 4, 3) = 2\n",
+ "D(3,2) = min(D(2,2)+1, D(3,1)+1, D(2,1)+1) = min(2, 3, 2) = 2\n",
+ "D(3,3) = min(D(2,3)+1, D(3,2)+1, D(2,2)+0) = min(2, 3, 2) = 2\n",
+ "D(3,4) = min(D(2,4)+1, D(3,3)+1, D(2,3)+1) = min(3, 3, 1) = 1\n",
+ "\n",
+ "La distance de Levenshtein correspond à D(3,4) = 1. \n",
+ "On remonte la matrice et on détermine qu'il faut faire une insertion de C de y dans x après le A.\n",
"```"
- ],
- "metadata": {
- "id": "b9iovhyZ2bXw"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "OV_YaQHr2elB"
+ },
"source": [
"Q3.\tAlignez les séquences suivantes avec l'algorithme de Needleman-Wunsch global x = TAT et y = ATGAC en considérant le schéma d'évaluation suivant\n",
"\n",
@@ -116,77 +132,107 @@
"et\n",
"\n",
"$\\gamma(g) = 0.5 |g|$\n"
- ],
- "metadata": {
- "id": "OV_YaQHr2elB"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "g_MrecVs3Nrw"
+ },
"source": [
+ "| | Ø | A | C | T | G | G |\n",
+ "| :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n",
+ "| **Ø** | 0 | -0.5 | -1 | -1.5 | -2 | -2.5 |\n",
+ "| **A** | -0.5 | -0.5 | 0.5 | 0 | -0.5 | -1 |\n",
+ "| **T** | -1 | 0.5 | 0 | 0 | 1 | 0.5 |\n",
+ "| **G** | -1.5 | 0 | 1.5 | 1 | 0.5 | 0.5 |\n",
+ "\n",
"```markdown\n",
- "Votre réponse ici\n",
+ "x : _T_AT\n",
+ "y : ATGAC\n",
"```"
- ],
- "metadata": {
- "id": "g_MrecVs3Nrw"
- }
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "Q4. Alignez les séquences suivantes avec l'algorithme de Smith-Waterman x = TTGG y = ATGAC en utilisant le schéma d'évaluation de la question précédente.\n"
- ],
"metadata": {
"id": "y1YF-G6E3Qoo"
- }
+ },
+ "source": [
+ "Q4. Alignez les séquences suivantes avec l'algorithme de Smith-Waterman x = TTGG y = ATGAC en utilisant le schéma d'évaluation de la question précédente.\n"
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "LLMECocb3pgI"
+ },
"source": [
"```markdown\n",
"Votre réponse ici\n",
"```"
- ],
- "metadata": {
- "id": "LLMECocb3pgI"
- }
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "# Exercice 1 : Algorithme de Levenshtein - version récursive"
- ],
"metadata": {
"id": "46gw0avh3wGw"
- }
+ },
+ "source": [
+ "# Exercice 1 : Algorithme de Levenshtein - version récursive"
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "ZKc09Kyg4a6v"
+ },
"source": [
"Q1. Ecrivez une fonction\n",
"\n",
"levenshtein(x: str, y: str) -> int\n",
"\n",
"qui retourne la distance de Levenshtein entre les séquences x et y en utilisant la version récursive de l'algorithme."
- ],
- "metadata": {
- "id": "ZKc09Kyg4a6v"
- }
+ ]
},
{
"cell_type": "code",
- "source": [
- "#Votre code ici"
- ],
+ "execution_count": null,
"metadata": {
"id": "FJR69IEQ4aHv"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [],
+ "source": [
+ "def levenshtein(x, y):\n",
+ " \"\"\"\n",
+ " Algorithme de Levenshtein - version récursive\n",
+ "\n",
+ " Args:\n",
+ " x (str): séquence x \n",
+ " y (str): séquence y\n",
+ " Returns:\n",
+ " int : distance de levenshtein entre x et y \n",
+ " \"\"\"\n",
+ " if len(x) == 0:\n",
+ " return len(y)\n",
+ " if len(y) == 0:\n",
+ " return len(x)\n",
+ "\n",
+ " cost = 0 if x[0] == y[0] else 1\n",
+ " \n",
+ " # min(Suppression, Insertion, Substitution)\n",
+ " return min(\n",
+ " levenshtein(x[1:], y) + 1,\n",
+ " levenshtein(x, y[1:]) + 1,\n",
+ " levenshtein(x[1:], y[1:]) + cost\n",
+ " )"
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "arFVwA6E5NWn"
+ },
"source": [
"Q2. Vous pouvez tester votre code sur les exemples suivants:\n",
"\n",
@@ -196,80 +242,216 @@
"* $L(AY678264^*, OQ870305^*) = 310$\n",
"\n",
"$^*$ ids genbank de deux sequences."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "L('CCAG', 'CA') = 2, should be 2\n",
+ "L('CCGT', 'CGTCA') = 3, should be 3\n"
+ ]
+ },
+ {
+ "ename": "KeyboardInterrupt",
+ "evalue": "",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)",
+ "Cell \u001b[0;32mIn[4], line 12\u001b[0m\n\u001b[1;32m 9\u001b[0m OQ870305 \u001b[38;5;241m=\u001b[39m f\u001b[38;5;241m.\u001b[39mreadline()\n\u001b[1;32m 10\u001b[0m OQ870305 \u001b[38;5;241m=\u001b[39m f\u001b[38;5;241m.\u001b[39mread()\u001b[38;5;241m.\u001b[39mreplace(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m---> 12\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mL(AY678264, OQ870305) = \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[43mlevenshtein\u001b[49m\u001b[43m(\u001b[49m\u001b[43mAY678264\u001b[49m\u001b[43m,\u001b[49m\u001b[38;5;250;43m \u001b[39;49m\u001b[43mOQ870305\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m, should be 310\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n",
+ "Cell \u001b[0;32mIn[2], line 20\u001b[0m, in \u001b[0;36mlevenshtein\u001b[0;34m(x, y)\u001b[0m\n\u001b[1;32m 16\u001b[0m cost \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m x[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;241m==\u001b[39m y[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;241m1\u001b[39m\n\u001b[1;32m 18\u001b[0m \u001b[38;5;66;03m# min(Suppression, Insertion, Substitution)\u001b[39;00m\n\u001b[1;32m 19\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mmin\u001b[39m(\n\u001b[0;32m---> 20\u001b[0m \u001b[43mlevenshtein\u001b[49m\u001b[43m(\u001b[49m\u001b[43mx\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[1;32m 21\u001b[0m levenshtein(x, y[\u001b[38;5;241m1\u001b[39m:]) \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[1;32m 22\u001b[0m levenshtein(x[\u001b[38;5;241m1\u001b[39m:], y[\u001b[38;5;241m1\u001b[39m:]) \u001b[38;5;241m+\u001b[39m cost\n\u001b[1;32m 23\u001b[0m )\n",
+ "Cell \u001b[0;32mIn[2], line 20\u001b[0m, in \u001b[0;36mlevenshtein\u001b[0;34m(x, y)\u001b[0m\n\u001b[1;32m 16\u001b[0m cost \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m x[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;241m==\u001b[39m y[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;241m1\u001b[39m\n\u001b[1;32m 18\u001b[0m \u001b[38;5;66;03m# min(Suppression, Insertion, Substitution)\u001b[39;00m\n\u001b[1;32m 19\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mmin\u001b[39m(\n\u001b[0;32m---> 20\u001b[0m \u001b[43mlevenshtein\u001b[49m\u001b[43m(\u001b[49m\u001b[43mx\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[1;32m 21\u001b[0m levenshtein(x, y[\u001b[38;5;241m1\u001b[39m:]) \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[1;32m 22\u001b[0m levenshtein(x[\u001b[38;5;241m1\u001b[39m:], y[\u001b[38;5;241m1\u001b[39m:]) \u001b[38;5;241m+\u001b[39m cost\n\u001b[1;32m 23\u001b[0m )\n",
+ " \u001b[0;31m[... skipping similar frames: levenshtein at line 20 (705 times)]\u001b[0m\n",
+ "Cell \u001b[0;32mIn[2], line 21\u001b[0m, in \u001b[0;36mlevenshtein\u001b[0;34m(x, y)\u001b[0m\n\u001b[1;32m 16\u001b[0m cost \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m x[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;241m==\u001b[39m y[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;241m1\u001b[39m\n\u001b[1;32m 18\u001b[0m \u001b[38;5;66;03m# min(Suppression, Insertion, Substitution)\u001b[39;00m\n\u001b[1;32m 19\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mmin\u001b[39m(\n\u001b[1;32m 20\u001b[0m levenshtein(x[\u001b[38;5;241m1\u001b[39m:], y) \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[0;32m---> 21\u001b[0m \u001b[43mlevenshtein\u001b[49m\u001b[43m(\u001b[49m\u001b[43mx\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[1;32m 22\u001b[0m levenshtein(x[\u001b[38;5;241m1\u001b[39m:], y[\u001b[38;5;241m1\u001b[39m:]) \u001b[38;5;241m+\u001b[39m cost\n\u001b[1;32m 23\u001b[0m )\n",
+ "Cell \u001b[0;32mIn[2], line 21\u001b[0m, in \u001b[0;36mlevenshtein\u001b[0;34m(x, y)\u001b[0m\n\u001b[1;32m 16\u001b[0m cost \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m x[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;241m==\u001b[39m y[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;241m1\u001b[39m\n\u001b[1;32m 18\u001b[0m \u001b[38;5;66;03m# min(Suppression, Insertion, Substitution)\u001b[39;00m\n\u001b[1;32m 19\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mmin\u001b[39m(\n\u001b[1;32m 20\u001b[0m levenshtein(x[\u001b[38;5;241m1\u001b[39m:], y) \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[0;32m---> 21\u001b[0m \u001b[43mlevenshtein\u001b[49m\u001b[43m(\u001b[49m\u001b[43mx\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[1;32m 22\u001b[0m levenshtein(x[\u001b[38;5;241m1\u001b[39m:], y[\u001b[38;5;241m1\u001b[39m:]) \u001b[38;5;241m+\u001b[39m cost\n\u001b[1;32m 23\u001b[0m )\n",
+ " \u001b[0;31m[... skipping similar frames: levenshtein at line 21 (64 times), levenshtein at line 20 (1 times)]\u001b[0m\n",
+ "Cell \u001b[0;32mIn[2], line 20\u001b[0m, in \u001b[0;36mlevenshtein\u001b[0;34m(x, y)\u001b[0m\n\u001b[1;32m 16\u001b[0m cost \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m x[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;241m==\u001b[39m y[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;241m1\u001b[39m\n\u001b[1;32m 18\u001b[0m \u001b[38;5;66;03m# min(Suppression, Insertion, Substitution)\u001b[39;00m\n\u001b[1;32m 19\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mmin\u001b[39m(\n\u001b[0;32m---> 20\u001b[0m \u001b[43mlevenshtein\u001b[49m\u001b[43m(\u001b[49m\u001b[43mx\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[1;32m 21\u001b[0m levenshtein(x, y[\u001b[38;5;241m1\u001b[39m:]) \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[1;32m 22\u001b[0m levenshtein(x[\u001b[38;5;241m1\u001b[39m:], y[\u001b[38;5;241m1\u001b[39m:]) \u001b[38;5;241m+\u001b[39m cost\n\u001b[1;32m 23\u001b[0m )\n",
+ " \u001b[0;31m[... skipping similar frames: levenshtein at line 21 (57 times)]\u001b[0m\n",
+ "Cell \u001b[0;32mIn[2], line 21\u001b[0m, in \u001b[0;36mlevenshtein\u001b[0;34m(x, y)\u001b[0m\n\u001b[1;32m 16\u001b[0m cost \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m x[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;241m==\u001b[39m y[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;241m1\u001b[39m\n\u001b[1;32m 18\u001b[0m \u001b[38;5;66;03m# min(Suppression, Insertion, Substitution)\u001b[39;00m\n\u001b[1;32m 19\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mmin\u001b[39m(\n\u001b[1;32m 20\u001b[0m levenshtein(x[\u001b[38;5;241m1\u001b[39m:], y) \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[0;32m---> 21\u001b[0m \u001b[43mlevenshtein\u001b[49m\u001b[43m(\u001b[49m\u001b[43mx\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[1;32m 22\u001b[0m levenshtein(x[\u001b[38;5;241m1\u001b[39m:], y[\u001b[38;5;241m1\u001b[39m:]) \u001b[38;5;241m+\u001b[39m cost\n\u001b[1;32m 23\u001b[0m )\n",
+ "Cell \u001b[0;32mIn[2], line 22\u001b[0m, in \u001b[0;36mlevenshtein\u001b[0;34m(x, y)\u001b[0m\n\u001b[1;32m 16\u001b[0m cost \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m x[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;241m==\u001b[39m y[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;241m1\u001b[39m\n\u001b[1;32m 18\u001b[0m \u001b[38;5;66;03m# min(Suppression, Insertion, Substitution)\u001b[39;00m\n\u001b[1;32m 19\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mmin\u001b[39m(\n\u001b[1;32m 20\u001b[0m levenshtein(x[\u001b[38;5;241m1\u001b[39m:], y) \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[1;32m 21\u001b[0m levenshtein(x, y[\u001b[38;5;241m1\u001b[39m:]) \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[0;32m---> 22\u001b[0m \u001b[43mlevenshtein\u001b[49m\u001b[43m(\u001b[49m\u001b[43mx\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;241m+\u001b[39m cost\n\u001b[1;32m 23\u001b[0m )\n",
+ "Cell \u001b[0;32mIn[2], line 21\u001b[0m, in \u001b[0;36mlevenshtein\u001b[0;34m(x, y)\u001b[0m\n\u001b[1;32m 16\u001b[0m cost \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m x[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;241m==\u001b[39m y[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;241m1\u001b[39m\n\u001b[1;32m 18\u001b[0m \u001b[38;5;66;03m# min(Suppression, Insertion, Substitution)\u001b[39;00m\n\u001b[1;32m 19\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mmin\u001b[39m(\n\u001b[1;32m 20\u001b[0m levenshtein(x[\u001b[38;5;241m1\u001b[39m:], y) \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[0;32m---> 21\u001b[0m \u001b[43mlevenshtein\u001b[49m\u001b[43m(\u001b[49m\u001b[43mx\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[1;32m 22\u001b[0m levenshtein(x[\u001b[38;5;241m1\u001b[39m:], y[\u001b[38;5;241m1\u001b[39m:]) \u001b[38;5;241m+\u001b[39m cost\n\u001b[1;32m 23\u001b[0m )\n",
+ "Cell \u001b[0;32mIn[2], line 21\u001b[0m, in \u001b[0;36mlevenshtein\u001b[0;34m(x, y)\u001b[0m\n\u001b[1;32m 16\u001b[0m cost \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m x[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;241m==\u001b[39m y[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;241m1\u001b[39m\n\u001b[1;32m 18\u001b[0m \u001b[38;5;66;03m# min(Suppression, Insertion, Substitution)\u001b[39;00m\n\u001b[1;32m 19\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mmin\u001b[39m(\n\u001b[1;32m 20\u001b[0m levenshtein(x[\u001b[38;5;241m1\u001b[39m:], y) \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[0;32m---> 21\u001b[0m \u001b[43mlevenshtein\u001b[49m\u001b[43m(\u001b[49m\u001b[43mx\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[1;32m 22\u001b[0m levenshtein(x[\u001b[38;5;241m1\u001b[39m:], y[\u001b[38;5;241m1\u001b[39m:]) \u001b[38;5;241m+\u001b[39m cost\n\u001b[1;32m 23\u001b[0m )\n",
+ " \u001b[0;31m[... skipping similar frames: levenshtein at line 21 (182 times)]\u001b[0m\n",
+ "Cell \u001b[0;32mIn[2], line 21\u001b[0m, in \u001b[0;36mlevenshtein\u001b[0;34m(x, y)\u001b[0m\n\u001b[1;32m 16\u001b[0m cost \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m x[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;241m==\u001b[39m y[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;241m1\u001b[39m\n\u001b[1;32m 18\u001b[0m \u001b[38;5;66;03m# min(Suppression, Insertion, Substitution)\u001b[39;00m\n\u001b[1;32m 19\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mmin\u001b[39m(\n\u001b[1;32m 20\u001b[0m levenshtein(x[\u001b[38;5;241m1\u001b[39m:], y) \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[0;32m---> 21\u001b[0m \u001b[43mlevenshtein\u001b[49m\u001b[43m(\u001b[49m\u001b[43mx\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m,\n\u001b[1;32m 22\u001b[0m levenshtein(x[\u001b[38;5;241m1\u001b[39m:], y[\u001b[38;5;241m1\u001b[39m:]) \u001b[38;5;241m+\u001b[39m cost\n\u001b[1;32m 23\u001b[0m )\n",
+ "Cell \u001b[0;32mIn[2], line 19\u001b[0m, in \u001b[0;36mlevenshtein\u001b[0;34m(x, y)\u001b[0m\n\u001b[1;32m 16\u001b[0m cost \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m x[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;241m==\u001b[39m y[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;241m1\u001b[39m\n\u001b[1;32m 18\u001b[0m \u001b[38;5;66;03m# min(Suppression, Insertion, Substitution)\u001b[39;00m\n\u001b[0;32m---> 19\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mmin\u001b[39;49m\u001b[43m(\u001b[49m\n\u001b[1;32m 20\u001b[0m \u001b[43m \u001b[49m\u001b[43mlevenshtein\u001b[49m\u001b[43m(\u001b[49m\u001b[43mx\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 21\u001b[0m \u001b[43m \u001b[49m\u001b[43mlevenshtein\u001b[49m\u001b[43m(\u001b[49m\u001b[43mx\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 22\u001b[0m \u001b[43m \u001b[49m\u001b[43mlevenshtein\u001b[49m\u001b[43m(\u001b[49m\u001b[43mx\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[43mcost\u001b[49m\n\u001b[1;32m 23\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
+ "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
+ ]
+ }
],
- "metadata": {
- "id": "arFVwA6E5NWn"
- }
+ "source": [
+ "print(f\"L('CCAG', 'CA') = {levenshtein('CCAG', 'CA')}, should be 2\")\n",
+ "print(f\"L('CCGT', 'CGTCA') = {levenshtein('CCGT', 'CGTCA')}, should be 3\")\n",
+ "\n",
+ "with open(\"data/AY678264.fna\", \"r\") as f:\n",
+ " AY678264 = f.readline()\n",
+ " AY678264 = f.read().replace(\"\\n\", \"\")\n",
+ "\n",
+ "with open(\"data/OQ870305.fna\", \"r\") as f:\n",
+ " OQ870305 = f.readline()\n",
+ " OQ870305 = f.read().replace(\"\\n\", \"\")\n",
+ "\n",
+ "# Prend trop de temps - inutilisable\n",
+ "print(f\"L(AY678264, OQ870305) = {levenshtein(AY678264, OQ870305)}, should be 310\")"
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "# Exercice 2 : Algorithme de Smith-Waterman - version itérative"
- ],
"metadata": {
"id": "erCpfG7O7BV-"
- }
+ },
+ "source": [
+ "# Exercice 2 : Algorithme de Smith-Waterman - version itérative"
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "rv2Y78y37IOd"
+ },
"source": [
"Q1. Ecrivez la fonction\n",
"\n",
"sw_fwd(x: str, y: str, cmap: dict, sigma: array, (go, ge): list) -> (array, array)\n",
"\n",
"qui construit les matrices $S$ et $B$ en utilisant l'algorithme de Smith-Waterman pour aligner les séquences x et y suivant le schéma d'évaluation donné par la matrice de substitution $\\Sigma$ et la fonction d'évaluation des trous $\\gamma(n)= g_o + g_e \\times n$. Le dictionnaire cmap donne la position des différents nucléotides dans la matrice $\\Sigma$. La fonction retourne la paire de matrices de score $S$ et de retour $B$."
- ],
- "metadata": {
- "id": "rv2Y78y37IOd"
- }
+ ]
},
{
"cell_type": "code",
- "source": [
- "#Votre code ici"
- ],
+ "execution_count": null,
"metadata": {
"id": "njn3JB0b-WHj"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "\n",
+ "def sw_fwd(x: str, y: str, cmap: dict, sigma: np.array, gap_penalties: tuple) :\n",
+ " # sourcery skip: use-itertools-product\n",
+ " \"\"\"\n",
+ " Étape aller de l'algorithme de Smith-Waterman.\n",
+ "\n",
+ " Args:\n",
+ " x (str): séquences x à aligner\n",
+ " y (str): séquences y à aligner\n",
+ " cmap (dict): dictionnaire qui mappe chaque nucléotide à son indice dans sigma\n",
+ " sigma (array): matrice de substitution\n",
+ " gap_penalties (tuple): tuple (go, ge) où go est le coût d'ouverture et ge le coût d'extension\n",
+ " \n",
+ " Returns:\n",
+ " S : matrice des scores (array)\n",
+ " B : matrice des pointeurs de backtracking (array d'objets)\n",
+ " \"\"\"\n",
+ " go, ge = gap_penalties\n",
+ " m, n = len(x), len(y)\n",
+ " \n",
+ " # Initialisation des matrices S et B.\n",
+ " S = np.zeros((m + 1, n + 1))\n",
+ " B = np.empty((m + 1, n + 1), dtype=object)\n",
+ " B.fill('')\n",
+ " \n",
+ "\n",
+ " for i in range(1, m + 1):\n",
+ " for j in range(1, n + 1):\n",
+ " # Score diagonal (match ou substitution)\n",
+ " diag = S[i - 1, j - 1] + sigma[cmap[x[i - 1]], cmap[y[j - 1]]]\n",
+ " \n",
+ " # Score pour une insertion (gap dans x)\n",
+ " left = float('-inf')\n",
+ " for l in range(1, j + 1):\n",
+ " score = S[i, j - l] - (go + ge * l)\n",
+ " if score > left:\n",
+ " left = score\n",
+ " \n",
+ " # Score pour une suppression (gap dans y)\n",
+ " up = float('-inf')\n",
+ " for k in range(1, i + 1):\n",
+ " score = S[i - k, j] - (go + ge * k)\n",
+ " if score > up:\n",
+ " up = score\n",
+ " \n",
+ " # Choix du meilleur score en incluant la possibilité d'un alignement local (score 0)\n",
+ " best = max(0, diag, left, up)\n",
+ " S[i, j] = best\n",
+ " \n",
+ " # Stockage du choix dans la matrice B\n",
+ " if best == 0:\n",
+ " B[i, j] = None\n",
+ " elif best == diag:\n",
+ " B[i, j] = 'DIAG'\n",
+ " elif best == left:\n",
+ " B[i, j] = 'LEFT'\n",
+ " elif best == up:\n",
+ " B[i, j] = 'UP'\n",
+ " \n",
+ " return S, B"
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "55n8mt9U-Wai"
+ },
"source": [
"Q2. Ecrivez la fonction\n",
"\n",
"sw_bwd(x: str, y: str, S: array, B: array) -> (str, str, float)\n",
"\n",
"qui effectue l'etape de retour de l'algorithme de Smith-Waterman etant donné les séquences $x$ et $y$ et les matrices de score $S$ et de retour $B$. La fonction retourne un tuple contenant les alignements des séquences x et y et le score de l'alignement."
- ],
- "metadata": {
- "id": "55n8mt9U-Wai"
- }
+ ]
},
{
"cell_type": "code",
- "source": [
- "#Votre code ici"
- ],
+ "execution_count": 10,
"metadata": {
"id": "ij9JDpBm_UZ7"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [],
+ "source": [
+ "def sw_bwd(x: str, y: str, S: np.array, B: np.array):\n",
+ " \"\"\"\n",
+ " Étape retour de l'algorithme de Smith-Waterman.\n",
+ "\n",
+ " Args:\n",
+ " x (str): séquence x d'origine\n",
+ " y (str): séquence y d'origine\n",
+ " S (np.array): matrice de score calculée par sw_fwd\n",
+ " B (np.array): matrice de pointeur traceback calculé par sw_fwd\n",
+ " \n",
+ " Returns:\n",
+ " aligned_x : alignement de la séquence x\n",
+ " aligned_y : alignement de la séquence y\n",
+ " max_score : score de l'alignement optimal\n",
+ " \"\"\"\n",
+ " #TODO\n",
+ " return (\"A\", \"B\", 0)"
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "Q3. Vous pouvez tester votre code en utilisant le schéma d'évaluation suivant :"
- ],
"metadata": {
"id": "kwmxg2dxAiwS"
- }
+ },
+ "source": [
+ "Q3. Vous pouvez tester votre code en utilisant le schéma d'évaluation suivant :"
+ ]
},
{
"cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "id": "JUtYRFTBAwwZ"
+ },
+ "outputs": [],
"source": [
+ "import numpy as np\n",
+ "\n",
"cmap = {\"A\": 0, \"T\": 1, \"G\": 2, \"C\": 3}\n",
"m = np.array([[1, -0.5, -0.5, -0.5],\n",
" [-0.5, 1, -0.5, -0.5],\n",
@@ -277,27 +459,20 @@
" [-0.5, -0.5, -0.5, 1]])\n",
"go = 0\n",
"ge = 0.5"
- ],
- "metadata": {
- "id": "JUtYRFTBAwwZ"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "* $SW('TCGC', 'CTTAG')$ retourne un score de $1.5$ à la position $(3,5)$ et l'alignement"
- ],
"metadata": {
"id": "eMGh4K5aIFxE"
- }
+ },
+ "source": [
+ "* $SW('TCGC', 'CTTAG')$ retourne un score de $1.5$ à la position $(3,5)$ et l'alignement"
+ ]
},
{
"cell_type": "code",
- "source": [
- "HTML(\"
\")"
- ],
+ "execution_count": 11,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
@@ -306,38 +481,40 @@
"id": "joHNwJ9AIf6F",
"outputId": "a9206810-a083-4d86-8b14-38183f1dd80c"
},
- "execution_count": null,
"outputs": [
{
- "output_type": "execute_result",
- "data": {
- "text/plain": [
- ""
- ],
- "text/html": [
- ""
- ]
- },
- "metadata": {},
- "execution_count": 18
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "SW('TCGC', 'CTTAG') = 0 should be 1.5, aligned_x is A, aligned_y is B.\n"
+ ]
}
+ ],
+ "source": [
+ "HTML(\"\")\n",
+ "x = \"TCGC\"\n",
+ "y = \"CTTAG\"\n",
+ "\n",
+ "# First step\n",
+ "S, B = sw_fwd(x, y, cmap, m, (go, ge))\n",
+ "\n",
+ "# Second step\n",
+ "aligned_x, aligned_y, score = sw_bwd(x, y, S, B)\n",
+ "print(f\"SW('TCGC', 'CTTAG') = {score} should be 1.5, aligned_x is {aligned_x}, aligned_y is {aligned_y}.\")"
]
},
{
"cell_type": "markdown",
- "source": [
- "* $SW(AY678264^*, OQ870305^*)$ retourne un score de $342.1$ à la position $(708,717)$ et l'alignement"
- ],
"metadata": {
"id": "JJlU5yvZI43D"
- }
+ },
+ "source": [
+ "* $SW(AY678264^*, OQ870305^*)$ retourne un score de $342.1$ à la position $(708,717)$ et l'alignement"
+ ]
},
{
"cell_type": "code",
- "source": [
- "from IPython.display import HTML\n",
- "HTML(\"| x: | ATGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGC-A-CATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAG---GGCGAGGGCGAGGGC--CGCC-CCTACGAGGGCACCCAGACCGC-CAAGCTGAAGGTG-ACCA-AGG---G-TGGCC---CCCT-GCCCTTCGCCT-GGGA-CATCCTGTCC--C--C-T-CAGTTCATGT-A-CGGCT-CCAAGGCCTACGTG-A--AGCAC--C--C--C--G-CCGACATCCCCG-A--CTAC-T--TGAAGCTG-TCCTTC--C--C-----CGA-GG--GCTTCAAGTGGGAGCG-CGTGATGAACTTCGAGGACGGCGGCGTGGTG-ACCG--T-GA-C-CCAGGAC-TC--CTCCCTGCAGGACGGCGAGTTCATCTACAAGGTG---AAGCTGCGCGGCACCAACTTCCCCT-CCGACGGCCCCGTA-ATGCA-GAAGAAGACCATGGGCTG--GGA-GGCCTCCTCCGAGCGGATGTACCCCGAGGA-CGGCGCC-CTGAAGGGCGAGATCAAGCAGA-GGCTGAAGC-TGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACA-AGGCCAAGAAG-CCCGTGCAGCTGCCCGGC-GCCTACAACGTCAACATCAAGT-TG----GA-CATCACCTCCCACAACGAGGA-CTAC-A-C-CA---T-C-G-TGGAACAGTACG-AACGCGCCGAGGGCCGCCACTCCAC-CGGCGGCATGGACGAGCTGTACAAG |
|---|
| y: | ATGGTGAGCAAGGGCGAGGA-G----C-T-G--TTCA-C-CGG-GGTGGTGCCCATCCTGGT-CGAGC-TGGACGGCGACGTAAACGGCCACAAGTTC-AG--CGTGTCCGGCGAGGGCGAGGGCGATGCCACCTAC---GGCAAGCTGACC-CTGAAG-TTCATTTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCC-AC-CCTCGTGACCACCCTGACCTACGGCGTGCAGTGC-T-TCAGCCGCTACCCCGACC-ACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGC-GCACCATCTTCTTCAAGGACGACGGCAACTACAAGA-CCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGC-A--ACATC--C-TGGGGCACAAGCTG-G-AGTA-CAACTACAACAGCC-ACAACGTC-TATAT-CATG--GCCGA-CAA--GCAGAAGAACGG-CA--T-C-A-AGG-TGAACTTC-AAGATC--CGCCAC--AA---C---ATCGAG--GACGGC---AGCGTGCAGCTCGCCGACCACTACCA-GC--A-G--AACACC-CC--CATCGGCGACG--GCCCCGTGCTGCTGCCCGACAACC-ACTACCTGAGCACCCAGTCCGCCCTGAGCAA-A-GACCC-CAACGAGAAGC-GCGATCACATGGTCCTGCTGG---AGTTCGTGAC-CGCC----GCCGGGA-T-CACTC-TCGGCATGGACGAGCTGTACAAG |
|---|
\")"
- ],
+ "execution_count": 12,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
@@ -346,56 +523,62 @@
"id": "HUELvWKMFtIO",
"outputId": "976bab6f-f1fc-4c5a-c69c-8de02fc838d0"
},
- "execution_count": null,
"outputs": [
{
- "output_type": "execute_result",
"data": {
- "text/plain": [
- ""
- ],
"text/html": [
"| x: | ATGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGC-A-CATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAG---GGCGAGGGCGAGGGC--CGCC-CCTACGAGGGCACCCAGACCGC-CAAGCTGAAGGTG-ACCA-AGG---G-TGGCC---CCCT-GCCCTTCGCCT-GGGA-CATCCTGTCC--C--C-T-CAGTTCATGT-A-CGGCT-CCAAGGCCTACGTG-A--AGCAC--C--C--C--G-CCGACATCCCCG-A--CTAC-T--TGAAGCTG-TCCTTC--C--C-----CGA-GG--GCTTCAAGTGGGAGCG-CGTGATGAACTTCGAGGACGGCGGCGTGGTG-ACCG--T-GA-C-CCAGGAC-TC--CTCCCTGCAGGACGGCGAGTTCATCTACAAGGTG---AAGCTGCGCGGCACCAACTTCCCCT-CCGACGGCCCCGTA-ATGCA-GAAGAAGACCATGGGCTG--GGA-GGCCTCCTCCGAGCGGATGTACCCCGAGGA-CGGCGCC-CTGAAGGGCGAGATCAAGCAGA-GGCTGAAGC-TGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACA-AGGCCAAGAAG-CCCGTGCAGCTGCCCGGC-GCCTACAACGTCAACATCAAGT-TG----GA-CATCACCTCCCACAACGAGGA-CTAC-A-C-CA---T-C-G-TGGAACAGTACG-AACGCGCCGAGGGCCGCCACTCCAC-CGGCGGCATGGACGAGCTGTACAAG |
|---|
| y: | ATGGTGAGCAAGGGCGAGGA-G----C-T-G--TTCA-C-CGG-GGTGGTGCCCATCCTGGT-CGAGC-TGGACGGCGACGTAAACGGCCACAAGTTC-AG--CGTGTCCGGCGAGGGCGAGGGCGATGCCACCTAC---GGCAAGCTGACC-CTGAAG-TTCATTTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCC-AC-CCTCGTGACCACCCTGACCTACGGCGTGCAGTGC-T-TCAGCCGCTACCCCGACC-ACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGC-GCACCATCTTCTTCAAGGACGACGGCAACTACAAGA-CCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGC-A--ACATC--C-TGGGGCACAAGCTG-G-AGTA-CAACTACAACAGCC-ACAACGTC-TATAT-CATG--GCCGA-CAA--GCAGAAGAACGG-CA--T-C-A-AGG-TGAACTTC-AAGATC--CGCCAC--AA---C---ATCGAG--GACGGC---AGCGTGCAGCTCGCCGACCACTACCA-GC--A-G--AACACC-CC--CATCGGCGACG--GCCCCGTGCTGCTGCCCGACAACC-ACTACCTGAGCACCCAGTCCGCCCTGAGCAA-A-GACCC-CAACGAGAAGC-GCGATCACATGGTCCTGCTGG---AGTTCGTGAC-CGCC----GCCGGGA-T-CACTC-TCGGCATGGACGAGCTGTACAAG |
|---|
"
+ ],
+ "text/plain": [
+ ""
]
},
+ "execution_count": 12,
"metadata": {},
- "execution_count": 15
+ "output_type": "execute_result"
}
+ ],
+ "source": [
+ "from IPython.display import HTML\n",
+ "HTML(\"| x: | ATGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGC-A-CATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAG---GGCGAGGGCGAGGGC--CGCC-CCTACGAGGGCACCCAGACCGC-CAAGCTGAAGGTG-ACCA-AGG---G-TGGCC---CCCT-GCCCTTCGCCT-GGGA-CATCCTGTCC--C--C-T-CAGTTCATGT-A-CGGCT-CCAAGGCCTACGTG-A--AGCAC--C--C--C--G-CCGACATCCCCG-A--CTAC-T--TGAAGCTG-TCCTTC--C--C-----CGA-GG--GCTTCAAGTGGGAGCG-CGTGATGAACTTCGAGGACGGCGGCGTGGTG-ACCG--T-GA-C-CCAGGAC-TC--CTCCCTGCAGGACGGCGAGTTCATCTACAAGGTG---AAGCTGCGCGGCACCAACTTCCCCT-CCGACGGCCCCGTA-ATGCA-GAAGAAGACCATGGGCTG--GGA-GGCCTCCTCCGAGCGGATGTACCCCGAGGA-CGGCGCC-CTGAAGGGCGAGATCAAGCAGA-GGCTGAAGC-TGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACA-AGGCCAAGAAG-CCCGTGCAGCTGCCCGGC-GCCTACAACGTCAACATCAAGT-TG----GA-CATCACCTCCCACAACGAGGA-CTAC-A-C-CA---T-C-G-TGGAACAGTACG-AACGCGCCGAGGGCCGCCACTCCAC-CGGCGGCATGGACGAGCTGTACAAG |
|---|
| y: | ATGGTGAGCAAGGGCGAGGA-G----C-T-G--TTCA-C-CGG-GGTGGTGCCCATCCTGGT-CGAGC-TGGACGGCGACGTAAACGGCCACAAGTTC-AG--CGTGTCCGGCGAGGGCGAGGGCGATGCCACCTAC---GGCAAGCTGACC-CTGAAG-TTCATTTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCC-AC-CCTCGTGACCACCCTGACCTACGGCGTGCAGTGC-T-TCAGCCGCTACCCCGACC-ACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGC-GCACCATCTTCTTCAAGGACGACGGCAACTACAAGA-CCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGC-A--ACATC--C-TGGGGCACAAGCTG-G-AGTA-CAACTACAACAGCC-ACAACGTC-TATAT-CATG--GCCGA-CAA--GCAGAAGAACGG-CA--T-C-A-AGG-TGAACTTC-AAGATC--CGCCAC--AA---C---ATCGAG--GACGGC---AGCGTGCAGCTCGCCGACCACTACCA-GC--A-G--AACACC-CC--CATCGGCGACG--GCCCCGTGCTGCTGCCCGACAACC-ACTACCTGAGCACCCAGTCCGCCCTGAGCAA-A-GACCC-CAACGAGAAGC-GCGATCACATGGTCCTGCTGG---AGTTCGTGAC-CGCC----GCCGGGA-T-CACTC-TCGGCATGGACGAGCTGTACAAG |
|---|
\")"
]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "Q5jVeLfgMMtA"
+ },
"source": [
"# Exercice 3 : Distribution des scores d’alignement pour des séquences aléatoires\n",
"\n",
"Pour tester si un alignement reflète une réelle similarité biologique, on va évaluer la distribution des scores d’alignement pour des paires de séquences aléatoires."
- ],
- "metadata": {
- "id": "Q5jVeLfgMMtA"
- }
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "Q1. En considérant deux séquences aléatoires de même taille N, où chaque nucléotide apparaît avec une probabilité uniforme de ¼, calculer le score moyen attendu pour une superposition sans trou dans le cas où une identité vaut +1 et une différence vaut 0."
- ],
"metadata": {
"id": "6xyXw0HsMQGf"
- }
+ },
+ "source": [
+ "Q1. En considérant deux séquences aléatoires de même taille N, où chaque nucléotide apparaît avec une probabilité uniforme de ¼, calculer le score moyen attendu pour une superposition sans trou dans le cas où une identité vaut +1 et une différence vaut 0."
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "meF18gt-Mhcn"
+ },
"source": [
"```markdown\n",
"Votre réponse ici\n",
"```"
- ],
- "metadata": {
- "id": "meF18gt-Mhcn"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "fP5_mHnYMkNI"
+ },
"source": [
"Q2. La question précédente peut se resoudre analytiquement car on ne considère pas de trou. Pour étendre le résultat precedent à un alignement avec trous, on va se baser sur la simulation de séquences aleatoires.\n",
"\n",
@@ -404,13 +587,15 @@
" 2. un alignement local via Smith-Waterman (utilisez le code de l'exercice précédent)\n",
"\n",
"Utilisez le schéma d'évaluation suivant :"
- ],
- "metadata": {
- "id": "fP5_mHnYMkNI"
- }
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "akUVqotnOLkH"
+ },
+ "outputs": [],
"source": [
"rmap = {\"A\": 0, \"T\": 1, \"G\": 2, \"C\": 3}\n",
"sigma = np.array([[1, -0.5, -0.5, -0.5],\n",
@@ -419,63 +604,82 @@
" [-0.5, -0.5, -0.5, 1]])\n",
"go =0\n",
"ge = 0.5"
- ],
- "metadata": {
- "id": "akUVqotnOLkH"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "code",
- "source": [
- "#Votre code ici"
- ],
+ "execution_count": null,
"metadata": {
"id": "UX0afNaqOVZ2"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [],
+ "source": [
+ "#Votre code ici"
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "Q3. Qu'observez-vous ?"
- ],
"metadata": {
"id": "UNn9fUuXO4Le"
- }
+ },
+ "source": [
+ "Q3. Qu'observez-vous ?"
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "dSQEl0XXO8IG"
+ },
"source": [
"```markdown\n",
"Votre réponse ici\n",
"```"
- ],
- "metadata": {
- "id": "dSQEl0XXO8IG"
- }
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "Q4. Quelle conclusion peut-on en tirer sur la significativité d'un alignement ?"
- ],
"metadata": {
"id": "xHfVXpQhf15n"
- }
+ },
+ "source": [
+ "Q4. Quelle conclusion peut-on en tirer sur la significativité d'un alignement ?"
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "5KjhEeHDgDns"
+ },
"source": [
"```markdown\n",
"Votre réponse ici\n",
"```"
- ],
- "metadata": {
- "id": "5KjhEeHDgDns"
- }
+ ]
}
- ]
-}
\ No newline at end of file
+ ],
+ "metadata": {
+ "colab": {
+ "authorship_tag": "ABX9TyNSXnqaXAUgZK9rmJ1TWbGo",
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}