{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install pandas\n", "!pip install pysmiles\n", "!pip install networkx" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import networkx as nx\n", "from pysmiles import read_smiles" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# TD / TP -- Discriminant graph mining\n", "\n", "The objective of this TD is to implement a DFS algorithm to mine discriminant molecule graphs. \n", "We will more specifically investigate the question of Blood Brain Barrier (BBB) traversal. The BBB prevents large molecules in the blood to penetrate the brain. \n", "Knowing whether a candidate drug is capable to go through the BBB is essential to design drugs again brain diseases. \n", "Then, there are many research effort to predict whether a molecule (candidate drug) can cross the BBB.\n", "\n", "In this TP, we will use a discriminant graph mining technique. \n", "\n", "The TP is organized in three parts :\n", "1. you will discover a bit more the data and the library that we propose to use to manipulate molecules\n", "2. you will implement a frequent graph mining algorithm. As we have already seen, the problem with graphs is that there are potentially a lot of redundancies: you will do your best to address this issue.\n", "3. you will implement an disciminant graph mining approach" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Discovery of the data and the library" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Molecule data as a graph" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#load dataset\n", "!wget https://github.com/theochem/B3DB/blob/main/B3DB/B3DB_classification.tsv" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | SMILES | \n", "BBB+/BBB- | \n", "
---|---|---|
0 | \n", "O=C(O)c1cc(N=Nc2ccc(S(=O)(=O)Nc3ccccn3)cc2)ccc1O | \n", "BBB- | \n", "
1 | \n", "COC1(NC(=O)C(C(=O)O)c2ccc(O)cc2)C(=O)N2C(C(=O)... | \n", "BBB- | \n", "
2 | \n", "Oc1c(I)cc(Cl)c2cccnc12 | \n", "BBB- | \n", "
3 | \n", "CCNC(=NCCSCc1ncccc1Br)NC#N | \n", "BBB- | \n", "
4 | \n", "CN1CC[C@]23c4c5ccc(OC6O[C@H](C(=O)O)[C@@H](O)[... | \n", "BBB- | \n", "
... | \n", "... | \n", "... | \n", "
7802 | \n", "c1ccc(CN(CC2=NCCN2)c2ccccc2)cc1 | \n", "BBB- | \n", "
7803 | \n", "CCOCCn1c(N2CCCN(C)CC2)nc2ccccc21 | \n", "BBB+ | \n", "
7804 | \n", "CN1CCC(=C2c3ccccc3CC(=O)c3sccc32)CC1 | \n", "BBB+ | \n", "
7805 | \n", "Cc1[nH]c(=O)c(C#N)cc1-c1ccncc1 | \n", "BBB- | \n", "
7806 | \n", "Nc1cc(-c2ccncc2)c[nH]c1=O | \n", "BBB- | \n", "
7807 rows × 2 columns
\n", "