iCCBDR: iMDLab Computational and Chemical Biology Data Repository


Please contribute to the repository: This page is widely used and disseminated. To increase exposure of your publications and promote your discoveries, please consider sharing your data here with the community. This includes cheminformatics, bioinformatics, genomics, and machine learning data.




Data curated/collected by iMDLab:

Name Description Dataset Reference
RNA-ligand binding dataset The pdf file contains the binding free energies, and dissociation constant values of 45 RNA-ligand complexes along with their original references. Download Reference
hERG Dataset The zipped sdf file contains structure-activity data of 639 compounds targeted to hERG. Download Reference
Caco-2 dataset The zipped sdf file contains largest collection of structure-activity data targeting Caco-2 (174 compounds) Download Reference
Drug like small molecules The zipped sdf file consists of structural information of 6516 drug like molecules along with various other properties like Molecular weight, LogP, LogS, Solubility, Refractivity and its SMILES representation. Download Reference
Approved small molecules The zipped sdf file consists of structural information of 1424 approved drugs along with various other properties like Molecular weight, LogP, LogS, Solubility, Refractivity and its SMILES representation. Download Reference
Withdrawn small molecules The zipped sdf file consists of structural information of 67 drug molecules (that are withdrawn from the market) along with various other properties like Molecular weight, LogP, LogS, Solubility, Refractivity and its SMILES representation. Download Reference
Small molecules from Ligand Depot The zipped sdf file consists of structural information of 12892 small molecules from Ligand Depot. Download Reference
MyriaScreen dataset The zipped sdf file contains 10000 premium molecular candidates hand-picked from Sigma-Aldrich and TimTec compound stocks. The selection of 10,000 high-purity and diverse molecules is the result of careful evaluation, multi-stage filtering, and refinement of two companies’ compound pools. Download Reference
Maybridge dataset The zipped mol2 file contains 14,400 premier compounds representing the drug-like diversity of the Maybridge Screening Collection. Download Reference
Veber Oral Availability 275 drugs and their human oral bioavalability values. Download Reference
Guha Artemisinin QSAR 179 Compounds with activity data External site Download Reference
Talele Cytochrome P450 Inhibition Diverse set of 13 azole antifungal compounds with Cytochrome P450-14alphaDM inhibition constants External site Download Reference
Burns Blood-Brain Barrier Dataset of 80 compounds with their "Blood-Brain Barrier" (BBB) values.
Download Reference
Fontaine Factor Xa Inhibitors 435 Factor Xa Inhibitors used for binary classification, but real-valued Ki values are also given in the file External site Download Reference
Jorissen Virtual Screening 50 CDK2 Inhibitors, 50 COX2 Inhibitors, 50 FXa Inhibitors, 50 PDE5 Inhibitors, 50 A1A Antagonists, Plus Decoy Structures External site Download Reference
Guha PDGFR Inhibitors 79 Compounds with activity data External site Download Reference
Silverman Benzodiazepine Inverse Agonists 37 beta-Carbolenes, Pyridodiindoles and CGS compounds binding to Benzodiazepine Inverse Agonist Site External site Download Reference
Sutherland 4 QSAR sets Inhibitors of ACE, GPB, THER, THR External site Download Reference
Karthikeyan Melting Point Melting Points for 4173 Training Set Molecules and 277 Test Set Compounds (Drug-Like) External site Download Reference
Bergstrom Melting Point Melting Point Data for 185 Training Set and 92 Test Set Compounds (Drug-Like) External site Download Reference
Stahl Virtual Screening 128 COX2 Inhibitors, 55 Estrogen Receptor Ligands, 43 Gelastinase A and General MMP Ligands, 17 Neuraminidase Inhibitors, 25 p38 MAP Kinase Inhibitors, 67 Thrombin Inhibitors External site Download Reference
Huuskonen Aqueous Solubility Aequeous Solubility Data for: Training Set (1033 Compounds), Test Set 1 (258 Compounds), Test Set 2 (21 Compounds) External site Download Reference
Delaney Aqueous Solubility Aequeous Solubility Data for 1144 low molecular weight compounds Download Reference
Bursi Mutagenicity 4337 Compounds with Mutagenicity (AMES) Classification External site Download Reference
Helma CPDB Mutagenicity 684 compounds with mutagenicity data - "cleaned" subset of CPDB External site Download Reference
Li Blood-Brain-Barrier Penetration 415 molecules with Binary Blood-Brain-Barrier Penetration Data (Penetrating/Non-Penetrating) with References External site Download Reference
Liu Blood-Brain-Barrier Penetration Blood-Brain-Barrier Penetration Data for Training Set (57 Compounds) and Test Set (13 Compounds) External site Download Reference
Timofei Epoxide-Enantioselectivity 28 Epoxides with associated enatioselectivity ratios External site Download Reference
Thummel Oral Availability Availability (Oral) % have been taken from the Table A-II-1 given in Appendix II of the mentioned reference. Users are requested to go through the comments associated with these values, which have been given as footnotes in the table. Only drugs that had Availability values indicated for them have been provided in the files here. Download Reference
Thummel Urinary Excretion Urinary Excretion (%) values have been taken from the Table A-II-1 given in Appendix II of the above mentioned reference. Users are requested to go through the comments associated with these values, which have been given as footnotes in the table. Only drugs that had Urinary Excretion values indicated for them have been provided in the files here. Download Reference
Thummel Percentage Plasma Binding 'Bound in Plasma' (%) values have been taken from the Table A-II-1 given in Appendix II of the above mentioned reference. Users are requested to go through the comments associated with these values, which have been given as footnotes in the table. Only drugs that had 'Bound in Plasma' values indicated for them have been provided in the files here. Download Reference
Thummel Clearance 'Clearance' values have been taken from the Table A-II-1 given in Appendix II of the above mentioned reference. These are in (ml/min/kg) units unless otherwise indicated. Users are requested to go through the comments associated with these values, which have been given as footnotes in the table. Only drugs that had 'Clearance' values indicated for them have been provided in the files here. Download Reference
Thummel Volume of Distribution 'Volume of Distribution' values have been taken from the Table A-II-1 given in Appendix II of the above mentioned reference. These are in liters/kg units unless otherwise indicated. Users are requested to go through the comments associated with these values, which have been given as footnotes in the table. Only drugs that had 'Volume of Distribution' values indicated for them have been provided in the files here. In all, 284 drugs and their 'Volume of Distribution' values are given in the files here. Download Reference
Thummel Half-life 'Half-Life' values have been taken from the Table A-II-1 given in Appendix II of the above mentioned reference. These are in hours unless otherwise indicated. Users are requested to go through the comments associated with these values, which have been given as footnotes in the table. Only drugs that had 'Half-Life' values indicated for them have been provided in the files here. In all, 304 drugs and their 'Half-Life' values are given in the files here. Download Reference
He QSAR Reliability Assessment 322 organic compounds, with fathead minnow acute toxicity as activity of interest Download Reference
Thummel Peak Concentration 'Peak-Concentrations' values have been taken from the Table A-II-1 given in Appendix II of the above mentioned reference. The units are indicated along with the values. Users are requested to go through the comments associated with these values, which have been given as footnotes in the table. In all, 304 drugs and their 'Peak-Concentration' values are given in the files here. Download Reference
Lombardo Volume of Distribution In all, 120 compounds, their VDss (volume-of-distribution in steady-state) Fu (fraction unbound in human plasma) values have been given. The same have been provided here in .sdf and .txt files. Download Reference
Porter VLA-4 integrin antagonists The dataset of 94 compounds targeting VLA-4 integrin. Download Reference Reference Reference Reference
Parrott Intestinal Absorption A total of 28 drugs and their 'Fraction absorbed (%)' is given in a table in the paper. Compound "Aciclovir" was retrieved as "Acyclovir" from ChemIDplus. Download Reference
Zhao Oral Absorption A total of 238 drugs and their %Absorption values taken from the table given in the paper. Download Reference
Klopman Intestinal Absorption A total of 50 drugs and their %HIA (Human Intestinal Absorption) values taken from the table given in the paper. "Flucloxacillin" was retrieved as "Floxacillin" from ChemIDplus. "Azimilide" had to be drawn as structure was not available for download from ChemIDplus (though it was depicted there). Download Reference
Kustrin Intestinal Absorption 86 drugs and their experimentally-derived Intestinal Absorption (%) values
Download Reference
Stenberg Intestinal Absorption 23 compounds with their Intestinal Absorption values provided here.
Download Reference
Irvine Membrane Permeability A total of 55 drugs and their human % absorption values
Download Reference
Sanghvi Intestinal Absorption 131 compounds and their 'Fraction Absorbed' values have been obtained from the table given in the paper. Some compounds had multiple values reported (as obtained from various references). These have been averaged and provided by us in the SDF and AMP files. Download Reference
Raevsky Human Drug Absorption Absorption Fraction (FA) values have been given for 32 compounds. These have been picked by the authors from Palm et. al. and Kansy et. al. averaging values for common compounds. Download Reference
Ghuloum Molecular Characterization 20 drugs and their Fraction Absorbed (%) values have been obtained from a table in the paper. Download Reference
Clark Intestinal Absorption 3 tables with %FA values are given in this paper. One is from Palm et. al. (20 compounds, available on this web-page). Other is from Wessel et. al. (86 compounds, available on this web-page). The third dataset is from Kansy et. al. and is made available here. Download Reference
Wessel Intestinal Absorption A total of 86 drugs and their %HIA (Human Intestinal Absorption) values.
Download Reference
Palm Intestinal Absorption A total of 20 drugs and their oral drug absorption in humans (FA) values taken from the table given in the paper. "Phenazone" was retrieved as "Antipyrine" from ChemIDplus. In paper, values were given as ± s.d. In AMP file only the value (and not its s.d.) taken as end-point for modelling. Download Reference
Dorronsoro Oral Absorption and Blood-Brain Barrier Permeability 28 compounds and their % Bioavailable values are given in a table in the paper. "L-Dopa" was retrieved as Levodopa and "alpha-methyldopa" was retrieved as "Methyldopa" from ChemIDplus. Download Reference
Kansy Passive Absorption 25 compounds and their % Absorption values (humans) are given in the paper. These have been provided here. Download Reference
Linnankoski Oral Absorption 23 compounds and their Experimental FA values are given in the paper. These have been provided here. Download Reference
Gunturi Intestinal Absorption This is a subset of Zhao et. al. dataset containing 174 compounds. Their %HIA values have been given in the paper. The same are provided here. "AAFC" was retrieved as "Flurocitabine", "HBED" was retrieved as "N,N'-Bis(2-hydroxybenzyl)ethylenediamine-N,N'-diacetic acid", "Amrinone" as "Inamrinone" from ChemIDplus. Download Reference
Balon Intestinal Absorption 21 compounds and their Intestinal absorption (%) values have been given in the paper. Same are provided here. Download Reference
Raevsky Intestinal Absorption 100 compounds and their FA (fraction absorbed) values have been given in the paper. Same are provided here. Download Reference
Varma Intestinal Absorption 136 compounds and their % HIA values have been given in the paper. 73 of these have been indicated as "Drugs Which Are Not Substrates to P-gp" while the remaining 63 compounds have been indicated as "P-gp Substrates". We have indicated this information in the files provided here. Download Reference
Recanatini QT prolongation 22 compounds and their pIC50 values have been provided here from the table given in the paper. Download Reference
Winiwarter Human Jejunal Permeability 22 compounds and their experimentally determined permeability values (log Peff) Download Reference
Ekins Potassium Channel Inhibition 99 compounds, their logIC50 values and cell types used have been given as the 'training set' in Table 1 in the paper. 35 compounds, their logIC50 values and cell types used have been given as the 'test set' in Table 2 in the paper. Download Reference
Thummel Peak Time 'Peak-Time' values have been taken from the Table A-II-1 given in Appendix II of the above mentioned reference. These are in hours unless otherwise indicated. Users are requested to go through the comments associated with these values, which have been given as footnotes in the table. Only drugs that had 'Peak-Time' values indicated for them have been provided in the files here. In all, 270 drugs and their 'Peak-Time' values are given in the files here. Download Reference
Patterson Neighbourhood Behaviour 20 QSAR datasets from David Patterson's Neighbourhood Behaviour Paper with nM activity data External site Download Reference
Sutherland SAR 405 Benzodiazepine Receptor Ligands / IC50, 467 Cox2 Inhibitors / IC50, 756 DHFR inhibitors (of P. carinii DHFR) with IC50, 616 nonredundant ER ligands from National Toxicology Program of the NIH with binding affinities relative to beta-estradiol, 393 ER ligands selected from literature (see reference for details) with binding affinities relative to beta-estradiol External site Download Reference
Bohm Serin Protease Inhibitors Inhibitors of Thrombin/ Trypsin/ Factor Xa with activity data, Training set (72 Compounds), Test Set (16 Compounds) External site Download Reference


Data contributed by the community:

Name Description Files Reference
LIT-PCBAAn unbiased dataset for machine learning and virtual screening. It is from 149 dose–response PubChem bioassays that were additionally processed to remove false positives and assay artifacts and keep active and inactive compounds within similar molecular property ranges. It consists of 15 targets and 7844 confirmed active and 407,381 confirmed inactive compounds.DownloadReference
DrVAEUsing a unified probabilistic approach, Drug Response Variational Autoencoder (Dr.VAE), simultaneously models both drug response in terms of viability and transcriptomic perturbations.DownloadReference
Prediction of Drug ResponseA computational framework for the parameterization of large-scale mechanistic models and its application to the prediction of drug response of cancer cell lines from exome and transcriptome sequencing dataDownloadReference
CDRscanA Deep Learning Model That Predicts Drug Effectiveness from Cancer Genomic SignatureDownloadReference
Blood Brain Barrier Dataset415 molecules with binary BBB penetration dataDownloadReference
Updated DUDE Diverse SetThe Updated DUD-E Diverse Subset-for binding affinity predictionDownloadReference
208 macrocycles A novel method for exploring macrocycle conformational space, Prime macrocycle conformational sampling (Prime-MCS), is introduced and evaluated in the context of other available algorithms (Molecular Dynamics, LowModeMD in MOE, and MacroModel Baseline Search). The algorithms were benchmarked on a data set of 208 macrocycles which was curated for diversity from the Cambridge Structural Database, the Protein Data Bank, and the Biologically Interesting Molecule Reference Dictionary. DownloadReference
tmQM DatasetQuantum Geometries and Properties of 86k Transition Metal ComplexesDownloadReference
Mycobacterium tuberculosis Gyrase Inhibitors30 hit compounds against Mycobacterium tuberculosis Gyrase with experimentally measured activitiesDownloadReference