Drug-Target Interactions

1 Introduction

1.1 Overview
1.2 Install Package
1.3 Load Package and Access Help

2 Working Environment

2.1 Required Files and Directories

3 Produce Results Quickly

4 Retrieve UniProt IDs

4.1 UniProt’s UNIREF Clusters
4.2 BioMart’s Paralogs

5 Query Drug-Target Annotations

5.1 Using drugTargetAnnot

5.1.1 Query with Compound IDs
5.1.2 Query with Protein IDs
5.1.3 Query with Gene IDs

5.2 Using getDrugTarget

5.2.1 Query with Compound IDs
5.2.2 Query with Protein IDs

6 Query Bioassay Data

6.1 Query with Compound IDs
6.2 Query with Protein IDs

7 Workflow to Run Everything

7.1 ID mapping

7.1.1 Query with Gene Names
7.1.2 Query with ENSEBML Gene IDs
7.1.3 Query with UniProt IDs

7.2 Retrieve UniProt IDs

7.2.1 UNIREF Cluster
7.2.2 BioMart Parlogs

7.3 Drug-Target Data
7.4 Drug-Target Frequency
7.5 Write Results to Tabular Files

8 Session Info

References

1 Introduction

1.1 Overview

The drugTargetInteractions package provides utilities for identifying drug-target interactions for sets of small molecule and/or gene/protein identifiers (Wu et al. 2006). The required drug-target interaction information is obained from a downloaded SQLite instance of the ChEMBL database (Gaulton et al. 2012; Bento et al. 2014). ChEMBL has been chosen for this purpose, because it provides one of the most comprehensive and best annotatated knowledge resources for drug-target information available in the public domain.

1.2 Install Package

As Bioconductor package drugTargetInteraction can be installed with the BiocManager::install() function.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("drugTargetInteractions")

Alternatively, the package can be installed from GitHub as follows.

devtools::install_github("girke-lab/drugTargetInteractions", build_vignettes=TRUE) # Installs from github

1.3 Load Package and Access Help

To use drugTargetInteractions, the package needs to be loaded in a user’s R session.

library("drugTargetInteractions") # Loads the package

The following commands are useful to list the available help files and open the the vignette of the package.

library(help="drugTargetInteractions") # Lists package info
vignette(topic="drugTargetInteractions", package="drugTargetInteractions") # Opens vignette

2 Working Environment

2.1 Required Files and Directories

The drugTargetInteractions package uses a downloaded SQLite instance of the ChEMBL database. The following code in this vignette uses a slimmed down toy version of this database that is small enough to be included in this package for demonstrating the usage of its functions. For real drug-target analysis work, it is important that users download and uncompress a recent version of the ChEMBL SQLite database from here, and then replace the path assigned to chembldb below with the path to the full version of the ChEMBL database they have downloaded to their system. Since the SQLite database from ChEMBL can be used by this package as is, creating a copy of the ChEMBL SQLite database on Bioconductor’s AnnotationHub is not necessary at this point. This way users can always use the latest or historical versions of ChEMBL without the need of maintaining a mirror instance.

The following genConfig function call creates a list containing the paths to input and output directories used by the sample code introduced in this vignette. For real analyses, users want to customize these paths to match the environment on their system. Usually, this means all paths generated by the system.file function need to be changed, as those are specific to work with the test data of a package (e.g. toy ChEMBL SQLite instance).

chembldb <- system.file("extdata", "chembl_sample.db", package="drugTargetInteractions")
resultsPath <- system.file("extdata", "results", package="drugTargetInteractions")
config <- genConfig(chemblDbPath=chembldb, resultsPath=resultsPath)

In addition, a lookup table is downloaded on the fly from UniChem (see web page and corresponding ftp site. This lookup table is used by drugTargetInteractions to translate compound identifiers across different drug databases. Currently, this includes three types of compound identifiers: DrugBank, PubChem, and ChEBI.

downloadUniChem(config=config)
cmpIdMapping(config=config)

3 Produce Results Quickly

Users mainly interested in generating analysis results can skip the technical details in the following sections and continue with the section entitled Workflow to Run Everything.

4 Retrieve UniProt IDs

The following returns for a set of query IDs (e.g. ENSEMBL gene IDs) the corresponding UniProt IDs based on a stict ID matching as well as a more relaxed sequence similarity-based approach. The latter sequence similarity associations are obtained with the getUniprotIDs or the getParalogs functions using UniProt’s UNIREF cluster or BioMart’s paralog annotations, respectively.

4.1 UniProt’s UNIREF Clusters

The UniProt.ws package is used to to return for a set of query IDs (here Ensembl gene IDs) the corresponding UniProt IDs based on two independent approaches: ID mappings (IDMs) and sequence similarity nearest neighbors (SSNNs) using UNIREF clusters. The latter have been generated by UniProt with the MMSeqs2 and Linclust algorithms (Steinegger and Söding 2017; Steinegger and Soeding 2018). Additional details on the UNIREF clusters are available here. The organism, query ID type and sequence similarity level can be selected under the taxId, kt and seq_cluster arguments, respectively. The seq_cluster argument can be assigned one of: UNIREF100, UNIREF90 or UNIREF50. The result is a list with two data.frames. The first one is based on IDMs and the second one on SSNNs.

keys <- c("ENSG00000145700", "ENSG00000135441", "ENSG00000120071")
res_list90 <- getUniprotIDs(taxId=9606, kt="ENSEMBL", keys=keys, seq_cluster="UNIREF90")

The following shows the first data.frame containing the ID mapping results.

library(DT)
datatable(res_list90[[1]], options = list(scrollX=TRUE, scrollY="600px", autoWidth = TRUE))

Show entries

Search:

	ENSEMBL	ID	GENES	UNIREF100	UNIREF90	UNIREF50	ORGANISM	PROTEIN-NAMES
1	ENSG00000145700	A0A669KAY2	ANKRD31	UniRef100_A0A669KAY2	UniRef90_Q8N7Z5	UniRef50_Q8N7Z5	Homo sapiens (Human)	Ankyrin repeat domain-containing protein 31
2	ENSG00000145700	D6RJB7	ANKRD31	UniRef100_D6RJB7	UniRef90_Q8N7Z5	UniRef50_Q8N7Z5	Homo sapiens (Human)	Ankyrin repeat domain-containing protein 31
3	ENSG00000145700	Q8N7Z5	ANKRD31	UniRef100_Q8N7Z5	UniRef90_Q8N7Z5	UniRef50_Q8N7Z5	Homo sapiens (Human)	Ankyrin repeat domain-containing protein 31
4	ENSG00000135441	F8VP73	BLOC1S1	UniRef100_F8VP73	UniRef90_F8VP73	UniRef50_F8VP73	Homo sapiens (Human)	Biogenesis of lysosome-related organelles complex 1 subunit 1
5	ENSG00000135441	F8W606	BLOC1S1	UniRef100_F8VP73	UniRef90_F8VP73	UniRef50_F8VP73	Homo sapiens (Human)	Biogenesis of lysosome-related organelles complex 1 subunit 1
6	ENSG00000135441	G8JLQ3	BLOC1S1	UniRef100_Q2NKW0	UniRef90_Q2NKW0	UniRef50_Q2NKW0	Homo sapiens (Human)	Biogenesis of lysosome-related organelles complex 1 subunit 1
7	ENSG00000135441	H0YHA4	BLOC1S1	UniRef100_H0YHA4	UniRef90_H0YHA4	UniRef50_H0YHA4	Homo sapiens (Human)	Biogenesis of lysosome-related organelles complex 1 subunit 1 (Fragment)
8	ENSG00000135441	P78537	BLOC1S1 BLOS1 GCN5L1 RT14	UniRef100_P78537	UniRef90_P78537	UniRef50_P78537	Homo sapiens (Human)	Biogenesis of lysosome-related organelles complex 1 subunit 1 (BLOC-1 subunit 1) (GCN5-like protein 1) (Protein RT14)
9	ENSG00000120071	A0A0G2JQP8	KANSL1	UniRef100_A0A0G2JQP8			Homo sapiens (Human)	KAT8 regulatory NSL complex subunit 1 (Fragment)
10	ENSG00000120071	A0A1W2PPV8	KANSL1	UniRef100_A0A1W2PPV8	UniRef90_A0A1W2PPV8	UniRef50_Q7Z3B3	Homo sapiens (Human)	KAT8 regulatory NSL complex subunit 1 (Fragment)

Showing 1 to 10 of 18 entries

Previous1 2Next

The following shows how to return the dimensions of the two data.frames and how to obtain the UniProt IDs as character vectors required for the downstream analysis steps.

sapply(res_list90, dim, simplify=FALSE)

## $IDM
## [1] 18  8
## 
## $SSNN
## [1] 22  8

sapply(names(res_list90), function(x) unique(na.omit(res_list90[[x]]$ID)))

## $IDM
##  [1] "A0A669KAY2" "D6RJB7"     "Q8N7Z5"     "F8VP73"     "F8W606"     "G8JLQ3"     "H0YHA4"    
##  [8] "P78537"     "A0A0G2JQP8" "A0A1W2PPV8" "A0A1W2PQT4" "A0A1W2PRA9" "A0A1W2PRB5" "A0A1W2PRR3"
## [15] "A0A1W2PS83" "A0A3B3IT55" "I3L233"     "Q7Z3B3"    
## 
## $SSNN
##  [1] "A0A669KAY2" "D6RJB7"     "Q8N7Z5"     "A0A087WSV2" "F8VP73"     "F8W606"     "G8JLQ3"    
##  [8] "F8VNQ1"     "H0YHA4"     "P78537"     "A0A1W2PPV8" "A0A1W2PQT4" "A0A1W2PRA9" "A0A1W2PRB5"
## [15] "A0A1W2PRR3" "A0A1W2PS83" "A0A3B3IT55" "A0A024R9Y2" "A0A0G2JNT7" "A0A0G2JQF4" "I3L233"    
## [22] "Q7Z3B3"

4.2 BioMart’s Paralogs

The following obtains via biomaRt for a set of query genes the corresponding UniProt IDs as well as their paralogs. The query genes can be Gene Names or ENSEMBL Gene IDs from H. sapiens. The result is similar to IDMs and SSNNs from the getUniprotIDs function, but instead of UNIREF clusters, biomaRt’s paralogs are used to obtain SSNNs.

queryBy <- list(molType="gene", idType="external_gene_name", ids=c("ANKRD31", "BLOC1S1", "KANSL1"))
queryBy <- list(molType="gene", idType="ensembl_gene_id", ids=c("ENSG00000145700", "ENSG00000135441", "ENSG00000120071"))
res_list <- getParalogs(queryBy)

The following shows the first data.frame containing the ID mapping results.

library(DT)
datatable(res_list[[1]], options = list(scrollX = TRUE, scrollY="400px", autoWidth = TRUE))

Show entries

Search:

	QueryID	ENSEMBL	GENES	ID_up_sp	ID_up_sp_tr
1	ENSG00000120071	ENSG00000120071	KIAA1267	Q7Z3B3	A0A024R9Y2
2	ENSG00000120071	ENSG00000120071	KANSL1	Q7Z3B3	A0A024R9Y2
3	ENSG00000120071	ENSG00000120071	KIAA1267	Q7Z3B3
4	ENSG00000120071	ENSG00000120071	KANSL1	Q7Z3B3
5	ENSG00000120071	ENSG00000120071	KIAA1267		A0A1W2PRB5
6	ENSG00000120071	ENSG00000120071	KANSL1		A0A1W2PRB5
7	ENSG00000120071	ENSG00000120071	KIAA1267		A0A3B3IT55
8	ENSG00000120071	ENSG00000120071	KANSL1		A0A3B3IT55
9	ENSG00000120071	ENSG00000120071	KIAA1267		A0A1W2PS83
10	ENSG00000120071	ENSG00000120071	KANSL1		A0A1W2PS83

Showing 1 to 10 of 30 entries

Previous1 2 3Next

The following shows how to return the dimensions of the two data.frames and how to obtain the UniProt IDs as character vectors required for the downstream analysis steps.

sapply(res_list, dim, simplify=FALSE)

## $IDM
## [1] 30  5
## 
## $SSNN
## [1] 33  7

sapply(names(res_list), function(x) unique(na.omit(res_list[[x]]$ID_up_sp)))

## $IDM
## [1] "Q7Z3B3" "P78537" "Q8N7Z5"
## 
## $SSNN
## [1] "Q7Z3B3" "A0AUZ9" "P78537" "Q8N7Z5" "Q5JPF3" "A6QL64" "Q8N2N9"

5 Query Drug-Target Annotations

The drugTargetAnnot function returns for a set of compound or gene/protein IDs the corresponding known drug-target annotation data available in ChEMBL. A related function called getDrugTarget is described in the subsequent subsection. This method generates very similar results, but uses internally pre-computed annotation summary tables which is less flexible than the usage of pure SQL statements.

5.1 Using `drugTargetAnnot`

The drugTargetAnnot function queries the ChEMBL database with SQL statements without depending on pre-computed annotation tables.

5.1.1 Query with Compound IDs

queryBy <- list(molType="cmp", idType="chembl_id", ids=c("CHEMBL17", "CHEMBL19", "CHEMBL1201117", "CHEMBL25", "nomatch", "CHEMBL1742471"))
qresult1 <- drugTargetAnnot(queryBy, config=config)

library(DT)
datatable(qresult1, options = list(scrollX = TRUE, scrollY="600px", autoWidth = TRUE))

Show entries

Search:

	QueryIDs	chembl_id	molregno	PubChem_ID	DrugBank_ID	Drug_Name	MOA	Action_Type	First_Approval	ChEMBL_TID	TID	UniProt_ID	Desc	Organism	Mesh_Indication
1	CHEMBL17	CHEMBL17	1085	3038	DB01144	DICHLORPHENAMIDE	Carbonic anhydrase XII inhibitor	INHIBITOR	1958	CHEMBL3242	12209	O43570	Carbonic anhydrase 12	Homo sapiens	D005901: Glaucoma,D020513: Paralysis, Hyperkalemic Periodic,D020514: Hypokalemic Periodic Paralysis
2	CHEMBL17	CHEMBL17	1085	3038	DB01144	DICHLORPHENAMIDE	Carbonic anhydrase I inhibitor	INHIBITOR	1958	CHEMBL261	10193	P00915	Carbonic anhydrase 1	Homo sapiens	D005901: Glaucoma,D020513: Paralysis, Hyperkalemic Periodic,D020514: Hypokalemic Periodic Paralysis
3	CHEMBL17	CHEMBL17	1085	3038	DB01144	DICHLORPHENAMIDE	Carbonic anhydrase II inhibitor	INHIBITOR	1958	CHEMBL205	15	P00918	Carbonic anhydrase 2	Homo sapiens	D005901: Glaucoma,D020513: Paralysis, Hyperkalemic Periodic,D020514: Hypokalemic Periodic Paralysis
4	CHEMBL17	CHEMBL17	1085	3038	DB01144	DICHLORPHENAMIDE	Carbonic anhydrase IV inhibitor	INHIBITOR	1958	CHEMBL3729	12896	P22748	Carbonic anhydrase 4	Homo sapiens	D005901: Glaucoma,D020513: Paralysis, Hyperkalemic Periodic,D020514: Hypokalemic Periodic Paralysis
5	CHEMBL19	CHEMBL19	1124			METHAZOLAMIDE	Carbonic anhydrase I inhibitor	INHIBITOR	1959	CHEMBL261	10193	P00915	Carbonic anhydrase 1	Homo sapiens	D000532: Altitude Sickness,D002583: Uterine Cervical Neoplasms,D005901: Glaucoma
6	CHEMBL19	CHEMBL19	1124			METHAZOLAMIDE	Carbonic anhydrase II inhibitor	INHIBITOR	1959	CHEMBL205	15	P00918	Carbonic anhydrase 2	Homo sapiens	D000532: Altitude Sickness,D002583: Uterine Cervical Neoplasms,D005901: Glaucoma
7	CHEMBL19	CHEMBL19	1124			METHAZOLAMIDE	Carbonic anhydrase IV inhibitor	INHIBITOR	1959	CHEMBL3729	12896	P22748	Carbonic anhydrase 4	Homo sapiens	D000532: Altitude Sickness,D002583: Uterine Cervical Neoplasms,D005901: Glaucoma
8	CHEMBL19	CHEMBL19	1124			METHAZOLAMIDE	Carbonic anhydrase VII inhibitor	INHIBITOR	1959	CHEMBL2326	11060	P43166	Carbonic anhydrase 7	Homo sapiens	D000532: Altitude Sickness,D002583: Uterine Cervical Neoplasms,D005901: Glaucoma
9	CHEMBL1201117	CHEMBL1201117	675068	4107	DB00423	METHOCARBAMOL	Carbonic anhydrase I inhibitor	INHIBITOR	1957	CHEMBL261	10193	P00915	Carbonic anhydrase 1	Homo sapiens	D008103: Liver Cirrhosis,D010146: Pain
10	CHEMBL25	CHEMBL25	1280	2244	DB00945	ASPIRIN	Cyclooxygenase inhibitor	INHIBITOR	1950	CHEMBL2094253	104725	P23219	Prostaglandin G/H synthase 1	Homo sapiens	D000026: Abortion, Habitual,D000236: Adenoma,D000755: Anemia, Sickle Cell,D001024: Aortic Valve Stenosis,D001249: Asthma,D001281: Atrial Fibrillation,D001471: Barrett Esophagus,D001714: Bipolar Disorder,D001943: Breast Neoplasms,D002289: Carcinoma, Non-Small-Cell Lung,D002294: Carcinoma, Squamous Cell,D002312: Cardiomyopathy, Hypertrophic,D002318: Cardiovascular Diseases,D002386: Cataract,D002532: Intracranial Aneurysm,D002537: Intracranial Arteriosclerosis,D002561: Cerebrovascular Disorders,D003093: Colitis, Ulcerative,D003110: Colonic Neoplasms,D003123: Colorectal Neoplasms, Hereditary Nonpolyposis,D003139: Common Cold,D003324: Coronary Artery Disease,D003327: Coronary Disease,D003866: Depressive Disorder,D003920: Diabetes Mellitus,D003922: Diabetes Mellitus, Type 1,D003924: Diabetes Mellitus, Type 2,D004066: Digestive System Diseases,D004381: Duodenal Ulcer,D004938: Esophageal Neoplasms,D005221: Fatigue,D005334: Fever,D005909: Glioblastoma,D006258: Head and Neck Neoplasms,D006261: Headache,D006331: Heart Diseases,D006333: Heart Failure,D006470: Hemorrhage,D006528: Carcinoma, Hepatocellular,D006937: Hypercholesterolemia,D006973: Hypertension,D006976: Hypertension, Pulmonary,D007246: Infertility,D007249: Inflammation,D007319: Sleep Initiation and Maintenance Disorders,D007511: Ischemia,D007972: Leukoplakia, Oral,D008180: Lupus Erythematosus, Systemic,D008268: Macular Degeneration,D008545: Melanoma,D008581: Meningitis,D008881: Migraine Disorders,D009080: Mucocutaneous Lymph Node Syndrome,D009101: Multiple Myeloma,D009103: Multiple Sclerosis,D009203: Myocardial Infarction,D009303: Nasopharyngeal Neoplasms,D009369: Neoplasms,D010020: Osteonecrosis,D010051: Ovarian Neoplasms,D010146: Pain,D010437: Peptic Ulcer,D010518: Periodontitis,D010927: Placental Insufficiency,D011014: Pneumonia,D011087: Polycythemia Vera,D011125: Adenomatous Polyposis Coli,D011225: Pre-Eclampsia,D011247: Pregnancy,D011471: Prostatic Neoplasms,D011618: Psychotic Disorders,D012004: Rectal Neoplasms,D012128: Respiratory Distress Syndrome, Adult,D012170: Retinal Vein Occlusion,D012214: Rheumatic Heart Disease,D012559: Schizophrenia,D012883: Skin Ulcer,D013035: Spasm,D013276: Stomach Ulcer,D013920: Thrombocythemia, Essential,D013927: Thrombosis,D014029: Tobacco Use Disorder,D014652: Vascular Diseases,D014947: Wounds and Injuries,D015179: Colorectal Neoplasms,D015228: Hypertriglyceridemia,D015356: Retinal Artery Occlusion,D015658: HIV Infections,D016491: Peripheral Vascular Diseases,D016893: Carotid Stenosis,D017202: Myocardial Ischemia,D018805: Sepsis,D019851: Thrombophilia,D020246: Venous Thrombosis,D020520: Brain Infarction,D020521: Stroke,D024821: Metabolic Syndrome,D029424: Pulmonary Disease, Chronic Obstructive,D050197: Atherosclerosis,D050723: Fractures, Bone,D051436: Renal Insufficiency, Chronic,D052439: Lipid Metabolism Disorders,D054058: Acute Coronary Syndrome,D054556: Venous Thromboembolism,D055371: Acute Lung Injury,D056586: Acute Chest Syndrome,D058729: Peripheral Arterial Disease,D060050: Angina, Stable

Showing 1 to 10 of 14 entries

Previous1 2Next

5.1.2 Query with Protein IDs

queryBy <- list(molType="protein", idType="UniProt_ID", ids=c("P43166", "P00915"))
qresult2 <- drugTargetAnnot(queryBy, config=config)

library(DT)
datatable(qresult2, options = list(scrollX = TRUE, scrollY="600px", autoWidth = TRUE))

Show entries

Search:

	QueryIDs	chembl_id	molregno	PubChem_ID	DrugBank_ID	Drug_Name	MOA	Action_Type	First_Approval	ChEMBL_TID	TID	UniProt_ID	Desc	Organism	Mesh_Indication
1	P43166	CHEMBL18	1096	3295	DB00311	ETHOXZOLAMIDE	Carbonic anhydrase inhibitor	INHIBITOR	1982	CHEMBL2095180	104764	P43166	Carbonic anhydrase 7	Homo sapiens
2	P43166	CHEMBL19	1124			METHAZOLAMIDE	Carbonic anhydrase VII inhibitor	INHIBITOR	1959	CHEMBL2326	11060	P43166	Carbonic anhydrase 7	Homo sapiens	D000532: Altitude Sickness,D002583: Uterine Cervical Neoplasms,D005901: Glaucoma
3	P00915	CHEMBL17	1085	3038	DB01144	DICHLORPHENAMIDE	Carbonic anhydrase I inhibitor	INHIBITOR	1958	CHEMBL261	10193	P00915	Carbonic anhydrase 1	Homo sapiens	D005901: Glaucoma,D020513: Paralysis, Hyperkalemic Periodic,D020514: Hypokalemic Periodic Paralysis
4	P00915	CHEMBL18	1096	3295	DB00311	ETHOXZOLAMIDE	Carbonic anhydrase inhibitor	INHIBITOR	1982	CHEMBL2095180	104764	P00915	Carbonic anhydrase 1	Homo sapiens
5	P00915	CHEMBL19	1124			METHAZOLAMIDE	Carbonic anhydrase I inhibitor	INHIBITOR	1959	CHEMBL261	10193	P00915	Carbonic anhydrase 1	Homo sapiens	D000532: Altitude Sickness,D002583: Uterine Cervical Neoplasms,D005901: Glaucoma
6	P00915						Carbonic anhydrase I inhibitor	INHIBITOR		CHEMBL261	10193	P00915	Carbonic anhydrase 1	Homo sapiens
7	P00915	CHEMBL166863	277493	9841854	DB12399	POLMACOXIB	Carbonic anhydrase I inhibitor	INHIBITOR		CHEMBL261	10193	P00915	Carbonic anhydrase 1	Homo sapiens	D010003: Osteoarthritis,D012216: Rheumatic Diseases,D015207: Osteoarthritis, Hip
8	P00915	CHEMBL1200814	674765	13290219, 90469712		ACETAZOLAMIDE SODIUM	Carbonic anhydrase I inhibitor	INHIBITOR	1990	CHEMBL261	10193	P00915	Carbonic anhydrase 1	Homo sapiens	D004487: Edema,D005901: Glaucoma,D012640: Seizures,D015812: Glaucoma, Angle-Closure
9	P00915	CHEMBL1201117	675068	4107	DB00423	METHOCARBAMOL	Carbonic anhydrase I inhibitor	INHIBITOR	1957	CHEMBL261	10193	P00915	Carbonic anhydrase 1	Homo sapiens	D008103: Liver Cirrhosis,D010146: Pain

Showing 1 to 9 of 9 entries

Previous1Next

5.1.3 Query with Gene IDs

The following returns drug-target annotations for a set of query Ensembl gene IDs. For this the Ensembl gene IDs are translated into UniProt IDs using both the IDM and SSNN approaches.

keys <- c("ENSG00000120088", "ENSG00000135441", "ENSG00000120071")
res_list90 <- getUniprotIDs(taxId=9606, kt="ENSEMBL", keys=keys, seq_cluster="UNIREF90")

id_list <- sapply(names(res_list90), function(x) unique(na.omit(res_list90[[x]]$ID)))

Next, drug-target annotations are returned for the Uniprot IDs with the IDM or SSNN methods. The following example uses the UniProt IDs of the SSNN method. Note, to include the upstream Ensembl gene IDs in the final result table, the below Ensembl ID collapse step via a tapply is necessary since occasionally UniProt IDs are assigned to several Ensembl gene IDs (e.g. recent gene duplications).

queryBy <- list(molType="protein", idType="UniProt_ID", ids=id_list[[2]])
qresultSSNN <- drugTargetAnnot( queryBy, config=config)
ensidsSSNN <- tapply(res_list90[[2]]$ENSEMBL, res_list90[[2]]$ID, paste, collapse=", ") 
qresultSSNN <- data.frame(Ensembl_IDs=ensidsSSNN[as.character(qresultSSNN$QueryIDs)], qresultSSNN)

library(DT)
datatable(qresultSSNN, options = list(scrollX = TRUE, scrollY="600px", autoWidth = TRUE))

Show entries

Search:

	Ensembl_IDs	QueryIDs
A0A669KAY2	ENSG00000145700	A0A669KAY2
D6RJB7	ENSG00000145700	D6RJB7
Q8N7Z5	ENSG00000145700	Q8N7Z5
A0A087WSV2	ENSG00000258311	A0A087WSV2
F8VP73	ENSG00000135441	F8VP73
F8W606	ENSG00000135441	F8W606
G8JLQ3	ENSG00000135441	G8JLQ3
F8VNQ1	ENSG00000258311	F8VNQ1
H0YHA4	ENSG00000135441	H0YHA4
P78537	ENSG00000135441	P78537

Showing 1 to 10 of 22 entries

Previous1 2 3Next

5.2 Using `getDrugTarget`

The getDrugTarget function generates similar results as drugTargetAnnot, but depends on a pre-computed query table (here drugTargetAnnot.xls).

5.2.1 Query with Compound IDs

id_mapping <- c(chembl="chembl_id", pubchem="PubChem_ID", uniprot="UniProt_ID", drugbank="DrugBank_ID")
queryBy <- list(molType="cmp", idType="pubchem", ids=c("2244", "65869", "2244"))
queryBy <- list(molType="protein", idType="uniprot", ids=c("P43166", "P00915", "P43166"))
queryBy <- list(molType="cmp", idType="drugbank", ids=c("DB00945", "DB01202"))
#qresult3 <- getDrugTarget(queryBy=queryBy, id_mapping=id_mapping, columns=c(1,5,8,16,17,39,46:53),config=config)
qresult3 <- getDrugTarget(queryBy=queryBy, id_mapping=id_mapping, columns=c(1,5,8,16,17),config=config)

library(DT)
datatable(qresult3, options = list(scrollX = TRUE, scrollY="600px", autoWidth = TRUE))

Show entries

Search:

	QueryIDs	mechanism_of_action	action_type	pref_name	chembl_id
1	DB00945	Cyclooxygenase inhibitor	INHIBITOR	ASPIRIN	CHEMBL25
2	DB01202

Showing 1 to 2 of 2 entries

Previous1Next

5.2.2 Query with Protein IDs

queryBy <- list(molType="protein", idType="chembl", ids=c("CHEMBL25", "nomatch", "CHEMBL1742471"))
#qresult4 <- getDrugTarget(queryBy=queryBy, id_mapping, columns=c(1,5,8,16,17,39,46:52),config=config) 
qresult4 <- getDrugTarget(queryBy=queryBy, id_mapping=id_mapping, columns=c(1,5,8,16,17),config=config)

library(DT)
datatable(qresult4, options = list(scrollX = TRUE, scrollY="600px", autoWidth = TRUE))

Show entries

Search:

	QueryIDs	mechanism_of_action	action_type	pref_name	chembl_id
1	CHEMBL25	Cyclooxygenase inhibitor	INHIBITOR	ASPIRIN	CHEMBL25
2	nomatch
3	CHEMBL1742471	Histamine H2 receptor antagonist	ANTAGONIST	EBROTIDINE	CHEMBL1742471
4	CHEMBL1742471	Urease inhibitor	INHIBITOR	EBROTIDINE	CHEMBL1742471

Showing 1 to 4 of 4 entries

Previous1Next

6 Query Bioassay Data

The drugTargetBioactivity function returns for a set of compound or gene/protein IDs the corresponding bioassay data available in ChEMBL.

6.1 Query with Compound IDs

Example query for compounds IDs.

queryBy <- list(molType="cmp", idType="DrugBank_ID", ids=c("DB00945", "DB00316", "DB01050"))
qresultBAcmp <- drugTargetBioactivity(queryBy, config=config)

library(DT)
datatable(qresultBAcmp, options = list(scrollX = TRUE, scrollY="600px", autoWidth = TRUE))

Show entries

Search:

	chembl_id	molregno	PubChem_ID	DrugBank_ID	pref_name	activity_id	chembl_assay_id	UniProt_ID	description	organism	standard_type
1	CHEMBL25	1280	2244	DB00945	ASPIRIN	7617428	CHEMBL1909186	O76074	cGMP-specific 3',5'-cyclic phosphodiesterase	Homo sapiens	IC50
2	CHEMBL25	1280	2244	DB00945	ASPIRIN	7617429	CHEMBL1909186	O76074	cGMP-specific 3',5'-cyclic phosphodiesterase	Homo sapiens	Ki
3	CHEMBL112	16450	1983	DB00316	ACETAMINOPHEN	7641402	CHEMBL1909186	O76074	cGMP-specific 3',5'-cyclic phosphodiesterase	Homo sapiens	IC50
4	CHEMBL112	16450	1983	DB00316	ACETAMINOPHEN	7641403	CHEMBL1909186	O76074	cGMP-specific 3',5'-cyclic phosphodiesterase	Homo sapiens	Ki
5	CHEMBL521	11674	3672	DB01050	IBUPROFEN	7664497	CHEMBL1909186	O76074	cGMP-specific 3',5'-cyclic phosphodiesterase	Homo sapiens	IC50
6	CHEMBL521	11674	3672	DB01050	IBUPROFEN	7664498	CHEMBL1909186	O76074	cGMP-specific 3',5'-cyclic phosphodiesterase	Homo sapiens	Ki
7	CHEMBL25	1280	2244	DB00945	ASPIRIN	7618801	CHEMBL1909203	P00533	Epidermal growth factor receptor	Homo sapiens	IC50
8	CHEMBL25	1280	2244	DB00945	ASPIRIN	7618802	CHEMBL1909203	P00533	Epidermal growth factor receptor	Homo sapiens	Ki
9	CHEMBL112	16450	1983	DB00316	ACETAMINOPHEN	7641436	CHEMBL1909203	P00533	Epidermal growth factor receptor	Homo sapiens	IC50
10	CHEMBL112	16450	1983	DB00316	ACETAMINOPHEN	7641437	CHEMBL1909203	P00533	Epidermal growth factor receptor	Homo sapiens	Ki

Showing 1 to 10 of 1,318 entries

Previous1 2 3 4 5…132Next

6.2 Query with Protein IDs

Example query for protein IDs. Note, the Ensembl gene to UniProt ID mappings are derived from above and stored in the named character vector ensidsSSNN.

queryBy <- list(molType="protein", idType="uniprot", ids=id_list[[1]])                                                                                                             
qresultBApep <- drugTargetBioactivity(queryBy, config=config)                                                                                       
qresultBApep <- data.frame(Ensembl_IDs=ensidsSSNN[as.character(qresultBApep$UniProt_ID)], qresultBApep)

library(DT)
datatable(qresultBApep, options = list(scrollX = TRUE, scrollY="600px", autoWidth = TRUE))

Show entries

Search:

	Ensembl_IDs	chembl_id	molregno	PubChem_ID	DrugBank_ID	pref_name	activity_id	chembl_assay_id	UniProt_ID	description	organism	standard_value	standard_units	standard_flag	standard_type
No data available in table

Showing 0 to 0 of 0 entries

PreviousNext

7 Workflow to Run Everything

This section explains how to run all of the above drug-target interaction analysis steps with a few convenience meta functions. Users mainly interested in generating analysis results quickly can focus on this section only.

7.1 ID mapping

The getSymEnsUp function returns for a query of gene or protein IDs a mapping table containing: ENSEMBL Gene IDs, Gene Names/Symbols, UniProt IDs and ENSEMBL Protein IDs. Internally, the function uses the ensembldb package. Its results are returned in a list where the first slot contains the ID mapping table, while the subsequent slots include the corresponding named character vectors: ens_gene_id, up_ens_id, and up_gene_id. Currently, the following query IDs are supported by getSymEnsUp: GENE_NAME, ENSEMBL_GENE_ID and UNIPROT_ID.

7.1.1 Query with Gene Names

gene_name <- c("CA7", "CFTR")
idMap <- getSymEnsUp(EnsDb="EnsDb.Hsapiens.v86", ids=gene_name, idtype="GENE_NAME")
ens_gene_id <- idMap$ens_gene_id
ens_gene_id

## ENSG00000168748 ENSG00000001626 
##           "CA7"          "CFTR"

7.1.2 Query with ENSEBML Gene IDs

ensembl_gene_id <- c("ENSG00000001626", "ENSG00000168748")
idMap <- getSymEnsUp(EnsDb="EnsDb.Hsapiens.v86", ids=ensembl_gene_id, idtype="ENSEMBL_GENE_ID")
ens_gene_id <- idMap$ens_gene_id

7.1.3 Query with UniProt IDs

uniprot_id <- c("P43166", "P13569") 
idMap <- getSymEnsUp(EnsDb="EnsDb.Hsapiens.v86", ids=uniprot_id, idtype="UNIPROT_ID")
ens_gene_id <- idMap$ens_gene_id

7.2 Retrieve UniProt IDs

The perfect match and nearest neighbor UniProt IDs can be obtained from UniProt’s UNIREF cluster or BioMart’s paralog annotations.

7.2.1 UNIREF Cluster

The corresponding IDM and SSNN UniProt IDs for the above ENSEMBL gene IDs are obtained with the getUniprotIDs function. This step is slow since the queries have to be performed with chunksize=1 in order to reliably track the query ENSEMBL gene ID information in the results.

res_list90 <- getUniprotIDs(taxId=9606, kt="ENSEMBL", keys=names(ens_gene_id), seq_cluster="UNIREF90", chunksize=1)
sapply(res_list90, dim)

7.2.2 BioMart Parlogs

Here the corresponding perfect match (IDM) and paralog (SSNN) UniProt IDs for the above ENSEMBL gene IDs are obtained with the getParalogs function. The latter is much faster than getUniprotIDs, and also covers wider evolutionary distances. Thus, it may be the preferred methods for many use cases.

queryBy <- list(molType="gene", idType="ensembl_gene_id", ids=names(ens_gene_id))
res_list <- getParalogs(queryBy)

sapply(res_list, dim)

##      IDM SSNN
## [1,]  13  127
## [2,]   5    7

7.3 Drug-Target Data

Both drug-target annotation and bioassay data are obtained with a meta function called runDrugTarget_Annot_Bioassay that internally uses the main processing functions drugTargetAnnot and drugTargetBioactivity. It organizes the result in a list with the annotation and bioassay data (data.frames) in the first and second slot, respectively. Importantly, the results from the IDM and SSNN UniProt IDs are combined in a single table, where duplicated rows have been removed. To track in the result table, which method was used for obtaining UniProt IDs, an IDM_Mapping_Type column has been added. Note, the gene IDs in the SSNN rows have the string Query_ prepended to indicate that they are not necessarily the genes encoding the SSNN UniProt proteins listed in the corresponding rows. Instead they are the genes encoding the query proteins used for searching for SSNNs.

drug_target_list <- runDrugTarget_Annot_Bioassay(res_list=res_list, up_col_id="ID_up_sp", ens_gene_id, config=config) 
sapply(drug_target_list, dim)

##      Annotation Bioassay
## [1,]         55       35
## [2,]         18       17

View content of annotation result slot:

datatable(drug_target_list$Annotation, options = list(scrollX = TRUE, scrollY="600px", autoWidth = TRUE))

Show entries

Search:

	ID_Mapping_Type	GeneName	Ensembl_IDs	UniProt_QueryIDs	CHEMBL_CMP_ID	Molregno	PubChem_ID	DrugBank_ID	Drug_Name	MOA	Action_Type	First_Approval	ChEMBL_TID	TID	UniProt_ID	Target_Desc	Organism	Mesh_Indication
1	IDM	CA7	ENSG00000168748	P43166	CHEMBL18	1096	3295	DB00311	ETHOXZOLAMIDE	Carbonic anhydrase inhibitor	INHIBITOR	1982	CHEMBL2095180	104764	P43166	Carbonic anhydrase 7	Homo sapiens
2	IDM	CA7	ENSG00000168748	P43166	CHEMBL19	1124			METHAZOLAMIDE	Carbonic anhydrase VII inhibitor	INHIBITOR	1959	CHEMBL2326	11060	P43166	Carbonic anhydrase 7	Homo sapiens	D000532: Altitude Sickness,D002583: Uterine Cervical Neoplasms,D005901: Glaucoma
3	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	O43570	CHEMBL17	1085	3038	DB01144	DICHLORPHENAMIDE	Carbonic anhydrase XII inhibitor	INHIBITOR	1958	CHEMBL3242	12209	O43570	Carbonic anhydrase 12	Homo sapiens	D005901: Glaucoma,D020513: Paralysis, Hyperkalemic Periodic,D020514: Hypokalemic Periodic Paralysis
4	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	O43570	CHEMBL18	1096	3295	DB00311	ETHOXZOLAMIDE	Carbonic anhydrase inhibitor	INHIBITOR	1982	CHEMBL2095180	104764	O43570	Carbonic anhydrase 12	Homo sapiens
5	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	O43570						Carbonic anhydrase XII inhibitor	INHIBITOR		CHEMBL3242	12209	O43570	Carbonic anhydrase 12	Homo sapiens
6	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	O43570	CHEMBL1200814	674765	13290219, 90469712		ACETAZOLAMIDE SODIUM	Carbonic anhydrase XII inhibitor	INHIBITOR	1990	CHEMBL3242	12209	O43570	Carbonic anhydrase 12	Homo sapiens	D004487: Edema,D005901: Glaucoma,D012640: Seizures,D015812: Glaucoma, Angle-Closure
7	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	P35218									CHEMBL4789		P35218	Carbonic anhydrase 5A, mitochondrial	Homo sapiens
8	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	P35218	CHEMBL18	1096	3295	DB00311	ETHOXZOLAMIDE	Carbonic anhydrase inhibitor	INHIBITOR	1982	CHEMBL2095180	104764	P35218	Carbonic anhydrase 5A, mitochondrial	Homo sapiens
9	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	P23280									CHEMBL3025		P23280	Carbonic anhydrase 6	Homo sapiens
10	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	P23280	CHEMBL18	1096	3295	DB00311	ETHOXZOLAMIDE	Carbonic anhydrase inhibitor	INHIBITOR	1982	CHEMBL2095180	104764	P23280	Carbonic anhydrase 6	Homo sapiens

Showing 1 to 10 of 55 entries

Previous1 2 3 4 5 6Next

View content of bioassay result slot (restricted to first 500 rows):

datatable(drug_target_list$Bioassay[1:500,], options = list(scrollX = TRUE, scrollY="600px", autoWidth = TRUE))

Show entries

Search:

	ID_Mapping_Type	GeneName	Ensembl_IDs	CHEMBL_CMP_ID	Molregno	PubChem_ID	DrugBank_ID	Drug_Name	Activity_ID	CHEMBL_Assay_ID	UniProt_ID	Target_Desc	Organism	Standard_Value	Standard_Units	Standard_Flag	Standard_Type

	ID_Mapping_Type	GeneName	Ensembl_IDs	CHEMBL_CMP_ID	Molregno	PubChem_ID	DrugBank_ID	Drug_Name	Activity_ID	CHEMBL_Assay_ID	UniProt_ID	Target_Desc	Organism	Standard_Value	Standard_Units	Standard_Flag	Standard_Type
1	IDM	CA7	ENSG00000168748	CHEMBL112	16450	1983	DB00316	ACETAMINOPHEN	2573014	CHEMBL992757	P43166	Carbonic anhydrase 7	Homo sapiens	9100	nM	1	Ki
2	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	CHEMBL112	16450	1983	DB00316	ACETAMINOPHEN	2172309	CHEMBL933690	P00918	Carbonic anhydrase 2	Homo sapiens	6200	nM	1	Ki
3	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	CHEMBL112	16450	1983	DB00316	ACETAMINOPHEN	2573068	CHEMBL991856	P00918	Carbonic anhydrase 2	Homo sapiens	6200	nM	1	Ki
4	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	CHEMBL25	1280	2244	DB00945	ASPIRIN	2259037	CHEMBL1002070	P00918	Carbonic anhydrase 2	Homo sapiens	1160000	nM	1	IC50
5	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	CHEMBL25	1280	2244	DB00945	ASPIRIN	2259057	CHEMBL1002074	P00918	Carbonic anhydrase 2	Homo sapiens	3660000	nM	1	Ki
6	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	CHEMBL25	1280	2244	DB00945	ASPIRIN	7615970	CHEMBL1909123	P00918	Carbonic anhydrase 2	Homo sapiens	0		0	IC50
7	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	CHEMBL25	1280	2244	DB00945	ASPIRIN	7615971	CHEMBL1909123	P00918	Carbonic anhydrase 2	Homo sapiens	0		0	Ki
8	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	CHEMBL112	16450	1983	DB00316	ACETAMINOPHEN	7640002	CHEMBL1909123	P00918	Carbonic anhydrase 2	Homo sapiens	0		0	IC50
9	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	CHEMBL112	16450	1983	DB00316	ACETAMINOPHEN	7640003	CHEMBL1909123	P00918	Carbonic anhydrase 2	Homo sapiens	0		0	Ki
10	SSNN_noIDM	Query_CA7	Query_ENSG00000168748	CHEMBL521	11674	3672	DB01050	IBUPROFEN	7662367	CHEMBL1909123	P00918	Carbonic anhydrase 2	Homo sapiens	0		0	IC50

Showing 1 to 10 of 500 entries

Previous1 2 3 4 5…50Next

7.4 Drug-Target Frequency

The following generates a summary table containing drug-target frequency information.

df <- drug_target_list$Annotation
df[,"GeneName"] <- gsub("Query_", "", as.character(df$GeneName))
stats <- tapply(df$CHEMBL_CMP_ID, as.factor(df$GeneName), function(x) unique(x))
stats <- sapply(names(stats), function(x) stats[[x]][nchar(stats[[x]]) > 0])
stats <- sapply(names(stats), function(x) stats[[x]][!is.na(stats[[x]])])
statsDF <- data.frame(GeneNames=names(stats), Drugs=sapply(stats, paste, collapse=", "), N_Drugs=sapply(stats, length))

Print drug-target frequency table.

datatable(statsDF, options = list(scrollX = TRUE, scrollY="150px", autoWidth = TRUE))

Show entries

Search:

	GeneNames	Drugs	N_Drugs
CA7	CA7	CHEMBL18, CHEMBL19, CHEMBL17, CHEMBL1200814, CHEMBL166863, CHEMBL1201117	6
CFTR	CFTR		0

Showing 1 to 2 of 2 entries

Previous1Next

7.5 Write Results to Tabular Files

Both the annotation and bioassay data of the drug_target_list object can be exported to separate tabular files as follows.

write.table(drug_target_list$Annotation, "DrugTargetAnnotation.xls", row.names=FALSE, quote=FALSE, na="", sep="\t")
write.table(drug_target_list$Bioassay, "DrugTargetBioassay.xls", row.names=FALSE, quote=FALSE, na="", sep="\t")
write.table(statDF, "statDF.xls", row.names=FALSE, quote=FALSE, na="", sep="\t")

8 Session Info

sessionInfo()

## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB             
##  [4] LC_COLLATE=C               LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] EnsDb.Hsapiens.v86_2.99.0     ensembldb_2.26.0              AnnotationFilter_1.26.0      
##  [4] GenomicFeatures_1.54.1        AnnotationDbi_1.64.0          Biobase_2.62.0               
##  [7] GenomicRanges_1.54.1          GenomeInfoDb_1.38.0           IRanges_2.36.0               
## [10] S4Vectors_0.40.1              BiocGenerics_0.48.0           DT_0.30                      
## [13] drugTargetInteractions_1.10.1 BiocStyle_2.30.0             
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.0            dplyr_1.1.3                 blob_1.2.4                 
##  [4] filelock_1.0.2              Biostrings_2.70.1           bitops_1.0-7               
##  [7] lazyeval_0.2.2              fastmap_1.1.1               RCurl_1.98-1.12            
## [10] BiocFileCache_2.10.1        GenomicAlignments_1.38.0    XML_3.99-0.14              
## [13] digest_0.6.33               lifecycle_1.0.3             ellipsis_0.3.2             
## [16] ProtGenerics_1.34.0         KEGGREST_1.42.0             RSQLite_2.3.2              
## [19] magrittr_2.0.3              compiler_4.3.1              rlang_1.1.1                
## [22] sass_0.4.7                  progress_1.2.2              tools_4.3.1                
## [25] utf8_1.2.4                  yaml_2.3.7                  rtracklayer_1.62.0         
## [28] knitr_1.45                  htmlwidgets_1.6.2           prettyunits_1.2.0          
## [31] S4Arrays_1.2.0              bit_4.0.5                   curl_5.1.0                 
## [34] DelayedArray_0.28.0         xml2_1.3.5                  abind_1.4-5                
## [37] BiocParallel_1.36.0         withr_2.5.2                 purrr_1.0.2                
## [40] grid_4.3.1                  fansi_1.0.5                 biomaRt_2.58.0             
## [43] SummarizedExperiment_1.32.0 cli_3.6.1                   rmarkdown_2.25             
## [46] crayon_1.5.2                generics_0.1.3              httr_1.4.7                 
## [49] rjson_0.2.21                BiocBaseUtils_1.4.0         DBI_1.1.3                  
## [52] cachem_1.0.8                stringr_1.5.0               zlibbioc_1.48.0            
## [55] parallel_4.3.1              BiocManager_1.30.22         XVector_0.42.0             
## [58] restfulr_0.0.15             matrixStats_1.0.0           vctrs_0.6.4                
## [61] Matrix_1.6-1.1              jsonlite_1.8.7              bookdown_0.36              
## [64] hms_1.1.3                   bit64_4.0.5                 crosstalk_1.2.0            
## [67] jquerylib_0.1.4             glue_1.6.2                  codetools_0.2-19           
## [70] stringi_1.7.12              BiocIO_1.12.0               tibble_3.2.1               
## [73] pillar_1.9.0                rappdirs_0.3.3              htmltools_0.5.6.1          
## [76] GenomeInfoDbData_1.2.11     R6_2.5.1                    dbplyr_2.4.0               
## [79] lattice_0.22-5              evaluate_0.22               png_0.1-8                  
## [82] Rsamtools_2.18.0            memoise_2.0.1               bslib_0.5.1                
## [85] rjsoncons_1.0.0             SparseArray_1.2.0           xfun_0.40                  
## [88] MatrixGenerics_1.14.0       UniProt.ws_2.42.0           pkgconfig_2.0.3

References

Bento, A Patrı́cia, Anna Gaulton, Anne Hersey, Louisa J Bellis, Jon Chambers, Mark Davies, Felix A Krüger, et al. 2014. “The ChEMBL bioactivity database: an update.” Nucleic Acids Res. 42 (Database issue): D1083–90. https://doi.org/10.1093/nar/gkt1031.

Gaulton, Anna, Louisa J Bellis, A Patricia Bento, Jon Chambers, Mark Davies, Anne Hersey, Yvonne Light, et al. 2012. “ChEMBL: a large-scale bioactivity database for drug discovery.” Nucleic Acids Res. 40 (Database issue): D1100–7. https://doi.org/10.1093/nar/gkr777.

Steinegger, Martin, and Johannes Soeding. 2018. “Clustering huge protein sequence sets in linear time.” Nat. Commun. 9 (1): 2542. https://doi.org/10.1038/s41467-018-04964-5.

Steinegger, Martin, and Johannes Söding. 2017. “MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets.” Nat. Biotechnol. 35 (11): 1026–8. https://doi.org/10.1038/nbt.3988.

Wu, C H, R Apweiler, A Bairoch, D A Natale, W C Barker, B Boeckmann, S Ferro, et al. 2006. “The Universal Protein Resource (UniProt): an expanding universe of protein information.” Nucleic Acids Res. 34 (Database issue): D187–D191. https://doi.org/10.1093/nar/gkj161.

Drug-Target Interactions

Last update: 31 October, 2023

1 Introduction

1.1 Overview

1.2 Install Package

1.3 Load Package and Access Help

2 Working Environment

2.1 Required Files and Directories

3 Produce Results Quickly

4 Retrieve UniProt IDs

4.1 UniProt’s UNIREF Clusters

4.2 BioMart’s Paralogs

5 Query Drug-Target Annotations

5.1 Using drugTargetAnnot

5.1.1 Query with Compound IDs

5.1.2 Query with Protein IDs

5.1.3 Query with Gene IDs

5.2 Using getDrugTarget

5.2.1 Query with Compound IDs

5.2.2 Query with Protein IDs

6 Query Bioassay Data

6.1 Query with Compound IDs

6.2 Query with Protein IDs

7 Workflow to Run Everything

7.1 ID mapping

7.1.1 Query with Gene Names

7.1.2 Query with ENSEBML Gene IDs

7.1.3 Query with UniProt IDs

7.2 Retrieve UniProt IDs

7.2.1 UNIREF Cluster

7.2.2 BioMart Parlogs

7.3 Drug-Target Data

7.4 Drug-Target Frequency

7.5 Write Results to Tabular Files

8 Session Info

References

5.1 Using `drugTargetAnnot`

5.2 Using `getDrugTarget`