---
output:
html_document:
self_contained: true
number_sections: no
theme: flatly
highlight: tango
mathjax: null
toc: true
toc_float: true
toc_depth: 2
css: style.css
bibliography: bibliography.bib
vignette: >
%\VignetteIndexEntry{"3.5 - Identifying regulatory TFs"}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
# Identifying regulatory TFs
This step is to identify regulatory TF whose expression associates with TF binding motif
DNA methylation which is carried out by function `get.TFs`.
For each motif considered to be enriched within a particular probe set, it will
compare the average DNA methylation at all distal enhancer probes within $\pm250bp$
of a motif occurrence, to the expression of human TFs. A statistical test
was performed for each motif-TF pair, as follows. The samples (all groups samples)
were divided into two groups: the M group, which consisted
of the 20\% of samples with the highest average methylation at all motif-adjacent
probes, and the U group, which consisted of the 20\% of samples with the lowest
methylation. For each candidate motif-TF pair, the Mann-Whitney U test was used to test
the null hypothesis that overall gene expression in group M was greater or equal
than that in group U.
All TFs were ranked by the $-log_{10}(P_{r})$, and those falling within the top 5% of
this ranking were considered candidate upstream regulators.
![Source: Yao, Lijing, et al. "Inferring regulatory element landscapes and transcription factor networks from cancer methylomes." Genome biology 16.1 (2015): 105.](figures/paper_get_pairs.png) [@yao2015inferring,@yao2015demystifying]
Main get.pair arguments
| Argument | Description |
|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| data | A multiAssayExperiment with DNA methylation and Gene Expression data. See `createMAE` function. |
| enriched.motif | A list containing output of get.enriched.motif function or a path of XX.rda file containing output of get.enriched.motif function. |
| group.col | A column defining the groups of the sample. You can view the available columns using: `colnames(MultiAssayExperiment::colData(data))`.
| group1 | A group from group.col. |
| group2 | A group from group.col. |
| minSubgroupFrac | A number ranging from 0 to 1 specifying the percentage of samples used to create the groups U (unmethylated) and M (methylated) used to link probes to TF expression. Default is 0.4 (lowest quintile of all samples will be in the U group and the highest quintile of all samples in the M group). |
| mode | A character. Can be "unsupervised" or "supervised". If unsupervised is set the U (unmethylated) and M (methylated) groups will be selected among all samples based on methylation of each probe. Otherwise U group and M group will set as the samples of group1 or group2 as described below: If diff.dir is "hypo, U will be the group 1 and M the group2. If diff.dir is "hyper" M group will be the group1 and U the group2. |
| diff.dir | A character can be "hypo" or "hyper", showing differential methylation dirction in group 1. It can be "hypo" which means the probes are hypomethylated in group1; "hyper" which means the probes are hypermethylated in group1; This argument is used only when mode is supervised nad it should be the same value from get.diff.meth function. |
```{r,eval=TRUE, message=FALSE, warning = FALSE, results = "hide"}
# Load results from previous sections
mae <- get(load("mae.rda"))
load("result/getMotif.hypo.enriched.motifs.rda")
```
```{r,eval=TRUE, message=FALSE, warning = FALSE, results = "hide"}
## identify regulatory TF for the enriched motifs
TF <- get.TFs(data = mae,
group.col = "definition",
group1 = "Primary solid Tumor",
group2 = "Solid Tissue Normal",
minSubgroupFrac = 0.4,
enriched.motif = enriched.motif,
dir.out = "result",
cores = 1,
label = "hypo")
```
```{r,eval=TRUE, message=FALSE, warning = FALSE}
# get.TFs automatically save output files.
# getTF.hypo.TFs.with.motif.pvalue.rda contains statistics for all TF with average
# DNA methylation at sites with the enriched motif.
# getTF.hypo.significant.TFs.with.motif.summary.csv contains only the significant probes.
dir(path = "result", pattern = "getTF")
# TF ranking plot based on statistics will be automatically generated.
dir(path = "result/TFrankPlot_family/", pattern = "pdf")
```
# Bibliography