--- title: "cbaf: cBioPortal Automated Functions" author: Arman Shahrisa date: "`r Sys.Date()`" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{cbaf} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction `cbaf` is a _Bioconductor_ package that facilitates working with the high-throughput data stored on . The official CRAN package that is designed for obtaining data from cBioPortal in R, is `cgdsr`. To obtain data with this package, users have to pass a multistep procedure. Besides, the index of cancers and their subgroups changes frequently, which in turn, requires changing the R code. cbaf makes this procedure automated for __RNA-Seq__, __microRNA-Seq__, __microarray__ and __methylation__ data. In addition, comparing the genetic data across multiple cancer studies/subgroups of a cancer study becomes much faster and easier. The results are stored as excel file(s) and multiple heatmaps. # Package Installation ## Prerequisites The package itself doesn't need anything outside of R, but one of the dependant packages `rjava` needs some prerequisites. Since preparing the prerequisites may be complicated sometimes, they are briefly described in this section. In a __32 bit windows__, 32 bit version of _Java Runtime Environment_ must be installed first. In a __64 bit windows__, it is highly recommended that both 32 and 64 bit versions of _Java Runtime Environment_ be installed. In __ubuntu__, run the following commands in terminal in the same order as specified: ``` sudo apt-get install default-jdk sudo R CMD javareconf sudo apt-get install r-cran-rjava sudo apt-get install libgdal1-dev libproj-dev export LD_LIBRARY_PATH=/usr/lib/jvm/jre/lib/amd64:/usr/lib/jvm/jre/lib/amd64/default sudo apt-get install libcurl4-openssl-dev libssl-dev ``` ## Installation and Loading The package can be installed through the `biocLite` script. ```{r, eval = FALSE} source("http://www.bioconductor.org/biocLite.R") biocLite("cbaf", dependencies = TRUE) ``` After that, the pachage can be loaded into _R_ workspace by ```{r, results='hide', warning=FALSE, message=FALSE} library(cbaf) ``` # How to Use the _cbaf_ The package contains seven low-level functions: `availableData()`, `obtainOneStudy()`, `obtainMultipleStudies()`, `automatedStatistics()`, `heatmapOutput()`, `xlsxOutput()` and `cleanDatabase()`. In addition, there are also two high-level functions, `processOneStudy()` and `processMultipleStudies()`, that execute some of the mentioned functions in an ordered manner to speed up the overal process. It is recommended that users only work with two low-level functions - `availableData()` and `cleanDatabase()` - directly, since they are independant of other low-level functions. For the rest, please use high-level functions instead. This allows all functions to work with a higher efficiency. ## Low-level Functions ### availableData() This function scans all the cancer studies to examine presence of _RNA-Seq_, _microRNA-Seq_, _microarray_ and _methylation_ data. It requires a name to label the output excel file. In the following example, the entered name is `"list.2017-09-08"`. ```{r, eval = FALSE} availableData("list.2017-09-08") ``` Upon finishing, the output excel file is accessible at the present (working) directory. It contains different columns: cancer_study_id, cancer_study_name, RNA.Seq, microRNA.Seq, microarray of mRNA, microarray of miRNA, methylation and description. if there is already an excel file with the given name in the working directory, the function prints a message, asking the user whether or not it should proceeds. If the answer is no, the function prints a message to inform the user that it has stopped further processing. If the user types yes, `availableData()` will overwrite the excel file after it has obtained the requested data. ### obtainOneStudy() This function obtains and stores the supported data for at least one group of genes across multiple subgroups of a cancer study. In addion, it can check whether or not all genes are included in different subgroups of a cancer study and, if not, looks for the alternative gene names. It requires at least four arguments: - __genesList__, a list that contains at least one gene group. There is no limit on the number of gene groups, users can set as many as gene groups they desire. - __submissionName__, a character string containing name of interest. It is used for naming the process. - __studyName__, a character string showing the desired cancer name. It is an standard cancer study name that exists on cbioportal.org, such as `Acute Myeloid Leukemia (TCGA, NEJM 2013)`. - __desiredTechnique__, one of the five supported high-throughput studies: `RNA-Seq`, `microRNA-Seq`, `microarray.mRNA`, `microarray.microRNA` or `methylation`. Function also contains two other options: - __desiredCaseList__ a numeric vector that contains the index of desired cancer subgroups, assuming the user knows index of desired subgroups. If not, desiredCaseList must be set as 'none', function will show the available subgroups asking the user to enter the desired ones during the process. The default value is 'none'. - __validateGenes__ a logical value that, if set to be `TRUE`, function will check each cancer subgroup to find whether or not every gene has a record. If the subgroup doesn't have a record for the specific gene, function checks for alternative gene names that cbioportal might use instead of the given gene name. Consider the following example, where _genes_ consists of two gene groups K.demethylases and K.acetyltransferases, _submissionName_ is `test`, _cancername_ is `Breast Invasive Carcinoma (TCGA, Cell 2015)` and the _desiredTechnique_ is `RNA-Seq`. If `desired.case.list = "none"`, all subgroups of the requested cancer study appear on console, function asks the user to choose the index of desired subgroups. Alterntively, user can enter the index of desired cases by changing the argument `desired.case.list = "none"` to, e.g. `desiredCaseList = c(2,3,4,5)`. After the user has entered the desired subgroups, function continues by getting the data and informs the user with a progress bar. ```{r} genes <- list(K.demethylases = c("KDM1A", "KDM1B", "KDM2A"), K.acetyltransferases = c("CLOCK", "CREBBP", "ELP3", "EP300")) obtainOneStudy(genes, "test", "Breast Invasive Carcinoma (TCGA, Cell 2015)", "RNA-Seq", desiredCaseList = c(2,3,4,5)) ``` ### obtainMultipleStudies() This function obtains and stores the supported data for at least one group of genes across multiple cancer studies. It can check whether or not all genes are included in each cancer study and, if not, it looks for the alternative gene names. It requires at least four arguments: - __genes__, a list that contains at least one group of genes. There is no limit for the number of gene groups, users can set as many as gene groups they desire. - __submissionName__, a character string containing name of interest. It is used for naming the process. - __cancernames__, a character vector or a matrix possessing names of desired cancer studies. The character vector contains standard cancer names that can be found on cbioportal.org, such as `Acute Myeloid Leukemia (TCGA, NEJM 2013)`. Alternatively, a matrix can be used if user prefers user-defined cancer names. In this case, the first column of matrix comprises the standard cancer names while the second column must contain the desired cancer names. - __desiredTechnique__, one of the five supported high-throughput studies: `RNA-Seq`, `microRNA-Seq`, `microarray.mRNA`, `microarray.microRNA` or `methylation`. Function also contains two other options: - __cancerCode__, if `TRUE`, will force the function to use the standard abbreviated cancer names instead of complete cancer names. For example, `laml_tcga_pub` is the shortened name for `Acute Myeloid Leukemia (TCGA, NEJM 2013)`. - __validateGenes__ , if `TRUE`, causes the function to check all cancer studies to find which genes from the input data are available. In addition, function checks for alternative gene names that cbioportal might use instead of the given gene name. In the following example, _genes_ consists of two gene groups K.demethylases and K.acetyltransferases, _submissionName_ is `test2`, _cancername_ has complete name of five cancer studies and the desired high-throughput study is `RNA-Seq`. ```{r} genes <- list(K.demethylases = c("KDM1A", "KDM1B", "KDM2A"), K.acetyltransferases = c("CLOCK", "CREBBP", "ELP3", "EP300")) # Specifying names of cancer studies by standard study names cancernames <- c("Acute Myeloid Leukemia (TCGA, Provisional)", "Adrenocortical Carcinoma (TCGA, Provisional)", "Bladder Urothelial Carcinoma (TCGA, Provisional)", "Brain Lower Grade Glioma (TCGA, Provisional)", "Breast Invasive Carcinoma (TCGA, Provisional)") # Specifying names of cancer studies by creating a matrix that includes standard and desired study names cancernames <- matrix(c("Acute Myeloid Leukemia (TCGA, Provisional)", "acute myeloid leukemia", "Adrenocortical Carcinoma (TCGA, Provisional)", "adrenocortical carcinoma", "Bladder Urothelial Carcinoma (TCGA, Provisional)", "bladder urothelial carcinoma", "Brain Lower Grade Glioma (TCGA, Provisional)", "brain lower grade glioma", "Breast Invasive Carcinoma (TCGA, Provisional)", "breast invasive carcinoma"), nrow = 5, ncol=2 , byrow = TRUE) obtainMultipleStudies(genes, "test2", cancernames, "RNA-Seq") ``` ### automatedStatistics() The function calculates the statistics of the data obtained by `obtainOneStudy()` or `obtainMultipleStudies()` functions. Based on user's preference, these statistics can include _frequency percentage_, _frequency ratio_, _mean value_ and _median value_ of samples greater than specific value. Furthermore, it can look for the genes that comprise the highest values in each cancer and list the top 5 genes for _frequency percentage_, _mean value_ and _median value_. It requires at least two arguments: - __submissionName__, a character string containing name of interest. It is used for naming the process and should be the same as submissionName for either of `obtainOneStudy()` or `obtainMultipleStudies()` functions. - __obtainedDataType__, a character string that identifies the type of input data produced by the previous function. Two options are available: `single study` for `obtainOneStudy()` and `multiple studies` for `obtainMultipleStudies()`. The function uses `obtainedDataType` and `submissionName` to construct the name of the BiocFileCach object and then finds the appropriate data inside it. Default value is `multiple studies`. Function also contains four other options: - __calculate__, a character vector that contains the desired statistical procedures. Default input is `c("frequencyPercentage", "frequencyRatio", "meanValue")`. To get all the statistics, use the following instead: `c("frequencyPercentage", "frequencyRatio", "meanValue", "medianValue")`. This will tell the function to compute the following: + _frequencyPercentage_, which is the percentage of samples having the value greather than specific cutoff divided by the total sample size for every study / study subgroup + _frequency ratio_, which shows the number of selected samples divided by the total number of samples that give the frequency percentage. It shows the selected and total sample sizes. + _Mean Value_, which contains mean value of selected samples for each study. + _Median Value_, which shows the median value of selected samples for every study. - __topGenes__, a logical value that, if set as TRUE, causes the function to create three data.frame that contain the five top genes for each cancer. To get all the three data.frames, _frequencyPercentage_, _meanValue_ and _median_ must have been included for __calculate__. - __cutoff__, a number used to limit samples to those that are greater than this number (cutoff). The default value for methylation data is 0.6 while gene expression studies use default value of 2. For methylation studies, it is _observed/expected ratio_, for the rest, it is _z-score_. To change the cutoff to any desired number, change the option to `cutoff = desiredNumber` in which desiredNumber is the number of interest. - __round__, a logical value that forces the function to round all the calculated values to two decimal places. The default value is `TRUE`. In the following example, _submissionName_ is `test`, and the _obtainedDataType_ is `multiple studies`. We exclude _mean value_ and _median value_ from `calculate`. Note that top genes for these two statistics will also be skipped. ```{r} automatedStatistics("test", obtainedDataType = "single study", calculate = c("frequencyPercentage", "frequencyRatio")) ``` ### heatmapOutput() This function prepares heatmap for _frequency percentage_, _mean value_ and _median value_ data provided by `automatedStatistics()` function. Heatmaps for every gene group are stored in separate folder. It requires at least one argument: - __submissionName__, a character string containing name of interest. It is used for naming the process and should be the same as submissionName for either of `obtainOneStudy()` or `obtainMultipleStudies()` functions. Function also contains thirteen other options: - __shortenStudyNames__ a logical value that causes the function to remove the last part of cancer names aiming to shorten them. The removed segment usually contains the name of scientific group that has conducted the experiment. - __genelimit__ if large number of genes exist in at least one gene group, this option can be used to limit the number of genes that are shown on heatmap. For instance, `genelimit=50` will limit the heatmap to 50 genes that show the most variation across multiple study / study subgroups. The default value is `none`. - __resolution__ This option can be used to adjust the resolution of the output heatmaps as 'dot per inch'. The defalut resolution is 600. - __RowCex__ a number that specifies letter size in heatmap row names. - __ColCex__ a number that specifies letter size in heatmap column names. - __heatmapMargines__ a numeric vector that is used to set heatmap margins. The default value is `heatmapMargines=c(15,07)`. - __angleForYaxisNames__ a number that determines the angle with which the studies/study subgroups names are shown on heatmaps. The default value is 45 degree. - __heatmapColor__ a character string that defines heatmap color. The default value is "RdBu". "redgreen" is also a popular color in genomic studies. To see the rest of colors, please type `library(RColorBrewer)` and then `display.brewer.all()`. - __reverseColor__ a logical value that reverses the color gradient for heatmap(s). - __transposedHeatmap__ a logical value that transposes heatmap rows to columns and vice versa. - __simplify__ a logical value that tells the function whether or not to change values under _simplifictionCuttoff_ to zero. The purpose behind this option is to facilitate seeing the candidate genes. Therefore, it is not suited for publications. - __simplifictionCuttoff__ a logical value that, if `simplify.visulization = TRUE`, needs to be set as a desired cutoff for _simplify.visulization_. It has the same unit as _cutoff_. - __genesToDrop__ a character vector. Gene names within this vector will be omitted from heatmap. In the following example, _submissionName_ is `test`. ```{r, eval = FALSE} heatmapOutput("test", shortenStudyNames = TRUE, heatmapMargines = c(13,5), heatmapColor = "redgreen", genesToDrop = c("PVT1", "SNHG6"), RowCex = 1, ColCex = 1, reverseColor = FALSE) ``` If the requested heatmaps already exist, it doesn't rewrite the heatmaps. The number of skipped heatmaps is then printed. ### xlsxOutput() This function exports the output of `automatedStatistics()` and the _gene validation_ result of one of the `obtainOneStudy()` or `obtainMultipleStudies()` functions as an excel file. For every gene group, an excel file will be generated and stored in the same folder as heatmaps. It requires one argument: - __submissionName__, a character string containing name of interest. It is used for naming the process and should be the same as submissionName for either of `obtainOneStudy()` or `obtainMultipleStudies()` functions. In the following example, _submissionName_ is `test`. ```{r, eval = FALSE} xlsxOutput("test") ``` If the requested excel files already exist, the function avoids rewriting them. The number of skipped excel files is then printed. ### cleanDatabase() This function removes the created databases in the cbaf package directory. This helps users to obtain the fresh data from cbioportal.org. It contains one optional argument: - __databaseNames__, a character vector that contains name of databases that will be removed. The default value in null. In the following example, _databaseNames_ is `Whole2`. ```{r, eval=FALSE} cleanDatabase("Whole2") ``` If the _databaseNames_ left unentered, it will print the available databases and allow the user to choose the desired ones. ## High-level Functions ### processOneStudy() This function combines four of the mentioned functions for the ease of use. It is recommended that users only use this parent function to obtain and process gene data across multiple subsections of a cancer study so that child functions work with maximum efficiency. `processOneStudy()` uses the following functions: - __obtainOneStudy()__ - __automatedStatistics()__ - __heatmapOutput()__ - __xlsxOutput()__ It requires at least four arguments. All function arguments are the same as low-level functions: - __genesList__, a list that contains at least one gene group. There is no limit on the number of gene groups, users can set as many as gene groups they desire. - __submissionName__, a character string containing name of interest. It is used for naming the process and should be the same as submissionName for either of `obtainOneStudy()` or `obtainMultipleStudies()` functions. - __studyName__, a character string showing the desired cancer name. It is an standard cancer study name that exists on cbioportal.org, such as `Acute Myeloid Leukemia (TCGA, NEJM 2013)`. - __desiredTechnique__, one of the five supported high-throughput studies: `RNA-Seq`, `microRNA-Seq`, `microarray.mRNA`, `microarray.microRNA` or `methylation`. Function also contains nineteen other options: - __desiredCaseList__ a numeric vector that contains the index of desired cancer subgroups, assuming the user knows index of desired subgroups. If not, desiredCaseList must be set as 'none', function will show the available subgroups asking the user to enter the desired ones during the process. The default value is 'none'. - __validateGenes__ a logical value that, if set to be `TRUE`, function will check each cancer subgroup to find whether or not every gene has a record. If the subgroup doesn't have a record for the specific gene, function checks for alternative gene names that cbioportal might use instead of the given gene name. - __calculate__, a character vector that contains the desired statistical procedures. Default input is `c("frequencyPercentage", "frequencyRatio", "meanValue")`. To get all the statistics, use the following instead: `c("frequencyPercentage", "frequencyRatio", "meanValue", "medianValue")`. - __cutoff__, a number used to limit samples to those that are greater than this number (cutoff). The default value for methylation data is 0.6 while gene expression studies use default value of 2. For methylation studies, it is _observed/expected ratio_, for the rest, it is _z-score_. To change the cutoff to any desired number, change the option to `cutoff = desiredNumber` in which desiredNumber is the number of interest. - __round__, a logical value that forces the function to round all the calculated values to two decimal places. The default value is `TRUE`. - __topGenes__, a logical value that, if set as TRUE, causes the function to create three data.frame that contain the five top genes for each cancer. To get all the three data.frames, _frequencyPercentage_, _meanValue_ and _median_ must have been included for __calculate__. - __shortenStudyNames__ a logical value that causes the function to remove the last part of cancer names aiming to shorten them. The removed segment usually contains the name of scientific group that has conducted the experiment. - __genelimit__ if large number of genes exist in at least one gene group, this option can be used to limit the number of genes that are shown on heatmap. For instance, `genelimit=50` will limit the heatmap to 50 genes that show the most variation across multiple study / study subgroups. The default value is `none`. - __resolution__ This option can be used to adjust the resolution of the output heatmaps as 'dot per inch'. The defalut resolution is 600. - __RowCex__ a number that specifies letter size in heatmap row names. - __ColCex__ a number that specifies letter size in heatmap column names. - __heatmapMargines__ a numeric vector that is used to set heatmap margins. The default value is `heatmapMargines=c(15,07)`. - __angleForYaxisNames__ a number that determines the angle with which the studies/study subgroups names are shown on heatmaps. The default value is 45 degree. - __heatmapColor__ a character string that defines heatmap color. The default value is "RdBu". "redgreen" is also a popular color in genomic studies. To see the rest of colors, please type `library(RColorBrewer)` and then `display.brewer.all()`. - __reverseColor__ a logical value that reverses the color gradient for heatmap(s). - __transposedHeatmap__ a logical value that transposes heatmap rows to columns and vice versa. - __simplify__ a logical value that tells the function whether or not to change values under _simplifictionCuttoff_ to zero. The purpose behind this option is to facilitate seeing the candidate genes. Therefore, it is not suited for publications. - __simplifictionCuttoff__ a logical value that, if `simplify.visulization = TRUE`, needs to be set as a desired cutoff for _simplify.visulization_. It has the same unit as _cutoff_. - __genesToDrop__ a character vector. Gene names within this vector will be omitted from heatmap. To get more information about the function options, please refer to the child function to whom they correspond, for example `genesList` lies within `obtainMultipleStudies()` function. The following is an example showing how this function can be used: ```{r} genes <- list(K.demethylases = c("KDM1A", "KDM1B", "KDM2A", "KDM2B", "KDM3A", "KDM3B", "JMJD1C", "KDM4A"), K.methyltransferases = c("SUV39H1", "SUV39H2", "EHMT1", "EHMT2", "SETDB1", "SETDB2", "KMT2A", "KMT2A")) processOneStudy(genes, "test", "Breast Invasive Carcinoma (TCGA, Cell 2015)", "RNA-Seq", desiredCaseList = c(2,3,4,5), calculate = c("frequencyPercentage", "frequencyRatio"), RowCex = 1, ColCex = 1) ``` The output excel file and heatmaps are stored in separate folders for every gene group. Ultimately, all the folders are located inside another folder, which its name is the combination of _submissionName_ and “output for multiple studies”, for example “test output for multiple studies”. ### processMultipleStudies() This function combines four of the mentioned above functions for the ease of use. It is recommended that users only use this parent function to obtain and process gene data across multiple cancer studies for maximum efficiency. `processMultipleStudies()` uses the following functions: - __obtainMultipleStudies()__ - __automatedStatistics()__ - __heatmapOutput()__ - __xlsxOutput()__ It requires at least four arguments. All function arguments are the same as low-level functions: - __genesList__, a list that contains at least one gene group. There is no limit on the number of gene groups, users can set as many as gene groups they desire. - __submissionName__, a character string containing name of interest. It is used for naming the process and should be the same as submissionName for either of `obtainOneStudy()` or `obtainMultipleStudies()` functions. - __studyName__, a character string showing the desired cancer name. It is an standard cancer study name that exists on cbioportal.org, such as `Acute Myeloid Leukemia (TCGA, NEJM 2013)`. - __desiredTechnique__, one of the five supported high-throughput studies: `RNA-Seq`, `microRNA-Seq`, `microarray.mRNA`, `microarray.microRNA` or `methylation`. Function also contains nineteen other options: - __cancerCode__, if `TRUE`, will force the function to use the standard abbreviated cancer names instead of complete cancer names. For example, `laml_tcga_pub` is the shortened name for `Acute Myeloid Leukemia (TCGA, NEJM 2013)`. - __validateGenes__ a logical value that, if set to be `TRUE`, function will check each cancer subgroup to find whether or not every gene has a record. If the subgroup doesn't have a record for the specific gene, function checks for alternative gene names that cbioportal might use instead of the given gene name. - __calculate__, a character vector that contains the desired statistical procedures. Default input is `c("frequencyPercentage", "frequencyRatio", "meanValue")`. To get all the statistics, use the following instead: `c("frequencyPercentage", "frequencyRatio", "meanValue", "medianValue")`. - __cutoff__, a number used to limit samples to those that are greater than this number (cutoff). The default value for methylation data is 0.6 while gene expression studies use default value of 2. For methylation studies, it is _observed/expected ratio_, for the rest, it is _z-score_. To change the cutoff to any desired number, change the option to `cutoff = desiredNumber` in which desiredNumber is the number of interest. - __round__, a logical value that forces the function to round all the calculated values to two decimal places. The default value is `TRUE`. - __topGenes__, a logical value that, if set as TRUE, causes the function to create three data.frame that contain the five top genes for each cancer. To get all the three data.frames, _frequencyPercentage_, _meanValue_ and _median_ must have been included for __calculate__. - __shortenStudyNames__ a logical value that causes the function to remove the last part of cancer names aiming to shorten them. The removed segment usually contains the name of scientific group that has conducted the experiment. - __genelimit__ if large number of genes exist in at least one gene group, this option can be used to limit the number of genes that are shown on heatmap. For instance, `genelimit=50` will limit the heatmap to 50 genes that show the most variation across multiple study / study subgroups. The default value is `none`. - __resolution__ This option can be used to adjust the resolution of the output heatmaps as 'dot per inch'. The defalut resolution is 600. - __RowCex__ a number that specifies letter size in heatmap row names. - __ColCex__ a number that specifies letter size in heatmap column names. - __heatmapMargines__ a numeric vector that is used to set heatmap margins. The default value is `heatmapMargines=c(15,07)`. - __angleForYaxisNames__ a number that determines the angle with which the studies/study subgroups names are shown on heatmaps. The default value is 45 degree. - __heatmapColor__ a character string that defines heatmap color. The default value is "RdBu". "redgreen" is also a popular color in genomic studies. To see the rest of colors, please type `library(RColorBrewer)` and then `display.brewer.all()`. - __reverseColor__ a logical value that reverses the color gradient for heatmap(s). - __transposedHeatmap__ a logical value that transposes heatmap rows to columns and vice versa. - __simplify__ a logical value that tells the function whether or not to change values under _simplifictionCuttoff_ to zero. The purpose behind this option is to facilitate seeing the candidate genes. Therefore, it is not suited for publications. - __simplifictionCuttoff__ a logical value that, if `simplify.visulization = TRUE`, needs to be set as a desired cutoff for _simplify.visulization_. It has the same unit as _cutoff_. - __genesToDrop__ a character vector. Gene names within this vector will be omitted from heatmap. To get more information about the function options, please refer to the child function to whom they correspond, for example `genesList` lies within `obtainMultipleStudies()` function. The following is an example showing how this function can be used: ```{r} genes <- list(K.demethylases = c("KDM1A", "KDM1B", "KDM2A", "KDM2B", "KDM3A", "KDM3B", "JMJD1C", "KDM4A"), K.methyltransferases = c("SUV39H1", "SUV39H2", "EHMT1", "EHMT2", "SETDB1", "SETDB2", "KMT2A", "KMT2A")) studies <- c("Acute Myeloid Leukemia (TCGA, Provisional)", "Adrenocortical Carcinoma (TCGA, Provisional)", "Bladder Urothelial Carcinoma (TCGA, Provisional)", "Brain Lower Grade Glioma (TCGA, Provisional)", "Breast Invasive Carcinoma (TCGA, Provisional)") processMultipleStudies(genes, "test2", studies, "RNA-Seq", calculate = c("frequencyPercentage", "frequencyRatio"), heatmapMargines = c(15,10)) ``` The output excel file and heatmaps are stored in separate folders for every gene group. Ultimately, all the folders are located inside another folder, which it's name that is the combination of _submissionName_ and "output for multiple studies", for example "test output for multiple studies".