# Bach mouse mammary gland (10X Genomics)
## Introduction
This performs an analysis of the @bach2017differentiation 10X Genomics dataset,
from which we will consider a single sample of epithelial cells from the mouse mammary gland during gestation.
## Data loading
``` r
library(scRNAseq)
sce.mam <- BachMammaryData(samples="G_1")
```
``` r
library(scater)
rownames(sce.mam) <- uniquifyFeatureNames(
rowData(sce.mam)$Ensembl, rowData(sce.mam)$Symbol)
library(AnnotationHub)
ens.mm.v97 <- AnnotationHub()[["AH73905"]]
rowData(sce.mam)$SEQNAME <- mapIds(ens.mm.v97, keys=rowData(sce.mam)$Ensembl,
keytype="GENEID", column="SEQNAME")
```
## Quality control
``` r
unfiltered <- sce.mam
```
``` r
is.mito <- rowData(sce.mam)$SEQNAME == "MT"
stats <- perCellQCMetrics(sce.mam, subsets=list(Mito=which(is.mito)))
qc <- quickPerCellQC(stats, percent_subsets="subsets_Mito_percent")
sce.mam <- sce.mam[,!qc$discard]
```
``` r
colData(unfiltered) <- cbind(colData(unfiltered), stats)
unfiltered$discard <- qc$discard
gridExtra::grid.arrange(
plotColData(unfiltered, y="sum", colour_by="discard") +
scale_y_log10() + ggtitle("Total count"),
plotColData(unfiltered, y="detected", colour_by="discard") +
scale_y_log10() + ggtitle("Detected features"),
plotColData(unfiltered, y="subsets_Mito_percent",
colour_by="discard") + ggtitle("Mito percent"),
ncol=2
)
```
(\#fig:unref-bach-qc-dist)Distribution of each QC metric across cells in the Bach mammary gland dataset. Each point represents a cell and is colored according to whether that cell was discarded.
``` r
plotColData(unfiltered, x="sum", y="subsets_Mito_percent",
colour_by="discard") + scale_x_log10()
```
(\#fig:unref-bach-qc-comp)Percentage of mitochondrial reads in each cell in the Bach mammary gland dataset compared to its total count. Each point represents a cell and is colored according to whether that cell was discarded.
``` r
colSums(as.matrix(qc))
```
```
## low_lib_size low_n_features high_subsets_Mito_percent
## 0 0 143
## discard
## 143
```
## Normalization
``` r
library(scran)
set.seed(101000110)
clusters <- quickCluster(sce.mam)
sce.mam <- computeSumFactors(sce.mam, clusters=clusters)
sce.mam <- logNormCounts(sce.mam)
```
``` r
summary(sizeFactors(sce.mam))
```
```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.264 0.520 0.752 1.000 1.207 10.790
```
``` r
plot(librarySizeFactors(sce.mam), sizeFactors(sce.mam), pch=16,
xlab="Library size factors", ylab="Deconvolution factors", log="xy")
```
(\#fig:unref-bach-norm)Relationship between the library size factors and the deconvolution size factors in the Bach mammary gland dataset.
## Variance modelling
We use a Poisson-based technical trend to capture more genuine biological variation in the biological component.
``` r
set.seed(00010101)
dec.mam <- modelGeneVarByPoisson(sce.mam)
top.mam <- getTopHVGs(dec.mam, prop=0.1)
```
``` r
plot(dec.mam$mean, dec.mam$total, pch=16, cex=0.5,
xlab="Mean of log-expression", ylab="Variance of log-expression")
curfit <- metadata(dec.mam)
curve(curfit$trend(x), col='dodgerblue', add=TRUE, lwd=2)
```
(\#fig:unref-bach-var)Per-gene variance as a function of the mean for the log-expression values in the Bach mammary gland dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to simulated Poisson counts.
## Dimensionality reduction
``` r
library(BiocSingular)
set.seed(101010011)
sce.mam <- denoisePCA(sce.mam, technical=dec.mam, subset.row=top.mam)
sce.mam <- runTSNE(sce.mam, dimred="PCA")
```
``` r
ncol(reducedDim(sce.mam, "PCA"))
```
```
## [1] 15
```
## Clustering
We use a higher `k` to obtain coarser clusters (for use in `doubletCluster()` later).
``` r
snn.gr <- buildSNNGraph(sce.mam, use.dimred="PCA", k=25)
colLabels(sce.mam) <- factor(igraph::cluster_walktrap(snn.gr)$membership)
```
``` r
table(colLabels(sce.mam))
```
```
##
## 1 2 3 4 5 6 7 8 9 10
## 550 847 639 477 54 88 39 22 32 24
```
``` r
plotTSNE(sce.mam, colour_by="label")
```
(\#fig:unref-bach-tsne)Obligatory $t$-SNE plot of the Bach mammary gland dataset, where each point represents a cell and is colored according to the assigned cluster.