Chapter 9 Grun mouse HSC (CEL-seq)
9.1 Introduction
This performs an analysis of the mouse haematopoietic stem cell (HSC) dataset generated with CEL-seq (Grun et al. 2016). Despite its name, this dataset actually contains both sorted HSCs and a population of micro-dissected bone marrow cells.
9.2 Data loading
library(AnnotationHub)
ens.mm.v97 <- AnnotationHub()[["AH73905"]]
anno <- select(ens.mm.v97, keys=rownames(sce.grun.hsc),
keytype="GENEID", columns=c("SYMBOL", "SEQNAME"))
rowData(sce.grun.hsc) <- anno[match(rownames(sce.grun.hsc), anno$GENEID),]
After loading and annotation, we inspect the resulting SingleCellExperiment
object:
## class: SingleCellExperiment
## dim: 21817 1915
## metadata(0):
## assays(1): counts
## rownames(21817): ENSMUSG00000109644 ENSMUSG00000007777 ...
## ENSMUSG00000055670 ENSMUSG00000039068
## rowData names(3): GENEID SYMBOL SEQNAME
## colnames(1915): JC4_349_HSC_FE_S13_ JC4_350_HSC_FE_S13_ ...
## JC48P6_1203_HSC_FE_S8_ JC48P6_1204_HSC_FE_S8_
## colData names(2): sample protocol
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
9.3 Quality control
For some reason, no mitochondrial transcripts are available, and we have no spike-in transcripts, so we only use the number of detected genes and the library size for quality control. We block on the protocol used for cell extraction, ignoring the micro-dissected cells when computing this threshold. This is based on our judgement that a majority of micro-dissected plates consist of a majority of low-quality cells, compromising the assumptions of outlier detection.
library(scuttle)
stats <- perCellQCMetrics(sce.grun.hsc)
qc <- quickPerCellQC(stats, batch=sce.grun.hsc$protocol,
subset=grepl("sorted", sce.grun.hsc$protocol))
sce.grun.hsc <- sce.grun.hsc[,!qc$discard]
We examine the number of cells discarded for each reason.
## low_lib_size low_n_features discard
## 465 482 488
We create some diagnostic plots for each metric (Figure 9.1). The library sizes are unusually low for many plates of micro-dissected cells; this may be attributable to damage induced by the extraction protocol compared to cell sorting.
colData(unfiltered) <- cbind(colData(unfiltered), stats)
unfiltered$discard <- qc$discard
library(scater)
gridExtra::grid.arrange(
plotColData(unfiltered, y="sum", x="sample", colour_by="discard",
other_fields="protocol") + scale_y_log10() + ggtitle("Total count") +
facet_wrap(~protocol),
plotColData(unfiltered, y="detected", x="sample", colour_by="discard",
other_fields="protocol") + scale_y_log10() +
ggtitle("Detected features") + facet_wrap(~protocol),
ncol=1
)
9.4 Normalization
library(scran)
set.seed(101000110)
clusters <- quickCluster(sce.grun.hsc)
sce.grun.hsc <- computeSumFactors(sce.grun.hsc, clusters=clusters)
sce.grun.hsc <- logNormCounts(sce.grun.hsc)
We examine some key metrics for the distribution of size factors, and compare it to the library sizes as a sanity check (Figure 9.2).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.027 0.290 0.603 1.000 1.201 16.433
plot(librarySizeFactors(sce.grun.hsc), sizeFactors(sce.grun.hsc), pch=16,
xlab="Library size factors", ylab="Deconvolution factors", log="xy")
9.5 Variance modelling
We create a mean-variance trend based on the expectation that UMI counts have Poisson technical noise. We do not block on sample here as we want to preserve any difference between the micro-dissected cells and the sorted HSCs.
set.seed(00010101)
dec.grun.hsc <- modelGeneVarByPoisson(sce.grun.hsc)
top.grun.hsc <- getTopHVGs(dec.grun.hsc, prop=0.1)
The lack of a typical “bump” shape in Figure 9.3 is caused by the low counts.
plot(dec.grun.hsc$mean, dec.grun.hsc$total, pch=16, cex=0.5,
xlab="Mean of log-expression", ylab="Variance of log-expression")
curfit <- metadata(dec.grun.hsc)
curve(curfit$trend(x), col='dodgerblue', add=TRUE, lwd=2)
9.6 Dimensionality reduction
set.seed(101010011)
sce.grun.hsc <- denoisePCA(sce.grun.hsc, technical=dec.grun.hsc, subset.row=top.grun.hsc)
sce.grun.hsc <- runTSNE(sce.grun.hsc, dimred="PCA")
We check that the number of retained PCs is sensible.
## [1] 9
9.7 Clustering
snn.gr <- buildSNNGraph(sce.grun.hsc, use.dimred="PCA")
colLabels(sce.grun.hsc) <- factor(igraph::cluster_walktrap(snn.gr)$membership)
##
## 1 2 3 4 5 6 7 8 9 10 11 12
## 259 148 221 103 177 108 48 122 98 63 62 18
short <- ifelse(grepl("micro", sce.grun.hsc$protocol), "micro", "sorted")
gridExtra:::grid.arrange(
plotTSNE(sce.grun.hsc, colour_by="label"),
plotTSNE(sce.grun.hsc, colour_by=I(short)),
ncol=2
)
9.8 Marker gene detection
markers <- findMarkers(sce.grun.hsc, test.type="wilcox", direction="up",
row.data=rowData(sce.grun.hsc)[,"SYMBOL",drop=FALSE])
To illustrate the manual annotation process, we examine the marker genes for one of the clusters. Upregulation of Camp, Lcn2, Ltf and lysozyme genes indicates that this cluster contains cells of neuronal origin.
chosen <- markers[['6']]
best <- chosen[chosen$Top <= 10,]
aucs <- getMarkerEffects(best, prefix="AUC")
rownames(aucs) <- best$SYMBOL
library(pheatmap)
pheatmap(aucs, color=viridis::plasma(100))
Session Info
R version 4.4.0 beta (2024-04-15 r86425)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] pheatmap_1.0.12 scran_1.32.0
[3] scater_1.32.0 ggplot2_3.5.1
[5] scuttle_1.14.0 AnnotationHub_3.12.0
[7] BiocFileCache_2.12.0 dbplyr_2.5.0
[9] ensembldb_2.28.0 AnnotationFilter_1.28.0
[11] GenomicFeatures_1.56.0 AnnotationDbi_1.66.0
[13] scRNAseq_2.18.0 SingleCellExperiment_1.26.0
[15] SummarizedExperiment_1.34.0 Biobase_2.64.0
[17] GenomicRanges_1.56.0 GenomeInfoDb_1.40.0
[19] IRanges_2.38.0 S4Vectors_0.42.0
[21] BiocGenerics_0.50.0 MatrixGenerics_1.16.0
[23] matrixStats_1.3.0 BiocStyle_2.32.0
[25] rebook_1.14.0
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 jsonlite_1.8.8
[3] CodeDepends_0.6.6 magrittr_2.0.3
[5] ggbeeswarm_0.7.2 gypsum_1.0.0
[7] farver_2.1.1 rmarkdown_2.26
[9] BiocIO_1.14.0 zlibbioc_1.50.0
[11] vctrs_0.6.5 memoise_2.0.1
[13] Rsamtools_2.20.0 DelayedMatrixStats_1.26.0
[15] RCurl_1.98-1.14 htmltools_0.5.8.1
[17] S4Arrays_1.4.0 curl_5.2.1
[19] BiocNeighbors_1.22.0 Rhdf5lib_1.26.0
[21] SparseArray_1.4.0 rhdf5_2.48.0
[23] sass_0.4.9 alabaster.base_1.4.0
[25] bslib_0.7.0 alabaster.sce_1.4.0
[27] httr2_1.0.1 cachem_1.0.8
[29] GenomicAlignments_1.40.0 igraph_2.0.3
[31] mime_0.12 lifecycle_1.0.4
[33] pkgconfig_2.0.3 rsvd_1.0.5
[35] Matrix_1.7-0 R6_2.5.1
[37] fastmap_1.1.1 GenomeInfoDbData_1.2.12
[39] digest_0.6.35 colorspace_2.1-0
[41] paws.storage_0.5.0 dqrng_0.3.2
[43] irlba_2.3.5.1 ExperimentHub_2.12.0
[45] RSQLite_2.3.6 beachmat_2.20.0
[47] labeling_0.4.3 filelock_1.0.3
[49] fansi_1.0.6 httr_1.4.7
[51] abind_1.4-5 compiler_4.4.0
[53] bit64_4.0.5 withr_3.0.0
[55] BiocParallel_1.38.0 viridis_0.6.5
[57] DBI_1.2.2 highr_0.10
[59] HDF5Array_1.32.0 alabaster.ranges_1.4.0
[61] alabaster.schemas_1.4.0 rappdirs_0.3.3
[63] DelayedArray_0.30.0 bluster_1.14.0
[65] rjson_0.2.21 tools_4.4.0
[67] vipor_0.4.7 beeswarm_0.4.0
[69] glue_1.7.0 restfulr_0.0.15
[71] rhdf5filters_1.16.0 grid_4.4.0
[73] Rtsne_0.17 cluster_2.1.6
[75] generics_0.1.3 gtable_0.3.5
[77] metapod_1.12.0 ScaledMatrix_1.12.0
[79] BiocSingular_1.20.0 utf8_1.2.4
[81] XVector_0.44.0 ggrepel_0.9.5
[83] BiocVersion_3.19.1 pillar_1.9.0
[85] limma_3.60.0 dplyr_1.1.4
[87] lattice_0.22-6 rtracklayer_1.64.0
[89] bit_4.0.5 tidyselect_1.2.1
[91] paws.common_0.7.2 locfit_1.5-9.9
[93] Biostrings_2.72.0 knitr_1.46
[95] gridExtra_2.3 bookdown_0.39
[97] ProtGenerics_1.36.0 edgeR_4.2.0
[99] xfun_0.43 statmod_1.5.0
[101] UCSC.utils_1.0.0 lazyeval_0.2.2
[103] yaml_2.3.8 evaluate_0.23
[105] codetools_0.2-20 tibble_3.2.1
[107] alabaster.matrix_1.4.0 BiocManager_1.30.22
[109] graph_1.82.0 cli_3.6.2
[111] munsell_0.5.1 jquerylib_0.1.4
[113] Rcpp_1.0.12 dir.expiry_1.12.0
[115] png_0.1-8 XML_3.99-0.16.1
[117] parallel_4.4.0 blob_1.2.4
[119] sparseMatrixStats_1.16.0 bitops_1.0-7
[121] viridisLite_0.4.2 alabaster.se_1.4.0
[123] scales_1.3.0 purrr_1.0.2
[125] crayon_1.5.2 rlang_1.1.3
[127] cowplot_1.1.3 KEGGREST_1.44.0
References
Grun, D., M. J. Muraro, J. C. Boisset, K. Wiebrands, A. Lyubimova, G. Dharmadhikari, M. van den Born, et al. 2016. “De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data.” Cell Stem Cell 19 (2): 266–77.