Clustering and cell annotation
Source:vignettes/Clustering_and_cell_annotation.Rmd
Clustering_and_cell_annotation.Rmd
In this vignette we showcase cyCONDOR
functions for
clustering and cell annotation.
Load an example dataset
condor <- readRDS("../.test_files/conodr_example_016.rds")
The cyCONDOR
ecosystem provides different clustering
methods, Phenograph and FlowSOM. It also provides a convenient way to
assign cell annotations to the clustered object.
The clustering functions take the condor
object as
fcd
input and the matrix to be used for the calculation of
the clusters as data_slot
input (e.g. orig
or
norm
). We recommend using the PCA coordinates as
input_type
to compensate for fluctuations in the expression
data. By defining a prefix
which gets incorporated into the
slot name of the output, each function can be run with different
settings and the results will be saved accordingly. The functions return
a fcd with an additional data frame corresponding to the calculated
clustering saved in fcd$clustering
. The name of the output
consists of the prefix
(if given), the clustering method
and the data_slot
.
Additionally, when using the expression matrix as input, the user has
the option to specifically state which markers should be used for the
calculation by listing them under markers
. The list can be
either written out manually or be extracted directly from the
condor
object using the implemented functions
measured_markers
and used_markers
By default
all available markers from the condor object are used. If the
discard
option is set to TRUE, all markers
except the ones listed under markers
are
used for calculation. This enables the exclusion of single markers. When
using pca
as input is possible to specify the number of PCs
to be used.
Phenograph clustering
runPhenograph
is based on the package
Rphenoannoy
, an optimized version of
Rphenograph
. This clustering method is designed for
high-dimensional single-cell data analysis incorporating the approximate
k-nearest neighbor (kNN) technique for graph construction. The
k
parameter defines the number of nearest neighbors to be
used for the nearest-neighbor graph, with higher values resulting in
fewer clusters. It can be useful to try out different settings here to
get the desired cluster resolution. A seed can be set to ensure
reproducibility of the clustering.
For more details see: Chen H (2015). “Rphenograph: R implementation of the phenograph algorithm”. R package version 0.99.1. https://github.com/JinmiaoChenLab/Rphenograph Stuchly J (2020). “Rphenoannoy: R implementation of the phenograph algorithm - approximate KNN modification, based on Rphenograph package”. R package version 0.1.0. https://github.com/stuchly/Rphenoannoy
condor <- runPhenograph(fcd = condor,
input_type = "pca",
data_slot = "orig",
k = 60,
seed = 91)
## Run Rphenograph starts:
## -Input data of 59049 rows and 28 columns
## -k is set to 60
## Finding nearest neighbors...DONE ~ 38.809 s
## Compute jaccard coefficient between nearest-neighbor sets...
## Presorting knn...
## presorting DONE ~ 2.927 s
## Start jaccard
## DONE ~ 3.185 s
## Build undirected graph from the weighted links...DONE ~ 2.165 s
## Run louvain clustering on the graph ...DONE ~ 10.309 s
## Run Rphenograph DONE, totally takes 54.468s.
## Return a community class
## -Modularity value: 0.8749651
## -Number of clusters: 25
The output of the phenograph clustering can be accessed with
condor$clustering$phenograph_pca_orig_k_60
.
FlowSOM clustering
runFlowSOM
provides a fast algorithm to cluster a high
number of cells. The nClusters
parameter defines the final
number of clusters to be generated. A seed can be set to ensure
reproducibility of the clustering.
For more details see: Van Gassen S et al. (2015) “FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data.” Cytom Part J Int Soc Anal Cytol 87: 636-645. https://onlinelibrary.wiley.com/doi/full/10.1002/cyto.a.22625
condor <- runFlowSOM(fcd = condor,
input_type = "pca",
data_slot = "orig",
nClusters = 15,
seed = 91,
ret_model = TRUE)
## Building SOM
## Mapping data to SOM
## Building MST
The output of the FlowSOM clustering can be accessed with
condor$clustering$FlowSOM_pca_orig_k_60
.
Metaclustering
Each cluster can be now labeled according to the specific cell type
with the metaclustering
function. This function takes the
condor object (fcd
) and the cluster_slot
as
input. The cluster_var_new
parameter names the new column
containing the cell types. The metacluster
parameter is a
named vector acting as translation table to annotate each cell
cluster.
condor <- metaclustering(fcd = condor,
cluster_slot = "phenograph_pca_orig_k_60",
cluster_var = "Phenograph",
cluster_var_new = "metaclusters",
metaclusters = c("1" = "Classical Monocytes",
"2" = "CD4 CD45RA+ CD127+",
"3" = "CD8 CD45RA+ CD127+",
"4" = "NK dim",
"5" = "CD8 CD45RA+ CD127-",
"6" = "Classical Monocytes",
"7" = "Unconventional T cells",
"8" = "CD4 CD45RA- CD127+",
"9" = "CD16+ Monocytes",
"10" = "CD4 CD127-",
"11" = "Classical Monocytes",
"12" = "CD8 CD45RA- CD127+",
"13" = "CD8 CD45RA- CD127+",
"14" = "NK bright",
"15" = "CD8 CD45RA+ CD127-",
"16" = "CD4 CD25+",
"17" = "B cells",
"18" = "Unconventional T cells",
"19" = "Classical Monocytes",
"20" = "pDCs",
"21" = "CD8 CD45RA+ CD127+",
"22" = "Basophils",
"23" = "Mixed",
"24" = "B cells",
"25" = "NK bright"))
## cluster metacluster
## 1 1 Classical Monocytes
## 2 2 CD4 CD45RA+ CD127+
## 3 3 CD8 CD45RA+ CD127+
## 4 4 NK dim
## 5 5 CD8 CD45RA+ CD127-
## 6 6 Classical Monocytes
## 7 7 Unconventional T cells
## 8 8 CD4 CD45RA- CD127+
## 9 9 CD16+ Monocytes
## 10 10 CD4 CD127-
## 11 11 Classical Monocytes
## 12 12 CD8 CD45RA- CD127+
## 13 13 CD8 CD45RA- CD127+
## 14 14 NK bright
## 15 15 CD8 CD45RA+ CD127-
## 16 16 CD4 CD25+
## 17 17 B cells
## 18 18 Unconventional T cells
## 19 19 Classical Monocytes
## 20 20 pDCs
## 21 21 CD8 CD45RA+ CD127+
## 22 22 Basophils
## 23 23 Mixed
## 24 24 B cells
## 25 25 NK bright
Session Info
info <- sessionInfo()
info
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] cyCONDOR_0.2.1
##
## loaded via a namespace (and not attached):
## [1] IRanges_2.34.1 Rmisc_1.5.1
## [3] urlchecker_1.0.1 nnet_7.3-19
## [5] CytoNorm_2.0.1 TH.data_1.1-2
## [7] vctrs_0.6.4 digest_0.6.33
## [9] png_0.1-8 shape_1.4.6
## [11] proxy_0.4-27 slingshot_2.8.0
## [13] ggrepel_0.9.4 parallelly_1.36.0
## [15] MASS_7.3-60 pkgdown_2.0.7
## [17] reshape2_1.4.4 httpuv_1.6.12
## [19] foreach_1.5.2 BiocGenerics_0.46.0
## [21] withr_2.5.1 ggrastr_1.0.2
## [23] xfun_0.40 ggpubr_0.6.0
## [25] ellipsis_0.3.2 survival_3.5-7
## [27] memoise_2.0.1 hexbin_1.28.3
## [29] ggbeeswarm_0.7.2 RProtoBufLib_2.12.1
## [31] princurve_2.1.6 profvis_0.3.8
## [33] ggsci_3.0.0 systemfonts_1.0.5
## [35] ragg_1.2.6 zoo_1.8-12
## [37] GlobalOptions_0.1.2 DEoptimR_1.1-3
## [39] Formula_1.2-5 prettyunits_1.2.0
## [41] promises_1.2.1 scatterplot3d_0.3-44
## [43] rstatix_0.7.2 globals_0.16.2
## [45] ps_1.7.5 rstudioapi_0.15.0
## [47] miniUI_0.1.1.1 generics_0.1.3
## [49] ggcyto_1.28.1 base64enc_0.1-3
## [51] processx_3.8.2 curl_5.1.0
## [53] S4Vectors_0.38.2 zlibbioc_1.46.0
## [55] flowWorkspace_4.12.2 polyclip_1.10-6
## [57] randomForest_4.7-1.1 GenomeInfoDbData_1.2.10
## [59] RBGL_1.76.0 ncdfFlow_2.46.0
## [61] RcppEigen_0.3.3.9.4 xtable_1.8-4
## [63] stringr_1.5.0 desc_1.4.2
## [65] doParallel_1.0.17 evaluate_0.22
## [67] S4Arrays_1.0.6 hms_1.1.3
## [69] glmnet_4.1-8 GenomicRanges_1.52.1
## [71] irlba_2.3.5.1 colorspace_2.1-0
## [73] harmony_1.1.0 reticulate_1.34.0
## [75] readxl_1.4.3 magrittr_2.0.3
## [77] lmtest_0.9-40 readr_2.1.4
## [79] Rgraphviz_2.44.0 later_1.3.1
## [81] lattice_0.22-5 future.apply_1.11.0
## [83] robustbase_0.99-0 XML_3.99-0.15
## [85] cowplot_1.1.1 matrixStats_1.1.0
## [87] xts_0.13.1 class_7.3-22
## [89] Hmisc_5.1-1 pillar_1.9.0
## [91] nlme_3.1-163 iterators_1.0.14
## [93] compiler_4.3.1 RSpectra_0.16-1
## [95] stringi_1.7.12 gower_1.0.1
## [97] minqa_1.2.6 SummarizedExperiment_1.30.2
## [99] lubridate_1.9.3 devtools_2.4.5
## [101] CytoML_2.12.0 plyr_1.8.9
## [103] crayon_1.5.2 abind_1.4-5
## [105] locfit_1.5-9.8 sp_2.1-1
## [107] sandwich_3.0-2 pcaMethods_1.92.0
## [109] dplyr_1.1.3 codetools_0.2-19
## [111] multcomp_1.4-25 textshaping_0.3.7
## [113] recipes_1.0.8 openssl_2.1.1
## [115] Rphenograph_0.99.1 TTR_0.24.3
## [117] bslib_0.5.1 e1071_1.7-13
## [119] destiny_3.14.0 GetoptLong_1.0.5
## [121] ggplot.multistats_1.0.0 mime_0.12
## [123] splines_4.3.1 circlize_0.4.15
## [125] Rcpp_1.0.11 sparseMatrixStats_1.12.2
## [127] cellranger_1.1.0 knitr_1.44
## [129] utf8_1.2.4 clue_0.3-65
## [131] lme4_1.1-35.1 fs_1.6.3
## [133] listenv_0.9.0 checkmate_2.3.0
## [135] DelayedMatrixStats_1.22.6 pkgbuild_1.4.2
## [137] ggsignif_0.6.4 tibble_3.2.1
## [139] Matrix_1.6-1.1 rpart.plot_3.1.1
## [141] callr_3.7.3 tzdb_0.4.0
## [143] tweenr_2.0.2 pkgconfig_2.0.3
## [145] pheatmap_1.0.12 tools_4.3.1
## [147] cachem_1.0.8 smoother_1.1
## [149] fastmap_1.1.1 rmarkdown_2.25
## [151] scales_1.2.1 grid_4.3.1
## [153] usethis_2.2.2 broom_1.0.5
## [155] sass_0.4.7 graph_1.78.0
## [157] carData_3.0-5 RANN_2.6.1
## [159] rpart_4.1.21 farver_2.1.1
## [161] yaml_2.3.7 MatrixGenerics_1.12.3
## [163] foreign_0.8-85 ggthemes_4.2.4
## [165] cli_3.6.1 purrr_1.0.2
## [167] stats4_4.3.1 lifecycle_1.0.3
## [169] uwot_0.1.16 askpass_1.2.0
## [171] caret_6.0-94 Biobase_2.60.0
## [173] mvtnorm_1.2-3 lava_1.7.3
## [175] sessioninfo_1.2.2 backports_1.4.1
## [177] cytolib_2.12.1 timechange_0.2.0
## [179] gtable_0.3.4 rjson_0.2.21
## [181] umap_0.2.10.0 ggridges_0.5.4
## [183] Rphenoannoy_0.1.0 parallel_4.3.1
## [185] pROC_1.18.5 limma_3.56.2
## [187] jsonlite_1.8.7 edgeR_3.42.4
## [189] RcppHNSW_0.5.0 bitops_1.0-7
## [191] ggplot2_3.4.4 Rtsne_0.16
## [193] FlowSOM_2.8.0 ranger_0.16.0
## [195] flowCore_2.12.2 jquerylib_0.1.4
## [197] timeDate_4022.108 shiny_1.7.5.1
## [199] ConsensusClusterPlus_1.64.0 htmltools_0.5.6.1
## [201] diffcyt_1.20.0 glue_1.6.2
## [203] XVector_0.40.0 VIM_6.2.2
## [205] RCurl_1.98-1.13 rprojroot_2.0.3
## [207] gridExtra_2.3 boot_1.3-28.1
## [209] TrajectoryUtils_1.8.0 igraph_1.5.1
## [211] R6_2.5.1 tidyr_1.3.0
## [213] SingleCellExperiment_1.22.0 vcd_1.4-11
## [215] cluster_2.1.4 pkgload_1.3.3
## [217] GenomeInfoDb_1.36.4 ipred_0.9-14
## [219] nloptr_2.0.3 DelayedArray_0.26.7
## [221] tidyselect_1.2.0 vipor_0.4.5
## [223] htmlTable_2.4.2 ggforce_0.4.1
## [225] CytoDx_1.20.0 car_3.1-2
## [227] future_1.33.0 ModelMetrics_1.2.2.2
## [229] munsell_0.5.0 laeken_0.5.2
## [231] data.table_1.14.8 htmlwidgets_1.6.2
## [233] ComplexHeatmap_2.16.0 RColorBrewer_1.1-3
## [235] rlang_1.1.1 remotes_2.4.2.1
## [237] colorRamps_2.3.1 ggnewscale_0.4.9
## [239] fansi_1.0.5 hardhat_1.3.0
## [241] beeswarm_0.4.0 prodlim_2023.08.28