Clustering and cell annotation
Source:vignettes/Clustering_and_cell_annotation.Rmd
Clustering_and_cell_annotation.RmdIn this vignette we showcase cyCONDOR functions for
clustering and cell annotation.
Load an example dataset
condor <- readRDS("../.test_files/conodr_example_016.rds")The cyCONDOR ecosystem provides different clustering
methods, Phenograph and FlowSOM. It also provides a convenient way to
assign cell annotations to the clustered object.
The clustering functions take the condor object as
fcd input and the matrix to be used for the calculation of
the clusters as data_slot input (e.g. orig or
norm). We recommend using the PCA coordinates as
input_type to compensate for fluctuations in the expression
data. By defining a prefix which gets incorporated into the
slot name of the output, each function can be run with different
settings and the results will be saved accordingly. The functions return
a fcd with an additional data frame corresponding to the calculated
clustering saved in fcd$clustering. The name of the output
consists of the prefix (if given), the clustering method
and the data_slot.
Additionally, when using the expression matrix as input, the user has
the option to specifically state which markers should be used for the
calculation by listing them under markers. The list can be
either written out manually or be extracted directly from the
condor object using the implemented functions
measured_markers and used_markers By default
all available markers from the condor object are used. If the
discard option is set to TRUE, all markers
except the ones listed under markers are
used for calculation. This enables the exclusion of single markers. When
using pca as input is possible to specify the number of PCs
to be used.
Phenograph clustering
runPhenograph is based on the package
Rphenoannoy, an optimized version of
Rphenograph. This clustering method is designed for
high-dimensional single-cell data analysis incorporating the approximate
k-nearest neighbor (kNN) technique for graph construction. The
k parameter defines the number of nearest neighbors to be
used for the nearest-neighbor graph, with higher values resulting in
fewer clusters. It can be useful to try out different settings here to
get the desired cluster resolution. A seed can be set to ensure
reproducibility of the clustering.
For more details see: Chen H (2015). “Rphenograph: R implementation of the phenograph algorithm”. R package version 0.99.1. https://github.com/JinmiaoChenLab/Rphenograph Stuchly J (2020). “Rphenoannoy: R implementation of the phenograph algorithm - approximate KNN modification, based on Rphenograph package”. R package version 0.1.0. https://github.com/stuchly/Rphenoannoy
condor <- runPhenograph(fcd = condor,
input_type = "pca",
data_slot = "orig",
k = 60,
seed = 91)## Run Rphenograph starts:
## -Input data of 59049 rows and 28 columns
## -k is set to 60
## Finding nearest neighbors...DONE ~ 49.669 s
## Compute jaccard coefficient between nearest-neighbor sets...
## Presorting knn...
## presorting DONE ~ 2.439 s
## Start jaccard
## DONE ~ 3.64 s
## Build undirected graph from the weighted links...DONE ~ 3.254 s
## Run louvain clustering on the graph ...DONE ~ 14.2 s
## Run Rphenograph DONE, totally takes 70.763s.
## Return a community class
## -Modularity value: 0.8746552
## -Number of clusters: 25
The output of the phenograph clustering can be accessed with
condor$clustering$phenograph_pca_orig_k_60.
FlowSOM clustering
runFlowSOM provides a fast algorithm to cluster a high
number of cells. The nClusters parameter defines the final
number of clusters to be generated. A seed can be set to ensure
reproducibility of the clustering.
For more details see: Van Gassen S et al. (2015) “FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data.” Cytom Part J Int Soc Anal Cytol 87: 636-645. https://onlinelibrary.wiley.com/doi/full/10.1002/cyto.a.22625
condor <- runFlowSOM(fcd = condor,
input_type = "pca",
data_slot = "orig",
nClusters = 15,
seed = 91,
ret_model = TRUE)## Building SOM
## Mapping data to SOM
## Building MST
The output of the FlowSOM clustering can be accessed with
condor$clustering$FlowSOM_pca_orig_k_60.
Metaclustering
Each cluster can be now labeled according to the specific cell type
with the metaclustering function. This function takes the
condor object (fcd) and the cluster_slot as
input. The cluster_var_new parameter names the new column
containing the cell types. The metacluster parameter is a
named vector acting as translation table to annotate each cell
cluster.
condor <- metaclustering(fcd = condor,
cluster_slot = "phenograph_pca_orig_k_60",
cluster_var = "Phenograph",
cluster_var_new = "metaclusters",
metaclusters = c("1" = "Classical Monocytes",
"2" = "CD4 CD45RA+ CD127+",
"3" = "CD8 CD45RA+ CD127+",
"4" = "NK dim",
"5" = "CD8 CD45RA+ CD127-",
"6" = "Classical Monocytes",
"7" = "Unconventional T cells",
"8" = "CD4 CD45RA- CD127+",
"9" = "CD16+ Monocytes",
"10" = "CD4 CD127-",
"11" = "Classical Monocytes",
"12" = "CD8 CD45RA- CD127+",
"13" = "CD8 CD45RA- CD127+",
"14" = "NK bright",
"15" = "CD8 CD45RA+ CD127-",
"16" = "CD4 CD25+",
"17" = "B cells",
"18" = "Unconventional T cells",
"19" = "Classical Monocytes",
"20" = "pDCs",
"21" = "CD8 CD45RA+ CD127+",
"22" = "Basophils",
"23" = "Mixed",
"24" = "B cells",
"25" = "NK bright"))## cluster metacluster
## 1 1 Classical Monocytes
## 2 2 CD4 CD45RA+ CD127+
## 3 3 CD8 CD45RA+ CD127+
## 4 4 NK dim
## 5 5 CD8 CD45RA+ CD127-
## 6 6 Classical Monocytes
## 7 7 Unconventional T cells
## 8 8 CD4 CD45RA- CD127+
## 9 9 CD16+ Monocytes
## 10 10 CD4 CD127-
## 11 11 Classical Monocytes
## 12 12 CD8 CD45RA- CD127+
## 13 13 CD8 CD45RA- CD127+
## 14 14 NK bright
## 15 15 CD8 CD45RA+ CD127-
## 16 16 CD4 CD25+
## 17 17 B cells
## 18 18 Unconventional T cells
## 19 19 Classical Monocytes
## 20 20 pDCs
## 21 21 CD8 CD45RA+ CD127+
## 22 22 Basophils
## 23 23 Mixed
## 24 24 B cells
## 25 25 NK bright
Session Info
info <- sessionInfo()
info## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] cyCONDOR_0.3.1
##
## loaded via a namespace (and not attached):
## [1] IRanges_2.40.1 Rmisc_1.5.1
## [3] urlchecker_1.0.1 nnet_7.3-20
## [5] CytoNorm_2.0.1 TH.data_1.1-3
## [7] vctrs_0.6.5 digest_0.6.37
## [9] png_0.1-8 shape_1.4.6.1
## [11] proxy_0.4-27 slingshot_2.14.0
## [13] ggrepel_0.9.6 corrplot_0.95
## [15] parallelly_1.45.0 MASS_7.3-65
## [17] pkgdown_2.1.3 reshape2_1.4.4
## [19] httpuv_1.6.16 foreach_1.5.2
## [21] BiocGenerics_0.52.0 withr_3.0.2
## [23] ggrastr_1.0.2 xfun_0.52
## [25] ggpubr_0.6.1 ellipsis_0.3.2
## [27] survival_3.8-3 memoise_2.0.1
## [29] hexbin_1.28.5 ggbeeswarm_0.7.2
## [31] RProtoBufLib_2.18.0 princurve_2.1.6
## [33] profvis_0.4.0 ggsci_3.2.0
## [35] systemfonts_1.2.3 ragg_1.4.0
## [37] zoo_1.8-14 GlobalOptions_0.1.2
## [39] DEoptimR_1.1-3-1 Formula_1.2-5
## [41] promises_1.3.3 scatterplot3d_0.3-44
## [43] httr_1.4.7 rstatix_0.7.2
## [45] globals_0.18.0 rstudioapi_0.17.1
## [47] UCSC.utils_1.2.0 miniUI_0.1.2
## [49] generics_0.1.4 ggcyto_1.34.0
## [51] base64enc_0.1-3 curl_6.4.0
## [53] S4Vectors_0.44.0 zlibbioc_1.52.0
## [55] flowWorkspace_4.18.1 polyclip_1.10-7
## [57] randomForest_4.7-1.2 GenomeInfoDbData_1.2.13
## [59] SparseArray_1.6.2 RBGL_1.82.0
## [61] ncdfFlow_2.52.1 RcppEigen_0.3.4.0.2
## [63] xtable_1.8-4 stringr_1.5.1
## [65] desc_1.4.3 doParallel_1.0.17
## [67] evaluate_1.0.4 S4Arrays_1.6.0
## [69] hms_1.1.3 glmnet_4.1-9
## [71] GenomicRanges_1.58.0 irlba_2.3.5.1
## [73] colorspace_2.1-1 harmony_1.2.3
## [75] reticulate_1.42.0 readxl_1.4.5
## [77] magrittr_2.0.3 lmtest_0.9-40
## [79] readr_2.1.5 Rgraphviz_2.50.0
## [81] later_1.4.2 lattice_0.22-7
## [83] future.apply_1.20.0 robustbase_0.99-4-1
## [85] XML_3.99-0.18 cowplot_1.2.0
## [87] matrixStats_1.5.0 xts_0.14.1
## [89] class_7.3-23 Hmisc_5.2-3
## [91] pillar_1.11.0 nlme_3.1-168
## [93] iterators_1.0.14 compiler_4.4.2
## [95] RSpectra_0.16-2 stringi_1.8.7
## [97] gower_1.0.2 minqa_1.2.8
## [99] SummarizedExperiment_1.36.0 lubridate_1.9.4
## [101] devtools_2.4.5 CytoML_2.18.3
## [103] plyr_1.8.9 crayon_1.5.3
## [105] abind_1.4-8 locfit_1.5-9.12
## [107] sp_2.2-0 sandwich_3.1-1
## [109] pcaMethods_1.98.0 dplyr_1.1.4
## [111] codetools_0.2-20 multcomp_1.4-28
## [113] textshaping_1.0.1 recipes_1.3.1
## [115] openssl_2.3.3 Rphenograph_0.99.1
## [117] TTR_0.24.4 bslib_0.9.0
## [119] e1071_1.7-16 destiny_3.20.0
## [121] GetoptLong_1.0.5 ggplot.multistats_1.0.1
## [123] mime_0.13 splines_4.4.2
## [125] circlize_0.4.16 Rcpp_1.1.0
## [127] sparseMatrixStats_1.18.0 cellranger_1.1.0
## [129] knitr_1.50 clue_0.3-66
## [131] lme4_1.1-37 fs_1.6.6
## [133] listenv_0.9.1 checkmate_2.3.2
## [135] DelayedMatrixStats_1.28.1 Rdpack_2.6.4
## [137] pkgbuild_1.4.8 ggsignif_0.6.4
## [139] tibble_3.3.0 Matrix_1.7-3
## [141] rpart.plot_3.1.2 statmod_1.5.0
## [143] tzdb_0.5.0 tweenr_2.0.3
## [145] pkgconfig_2.0.3 pheatmap_1.0.13
## [147] tools_4.4.2 cachem_1.1.0
## [149] rbibutils_2.3 smoother_1.3
## [151] fastmap_1.2.0 rmarkdown_2.29
## [153] scales_1.4.0 grid_4.4.2
## [155] usethis_3.1.0 broom_1.0.8
## [157] sass_0.4.10 graph_1.84.1
## [159] carData_3.0-5 RANN_2.6.2
## [161] rpart_4.1.24 farver_2.1.2
## [163] reformulas_0.4.1 yaml_2.3.10
## [165] MatrixGenerics_1.18.1 foreign_0.8-90
## [167] ggthemes_5.1.0 cli_3.6.5
## [169] purrr_1.1.0 stats4_4.4.2
## [171] lifecycle_1.0.4 uwot_0.2.3
## [173] askpass_1.2.1 caret_7.0-1
## [175] Biobase_2.66.0 mvtnorm_1.3-3
## [177] lava_1.8.1 sessioninfo_1.2.3
## [179] backports_1.5.0 cytolib_2.18.2
## [181] timechange_0.3.0 gtable_0.3.6
## [183] rjson_0.2.23 umap_0.2.10.0
## [185] ggridges_0.5.6 Rphenoannoy_0.1.0
## [187] parallel_4.4.2 pROC_1.18.5
## [189] limma_3.62.2 jsonlite_2.0.0
## [191] edgeR_4.4.2 RcppHNSW_0.6.0
## [193] ggplot2_3.5.2 Rtsne_0.17
## [195] FlowSOM_2.14.0 ranger_0.17.0
## [197] flowCore_2.18.0 jquerylib_0.1.4
## [199] timeDate_4041.110 shiny_1.11.1
## [201] ConsensusClusterPlus_1.70.0 htmltools_0.5.8.1
## [203] diffcyt_1.26.1 glue_1.8.0
## [205] XVector_0.46.0 VIM_6.2.2
## [207] gridExtra_2.3 boot_1.3-31
## [209] TrajectoryUtils_1.14.0 igraph_2.1.4
## [211] R6_2.6.1 tidyr_1.3.1
## [213] SingleCellExperiment_1.28.1 vcd_1.4-13
## [215] cluster_2.1.8.1 pkgload_1.4.0
## [217] GenomeInfoDb_1.42.3 ipred_0.9-15
## [219] nloptr_2.2.1 DelayedArray_0.32.0
## [221] tidyselect_1.2.1 vipor_0.4.7
## [223] htmlTable_2.4.3 ggforce_0.5.0
## [225] CytoDx_1.26.0 car_3.1-3
## [227] future_1.58.0 ModelMetrics_1.2.2.2
## [229] laeken_0.5.3 data.table_1.17.8
## [231] htmlwidgets_1.6.4 ComplexHeatmap_2.22.0
## [233] RColorBrewer_1.1-3 rlang_1.1.6
## [235] remotes_2.5.0 colorRamps_2.3.4
## [237] ggnewscale_0.5.2 hardhat_1.4.1
## [239] beeswarm_0.4.0 prodlim_2025.04.28