Skip to contents

In this vignette we showcase cyCONDOR functions for clustering and cell annotation.

Load an example dataset

condor <- readRDS("../.test_files/conodr_example_016.rds")

The cyCONDOR ecosystem provides different clustering methods, Phenograph and FlowSOM. It also provides a convenient way to assign cell annotations to the clustered object.

The clustering functions take the condor object as fcd input and the matrix to be used for the calculation of the clusters as data_slot input (e.g. orig or norm). We recommend using the PCA coordinates as input_type to compensate for fluctuations in the expression data. By defining a prefix which gets incorporated into the slot name of the output, each function can be run with different settings and the results will be saved accordingly. The functions return a fcd with an additional data frame corresponding to the calculated clustering saved in fcd$clustering. The name of the output consists of the prefix (if given), the clustering method and the data_slot.

Additionally, when using the expression matrix as input, the user has the option to specifically state which markers should be used for the calculation by listing them under markers. The list can be either written out manually or be extracted directly from the condor object using the implemented functions measured_markers and used_markers By default all available markers from the condor object are used. If the discard option is set to TRUE, all markers except the ones listed under markers are used for calculation. This enables the exclusion of single markers. When using pca as input is possible to specify the number of PCs to be used.

Phenograph clustering

runPhenograph is based on the package Rphenoannoy, an optimized version of Rphenograph. This clustering method is designed for high-dimensional single-cell data analysis incorporating the approximate k-nearest neighbor (kNN) technique for graph construction. The k parameter defines the number of nearest neighbors to be used for the nearest-neighbor graph, with higher values resulting in fewer clusters. It can be useful to try out different settings here to get the desired cluster resolution. A seed can be set to ensure reproducibility of the clustering.

For more details see: Chen H (2015). “Rphenograph: R implementation of the phenograph algorithm”. R package version 0.99.1. https://github.com/JinmiaoChenLab/Rphenograph Stuchly J (2020). “Rphenoannoy: R implementation of the phenograph algorithm - approximate KNN modification, based on Rphenograph package”. R package version 0.1.0. https://github.com/stuchly/Rphenoannoy

condor <- runPhenograph(fcd = condor, 
                        input_type = "pca", 
                        data_slot = "orig", 
                        k = 60, 
                        seed = 91)
## Run Rphenograph starts:
##   -Input data of 59049 rows and 28 columns
##   -k is set to 60
##   Finding nearest neighbors...DONE ~ 38.594 s
##   Compute jaccard coefficient between nearest-neighbor sets...
## Presorting knn...
## presorting DONE ~ 2.915 s
##   Start jaccard
## DONE ~ 3.31 s
##   Build undirected graph from the weighted links...DONE ~ 2.3 s
##   Run louvain clustering on the graph ...DONE ~ 10.106 s
## Run Rphenograph DONE, totally takes 54.31s.
##   Return a community class
##   -Modularity value: 0.8749651 
##   -Number of clusters: 25

The output of the phenograph clustering can be accessed with condor$clustering$phenograph_pca_orig_k_60.

FlowSOM clustering

runFlowSOM provides a fast algorithm to cluster a high number of cells. The nClusters parameter defines the final number of clusters to be generated. A seed can be set to ensure reproducibility of the clustering.

For more details see: Van Gassen S et al. (2015) “FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data.” Cytom Part J Int Soc Anal Cytol 87: 636-645. https://onlinelibrary.wiley.com/doi/full/10.1002/cyto.a.22625

condor <- runFlowSOM(fcd = condor, 
                     input_type = "pca", 
                     data_slot = "orig", 
                     nClusters = 15, 
                     seed = 91, 
                     ret_model = TRUE)
## Building SOM
## Mapping data to SOM
## Building MST

The output of the FlowSOM clustering can be accessed with condor$clustering$FlowSOM_pca_orig_k_60.

Metaclustering

Each cluster can be now labeled according to the specific cell type with the metaclustering function. This function takes the condor object (fcd) and the cluster_slot as input. The cluster_var_new parameter names the new column containing the cell types. The metacluster parameter is a named vector acting as translation table to annotate each cell cluster.

condor <- metaclustering(fcd = condor, 
                         cluster_slot = "phenograph_pca_orig_k_60", 
                         cluster_var = "Phenograph", 
                         cluster_var_new = "metaclusters", 
                         metaclusters = c("1" = "Classical Monocytes",
                                          "2" = "CD4 CD45RA+ CD127+",
                                          "3" = "CD8 CD45RA+ CD127+", 
                                          "4" = "NK dim",
                                          "5" = "CD8 CD45RA+ CD127-",
                                          "6" = "Classical Monocytes",
                                          "7" = "Unconventional T cells", 
                                          "8" = "CD4 CD45RA- CD127+",
                                          "9" = "CD16+ Monocytes",
                                          "10" = "CD4 CD127-",
                                          "11" = "Classical Monocytes", 
                                          "12" = "CD8 CD45RA- CD127+", 
                                          "13" = "CD8 CD45RA- CD127+",
                                          "14" = "NK bright",
                                          "15" = "CD8 CD45RA+ CD127-",
                                          "16" = "CD4 CD25+",
                                          "17" = "B cells",
                                          "18" = "Unconventional T cells",
                                          "19" = "Classical Monocytes",
                                          "20" = "pDCs",
                                          "21" = "CD8 CD45RA+ CD127+",
                                          "22" = "Basophils",
                                          "23" = "Mixed",
                                          "24" = "B cells",
                                          "25" = "NK bright"))
##    cluster            metacluster
## 1        1    Classical Monocytes
## 2        2     CD4 CD45RA+ CD127+
## 3        3     CD8 CD45RA+ CD127+
## 4        4                 NK dim
## 5        5     CD8 CD45RA+ CD127-
## 6        6    Classical Monocytes
## 7        7 Unconventional T cells
## 8        8     CD4 CD45RA- CD127+
## 9        9        CD16+ Monocytes
## 10      10             CD4 CD127-
## 11      11    Classical Monocytes
## 12      12     CD8 CD45RA- CD127+
## 13      13     CD8 CD45RA- CD127+
## 14      14              NK bright
## 15      15     CD8 CD45RA+ CD127-
## 16      16              CD4 CD25+
## 17      17                B cells
## 18      18 Unconventional T cells
## 19      19    Classical Monocytes
## 20      20                   pDCs
## 21      21     CD8 CD45RA+ CD127+
## 22      22              Basophils
## 23      23                  Mixed
## 24      24                B cells
## 25      25              NK bright

Session Info

info <- sessionInfo()

info
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] cyCONDOR_0.2.0
## 
## loaded via a namespace (and not attached):
##   [1] IRanges_2.34.1              Rmisc_1.5.1                
##   [3] urlchecker_1.0.1            nnet_7.3-19                
##   [5] CytoNorm_2.0.1              TH.data_1.1-2              
##   [7] vctrs_0.6.4                 digest_0.6.33              
##   [9] png_0.1-8                   shape_1.4.6                
##  [11] proxy_0.4-27                slingshot_2.8.0            
##  [13] ggrepel_0.9.4               parallelly_1.36.0          
##  [15] MASS_7.3-60                 pkgdown_2.0.7              
##  [17] reshape2_1.4.4              httpuv_1.6.12              
##  [19] foreach_1.5.2               BiocGenerics_0.46.0        
##  [21] withr_2.5.1                 ggrastr_1.0.2              
##  [23] xfun_0.40                   ggpubr_0.6.0               
##  [25] ellipsis_0.3.2              survival_3.5-7             
##  [27] memoise_2.0.1               hexbin_1.28.3              
##  [29] ggbeeswarm_0.7.2            RProtoBufLib_2.12.1        
##  [31] princurve_2.1.6             profvis_0.3.8              
##  [33] ggsci_3.0.0                 systemfonts_1.0.5          
##  [35] ragg_1.2.6                  zoo_1.8-12                 
##  [37] GlobalOptions_0.1.2         DEoptimR_1.1-3             
##  [39] Formula_1.2-5               prettyunits_1.2.0          
##  [41] promises_1.2.1              scatterplot3d_0.3-44       
##  [43] rstatix_0.7.2               globals_0.16.2             
##  [45] ps_1.7.5                    rstudioapi_0.15.0          
##  [47] miniUI_0.1.1.1              generics_0.1.3             
##  [49] ggcyto_1.28.1               base64enc_0.1-3            
##  [51] processx_3.8.2              curl_5.1.0                 
##  [53] S4Vectors_0.38.2            zlibbioc_1.46.0            
##  [55] flowWorkspace_4.12.2        polyclip_1.10-6            
##  [57] randomForest_4.7-1.1        GenomeInfoDbData_1.2.10    
##  [59] RBGL_1.76.0                 ncdfFlow_2.46.0            
##  [61] RcppEigen_0.3.3.9.4         xtable_1.8-4               
##  [63] stringr_1.5.0               desc_1.4.2                 
##  [65] doParallel_1.0.17           evaluate_0.22              
##  [67] S4Arrays_1.0.6              hms_1.1.3                  
##  [69] glmnet_4.1-8                GenomicRanges_1.52.1       
##  [71] irlba_2.3.5.1               colorspace_2.1-0           
##  [73] harmony_1.1.0               reticulate_1.34.0          
##  [75] readxl_1.4.3                magrittr_2.0.3             
##  [77] lmtest_0.9-40               readr_2.1.4                
##  [79] Rgraphviz_2.44.0            later_1.3.1                
##  [81] lattice_0.22-5              future.apply_1.11.0        
##  [83] robustbase_0.99-0           XML_3.99-0.15              
##  [85] cowplot_1.1.1               matrixStats_1.1.0          
##  [87] xts_0.13.1                  class_7.3-22               
##  [89] Hmisc_5.1-1                 pillar_1.9.0               
##  [91] nlme_3.1-163                iterators_1.0.14           
##  [93] compiler_4.3.1              RSpectra_0.16-1            
##  [95] stringi_1.7.12              gower_1.0.1                
##  [97] minqa_1.2.6                 SummarizedExperiment_1.30.2
##  [99] lubridate_1.9.3             devtools_2.4.5             
## [101] CytoML_2.12.0               plyr_1.8.9                 
## [103] crayon_1.5.2                abind_1.4-5                
## [105] locfit_1.5-9.8              sp_2.1-1                   
## [107] sandwich_3.0-2              pcaMethods_1.92.0          
## [109] dplyr_1.1.3                 codetools_0.2-19           
## [111] multcomp_1.4-25             textshaping_0.3.7          
## [113] recipes_1.0.8               openssl_2.1.1              
## [115] Rphenograph_0.99.1          TTR_0.24.3                 
## [117] bslib_0.5.1                 e1071_1.7-13               
## [119] destiny_3.14.0              GetoptLong_1.0.5           
## [121] ggplot.multistats_1.0.0     mime_0.12                  
## [123] splines_4.3.1               circlize_0.4.15            
## [125] Rcpp_1.0.11                 sparseMatrixStats_1.12.2   
## [127] cellranger_1.1.0            knitr_1.44                 
## [129] utf8_1.2.4                  clue_0.3-65                
## [131] lme4_1.1-35.1               fs_1.6.3                   
## [133] listenv_0.9.0               checkmate_2.3.0            
## [135] DelayedMatrixStats_1.22.6   pkgbuild_1.4.2             
## [137] ggsignif_0.6.4              tibble_3.2.1               
## [139] Matrix_1.6-1.1              rpart.plot_3.1.1           
## [141] callr_3.7.3                 tzdb_0.4.0                 
## [143] tweenr_2.0.2                pkgconfig_2.0.3            
## [145] pheatmap_1.0.12             tools_4.3.1                
## [147] cachem_1.0.8                smoother_1.1               
## [149] fastmap_1.1.1               rmarkdown_2.25             
## [151] scales_1.2.1                grid_4.3.1                 
## [153] usethis_2.2.2               broom_1.0.5                
## [155] sass_0.4.7                  graph_1.78.0               
## [157] carData_3.0-5               RANN_2.6.1                 
## [159] rpart_4.1.21                farver_2.1.1               
## [161] yaml_2.3.7                  MatrixGenerics_1.12.3      
## [163] foreign_0.8-85              ggthemes_4.2.4             
## [165] cli_3.6.1                   purrr_1.0.2                
## [167] stats4_4.3.1                lifecycle_1.0.3            
## [169] uwot_0.1.16                 askpass_1.2.0              
## [171] caret_6.0-94                Biobase_2.60.0             
## [173] mvtnorm_1.2-3               lava_1.7.3                 
## [175] sessioninfo_1.2.2           backports_1.4.1            
## [177] cytolib_2.12.1              timechange_0.2.0           
## [179] gtable_0.3.4                rjson_0.2.21               
## [181] umap_0.2.10.0               ggridges_0.5.4             
## [183] Rphenoannoy_0.1.0           parallel_4.3.1             
## [185] pROC_1.18.5                 limma_3.56.2               
## [187] jsonlite_1.8.7              edgeR_3.42.4               
## [189] RcppHNSW_0.5.0              bitops_1.0-7               
## [191] ggplot2_3.4.4               Rtsne_0.16                 
## [193] FlowSOM_2.8.0               ranger_0.16.0              
## [195] flowCore_2.12.2             jquerylib_0.1.4            
## [197] timeDate_4022.108           shiny_1.7.5.1              
## [199] ConsensusClusterPlus_1.64.0 htmltools_0.5.6.1          
## [201] diffcyt_1.20.0              glue_1.6.2                 
## [203] XVector_0.40.0              VIM_6.2.2                  
## [205] RCurl_1.98-1.13             rprojroot_2.0.3            
## [207] gridExtra_2.3               boot_1.3-28.1              
## [209] TrajectoryUtils_1.8.0       igraph_1.5.1               
## [211] R6_2.5.1                    tidyr_1.3.0                
## [213] SingleCellExperiment_1.22.0 vcd_1.4-11                 
## [215] cluster_2.1.4               pkgload_1.3.3              
## [217] GenomeInfoDb_1.36.4         ipred_0.9-14               
## [219] nloptr_2.0.3                DelayedArray_0.26.7        
## [221] tidyselect_1.2.0            vipor_0.4.5                
## [223] htmlTable_2.4.2             ggforce_0.4.1              
## [225] CytoDx_1.20.0               car_3.1-2                  
## [227] future_1.33.0               ModelMetrics_1.2.2.2       
## [229] munsell_0.5.0               laeken_0.5.2               
## [231] data.table_1.14.8           htmlwidgets_1.6.2          
## [233] ComplexHeatmap_2.16.0       RColorBrewer_1.1-3         
## [235] rlang_1.1.1                 remotes_2.4.2.1            
## [237] colorRamps_2.3.1            ggnewscale_0.4.9           
## [239] fansi_1.0.5                 hardhat_1.3.0              
## [241] beeswarm_0.4.0              prodlim_2023.08.28