Dimensionality Reduction

library(cyCONDOR)

In this vignette we showcase cyCONDOR functions for dimensionality reduction. We exemplify how to perform Principle Component Analysis (PCA) and calculate Uniform Manifold Approximation and Projection (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE) and Diffusion Map (DM).

All functions need the condor object as fcd input and the data_slot to be used for the calculation. The runPCA function always uses the expr slot for the calculation, for non-linear dimensionality reduction (UMAP, tSNE, DM) the user can decide to use the expr data or the pca result as input_type.

Additionally, the user has the option to specifically state which markers should be used for the calculation by listing them under markers. The list can be either written out manually or be extracted directly from the condor object using the implemented functions measured_markers and used_markers By default all available markers from the condor object are used. If the discard option is set to TRUE, all markers except the ones listed under markers are used for calculation. This enables the exclusion of single markers. When using pca as input is possible to specify the number of PCs to be used.

By defining a prefix which gets incorporated into the slot name of the output, each function can be run with different settings and the results will be saved accordingly.

The functions return a fcd with an additional data frame corresponding to the chosen dimensionality reduction method saved in fcd$reduction_method. The name of the output consists of the prefix (if given) and the data_slot.

Marker selection using the `marker` and `discard` variables

It is possible to specify the markers which should be the basis for the calculation using a combination of the markers variable and the discard flag in all dimensionality reduction functions. markers takes a vector of marker names as an input that should be included (positive selection) or excluded (negative selection). The user can choose to either discard the specified markers by setting the discard flag to TRUE (negative selection) or to keep only the specified markers by using the default setting of the discard flag (positive selection).

The marker names should correspond to a specific column in the expression table and can be given manually or can be extracted from the condor object using the cyCONDOR function used_markers. When performing a marker selection the user should make sure that a prefix for the output name is set to avoid overwriting a previously calculated matrix.

The option of marker selection is implemented in all dimensionality reduction functions but we only demonstrate it for PCA.

Load an example dataset

condor <- readRDS("../.test_files/conodr_example_016.rds")

Principal Component Analysis (PCA)

The calculation of the Principle Components is based on the prcomp function from the R Stats package (https://rdocumentation.org/packages/stats/versions/3.6.2).

condor <- runPCA(fcd = condor,
                 data_slot = "orig",
                 seed = 91)

The output data frame of the PCA can be accessed with condor$pca$orig.

As a demonstration the following code shows a positive and negative selection with the corresponding discard flag setting.

PCA (Positive selection: Specifying the markers to be used as basis for the calculation)

condor <- runPCA(fcd = condor,
                 data_slot = "orig",
                 seed = 91,
                 prefix =  "Tcell",
                 markers = c("CD3", "CD4", "CD8"),
                 discard = FALSE)

The output data frame of the PCA with positive marker selection can be accessed with condor$pca$Tcell_orig.

PCA (Negative selection: Excluding a specific marker from the calculation)

condor <- runPCA(fcd = condor,
                 data_slot = "orig",
                 seed = 91,
                 prefix =  "scatter_exclusion",
                 markers = c("FSC-A", "SSC-A"),
                 discard = TRUE)

The output data frame of the PCA with negative marker selection can be accessed with condor$pca$scatter_exclusion_orig.

UMAP

The calculation of the UMAP is based on the umap function from the uwot package. For more details see: Melville J (2023). “uwot: The Uniform Manifold Approximation and Projection (UMAP) Method for Dimensionality Reduction” https://github.com/jlmelville/uwot.

Besides important metrics that can be set in the uwot umap function (e.g. number of items that define a neighborhood around each point (nNeighbors) and minimum distance between embedded points (min_dist)) the runUMAP function implemented in cyCondor has additional parameters that can be adjusted. Next to the selection of markers and an output prefix the user can specify the number of PCs that should be used for the umap calculation (nPC) and has the option to save the umap model for future data projection (ret_model).

condor <- runUMAP(fcd = condor, 
                  input_type = "pca", 
                  data_slot = "orig", 
                  seed = 91)

The output data frame of the UMAP coordinates can be accessed with condor$umap$pca_orig.

tSNE

The tSNE calculation is based on the function Rtsne from the package Rtsne. The implementation in cyCondor allows for the definition of the perplexity used in the tSNE calculation. This parameter controls how many nearest neighbors should be taken into account when constructing the embedding. The user has the option, similar as in the UMAP function, to select the number of PCs which should be used for the calculation. For more details see: Jesse H. Krijthe (2015). “Rtsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation” https://github.com/jkrijthe/Rtsne.

condor <- runtSNE(fcd = condor, 
                  input_type = "pca", 
                  data_slot = "orig", 
                  seed = 91, 
                  perplexity = 30)

The output data frame of the tSNE coordinates can be accessed with condor$tSNE$pca_orig.

Diffusion Map

The calculation of DM is based on the function DiffusionMap from the package destiny. The number of nearest neighbors to be considered can be specified with k. Here, the user has as well the option to select the number of PCs which should be used for the calculation. For more deatils see: Philipp Angerer et al. (2015). “destiny: diffusion maps for large-scale single-cell data in R.” Helmholtz-Zentrum München.http://bioinformatics.oxfordjournals.org/content/32/8/1241.

condor <- runDM(fcd = subset_fcd(condor, 5000), 
                input_type = "pca", 
                data_slot = "orig", 
                k = 10, 
                seed = 91)

The output data frame of the DM can be accessed with condor$diffmap$pca_orig.

Session Info

info <- sessionInfo()

info

## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] cyCONDOR_0.3.0
## 
## loaded via a namespace (and not attached):
##   [1] IRanges_2.40.1              Rmisc_1.5.1                
##   [3] urlchecker_1.0.1            nnet_7.3-20                
##   [5] CytoNorm_2.0.1              TH.data_1.1-3              
##   [7] vctrs_0.6.5                 digest_0.6.37              
##   [9] png_0.1-8                   shape_1.4.6.1              
##  [11] proxy_0.4-27                slingshot_2.14.0           
##  [13] ggrepel_0.9.6               corrplot_0.95              
##  [15] parallelly_1.45.0           MASS_7.3-65                
##  [17] pkgdown_2.1.3               reshape2_1.4.4             
##  [19] httpuv_1.6.16               foreach_1.5.2              
##  [21] BiocGenerics_0.52.0         withr_3.0.2                
##  [23] ggrastr_1.0.2               xfun_0.52                  
##  [25] ggpubr_0.6.1                ellipsis_0.3.2             
##  [27] survival_3.8-3              memoise_2.0.1              
##  [29] hexbin_1.28.5               ggbeeswarm_0.7.2           
##  [31] RProtoBufLib_2.18.0         princurve_2.1.6            
##  [33] profvis_0.4.0               ggsci_3.2.0                
##  [35] systemfonts_1.2.3           ragg_1.4.0                 
##  [37] zoo_1.8-14                  GlobalOptions_0.1.2        
##  [39] DEoptimR_1.1-3-1            Formula_1.2-5              
##  [41] promises_1.3.3              scatterplot3d_0.3-44       
##  [43] httr_1.4.7                  rstatix_0.7.2              
##  [45] globals_0.18.0              rstudioapi_0.17.1          
##  [47] UCSC.utils_1.2.0            miniUI_0.1.2               
##  [49] generics_0.1.4              ggcyto_1.34.0              
##  [51] base64enc_0.1-3             curl_6.4.0                 
##  [53] S4Vectors_0.44.0            zlibbioc_1.52.0            
##  [55] flowWorkspace_4.18.1        polyclip_1.10-7            
##  [57] randomForest_4.7-1.2        GenomeInfoDbData_1.2.13    
##  [59] SparseArray_1.6.2           RBGL_1.82.0                
##  [61] ncdfFlow_2.52.1             RcppEigen_0.3.4.0.2        
##  [63] xtable_1.8-4                stringr_1.5.1              
##  [65] desc_1.4.3                  doParallel_1.0.17          
##  [67] evaluate_1.0.4              S4Arrays_1.6.0             
##  [69] hms_1.1.3                   glmnet_4.1-9               
##  [71] GenomicRanges_1.58.0        irlba_2.3.5.1              
##  [73] colorspace_2.1-1            harmony_1.2.3              
##  [75] reticulate_1.42.0           readxl_1.4.5               
##  [77] magrittr_2.0.3              lmtest_0.9-40              
##  [79] readr_2.1.5                 Rgraphviz_2.50.0           
##  [81] later_1.4.2                 lattice_0.22-7             
##  [83] future.apply_1.20.0         robustbase_0.99-4-1        
##  [85] XML_3.99-0.18               cowplot_1.2.0              
##  [87] matrixStats_1.5.0           xts_0.14.1                 
##  [89] class_7.3-23                Hmisc_5.2-3                
##  [91] pillar_1.11.0               nlme_3.1-168               
##  [93] iterators_1.0.14            compiler_4.4.2             
##  [95] RSpectra_0.16-2             stringi_1.8.7              
##  [97] gower_1.0.2                 minqa_1.2.8                
##  [99] SummarizedExperiment_1.36.0 lubridate_1.9.4            
## [101] devtools_2.4.5              CytoML_2.18.3              
## [103] plyr_1.8.9                  crayon_1.5.3               
## [105] abind_1.4-8                 locfit_1.5-9.12            
## [107] sp_2.2-0                    sandwich_3.1-1             
## [109] pcaMethods_1.98.0           dplyr_1.1.4                
## [111] codetools_0.2-20            multcomp_1.4-28            
## [113] textshaping_1.0.1           recipes_1.3.1              
## [115] openssl_2.3.3               Rphenograph_0.99.1         
## [117] TTR_0.24.4                  bslib_0.9.0                
## [119] e1071_1.7-16                destiny_3.20.0             
## [121] GetoptLong_1.0.5            ggplot.multistats_1.0.1    
## [123] mime_0.13                   splines_4.4.2              
## [125] circlize_0.4.16             Rcpp_1.1.0                 
## [127] sparseMatrixStats_1.18.0    cellranger_1.1.0           
## [129] knitr_1.50                  clue_0.3-66                
## [131] lme4_1.1-37                 fs_1.6.6                   
## [133] listenv_0.9.1               checkmate_2.3.2            
## [135] DelayedMatrixStats_1.28.1   Rdpack_2.6.4               
## [137] pkgbuild_1.4.8              ggsignif_0.6.4             
## [139] tibble_3.3.0                Matrix_1.7-3               
## [141] rpart.plot_3.1.2            statmod_1.5.0              
## [143] tzdb_0.5.0                  tweenr_2.0.3               
## [145] pkgconfig_2.0.3             pheatmap_1.0.13            
## [147] tools_4.4.2                 cachem_1.1.0               
## [149] rbibutils_2.3               smoother_1.3               
## [151] fastmap_1.2.0               rmarkdown_2.29             
## [153] scales_1.4.0                grid_4.4.2                 
## [155] usethis_3.1.0               broom_1.0.8                
## [157] sass_0.4.10                 graph_1.84.1               
## [159] carData_3.0-5               RANN_2.6.2                 
## [161] rpart_4.1.24                farver_2.1.2               
## [163] reformulas_0.4.1            yaml_2.3.10                
## [165] MatrixGenerics_1.18.1       foreign_0.8-90             
## [167] ggthemes_5.1.0              cli_3.6.5                  
## [169] purrr_1.0.4                 stats4_4.4.2               
## [171] lifecycle_1.0.4             uwot_0.2.3                 
## [173] askpass_1.2.1               caret_7.0-1                
## [175] Biobase_2.66.0              mvtnorm_1.3-3              
## [177] lava_1.8.1                  sessioninfo_1.2.3          
## [179] backports_1.5.0             cytolib_2.18.2             
## [181] timechange_0.3.0            gtable_0.3.6               
## [183] rjson_0.2.23                umap_0.2.10.0              
## [185] ggridges_0.5.6              parallel_4.4.2             
## [187] pROC_1.18.5                 limma_3.62.2               
## [189] jsonlite_2.0.0              edgeR_4.4.2                
## [191] RcppHNSW_0.6.0              ggplot2_3.5.2              
## [193] Rtsne_0.17                  FlowSOM_2.14.0             
## [195] ranger_0.17.0               flowCore_2.18.0            
## [197] jquerylib_0.1.4             timeDate_4041.110          
## [199] shiny_1.11.1                ConsensusClusterPlus_1.70.0
## [201] htmltools_0.5.8.1           diffcyt_1.26.1             
## [203] glue_1.8.0                  XVector_0.46.0             
## [205] VIM_6.2.2                   gridExtra_2.3              
## [207] boot_1.3-31                 TrajectoryUtils_1.14.0     
## [209] igraph_2.1.4                R6_2.6.1                   
## [211] tidyr_1.3.1                 SingleCellExperiment_1.28.1
## [213] vcd_1.4-13                  cluster_2.1.8.1            
## [215] pkgload_1.4.0               GenomeInfoDb_1.42.3        
## [217] ipred_0.9-15                nloptr_2.2.1               
## [219] DelayedArray_0.32.0         tidyselect_1.2.1           
## [221] vipor_0.4.7                 htmlTable_2.4.3            
## [223] ggforce_0.5.0               CytoDx_1.26.0              
## [225] car_3.1-3                   future_1.58.0              
## [227] ModelMetrics_1.2.2.2        laeken_0.5.3               
## [229] data.table_1.17.6           htmlwidgets_1.6.4          
## [231] ComplexHeatmap_2.22.0       RColorBrewer_1.1-3         
## [233] rlang_1.1.6                 remotes_2.5.0              
## [235] colorRamps_2.3.4            ggnewscale_0.5.2           
## [237] hardhat_1.4.1               beeswarm_0.4.0             
## [239] prodlim_2025.04.28

Marker selection using the marker and discard variables