Dimensionality Reduction
Source:vignettes/Dimensionality_Reduction.Rmd
Dimensionality_Reduction.RmdIn this vignette we showcase cyCONDOR functions for
dimensionality reduction. We exemplify how to perform Principle
Component Analysis (PCA) and calculate Uniform Manifold Approximation
and Projection (UMAP), t-Distributed Stochastic Neighbor Embedding
(tSNE) and Diffusion Map (DM).
All functions need the condor object as fcd
input and the data_slot to be used for the calculation. The
runPCA function always uses the expr slot for
the calculation, for non-linear dimensionality reduction (UMAP, tSNE,
DM) the user can decide to use the expr data or the
pca result as input_type.
Additionally, the user has the option to specifically state which
markers should be used for the calculation by listing them under
markers. The list can be either written out manually or be
extracted directly from the condor object using the
implemented functions measured_markers and
used_markers By default all available markers from the
condor object are used. If the discard option
is set to TRUE, all markers except the ones listed
under markers are used for calculation. This enables the
exclusion of single markers. When using pca as input is
possible to specify the number of PCs to be used.
By defining a prefix which gets incorporated into the
slot name of the output, each function can be run with different
settings and the results will be saved accordingly.
The functions return a fcd with an additional data frame
corresponding to the chosen dimensionality reduction method saved in
fcd$reduction_method. The name of the output consists of
the prefix (if given) and the data_slot.
Marker selection using the marker and
discard variables
It is possible to specify the markers which should be the basis for
the calculation using a combination of the markers variable
and the discard flag in all dimensionality reduction
functions. markers takes a vector of marker names as an
input that should be included (positive selection) or excluded (negative
selection). The user can choose to either discard the specified markers
by setting the discard flag to TRUE (negative selection) or
to keep only the specified markers by using the default setting of the
discard flag (positive selection).
The marker names should correspond to a specific column in the
expression table and can be given manually or can be extracted from the
condor object using the cyCONDOR function
used_markers. When performing a marker selection the user
should make sure that a prefix for the output name is set to avoid
overwriting a previously calculated matrix.
The option of marker selection is implemented in all dimensionality reduction functions but we only demonstrate it for PCA.
Load an example dataset
condor <- readRDS("../.test_files/conodr_example_016.rds")Principal Component Analysis (PCA)
The calculation of the Principle Components is based on the
prcomp function from the R Stats package (https://rdocumentation.org/packages/stats/versions/3.6.2).
condor <- runPCA(fcd = condor,
data_slot = "orig",
seed = 91)The output data frame of the PCA can be accessed with
condor$pca$orig.
As a demonstration the following code shows a positive and negative
selection with the corresponding discard flag setting.
PCA (Positive selection: Specifying the markers to be used as basis for the calculation)
condor <- runPCA(fcd = condor,
data_slot = "orig",
seed = 91,
prefix = "Tcell",
markers = c("CD3", "CD4", "CD8"),
discard = FALSE)The output data frame of the PCA with positive marker selection can
be accessed with condor$pca$Tcell_orig.
PCA (Negative selection: Excluding a specific marker from the calculation)
condor <- runPCA(fcd = condor,
data_slot = "orig",
seed = 91,
prefix = "scatter_exclusion",
markers = c("FSC-A", "SSC-A"),
discard = TRUE)The output data frame of the PCA with negative marker selection can
be accessed with condor$pca$scatter_exclusion_orig.
UMAP
The calculation of the UMAP is based on the umap
function from the uwot package. For more details see: Melville J (2023).
“uwot: The Uniform Manifold Approximation and Projection (UMAP) Method
for Dimensionality Reduction” https://github.com/jlmelville/uwot.
Besides important metrics that can be set in the uwot umap function
(e.g. number of items that define a neighborhood around each point
(nNeighbors) and minimum distance between embedded points
(min_dist)) the runUMAP function implemented
in cyCondor has additional parameters that can be adjusted.
Next to the selection of markers and an output
prefix the user can specify the number of PCs that should
be used for the umap calculation (nPC) and has the option
to save the umap model for future data projection
(ret_model).
condor <- runUMAP(fcd = condor,
input_type = "pca",
data_slot = "orig",
seed = 91)The output data frame of the UMAP coordinates can be accessed with
condor$umap$pca_orig.
tSNE
The tSNE calculation is based on the function Rtsne from
the package Rtsne. The implementation in
cyCondor allows for the definition of the perplexity used
in the tSNE calculation. This parameter controls how many nearest
neighbors should be taken into account when constructing the embedding.
The user has the option, similar as in the UMAP function, to select the
number of PCs which should be used for the calculation. For more details
see: Jesse H. Krijthe (2015). “Rtsne: T-Distributed Stochastic Neighbor
Embedding using a Barnes-Hut Implementation” https://github.com/jkrijthe/Rtsne.
condor <- runtSNE(fcd = condor,
input_type = "pca",
data_slot = "orig",
seed = 91,
perplexity = 30)The output data frame of the tSNE coordinates can be accessed with
condor$tSNE$pca_orig.
Diffusion Map
The calculation of DM is based on the function
DiffusionMap from the package destiny. The
number of nearest neighbors to be considered can be specified with
k. Here, the user has as well the option to select the
number of PCs which should be used for the calculation. For more deatils
see: Philipp Angerer et al. (2015). “destiny: diffusion maps for
large-scale single-cell data in R.” Helmholtz-Zentrum München.http://bioinformatics.oxfordjournals.org/content/32/8/1241.
condor <- runDM(fcd = subset_fcd(condor, 5000),
input_type = "pca",
data_slot = "orig",
k = 10,
seed = 91)The output data frame of the DM can be accessed with
condor$diffmap$pca_orig.
Session Info
info <- sessionInfo()
info## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] cyCONDOR_0.3.1
##
## loaded via a namespace (and not attached):
## [1] IRanges_2.40.1 Rmisc_1.5.1
## [3] urlchecker_1.0.1 nnet_7.3-20
## [5] CytoNorm_2.0.1 TH.data_1.1-3
## [7] vctrs_0.6.5 digest_0.6.37
## [9] png_0.1-8 shape_1.4.6.1
## [11] proxy_0.4-27 slingshot_2.14.0
## [13] ggrepel_0.9.6 corrplot_0.95
## [15] parallelly_1.45.0 MASS_7.3-65
## [17] pkgdown_2.1.3 reshape2_1.4.4
## [19] httpuv_1.6.16 foreach_1.5.2
## [21] BiocGenerics_0.52.0 withr_3.0.2
## [23] ggrastr_1.0.2 xfun_0.52
## [25] ggpubr_0.6.1 ellipsis_0.3.2
## [27] survival_3.8-3 memoise_2.0.1
## [29] hexbin_1.28.5 ggbeeswarm_0.7.2
## [31] RProtoBufLib_2.18.0 princurve_2.1.6
## [33] profvis_0.4.0 ggsci_3.2.0
## [35] systemfonts_1.2.3 ragg_1.4.0
## [37] zoo_1.8-14 GlobalOptions_0.1.2
## [39] DEoptimR_1.1-3-1 Formula_1.2-5
## [41] promises_1.3.3 scatterplot3d_0.3-44
## [43] httr_1.4.7 rstatix_0.7.2
## [45] globals_0.18.0 rstudioapi_0.17.1
## [47] UCSC.utils_1.2.0 miniUI_0.1.2
## [49] generics_0.1.4 ggcyto_1.34.0
## [51] base64enc_0.1-3 curl_6.4.0
## [53] S4Vectors_0.44.0 zlibbioc_1.52.0
## [55] flowWorkspace_4.18.1 polyclip_1.10-7
## [57] randomForest_4.7-1.2 GenomeInfoDbData_1.2.13
## [59] SparseArray_1.6.2 RBGL_1.82.0
## [61] ncdfFlow_2.52.1 RcppEigen_0.3.4.0.2
## [63] xtable_1.8-4 stringr_1.5.1
## [65] desc_1.4.3 doParallel_1.0.17
## [67] evaluate_1.0.4 S4Arrays_1.6.0
## [69] hms_1.1.3 glmnet_4.1-9
## [71] GenomicRanges_1.58.0 irlba_2.3.5.1
## [73] colorspace_2.1-1 harmony_1.2.3
## [75] reticulate_1.42.0 readxl_1.4.5
## [77] magrittr_2.0.3 lmtest_0.9-40
## [79] readr_2.1.5 Rgraphviz_2.50.0
## [81] later_1.4.2 lattice_0.22-7
## [83] future.apply_1.20.0 robustbase_0.99-4-1
## [85] XML_3.99-0.18 cowplot_1.2.0
## [87] matrixStats_1.5.0 xts_0.14.1
## [89] class_7.3-23 Hmisc_5.2-3
## [91] pillar_1.11.0 nlme_3.1-168
## [93] iterators_1.0.14 compiler_4.4.2
## [95] RSpectra_0.16-2 stringi_1.8.7
## [97] gower_1.0.2 minqa_1.2.8
## [99] SummarizedExperiment_1.36.0 lubridate_1.9.4
## [101] devtools_2.4.5 CytoML_2.18.3
## [103] plyr_1.8.9 crayon_1.5.3
## [105] abind_1.4-8 locfit_1.5-9.12
## [107] sp_2.2-0 sandwich_3.1-1
## [109] pcaMethods_1.98.0 dplyr_1.1.4
## [111] codetools_0.2-20 multcomp_1.4-28
## [113] textshaping_1.0.1 recipes_1.3.1
## [115] openssl_2.3.3 Rphenograph_0.99.1
## [117] TTR_0.24.4 bslib_0.9.0
## [119] e1071_1.7-16 destiny_3.20.0
## [121] GetoptLong_1.0.5 ggplot.multistats_1.0.1
## [123] mime_0.13 splines_4.4.2
## [125] circlize_0.4.16 Rcpp_1.1.0
## [127] sparseMatrixStats_1.18.0 cellranger_1.1.0
## [129] knitr_1.50 clue_0.3-66
## [131] lme4_1.1-37 fs_1.6.6
## [133] listenv_0.9.1 checkmate_2.3.2
## [135] DelayedMatrixStats_1.28.1 Rdpack_2.6.4
## [137] pkgbuild_1.4.8 ggsignif_0.6.4
## [139] tibble_3.3.0 Matrix_1.7-3
## [141] rpart.plot_3.1.2 statmod_1.5.0
## [143] tzdb_0.5.0 tweenr_2.0.3
## [145] pkgconfig_2.0.3 pheatmap_1.0.13
## [147] tools_4.4.2 cachem_1.1.0
## [149] rbibutils_2.3 smoother_1.3
## [151] fastmap_1.2.0 rmarkdown_2.29
## [153] scales_1.4.0 grid_4.4.2
## [155] usethis_3.1.0 broom_1.0.8
## [157] sass_0.4.10 graph_1.84.1
## [159] carData_3.0-5 RANN_2.6.2
## [161] rpart_4.1.24 farver_2.1.2
## [163] reformulas_0.4.1 yaml_2.3.10
## [165] MatrixGenerics_1.18.1 foreign_0.8-90
## [167] ggthemes_5.1.0 cli_3.6.5
## [169] purrr_1.1.0 stats4_4.4.2
## [171] lifecycle_1.0.4 uwot_0.2.3
## [173] askpass_1.2.1 caret_7.0-1
## [175] Biobase_2.66.0 mvtnorm_1.3-3
## [177] lava_1.8.1 sessioninfo_1.2.3
## [179] backports_1.5.0 cytolib_2.18.2
## [181] timechange_0.3.0 gtable_0.3.6
## [183] rjson_0.2.23 umap_0.2.10.0
## [185] ggridges_0.5.6 parallel_4.4.2
## [187] pROC_1.18.5 limma_3.62.2
## [189] jsonlite_2.0.0 edgeR_4.4.2
## [191] RcppHNSW_0.6.0 ggplot2_3.5.2
## [193] Rtsne_0.17 FlowSOM_2.14.0
## [195] ranger_0.17.0 flowCore_2.18.0
## [197] jquerylib_0.1.4 timeDate_4041.110
## [199] shiny_1.11.1 ConsensusClusterPlus_1.70.0
## [201] htmltools_0.5.8.1 diffcyt_1.26.1
## [203] glue_1.8.0 XVector_0.46.0
## [205] VIM_6.2.2 gridExtra_2.3
## [207] boot_1.3-31 TrajectoryUtils_1.14.0
## [209] igraph_2.1.4 R6_2.6.1
## [211] tidyr_1.3.1 SingleCellExperiment_1.28.1
## [213] vcd_1.4-13 cluster_2.1.8.1
## [215] pkgload_1.4.0 GenomeInfoDb_1.42.3
## [217] ipred_0.9-15 nloptr_2.2.1
## [219] DelayedArray_0.32.0 tidyselect_1.2.1
## [221] vipor_0.4.7 htmlTable_2.4.3
## [223] ggforce_0.5.0 CytoDx_1.26.0
## [225] car_3.1-3 future_1.58.0
## [227] ModelMetrics_1.2.2.2 laeken_0.5.3
## [229] data.table_1.17.8 htmlwidgets_1.6.4
## [231] ComplexHeatmap_2.22.0 RColorBrewer_1.1-3
## [233] rlang_1.1.6 remotes_2.5.0
## [235] colorRamps_2.3.4 ggnewscale_0.5.2
## [237] hardhat_1.4.1 beeswarm_0.4.0
## [239] prodlim_2025.04.28