Getting Started
Here we describe a basic workflow to analyse high-dimensional
cytometry data with cyCONDOR
. More detailed description of
all cyCONDOR
functionalists and visualization tools can be
found in the Articles section. In this section we will describe
how to load data from a folder of FCS files, how to perform
dimensionality reduction and clustering together with some basic
visualization of the results.
We start by loading the cyCONDOR
package.
Loading the data
With prep_fcd()
you import the .fcs
files
to your R session. The .fcs
files should all be stored in a
single folder data_path
. Additionally an
annotation table text file (anno_table
) has to be
provided which contains a column with the file names of the
.fcs
files and optionally addional sample information. This
is an example for an anno_table
:
filename | sample_id | condition |
---|---|---|
exp_x_sample1.fcs | sample1 | treatment |
exp_x_sample2.fcs | sample2 | control |
Arguments:
data_path
= Folder where the .fcs
files or .csv
files are stored
max_cell
= Number of cells to use for each
file
useCSV
= Flag if the input are
.csv
files and not .fcs
transformation
= Transformation to perform
(“auto_logi”, “arcsinh”, “clr”, “none”)
remove_param
= Parameters to be removed from
the fcd
, “inTime” should be kept
anno_table
= Path to the annotation table text
file.
filename_col
= Name of the column containing
the file name matching with the .fcs
/.csv
files
condor <- prep_fcd(data_path = "../.test_files/fcs/",
max_cell = 1000,
useCSV = FALSE,
transformation = "auto_logi",
remove_param = c("FSC-H", "SSC-H", "FSC-W", "SSC-W", "Time"),
anno_table = "../.test_files/metadata.csv",
filename_col = "filename"
)
class(condor)
## [1] "flow_cytometry_dataframe"
For more details on data loading, transformation and the general
structure of the fcd
have a look at
vignette("Data_Loading_and_Transformation")
and vignette("Other_utilities")
.
Dimensionality Reduction
To reduce the complexity of the data set, we first perform a
principal component analysis (PCA) and use those coordinates for
non-linear dimensionality reduction applying e.g. the UMAP or tSNE
algorithm. See
vignette("Dimensionality_Reduction")
for
further details and alternative methods. With this approach, we can
visualize the complexity of the data set in the two dimensional
space.
PCA
Arguments:
fcd
= Flow cytometry dataset
data_slot
= Data slot to use for the
calculation, e.g. "orig"
or batch corrected
"norm"
condor <- runPCA(fcd = condor,
data_slot = "orig"
)
UMAP
Arguments:
fcd
= Flow cytometry dataset
input type
= Data to use for the calculation of
the UMAP, e.g. “expr” or “pca”
data_slot
= Data slot to use for the
calculation, e.g. "orig"
or batch corrected
"norm"
condor <- runUMAP(fcd = condor,
input_type = "pca",
data_slot = "orig"
)
UMAP Visualization
plot_dim_red(fcd= condor,
reduction_method = "umap",
reduction_slot = "pca_orig",
param = "group",
title = "UMAP colored by group"
)
tSNE
Arguments:
fcd
= Flow cytometry dataset
input type
= Data to use for the calculation,
e.g. "expr"
or "pca"
data_slot
= Data slot to use for the
calculation, e.g. "orig"
or batch corrected
"norm"
condor <- runtSNE(fcd = condor,
input_type = "pca",
data_slot = "orig"
)
## Read the 10000 x 29 data matrix successfully!
## OpenMP is working. 1 threads.
## Using no_dims = 2, perplexity = 30.000000, and theta = 0.500000
## Computing input similarities...
## Building tree...
## - point 10000 of 10000
## Done in 2.93 seconds (sparsity = 0.013027)!
## Learning embedding...
## Iteration 50: error is 96.832296 (50 iterations in 1.49 seconds)
## Iteration 100: error is 84.589965 (50 iterations in 2.13 seconds)
## Iteration 150: error is 81.308503 (50 iterations in 1.35 seconds)
## Iteration 200: error is 80.502378 (50 iterations in 1.37 seconds)
## Iteration 250: error is 80.156805 (50 iterations in 1.37 seconds)
## Iteration 300: error is 3.166519 (50 iterations in 1.27 seconds)
## Iteration 350: error is 2.840701 (50 iterations in 1.22 seconds)
## Iteration 400: error is 2.653973 (50 iterations in 1.21 seconds)
## Iteration 450: error is 2.531473 (50 iterations in 1.18 seconds)
## Iteration 500: error is 2.444583 (50 iterations in 1.18 seconds)
## Iteration 550: error is 2.378906 (50 iterations in 1.18 seconds)
## Iteration 600: error is 2.328042 (50 iterations in 1.15 seconds)
## Iteration 650: error is 2.287705 (50 iterations in 1.14 seconds)
## Iteration 700: error is 2.255558 (50 iterations in 1.15 seconds)
## Iteration 750: error is 2.229873 (50 iterations in 1.21 seconds)
## Iteration 800: error is 2.210205 (50 iterations in 1.19 seconds)
## Iteration 850: error is 2.197058 (50 iterations in 1.26 seconds)
## Iteration 900: error is 2.188463 (50 iterations in 1.21 seconds)
## Iteration 950: error is 2.181792 (50 iterations in 1.23 seconds)
## Iteration 1000: error is 2.176606 (50 iterations in 1.22 seconds)
## Fitting performed in 25.73 seconds.
tSNE visualization
plot_dim_red(fcd= condor,
reduction_method = "tSNE",
reduction_slot = "pca_orig",
param = "group",
title = "tSNE colored by group"
)
Clustering
We group cells with similar marker expression applying the Phenograph
or FlowSOM clustering algorithms. For more details see
vignette("Clustering_and_cell_annotation")
.
Phenograph clustering
Arguments:
fcd
= Flow cytometry dataset
input type
= Data to use for the calculation of
the UMAP, e.g. "pca"
data_slot
= Data slot to use for the
calculation, e.g. "orig"
or "norm"
k
= K value used for clustering
condor <- runPhenograph(fcd = condor,
input_type = "pca",
data_slot = "orig",
k = 60
)
## Run Rphenograph starts:
## -Input data of 10000 rows and 29 columns
## -k is set to 60
## Finding nearest neighbors...DONE ~ 5.037 s
## Compute jaccard coefficient between nearest-neighbor sets...
## Presorting knn...
## presorting DONE ~ 0.362 s
## Start jaccard
## DONE ~ 0.528 s
## Build undirected graph from the weighted links...DONE ~ 0.204 s
## Run louvain clustering on the graph ...DONE ~ 0.843 s
## Run Rphenograph DONE, totally takes 6.61200000000001s.
## Return a community class
## -Modularity value: 0.8355073
## -Number of clusters: 13
Visualize Phenograph clustering
plot_dim_red(fcd= condor,
reduction_method = "umap",
reduction_slot = "pca_orig",
cluster_slot = "phenograph_pca_orig_k_60",
param = "Phenograph",
title = "UMAP colored by Phenograph clustering"
)
FlowSOM clustering
Arguments:
fcd
= Flow cytometry dataset
input type
= Data to use for the calculation,
e.g. "expr"
or "pca"
.
data_slot
= Data slot to use for the
calculation, e.g. "orig"
or "norm"
nClusters
= Number of final clusters
condor <- runFlowSOM(fcd = condor,
input_type = "expr",
data_slot = "orig",
nClusters = 5
)
## Building SOM
## Mapping data to SOM
## Building MST
Visualize FlowSOM clustering
plot_dim_red(fcd= condor,
reduction_method = "umap",
reduction_slot = "pca_orig",
cluster_slot = "FlowSOM_expr_orig_k_5",
param = "FlowSOM",
title = "UMAP colored by FlowSOM clustering"
)
Data visualization
We can now further visualize our data set to compare the different
experimental groups. Below are some examples, for more visualization
options check out
vignette("Data_Visualization")
.
Confusion Matrix
Arguments:
fcd
= Flow cytometry data set
cluster_slot
= String specifying which
clustering slot to use to find variable specified in
cluster_var
cluster_var
= String specifying variable name
in cluster_slot
that identifies cell population labels to
be used (e.g. clusters, metaclusters or predicted labels)
group_var
= String indicating variable name in
cell_anno that defines grouping variable to be used (x-axis), e.g. group
or sample ID
plot_confusion_HM(fcd = condor,
cluster_slot = "phenograph_pca_orig_k_60",
cluster_var = "Phenograph",
group_var = "group",
size = 30
)
Barplot of cluster frequencies
Arguments:
fcd
= Flow cytometry data set
cluster_slot
= String specifying which
clustering slot to use to find variable specified in
cluster_var
cluster_var
= String specifying variable name
in cluster_slot
that identifies cell population labels to
be used (e.g. clusters, metaclusters or predicted labels)
group_var
= String indicating variable name in
cell_anno that defines grouping variable to be used (x-axis), e.g. group
or sample ID
title
= Title of the plot, default is
“Counts”
plot_frequency_barplot(fcd = condor,
cluster_slot = "phenograph_pca_orig_k_60",
cluster_var = "Phenograph",
group_var = "group",
title = "Stacked barplot of cluster frequencies"
)
Heatmap of protein expression
Arguments:
fcd
= Flow cytometry data set
expr_slot
= expr_slot
from which
to take marker expression values, default is "orig"
cluster_slot
= String specifying which
clustering slot to use to find variable specified in
cluster_var
cluster_var
= String specifying variable name
in cluster_slot
that identifies cell population labels to
be used (e.g. clusters, metaclusters or predicted labels)
plot_marker_HM(fcd = condor,
expr_slot = "orig",
marker_to_exclude = c("FSC-A","SSC-A"),
cluster_slot = "phenograph_pca_orig_k_60",
cluster_var = "Phenograph"
)
Boxplot of cluster frequency
Arguments:
fcd
= Flow cytometry data set
cluster_slot
= String specifying which
clustering slot to use to find variable specified in
cluster_var
cluster_var
= String specifying variable name
in cluster_slot
that identifies cell population labels to
be used (e.g. clusters, metaclusters or predicted labels)
sample_var
= String indicating variable name in
cell_anno that defines sample IDs to be used
group_var
= String indicating variable name in
cell_anno that should be used to group samples in
sample_var
numeric
= Logical, if TRUE
numeric
levels in cluster_var are ordered in increasing order and “Cluster_” is
pasted before number, if FALSE
alphabetical ordering is
applied.
plots <- plot_frequency_boxplot(fcd = condor,
cluster_slot = "phenograph_pca_orig_k_60",
cluster_var = "Phenograph",
sample_var = "sample_ID",
group_var = "group",
numeric = T
)
plots$Cluster_7
What is next?
Depending on your data set cyCONDOR
offers various
options to continue with your analysis:
Try out more data visualization options:
vignette("Data_Visualization")
Exploratory differential analysis of cell population frequencies and marker expression:
vignette("Differential_Analysis"
See
vignette("Batch_correction")
for more details on how to handle batch effects withincyCONDOR
If you have a high number of samples recorded with the same panel, check out our data projection workflow for conveniently assigning clusters and metaclusters:
vignette("Data_Projection")
Train a machine learning classifier:
vignette("Machine_learning_classifier)
Calculate cell trajectories and pseudotime:
vignette("Pseudotime_analysis")
Import your FlowJo gating hierarchy into your
fcd
:vignette("Load_a_FlowJo_workspace.Rmd")
Session Info
info <- sessionInfo()
info
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] cyCONDOR_0.2.0
##
## loaded via a namespace (and not attached):
## [1] IRanges_2.34.1 Rmisc_1.5.1
## [3] urlchecker_1.0.1 nnet_7.3-19
## [5] CytoNorm_2.0.1 TH.data_1.1-2
## [7] vctrs_0.6.4 digest_0.6.33
## [9] png_0.1-8 shape_1.4.6
## [11] proxy_0.4-27 slingshot_2.8.0
## [13] ggrepel_0.9.4 parallelly_1.36.0
## [15] MASS_7.3-60 pkgdown_2.0.7
## [17] reshape2_1.4.4 httpuv_1.6.12
## [19] foreach_1.5.2 BiocGenerics_0.46.0
## [21] withr_2.5.1 ggrastr_1.0.2
## [23] xfun_0.40 ggpubr_0.6.0
## [25] ellipsis_0.3.2 survival_3.5-7
## [27] memoise_2.0.1 hexbin_1.28.3
## [29] ggbeeswarm_0.7.2 RProtoBufLib_2.12.1
## [31] princurve_2.1.6 profvis_0.3.8
## [33] ggsci_3.0.0 systemfonts_1.0.5
## [35] ragg_1.2.6 zoo_1.8-12
## [37] GlobalOptions_0.1.2 DEoptimR_1.1-3
## [39] Formula_1.2-5 prettyunits_1.2.0
## [41] promises_1.2.1 scatterplot3d_0.3-44
## [43] rstatix_0.7.2 globals_0.16.2
## [45] ps_1.7.5 rstudioapi_0.15.0
## [47] miniUI_0.1.1.1 generics_0.1.3
## [49] ggcyto_1.28.1 base64enc_0.1-3
## [51] processx_3.8.2 curl_5.1.0
## [53] S4Vectors_0.38.2 zlibbioc_1.46.0
## [55] flowWorkspace_4.12.2 polyclip_1.10-6
## [57] randomForest_4.7-1.1 GenomeInfoDbData_1.2.10
## [59] RBGL_1.76.0 ncdfFlow_2.46.0
## [61] RcppEigen_0.3.3.9.4 xtable_1.8-4
## [63] stringr_1.5.0 desc_1.4.2
## [65] doParallel_1.0.17 evaluate_0.22
## [67] S4Arrays_1.0.6 hms_1.1.3
## [69] glmnet_4.1-8 GenomicRanges_1.52.1
## [71] irlba_2.3.5.1 colorspace_2.1-0
## [73] harmony_1.1.0 reticulate_1.34.0
## [75] readxl_1.4.3 magrittr_2.0.3
## [77] lmtest_0.9-40 readr_2.1.4
## [79] Rgraphviz_2.44.0 later_1.3.1
## [81] lattice_0.22-5 future.apply_1.11.0
## [83] robustbase_0.99-0 XML_3.99-0.15
## [85] cowplot_1.1.1 matrixStats_1.1.0
## [87] RcppAnnoy_0.0.21 xts_0.13.1
## [89] class_7.3-22 Hmisc_5.1-1
## [91] pillar_1.9.0 nlme_3.1-163
## [93] iterators_1.0.14 compiler_4.3.1
## [95] RSpectra_0.16-1 stringi_1.7.12
## [97] gower_1.0.1 minqa_1.2.6
## [99] SummarizedExperiment_1.30.2 lubridate_1.9.3
## [101] devtools_2.4.5 CytoML_2.12.0
## [103] plyr_1.8.9 crayon_1.5.2
## [105] abind_1.4-5 locfit_1.5-9.8
## [107] sp_2.1-1 sandwich_3.0-2
## [109] pcaMethods_1.92.0 dplyr_1.1.3
## [111] codetools_0.2-19 multcomp_1.4-25
## [113] textshaping_0.3.7 recipes_1.0.8
## [115] openssl_2.1.1 Rphenograph_0.99.1
## [117] TTR_0.24.3 bslib_0.5.1
## [119] e1071_1.7-13 destiny_3.14.0
## [121] GetoptLong_1.0.5 ggplot.multistats_1.0.0
## [123] mime_0.12 splines_4.3.1
## [125] circlize_0.4.15 Rcpp_1.0.11
## [127] sparseMatrixStats_1.12.2 cellranger_1.1.0
## [129] knitr_1.44 utf8_1.2.4
## [131] clue_0.3-65 lme4_1.1-35.1
## [133] fs_1.6.3 listenv_0.9.0
## [135] checkmate_2.3.0 DelayedMatrixStats_1.22.6
## [137] pkgbuild_1.4.2 ggsignif_0.6.4
## [139] tibble_3.2.1 Matrix_1.6-1.1
## [141] rpart.plot_3.1.1 callr_3.7.3
## [143] tzdb_0.4.0 tweenr_2.0.2
## [145] pkgconfig_2.0.3 pheatmap_1.0.12
## [147] tools_4.3.1 cachem_1.0.8
## [149] smoother_1.1 fastmap_1.1.1
## [151] rmarkdown_2.25 scales_1.2.1
## [153] grid_4.3.1 usethis_2.2.2
## [155] broom_1.0.5 sass_0.4.7
## [157] graph_1.78.0 carData_3.0-5
## [159] RANN_2.6.1 rpart_4.1.21
## [161] farver_2.1.1 yaml_2.3.7
## [163] MatrixGenerics_1.12.3 foreign_0.8-85
## [165] ggthemes_4.2.4 cli_3.6.1
## [167] purrr_1.0.2 stats4_4.3.1
## [169] lifecycle_1.0.3 uwot_0.1.16
## [171] askpass_1.2.0 caret_6.0-94
## [173] Biobase_2.60.0 mvtnorm_1.2-3
## [175] lava_1.7.3 sessioninfo_1.2.2
## [177] backports_1.4.1 cytolib_2.12.1
## [179] timechange_0.2.0 gtable_0.3.4
## [181] rjson_0.2.21 umap_0.2.10.0
## [183] ggridges_0.5.4 Rphenoannoy_0.1.0
## [185] parallel_4.3.1 pROC_1.18.5
## [187] limma_3.56.2 jsonlite_1.8.7
## [189] edgeR_3.42.4 RcppHNSW_0.5.0
## [191] bitops_1.0-7 ggplot2_3.4.4
## [193] Rtsne_0.16 FlowSOM_2.8.0
## [195] ranger_0.16.0 flowCore_2.12.2
## [197] jquerylib_0.1.4 timeDate_4022.108
## [199] shiny_1.7.5.1 ConsensusClusterPlus_1.64.0
## [201] htmltools_0.5.6.1 diffcyt_1.20.0
## [203] glue_1.6.2 XVector_0.40.0
## [205] VIM_6.2.2 RCurl_1.98-1.13
## [207] rprojroot_2.0.3 gridExtra_2.3
## [209] boot_1.3-28.1 TrajectoryUtils_1.8.0
## [211] igraph_1.5.1 R6_2.5.1
## [213] tidyr_1.3.0 SingleCellExperiment_1.22.0
## [215] labeling_0.4.3 vcd_1.4-11
## [217] cluster_2.1.4 pkgload_1.3.3
## [219] GenomeInfoDb_1.36.4 ipred_0.9-14
## [221] nloptr_2.0.3 DelayedArray_0.26.7
## [223] tidyselect_1.2.0 vipor_0.4.5
## [225] htmlTable_2.4.2 ggforce_0.4.1
## [227] CytoDx_1.20.0 car_3.1-2
## [229] future_1.33.0 ModelMetrics_1.2.2.2
## [231] munsell_0.5.0 laeken_0.5.2
## [233] data.table_1.14.8 htmlwidgets_1.6.2
## [235] ComplexHeatmap_2.16.0 RColorBrewer_1.1-3
## [237] rlang_1.1.1 remotes_2.4.2.1
## [239] colorRamps_2.3.1 ggnewscale_0.4.9
## [241] fansi_1.0.5 hardhat_1.3.0
## [243] beeswarm_0.4.0 prodlim_2023.08.28