Skip to contents

Getting Started

Here we describe a basic workflow to analyse high-dimensional cytometry data with cyCONDOR. More detailed description of all cyCONDOR functionalists and visualization tools can be found in the Articles section. In this section we will describe how to load data from a folder of FCS files, how to perform dimensionality reduction and clustering together with some basic visualization of the results.

We start by loading the cyCONDOR package.

Loading the data

With prep_fcd() you import the .fcs files to your R session. The .fcs files should all be stored in a single folder data_path. Additionally an annotation table text file (anno_table) has to be provided which contains a column with the file names of the .fcs files and optionally addional sample information. This is an example for an anno_table:

filename sample_id condition
exp_x_sample1.fcs sample1 treatment
exp_x_sample2.fcs sample2 control

Arguments:

data_path = Folder where the .fcs files or .csv files are stored

max_cell = Number of cells to use for each file

useCSV = Flag if the input are .csv files and not .fcs

transformation = Transformation to perform (“auto_logi”, “arcsinh”, “clr”, “none”)

remove_param = Parameters to be removed from the fcd, “inTime” should be kept

anno_table = Path to the annotation table text file.

filename_col = Name of the column containing the file name matching with the .fcs/.csv files

condor <- prep_fcd(data_path = "../.test_files/fcs/", 
                    max_cell = 1000, 
                    useCSV = FALSE, 
                    transformation = "auto_logi", 
                    remove_param = c("FSC-H", "SSC-H", "FSC-W", "SSC-W", "Time"), 
                    anno_table = "../.test_files/metadata.csv", 
                    filename_col = "filename"
                   )

class(condor)
## [1] "flow_cytometry_dataframe"

For more details on data loading, transformation and the general structure of the fcd have a look at vignette("Data_Loading_and_Transformation") and vignette("Other_utilities").

Dimensionality Reduction

To reduce the complexity of the data set, we first perform a principal component analysis (PCA) and use those coordinates for non-linear dimensionality reduction applying e.g. the UMAP or tSNE algorithm. See vignette("Dimensionality_Reduction") for further details and alternative methods. With this approach, we can visualize the complexity of the data set in the two dimensional space.

PCA

Arguments:

fcd = Flow cytometry dataset

data_slot = Data slot to use for the calculation, e.g. "orig" or batch corrected "norm"

condor <- runPCA(fcd = condor, 
                 data_slot = "orig"
                 )

UMAP

Arguments:

fcd = Flow cytometry dataset

input type = Data to use for the calculation of the UMAP, e.g. “expr” or “pca”

data_slot = Data slot to use for the calculation, e.g. "orig" or batch corrected "norm"

condor <- runUMAP(fcd = condor, 
                  input_type = "pca", 
                  data_slot = "orig"
                  )

UMAP Visualization

plot_dim_red(fcd= condor,  
             reduction_method = "umap", 
             reduction_slot = "pca_orig", 
             param = "group", 
             title = "UMAP colored by group"
             )

tSNE

Arguments:

fcd = Flow cytometry dataset

input type = Data to use for the calculation, e.g. "expr" or "pca"

data_slot = Data slot to use for the calculation, e.g. "orig" or batch corrected "norm"

condor <- runtSNE(fcd = condor, 
                  input_type = "pca", 
                  data_slot = "orig"
                  )
## Read the 10000 x 29 data matrix successfully!
## OpenMP is working. 1 threads.
## Using no_dims = 2, perplexity = 30.000000, and theta = 0.500000
## Computing input similarities...
## Building tree...
##  - point 10000 of 10000
## Done in 2.93 seconds (sparsity = 0.013027)!
## Learning embedding...
## Iteration 50: error is 96.832296 (50 iterations in 1.49 seconds)
## Iteration 100: error is 84.589965 (50 iterations in 2.13 seconds)
## Iteration 150: error is 81.308503 (50 iterations in 1.35 seconds)
## Iteration 200: error is 80.502378 (50 iterations in 1.37 seconds)
## Iteration 250: error is 80.156805 (50 iterations in 1.37 seconds)
## Iteration 300: error is 3.166519 (50 iterations in 1.27 seconds)
## Iteration 350: error is 2.840701 (50 iterations in 1.22 seconds)
## Iteration 400: error is 2.653973 (50 iterations in 1.21 seconds)
## Iteration 450: error is 2.531473 (50 iterations in 1.18 seconds)
## Iteration 500: error is 2.444583 (50 iterations in 1.18 seconds)
## Iteration 550: error is 2.378906 (50 iterations in 1.18 seconds)
## Iteration 600: error is 2.328042 (50 iterations in 1.15 seconds)
## Iteration 650: error is 2.287705 (50 iterations in 1.14 seconds)
## Iteration 700: error is 2.255558 (50 iterations in 1.15 seconds)
## Iteration 750: error is 2.229873 (50 iterations in 1.21 seconds)
## Iteration 800: error is 2.210205 (50 iterations in 1.19 seconds)
## Iteration 850: error is 2.197058 (50 iterations in 1.26 seconds)
## Iteration 900: error is 2.188463 (50 iterations in 1.21 seconds)
## Iteration 950: error is 2.181792 (50 iterations in 1.23 seconds)
## Iteration 1000: error is 2.176606 (50 iterations in 1.22 seconds)
## Fitting performed in 25.73 seconds.

tSNE visualization

plot_dim_red(fcd= condor,  
             reduction_method = "tSNE", 
             reduction_slot = "pca_orig", 
             param = "group", 
             title = "tSNE colored by group"
             )

Clustering

We group cells with similar marker expression applying the Phenograph or FlowSOM clustering algorithms. For more details see vignette("Clustering_and_cell_annotation").

Phenograph clustering

Arguments:

fcd = Flow cytometry dataset

input type = Data to use for the calculation of the UMAP, e.g. "pca"

data_slot = Data slot to use for the calculation, e.g. "orig" or "norm"

k = K value used for clustering

condor <- runPhenograph(fcd = condor, 
                        input_type = "pca", 
                        data_slot = "orig", 
                        k = 60
                        )
## Run Rphenograph starts:
##   -Input data of 10000 rows and 29 columns
##   -k is set to 60
##   Finding nearest neighbors...DONE ~ 5.037 s
##   Compute jaccard coefficient between nearest-neighbor sets...
## Presorting knn...
## presorting DONE ~ 0.362 s
##   Start jaccard
## DONE ~ 0.528 s
##   Build undirected graph from the weighted links...DONE ~ 0.204 s
##   Run louvain clustering on the graph ...DONE ~ 0.843 s
## Run Rphenograph DONE, totally takes 6.61200000000001s.
##   Return a community class
##   -Modularity value: 0.8355073 
##   -Number of clusters: 13

Visualize Phenograph clustering

plot_dim_red(fcd= condor,  
             reduction_method = "umap", 
             reduction_slot = "pca_orig", 
             cluster_slot = "phenograph_pca_orig_k_60",
             param = "Phenograph",
             title = "UMAP colored by Phenograph clustering"
             )

FlowSOM clustering

Arguments:

fcd = Flow cytometry dataset

input type = Data to use for the calculation, e.g. "expr" or "pca".

data_slot = Data slot to use for the calculation, e.g. "orig" or "norm"

nClusters = Number of final clusters

condor <- runFlowSOM(fcd = condor, 
                     input_type = "expr", 
                     data_slot = "orig", 
                     nClusters = 5
                     )
## Building SOM
## Mapping data to SOM
## Building MST

Visualize FlowSOM clustering

plot_dim_red(fcd= condor,  
             reduction_method = "umap", 
             reduction_slot = "pca_orig", 
             cluster_slot = "FlowSOM_expr_orig_k_5",
             param = "FlowSOM",
             title = "UMAP colored by FlowSOM clustering"
             )

Data visualization

We can now further visualize our data set to compare the different experimental groups. Below are some examples, for more visualization options check out vignette("Data_Visualization").

Confusion Matrix

Arguments:

fcd = Flow cytometry data set

cluster_slot = String specifying which clustering slot to use to find variable specified in cluster_var

cluster_var = String specifying variable name in cluster_slot that identifies cell population labels to be used (e.g. clusters, metaclusters or predicted labels)

group_var = String indicating variable name in cell_anno that defines grouping variable to be used (x-axis), e.g. group or sample ID

plot_confusion_HM(fcd = condor,
                  cluster_slot = "phenograph_pca_orig_k_60", 
                  cluster_var = "Phenograph",
                  group_var = "group", 
                  size = 30
                  )

Barplot of cluster frequencies

Arguments:

fcd = Flow cytometry data set

cluster_slot = String specifying which clustering slot to use to find variable specified in cluster_var

cluster_var = String specifying variable name in cluster_slot that identifies cell population labels to be used (e.g. clusters, metaclusters or predicted labels)

group_var = String indicating variable name in cell_anno that defines grouping variable to be used (x-axis), e.g. group or sample ID

title = Title of the plot, default is “Counts”

plot_frequency_barplot(fcd = condor,
                    cluster_slot = "phenograph_pca_orig_k_60",
                    cluster_var = "Phenograph",
                    group_var = "group",
                    title = "Stacked barplot of cluster frequencies" 
                    )

Heatmap of protein expression

Arguments:

fcd = Flow cytometry data set

expr_slot = expr_slot from which to take marker expression values, default is "orig"

cluster_slot = String specifying which clustering slot to use to find variable specified in cluster_var

cluster_var = String specifying variable name in cluster_slot that identifies cell population labels to be used (e.g. clusters, metaclusters or predicted labels)

plot_marker_HM(fcd = condor,
               expr_slot = "orig",
               marker_to_exclude = c("FSC-A","SSC-A"),
               cluster_slot = "phenograph_pca_orig_k_60",
               cluster_var = "Phenograph"
               )

Boxplot of cluster frequency

Arguments:

fcd = Flow cytometry data set

cluster_slot = String specifying which clustering slot to use to find variable specified in cluster_var

cluster_var = String specifying variable name in cluster_slot that identifies cell population labels to be used (e.g. clusters, metaclusters or predicted labels)

sample_var = String indicating variable name in cell_anno that defines sample IDs to be used

group_var = String indicating variable name in cell_anno that should be used to group samples in sample_var

numeric = Logical, if TRUE numeric levels in cluster_var are ordered in increasing order and “Cluster_” is pasted before number, if FALSE alphabetical ordering is applied.

plots <- plot_frequency_boxplot(fcd = condor,
                                cluster_slot = "phenograph_pca_orig_k_60", 
                                cluster_var = "Phenograph",
                                sample_var = "sample_ID", 
                                group_var = "group", 
                                numeric = T
                                )

plots$Cluster_7

What is next?

Depending on your data set cyCONDOR offers various options to continue with your analysis:

  • Try out more data visualization options: vignette("Data_Visualization")

  • Exploratory differential analysis of cell population frequencies and marker expression: vignette("Differential_Analysis"

  • See vignette("Batch_correction") for more details on how to handle batch effects within cyCONDOR

  • If you have a high number of samples recorded with the same panel, check out our data projection workflow for conveniently assigning clusters and metaclusters: vignette("Data_Projection")

  • Train a machine learning classifier: vignette("Machine_learning_classifier)

  • Calculate cell trajectories and pseudotime: vignette("Pseudotime_analysis")

  • Import your FlowJo gating hierarchy into your fcd: vignette("Load_a_FlowJo_workspace.Rmd")

Session Info

info <- sessionInfo()

info
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] cyCONDOR_0.2.0
## 
## loaded via a namespace (and not attached):
##   [1] IRanges_2.34.1              Rmisc_1.5.1                
##   [3] urlchecker_1.0.1            nnet_7.3-19                
##   [5] CytoNorm_2.0.1              TH.data_1.1-2              
##   [7] vctrs_0.6.4                 digest_0.6.33              
##   [9] png_0.1-8                   shape_1.4.6                
##  [11] proxy_0.4-27                slingshot_2.8.0            
##  [13] ggrepel_0.9.4               parallelly_1.36.0          
##  [15] MASS_7.3-60                 pkgdown_2.0.7              
##  [17] reshape2_1.4.4              httpuv_1.6.12              
##  [19] foreach_1.5.2               BiocGenerics_0.46.0        
##  [21] withr_2.5.1                 ggrastr_1.0.2              
##  [23] xfun_0.40                   ggpubr_0.6.0               
##  [25] ellipsis_0.3.2              survival_3.5-7             
##  [27] memoise_2.0.1               hexbin_1.28.3              
##  [29] ggbeeswarm_0.7.2            RProtoBufLib_2.12.1        
##  [31] princurve_2.1.6             profvis_0.3.8              
##  [33] ggsci_3.0.0                 systemfonts_1.0.5          
##  [35] ragg_1.2.6                  zoo_1.8-12                 
##  [37] GlobalOptions_0.1.2         DEoptimR_1.1-3             
##  [39] Formula_1.2-5               prettyunits_1.2.0          
##  [41] promises_1.2.1              scatterplot3d_0.3-44       
##  [43] rstatix_0.7.2               globals_0.16.2             
##  [45] ps_1.7.5                    rstudioapi_0.15.0          
##  [47] miniUI_0.1.1.1              generics_0.1.3             
##  [49] ggcyto_1.28.1               base64enc_0.1-3            
##  [51] processx_3.8.2              curl_5.1.0                 
##  [53] S4Vectors_0.38.2            zlibbioc_1.46.0            
##  [55] flowWorkspace_4.12.2        polyclip_1.10-6            
##  [57] randomForest_4.7-1.1        GenomeInfoDbData_1.2.10    
##  [59] RBGL_1.76.0                 ncdfFlow_2.46.0            
##  [61] RcppEigen_0.3.3.9.4         xtable_1.8-4               
##  [63] stringr_1.5.0               desc_1.4.2                 
##  [65] doParallel_1.0.17           evaluate_0.22              
##  [67] S4Arrays_1.0.6              hms_1.1.3                  
##  [69] glmnet_4.1-8                GenomicRanges_1.52.1       
##  [71] irlba_2.3.5.1               colorspace_2.1-0           
##  [73] harmony_1.1.0               reticulate_1.34.0          
##  [75] readxl_1.4.3                magrittr_2.0.3             
##  [77] lmtest_0.9-40               readr_2.1.4                
##  [79] Rgraphviz_2.44.0            later_1.3.1                
##  [81] lattice_0.22-5              future.apply_1.11.0        
##  [83] robustbase_0.99-0           XML_3.99-0.15              
##  [85] cowplot_1.1.1               matrixStats_1.1.0          
##  [87] RcppAnnoy_0.0.21            xts_0.13.1                 
##  [89] class_7.3-22                Hmisc_5.1-1                
##  [91] pillar_1.9.0                nlme_3.1-163               
##  [93] iterators_1.0.14            compiler_4.3.1             
##  [95] RSpectra_0.16-1             stringi_1.7.12             
##  [97] gower_1.0.1                 minqa_1.2.6                
##  [99] SummarizedExperiment_1.30.2 lubridate_1.9.3            
## [101] devtools_2.4.5              CytoML_2.12.0              
## [103] plyr_1.8.9                  crayon_1.5.2               
## [105] abind_1.4-5                 locfit_1.5-9.8             
## [107] sp_2.1-1                    sandwich_3.0-2             
## [109] pcaMethods_1.92.0           dplyr_1.1.3                
## [111] codetools_0.2-19            multcomp_1.4-25            
## [113] textshaping_0.3.7           recipes_1.0.8              
## [115] openssl_2.1.1               Rphenograph_0.99.1         
## [117] TTR_0.24.3                  bslib_0.5.1                
## [119] e1071_1.7-13                destiny_3.14.0             
## [121] GetoptLong_1.0.5            ggplot.multistats_1.0.0    
## [123] mime_0.12                   splines_4.3.1              
## [125] circlize_0.4.15             Rcpp_1.0.11                
## [127] sparseMatrixStats_1.12.2    cellranger_1.1.0           
## [129] knitr_1.44                  utf8_1.2.4                 
## [131] clue_0.3-65                 lme4_1.1-35.1              
## [133] fs_1.6.3                    listenv_0.9.0              
## [135] checkmate_2.3.0             DelayedMatrixStats_1.22.6  
## [137] pkgbuild_1.4.2              ggsignif_0.6.4             
## [139] tibble_3.2.1                Matrix_1.6-1.1             
## [141] rpart.plot_3.1.1            callr_3.7.3                
## [143] tzdb_0.4.0                  tweenr_2.0.2               
## [145] pkgconfig_2.0.3             pheatmap_1.0.12            
## [147] tools_4.3.1                 cachem_1.0.8               
## [149] smoother_1.1                fastmap_1.1.1              
## [151] rmarkdown_2.25              scales_1.2.1               
## [153] grid_4.3.1                  usethis_2.2.2              
## [155] broom_1.0.5                 sass_0.4.7                 
## [157] graph_1.78.0                carData_3.0-5              
## [159] RANN_2.6.1                  rpart_4.1.21               
## [161] farver_2.1.1                yaml_2.3.7                 
## [163] MatrixGenerics_1.12.3       foreign_0.8-85             
## [165] ggthemes_4.2.4              cli_3.6.1                  
## [167] purrr_1.0.2                 stats4_4.3.1               
## [169] lifecycle_1.0.3             uwot_0.1.16                
## [171] askpass_1.2.0               caret_6.0-94               
## [173] Biobase_2.60.0              mvtnorm_1.2-3              
## [175] lava_1.7.3                  sessioninfo_1.2.2          
## [177] backports_1.4.1             cytolib_2.12.1             
## [179] timechange_0.2.0            gtable_0.3.4               
## [181] rjson_0.2.21                umap_0.2.10.0              
## [183] ggridges_0.5.4              Rphenoannoy_0.1.0          
## [185] parallel_4.3.1              pROC_1.18.5                
## [187] limma_3.56.2                jsonlite_1.8.7             
## [189] edgeR_3.42.4                RcppHNSW_0.5.0             
## [191] bitops_1.0-7                ggplot2_3.4.4              
## [193] Rtsne_0.16                  FlowSOM_2.8.0              
## [195] ranger_0.16.0               flowCore_2.12.2            
## [197] jquerylib_0.1.4             timeDate_4022.108          
## [199] shiny_1.7.5.1               ConsensusClusterPlus_1.64.0
## [201] htmltools_0.5.6.1           diffcyt_1.20.0             
## [203] glue_1.6.2                  XVector_0.40.0             
## [205] VIM_6.2.2                   RCurl_1.98-1.13            
## [207] rprojroot_2.0.3             gridExtra_2.3              
## [209] boot_1.3-28.1               TrajectoryUtils_1.8.0      
## [211] igraph_1.5.1                R6_2.5.1                   
## [213] tidyr_1.3.0                 SingleCellExperiment_1.22.0
## [215] labeling_0.4.3              vcd_1.4-11                 
## [217] cluster_2.1.4               pkgload_1.3.3              
## [219] GenomeInfoDb_1.36.4         ipred_0.9-14               
## [221] nloptr_2.0.3                DelayedArray_0.26.7        
## [223] tidyselect_1.2.0            vipor_0.4.5                
## [225] htmlTable_2.4.2             ggforce_0.4.1              
## [227] CytoDx_1.20.0               car_3.1-2                  
## [229] future_1.33.0               ModelMetrics_1.2.2.2       
## [231] munsell_0.5.0               laeken_0.5.2               
## [233] data.table_1.14.8           htmlwidgets_1.6.2          
## [235] ComplexHeatmap_2.16.0       RColorBrewer_1.1-3         
## [237] rlang_1.1.1                 remotes_2.4.2.1            
## [239] colorRamps_2.3.1            ggnewscale_0.4.9           
## [241] fansi_1.0.5                 hardhat_1.3.0              
## [243] beeswarm_0.4.0              prodlim_2023.08.28