Skip to contents

In this vignette we introduce the structure of the condor object and showcase some useful cyCONDOR functions to interact with it.

Load an example dataset

condor <- readRDS("../.test_files/condor_example_016_misc.rds")

Structure of the condor object

Knowing the structure of one’s data object is a huge advantage to maximize the ease of using bioinformatic tools for analysis. Due to it’s straight-line composition, the structure of the condor object is easy to grasp. It follows an hierarchical structure with 3 levels (data type/method -> data slot -> variable) and can be separated into 5 major sections each representing one step of data acquiring or analysis (expression, cell metadata, dimensionality reduction, clustering and extras).


drawing

Graphic of the condor object structure. The hierarchical levels are depicted as columns and the the major sections are colored in.

Hierarchical structure

The 1st level describes the data types and methods present in the object followed by the 2nd level specifying separate data slots for the actual data stored as data frames (df). The 3rd level contains the variables (column names) of the respective df.

Overview of the 5 sections of a condor object

The data types $expr and $anno are created while data loading and transformation of the condor object is performed and serve as the basis for further data analysis.

Expression

The original, transformed expression values are saved in $expr under the data slot $orig, containing the cell markers as column names (variables) and unique cell IDs as row names. If Batch normalization is performed on the expression values the output is saved in a df under a new data slot ($norm).

Metadata

The metadata is saved under data type anno and data slot cell_anno. The variables of this df correspond to the provided cell annotation and can be used as the argument group_var in many visualization functions.

Dimensionality reductions

Each output of a dimensionalty reduction or clustering function will be saved as a df under their specified method (e.g. $pca, $umap, $clustering) and data slot (e.g. $orig, $pca_orig, $phenograph_pca_orig_k30). The variables of the dimensionality reductions (e.g. $PC1, $PC2) will be used by cyCondor automatically as coordinates for visualization embedding when the method and data slot are specified (arg: reduction_method and reduction_slot).

Clustering

After clustering a data slot will be created under the $clustering method, named with a combination of the relevant parameters used for the calculations (eg. phenograph_pca_orig_k30). The available variables (e.g. $Phenograph ) are used as a basis for cell labeling, later saved under the variable (metaclusters).

Extras

$extra contains all additional data to be stored (e.g. parameters of data loading, lists of markers, dimensionality reduction or clustering models for future data projection).

Extract or change marker names

Get measured markers

The function measured_markers takes the condor object as fcd input and returns the number of markers that are included in the condor object and a list of their names. By directing the output to a variable it is possible to save the list of the marker names for future use.

expr_markers <- measured_markers(fcd = condor)
## [1] "number of measured markers: 28"
##  [1] "FSC-A"         "SSC-A"         "CD38"          "CD8"          
##  [5] "CD195 (CCR5)"  "CD94 (KLRD1)"  "CD45RA"        "HLA-DR"       
##  [9] "CD56"          "CD127 (IL7RA)" "CD14"          "CD64"         
## [13] "CD4"           "IgD"           "CD19"          "CD16"         
## [17] "CD32"          "CD197 (CCR7)"  "CD20"          "CD27"         
## [21] "CD15"          "PD-1"          "CD3"           "CD57"         
## [25] "CD25"          "CD123 (IL3RA)" "CD13"          "CD11c"

Change parameter names

The function change_param_name allows for the quick and easy changing of single or multiple parameter names. It needs the condor object as fcd input and vectors for the old and new parameter names (old_names and new_names, respectively). In the first example we change only the name of the PD-1 marker to PD1.

condor <- change_param_name(fcd = condor, 
                            old_names = "PD-1", 
                            new_names = "PD1")
## [1] "Changed parameter 'PD-1' to 'PD1' in orig."

It is also possible to modify multiple names at the same time. The vector NewNames can either be written manually or computed using vector manipulations. In the second example below we exclude the protein names from the specific markers. It is important, that the order of the old and new marker names stay the same.

OldNames <- c("CD195 (CCR5)", "CD94 (KLRD1)", "CD127 (IL7RA)", "CD197 (CCR7)", "CD123 (IL3RA)")
NewNames <- unlist(strsplit(OldNames, " "))[2*(1:length(OldNames))-1] 

condor <- change_param_name(fcd = condor, 
                            old_names = OldNames, 
                            new_names = NewNames)
## [1] "Changed parameter 'CD195 (CCR5)' to 'CD195' in orig."
## [1] "Changed parameter 'CD94 (KLRD1)' to 'CD94' in orig."
## [1] "Changed parameter 'CD127 (IL7RA)' to 'CD127' in orig."
## [1] "Changed parameter 'CD197 (CCR7)' to 'CD197' in orig."
## [1] "Changed parameter 'CD123 (IL3RA)' to 'CD123' in orig."

Get used markers

To keep track on which markers have been used as basis for dimensionality reduction or clustering the respective markers are being saved in the extra slot of the condor object. The used_markers function can be used to extract those markers.

It takes as input

  • the fcd object (e.g. condor),
  • the input_type (pca, umap, tSNE, diffmap, phenograph or FlowSOM),
  • the data_slot (orig or norm),
  • the prefix (if specified before, see dimensionality reduction or clustering)

and returns, similar to the measured_markers function, the number and names of the markers used for the specific analysis step.

pca_orig_markers <- used_markers(fcd = condor, 
                                 input_type = "pca", 
                                 data_slot = "orig",
                                 prefix = NULL)
## [1] "number of used markers in pca_orig : 28"
##  [1] "FSC-A"  "SSC-A"  "CD38"   "CD8"    "CD195"  "CD94"   "CD45RA" "HLA-DR"
##  [9] "CD56"   "CD127"  "CD14"   "CD64"   "CD4"    "IgD"    "CD19"   "CD16"  
## [17] "CD32"   "CD197"  "CD20"   "CD27"   "CD15"   "PD1"    "CD3"    "CD57"  
## [25] "CD25"   "CD123"  "CD13"   "CD11c"

Below we show an example of markers used for the PCA calculation with an exclusion of the scatter markers FSC-A and SSC-A. The prefix used in this PCA calculation was defines as scatter_exclusion.

pca_scatter_exclusion_orig_markers <- used_markers(fcd = condor, 
                                 input_type = "pca", 
                                 data_slot = "orig",
                                 prefix = "scatter_exclusion")
## [1] "number of used markers in pca_scatter_exclusion_orig : 26"
##  [1] "CD38"   "CD8"    "CD195"  "CD94"   "CD45RA" "HLA-DR" "CD56"   "CD127" 
##  [9] "CD14"   "CD64"   "CD4"    "IgD"    "CD19"   "CD16"   "CD32"   "CD197" 
## [17] "CD20"   "CD27"   "CD15"   "PD1"    "CD3"    "CD57"   "CD25"   "CD123" 
## [25] "CD13"   "CD11c"

Check the integrity of the condor object

The check_IDs function can be useful to make sure the condor object has the right structure for all downstream analysis. It checks the cell IDs at each level and compares them to the fcd$expr$orig data frame. If a discrepancy appears at any point, a warning will be returned.

check_IDs(condor)
## [1] "Everything looks fine"

Merge or subset the condor object

Merge two condor objects

The merge_condor function combines two condor objects comprised of the same parameters (markers). This function will merge only expression table and annotation as all the downstream analysis will need to be repeated. If the cell IDs are doubled between the two objects the merging can not be facilitated.

condor_merged <- merge_condor(data1 = condor, 
                              data2 = condor)

Subset a condor object

The subset_fcd function subsets the condor object to a specific number of randomly selected cells specified with the size parameter. A seed can be set for reproducibility.

condor_subset <- subset_fcd(fcd = condor, 
                            size = 5000,
                            seed = 91)

Subset a condor object euqually for a variable

The subset_fcd_byparam function subsets the condor object to a specific number of randomly selected cells specified with the size parameter in each of the specified param. A seed can be set for reproducibility.

condor_subset_sample <- subset_fcd_byparam(fcd = condor, 
                                           param = "sample_ID", 
                                           size = 500, 
                                           seed = 91)

Filter a condor object to create a specific subset

The filter_fcd function can be useful to created a specific subset of a condor object. It takes the row names of the cells to be filtered as cell_ids input.

condor_filter <- filter_fcd(fcd = condor, 
                            cell_ids = rownames(condor$expr$orig)[condor$clustering$phenograph_pca_orig_k_60$metaclusters == "Classical Monocytes"])

Session Info

info <- sessionInfo()

info
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] cyCONDOR_0.2.0
## 
## loaded via a namespace (and not attached):
##   [1] IRanges_2.34.1              Rmisc_1.5.1                
##   [3] urlchecker_1.0.1            nnet_7.3-19                
##   [5] CytoNorm_2.0.1              TH.data_1.1-2              
##   [7] vctrs_0.6.4                 digest_0.6.33              
##   [9] png_0.1-8                   shape_1.4.6                
##  [11] proxy_0.4-27                slingshot_2.8.0            
##  [13] ggrepel_0.9.4               parallelly_1.36.0          
##  [15] MASS_7.3-60                 pkgdown_2.0.7              
##  [17] reshape2_1.4.4              httpuv_1.6.12              
##  [19] foreach_1.5.2               BiocGenerics_0.46.0        
##  [21] withr_2.5.1                 ggrastr_1.0.2              
##  [23] xfun_0.40                   ggpubr_0.6.0               
##  [25] ellipsis_0.3.2              survival_3.5-7             
##  [27] memoise_2.0.1               hexbin_1.28.3              
##  [29] ggbeeswarm_0.7.2            RProtoBufLib_2.12.1        
##  [31] princurve_2.1.6             profvis_0.3.8              
##  [33] ggsci_3.0.0                 systemfonts_1.0.5          
##  [35] ragg_1.2.6                  zoo_1.8-12                 
##  [37] GlobalOptions_0.1.2         DEoptimR_1.1-3             
##  [39] Formula_1.2-5               prettyunits_1.2.0          
##  [41] promises_1.2.1              scatterplot3d_0.3-44       
##  [43] rstatix_0.7.2               globals_0.16.2             
##  [45] ps_1.7.5                    rstudioapi_0.15.0          
##  [47] miniUI_0.1.1.1              generics_0.1.3             
##  [49] ggcyto_1.28.1               base64enc_0.1-3            
##  [51] processx_3.8.2              curl_5.1.0                 
##  [53] S4Vectors_0.38.2            zlibbioc_1.46.0            
##  [55] flowWorkspace_4.12.2        polyclip_1.10-6            
##  [57] randomForest_4.7-1.1        GenomeInfoDbData_1.2.10    
##  [59] RBGL_1.76.0                 ncdfFlow_2.46.0            
##  [61] RcppEigen_0.3.3.9.4         xtable_1.8-4               
##  [63] stringr_1.5.0               desc_1.4.2                 
##  [65] doParallel_1.0.17           evaluate_0.22              
##  [67] S4Arrays_1.0.6              hms_1.1.3                  
##  [69] glmnet_4.1-8                GenomicRanges_1.52.1       
##  [71] irlba_2.3.5.1               colorspace_2.1-0           
##  [73] harmony_1.1.0               reticulate_1.34.0          
##  [75] readxl_1.4.3                magrittr_2.0.3             
##  [77] lmtest_0.9-40               readr_2.1.4                
##  [79] Rgraphviz_2.44.0            later_1.3.1                
##  [81] lattice_0.22-5              future.apply_1.11.0        
##  [83] robustbase_0.99-0           XML_3.99-0.15              
##  [85] cowplot_1.1.1               matrixStats_1.1.0          
##  [87] xts_0.13.1                  class_7.3-22               
##  [89] Hmisc_5.1-1                 pillar_1.9.0               
##  [91] nlme_3.1-163                iterators_1.0.14           
##  [93] compiler_4.3.1              RSpectra_0.16-1            
##  [95] stringi_1.7.12              gower_1.0.1                
##  [97] minqa_1.2.6                 SummarizedExperiment_1.30.2
##  [99] lubridate_1.9.3             devtools_2.4.5             
## [101] CytoML_2.12.0               plyr_1.8.9                 
## [103] crayon_1.5.2                abind_1.4-5                
## [105] locfit_1.5-9.8              sp_2.1-1                   
## [107] sandwich_3.0-2              pcaMethods_1.92.0          
## [109] dplyr_1.1.3                 codetools_0.2-19           
## [111] multcomp_1.4-25             textshaping_0.3.7          
## [113] recipes_1.0.8               openssl_2.1.1              
## [115] Rphenograph_0.99.1          TTR_0.24.3                 
## [117] bslib_0.5.1                 e1071_1.7-13               
## [119] destiny_3.14.0              GetoptLong_1.0.5           
## [121] ggplot.multistats_1.0.0     mime_0.12                  
## [123] splines_4.3.1               circlize_0.4.15            
## [125] Rcpp_1.0.11                 sparseMatrixStats_1.12.2   
## [127] cellranger_1.1.0            knitr_1.44                 
## [129] utf8_1.2.4                  clue_0.3-65                
## [131] lme4_1.1-35.1               fs_1.6.3                   
## [133] listenv_0.9.0               checkmate_2.3.0            
## [135] DelayedMatrixStats_1.22.6   pkgbuild_1.4.2             
## [137] ggsignif_0.6.4              tibble_3.2.1               
## [139] Matrix_1.6-1.1              rpart.plot_3.1.1           
## [141] callr_3.7.3                 tzdb_0.4.0                 
## [143] tweenr_2.0.2                pkgconfig_2.0.3            
## [145] pheatmap_1.0.12             tools_4.3.1                
## [147] cachem_1.0.8                smoother_1.1               
## [149] fastmap_1.1.1               rmarkdown_2.25             
## [151] scales_1.2.1                grid_4.3.1                 
## [153] usethis_2.2.2               broom_1.0.5                
## [155] sass_0.4.7                  graph_1.78.0               
## [157] carData_3.0-5               RANN_2.6.1                 
## [159] rpart_4.1.21                farver_2.1.1               
## [161] yaml_2.3.7                  MatrixGenerics_1.12.3      
## [163] foreign_0.8-85              ggthemes_4.2.4             
## [165] cli_3.6.1                   purrr_1.0.2                
## [167] stats4_4.3.1                lifecycle_1.0.3            
## [169] uwot_0.1.16                 askpass_1.2.0              
## [171] caret_6.0-94                Biobase_2.60.0             
## [173] mvtnorm_1.2-3               lava_1.7.3                 
## [175] sessioninfo_1.2.2           backports_1.4.1            
## [177] cytolib_2.12.1              timechange_0.2.0           
## [179] gtable_0.3.4                rjson_0.2.21               
## [181] umap_0.2.10.0               ggridges_0.5.4             
## [183] parallel_4.3.1              pROC_1.18.5                
## [185] limma_3.56.2                jsonlite_1.8.7             
## [187] edgeR_3.42.4                RcppHNSW_0.5.0             
## [189] bitops_1.0-7                ggplot2_3.4.4              
## [191] Rtsne_0.16                  FlowSOM_2.8.0              
## [193] ranger_0.16.0               flowCore_2.12.2            
## [195] jquerylib_0.1.4             timeDate_4022.108          
## [197] shiny_1.7.5.1               ConsensusClusterPlus_1.64.0
## [199] htmltools_0.5.6.1           diffcyt_1.20.0             
## [201] glue_1.6.2                  XVector_0.40.0             
## [203] VIM_6.2.2                   RCurl_1.98-1.13            
## [205] rprojroot_2.0.3             gridExtra_2.3              
## [207] boot_1.3-28.1               TrajectoryUtils_1.8.0      
## [209] igraph_1.5.1                R6_2.5.1                   
## [211] tidyr_1.3.0                 SingleCellExperiment_1.22.0
## [213] vcd_1.4-11                  cluster_2.1.4              
## [215] pkgload_1.3.3               GenomeInfoDb_1.36.4        
## [217] ipred_0.9-14                nloptr_2.0.3               
## [219] DelayedArray_0.26.7         tidyselect_1.2.0           
## [221] vipor_0.4.5                 htmlTable_2.4.2            
## [223] ggforce_0.4.1               CytoDx_1.20.0              
## [225] car_3.1-2                   future_1.33.0              
## [227] ModelMetrics_1.2.2.2        munsell_0.5.0              
## [229] laeken_0.5.2                data.table_1.14.8          
## [231] htmlwidgets_1.6.2           ComplexHeatmap_2.16.0      
## [233] RColorBrewer_1.1-3          rlang_1.1.1                
## [235] remotes_2.4.2.1             colorRamps_2.3.1           
## [237] ggnewscale_0.4.9            fansi_1.0.5                
## [239] hardhat_1.3.0               beeswarm_0.4.0             
## [241] prodlim_2023.08.28