Introduction to the condor object and other utilities
Source:vignettes/Other_utilities.Rmd
Other_utilities.Rmd
In this vignette we introduce the structure of the condor object and
showcase some useful cyCONDOR
functions to interact with
it.
Load an example dataset
condor <- readRDS("../.test_files/condor_example_016_misc.rds")
Structure of the condor
object
Knowing the structure of one’s data object is a huge advantage to
maximize the ease of using bioinformatic tools for analysis. Due to it’s
straight-line composition, the structure of the condor
object is easy to grasp. It follows an hierarchical structure with 3
levels (data type/method -> data slot -> variable) and can be
separated into 5 major sections each representing one step of data
acquiring or analysis (expression, cell metadata, dimensionality
reduction, clustering and extras).
Graphic of the condor object structure. The hierarchical levels are depicted as columns and the the major sections are colored in.
Hierarchical structure
The 1st level describes the data types and methods present in the object followed by the 2nd level specifying separate data slots for the actual data stored as data frames (df). The 3rd level contains the variables (column names) of the respective df.
Overview of the 5 sections of a condor
object
The data types $expr
and $anno
are created
while data loading and transformation of the condor
object
is performed and serve as the basis for further data analysis.
Expression
The original, transformed expression values are saved in
$expr
under the data slot $orig
, containing
the cell markers as column names (variables) and unique cell IDs as row
names. If Batch normalization is performed on the expression values the
output is saved in a df under a new data slot ($norm
).
Metadata
The metadata is saved under data type anno
and data slot
cell_anno
. The variables of this df correspond to the
provided cell annotation and can be used as the argument
group_var
in many visualization functions.
Dimensionality reductions
Each output of a dimensionalty reduction or clustering function will
be saved as a df under their specified method (e.g. $pca
,
$umap
, $clustering
) and data slot
(e.g. $orig
, $pca_orig
,
$phenograph_pca_orig_k30
). The variables of the
dimensionality reductions (e.g. $PC1
, $PC2
)
will be used by cyCondor
automatically as coordinates for
visualization embedding when the method and data slot are specified
(arg: reduction_method
and
reduction_slot
).
Clustering
After clustering a data slot will be created under the
$clustering
method, named with a combination of the
relevant parameters used for the calculations (eg.
phenograph_pca_orig_k30
). The available variables
(e.g. $Phenograph
) are used as a basis for cell labeling,
later saved under the variable (metaclusters
).
Extract or change marker names
Get measured markers
The function measured_markers
takes the condor object as
fcd
input and returns the number of markers that are
included in the condor object and a list of their names. By directing
the output to a variable it is possible to save the list of the marker
names for future use.
expr_markers <- measured_markers(fcd = condor)
## [1] "number of measured markers: 28"
## [1] "FSC-A" "SSC-A" "CD38" "CD8"
## [5] "CD195 (CCR5)" "CD94 (KLRD1)" "CD45RA" "HLA-DR"
## [9] "CD56" "CD127 (IL7RA)" "CD14" "CD64"
## [13] "CD4" "IgD" "CD19" "CD16"
## [17] "CD32" "CD197 (CCR7)" "CD20" "CD27"
## [21] "CD15" "PD-1" "CD3" "CD57"
## [25] "CD25" "CD123 (IL3RA)" "CD13" "CD11c"
Change parameter names
The function change_param_name
allows for the quick and
easy changing of single or multiple parameter names. It needs the condor
object as fcd
input and vectors for the old and new
parameter names (old_names
and new_names
,
respectively). In the first example we change only the name of the
PD-1 marker to PD1.
condor <- change_param_name(fcd = condor,
old_names = "PD-1",
new_names = "PD1")
## [1] "Changed parameter 'PD-1' to 'PD1' in orig."
It is also possible to modify multiple names at the same time. The vector NewNames can either be written manually or computed using vector manipulations. In the second example below we exclude the protein names from the specific markers. It is important, that the order of the old and new marker names stay the same.
OldNames <- c("CD195 (CCR5)", "CD94 (KLRD1)", "CD127 (IL7RA)", "CD197 (CCR7)", "CD123 (IL3RA)")
NewNames <- unlist(strsplit(OldNames, " "))[2*(1:length(OldNames))-1]
condor <- change_param_name(fcd = condor,
old_names = OldNames,
new_names = NewNames)
## [1] "Changed parameter 'CD195 (CCR5)' to 'CD195' in orig."
## [1] "Changed parameter 'CD94 (KLRD1)' to 'CD94' in orig."
## [1] "Changed parameter 'CD127 (IL7RA)' to 'CD127' in orig."
## [1] "Changed parameter 'CD197 (CCR7)' to 'CD197' in orig."
## [1] "Changed parameter 'CD123 (IL3RA)' to 'CD123' in orig."
Get used markers
To keep track on which markers have been used as basis for
dimensionality reduction or clustering the respective markers are being
saved in the extra slot of the condor
object. The
used_markers
function can be used to extract those
markers.
It takes as input
- the
fcd
object (e.g. condor), - the
input_type
(pca, umap, tSNE, diffmap, phenograph or FlowSOM), - the
data_slot
(orig or norm), - the
prefix
(if specified before, see dimensionality reduction or clustering)
and returns, similar to the measured_markers
function,
the number and names of the markers used for the specific analysis
step.
pca_orig_markers <- used_markers(fcd = condor,
input_type = "pca",
data_slot = "orig",
prefix = NULL)
## [1] "number of used markers in pca_orig : 28"
## [1] "FSC-A" "SSC-A" "CD38" "CD8" "CD195" "CD94" "CD45RA" "HLA-DR"
## [9] "CD56" "CD127" "CD14" "CD64" "CD4" "IgD" "CD19" "CD16"
## [17] "CD32" "CD197" "CD20" "CD27" "CD15" "PD1" "CD3" "CD57"
## [25] "CD25" "CD123" "CD13" "CD11c"
Below we show an example of markers used for the PCA calculation with
an exclusion of the scatter markers FSC-A
and
SSC-A
. The prefix
used in this PCA calculation
was defines as scatter_exclusion.
pca_scatter_exclusion_orig_markers <- used_markers(fcd = condor,
input_type = "pca",
data_slot = "orig",
prefix = "scatter_exclusion")
## [1] "number of used markers in pca_scatter_exclusion_orig : 26"
## [1] "CD38" "CD8" "CD195" "CD94" "CD45RA" "HLA-DR" "CD56" "CD127"
## [9] "CD14" "CD64" "CD4" "IgD" "CD19" "CD16" "CD32" "CD197"
## [17] "CD20" "CD27" "CD15" "PD1" "CD3" "CD57" "CD25" "CD123"
## [25] "CD13" "CD11c"
Check the integrity of the condor
object
The check_IDs
function can be useful to make sure the
condor object has the right structure for all downstream analysis. It
checks the cell IDs at each level and compares them to the
fcd$expr$orig
data frame. If a discrepancy appears at any
point, a warning will be returned.
check_IDs(condor)
## [1] "Everything looks fine"
Merge or subset the condor
object
Merge two condor
objects
The merge_condor
function combines two
condor
objects comprised of the same parameters (markers).
This function will merge only expression table and annotation as all the
downstream analysis will need to be repeated. If the cell IDs are
doubled between the two objects the merging can not be facilitated.
condor_merged <- merge_condor(data1 = condor,
data2 = condor)
Subset a condor
object
The subset_fcd
function subsets the condor
object to a specific number of randomly selected cells specified with
the size
parameter. A seed can be set for
reproducibility.
condor_subset <- subset_fcd(fcd = condor,
size = 5000,
seed = 91)
Subset a condor
object euqually for a variable
The subset_fcd_byparam
function subsets the
condor
object to a specific number of randomly selected
cells specified with the size
parameter in each of the
specified param
. A seed can be set for reproducibility.
condor_subset_sample <- subset_fcd_byparam(fcd = condor,
param = "sample_ID",
size = 500,
seed = 91)
Filter a condor
object to create a specific subset
The filter_fcd
function can be useful to created a
specific subset of a condor
object. It takes the row names
of the cells to be filtered as cell_ids
input.
condor_filter <- filter_fcd(fcd = condor,
cell_ids = rownames(condor$expr$orig)[condor$clustering$phenograph_pca_orig_k_60$metaclusters == "Classical Monocytes"])
Session Info
info <- sessionInfo()
info
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] cyCONDOR_0.2.1
##
## loaded via a namespace (and not attached):
## [1] IRanges_2.34.1 Rmisc_1.5.1
## [3] urlchecker_1.0.1 nnet_7.3-19
## [5] CytoNorm_2.0.1 TH.data_1.1-2
## [7] vctrs_0.6.4 digest_0.6.33
## [9] png_0.1-8 shape_1.4.6
## [11] proxy_0.4-27 slingshot_2.8.0
## [13] ggrepel_0.9.4 parallelly_1.36.0
## [15] MASS_7.3-60 pkgdown_2.0.7
## [17] reshape2_1.4.4 httpuv_1.6.12
## [19] foreach_1.5.2 BiocGenerics_0.46.0
## [21] withr_2.5.1 ggrastr_1.0.2
## [23] xfun_0.40 ggpubr_0.6.0
## [25] ellipsis_0.3.2 survival_3.5-7
## [27] memoise_2.0.1 hexbin_1.28.3
## [29] ggbeeswarm_0.7.2 RProtoBufLib_2.12.1
## [31] princurve_2.1.6 profvis_0.3.8
## [33] ggsci_3.0.0 systemfonts_1.0.5
## [35] ragg_1.2.6 zoo_1.8-12
## [37] GlobalOptions_0.1.2 DEoptimR_1.1-3
## [39] Formula_1.2-5 prettyunits_1.2.0
## [41] promises_1.2.1 scatterplot3d_0.3-44
## [43] rstatix_0.7.2 globals_0.16.2
## [45] ps_1.7.5 rstudioapi_0.15.0
## [47] miniUI_0.1.1.1 generics_0.1.3
## [49] ggcyto_1.28.1 base64enc_0.1-3
## [51] processx_3.8.2 curl_5.1.0
## [53] S4Vectors_0.38.2 zlibbioc_1.46.0
## [55] flowWorkspace_4.12.2 polyclip_1.10-6
## [57] randomForest_4.7-1.1 GenomeInfoDbData_1.2.10
## [59] RBGL_1.76.0 ncdfFlow_2.46.0
## [61] RcppEigen_0.3.3.9.4 xtable_1.8-4
## [63] stringr_1.5.0 desc_1.4.2
## [65] doParallel_1.0.17 evaluate_0.22
## [67] S4Arrays_1.0.6 hms_1.1.3
## [69] glmnet_4.1-8 GenomicRanges_1.52.1
## [71] irlba_2.3.5.1 colorspace_2.1-0
## [73] harmony_1.1.0 reticulate_1.34.0
## [75] readxl_1.4.3 magrittr_2.0.3
## [77] lmtest_0.9-40 readr_2.1.4
## [79] Rgraphviz_2.44.0 later_1.3.1
## [81] lattice_0.22-5 future.apply_1.11.0
## [83] robustbase_0.99-0 XML_3.99-0.15
## [85] cowplot_1.1.1 matrixStats_1.1.0
## [87] xts_0.13.1 class_7.3-22
## [89] Hmisc_5.1-1 pillar_1.9.0
## [91] nlme_3.1-163 iterators_1.0.14
## [93] compiler_4.3.1 RSpectra_0.16-1
## [95] stringi_1.7.12 gower_1.0.1
## [97] minqa_1.2.6 SummarizedExperiment_1.30.2
## [99] lubridate_1.9.3 devtools_2.4.5
## [101] CytoML_2.12.0 plyr_1.8.9
## [103] crayon_1.5.2 abind_1.4-5
## [105] locfit_1.5-9.8 sp_2.1-1
## [107] sandwich_3.0-2 pcaMethods_1.92.0
## [109] dplyr_1.1.3 codetools_0.2-19
## [111] multcomp_1.4-25 textshaping_0.3.7
## [113] recipes_1.0.8 openssl_2.1.1
## [115] Rphenograph_0.99.1 TTR_0.24.3
## [117] bslib_0.5.1 e1071_1.7-13
## [119] destiny_3.14.0 GetoptLong_1.0.5
## [121] ggplot.multistats_1.0.0 mime_0.12
## [123] splines_4.3.1 circlize_0.4.15
## [125] Rcpp_1.0.11 sparseMatrixStats_1.12.2
## [127] cellranger_1.1.0 knitr_1.44
## [129] utf8_1.2.4 clue_0.3-65
## [131] lme4_1.1-35.1 fs_1.6.3
## [133] listenv_0.9.0 checkmate_2.3.0
## [135] DelayedMatrixStats_1.22.6 pkgbuild_1.4.2
## [137] ggsignif_0.6.4 tibble_3.2.1
## [139] Matrix_1.6-1.1 rpart.plot_3.1.1
## [141] callr_3.7.3 tzdb_0.4.0
## [143] tweenr_2.0.2 pkgconfig_2.0.3
## [145] pheatmap_1.0.12 tools_4.3.1
## [147] cachem_1.0.8 smoother_1.1
## [149] fastmap_1.1.1 rmarkdown_2.25
## [151] scales_1.2.1 grid_4.3.1
## [153] usethis_2.2.2 broom_1.0.5
## [155] sass_0.4.7 graph_1.78.0
## [157] carData_3.0-5 RANN_2.6.1
## [159] rpart_4.1.21 farver_2.1.1
## [161] yaml_2.3.7 MatrixGenerics_1.12.3
## [163] foreign_0.8-85 ggthemes_4.2.4
## [165] cli_3.6.1 purrr_1.0.2
## [167] stats4_4.3.1 lifecycle_1.0.3
## [169] uwot_0.1.16 askpass_1.2.0
## [171] caret_6.0-94 Biobase_2.60.0
## [173] mvtnorm_1.2-3 lava_1.7.3
## [175] sessioninfo_1.2.2 backports_1.4.1
## [177] cytolib_2.12.1 timechange_0.2.0
## [179] gtable_0.3.4 rjson_0.2.21
## [181] umap_0.2.10.0 ggridges_0.5.4
## [183] parallel_4.3.1 pROC_1.18.5
## [185] limma_3.56.2 jsonlite_1.8.7
## [187] edgeR_3.42.4 RcppHNSW_0.5.0
## [189] bitops_1.0-7 ggplot2_3.4.4
## [191] Rtsne_0.16 FlowSOM_2.8.0
## [193] ranger_0.16.0 flowCore_2.12.2
## [195] jquerylib_0.1.4 timeDate_4022.108
## [197] shiny_1.7.5.1 ConsensusClusterPlus_1.64.0
## [199] htmltools_0.5.6.1 diffcyt_1.20.0
## [201] glue_1.6.2 XVector_0.40.0
## [203] VIM_6.2.2 RCurl_1.98-1.13
## [205] rprojroot_2.0.3 gridExtra_2.3
## [207] boot_1.3-28.1 TrajectoryUtils_1.8.0
## [209] igraph_1.5.1 R6_2.5.1
## [211] tidyr_1.3.0 SingleCellExperiment_1.22.0
## [213] vcd_1.4-11 cluster_2.1.4
## [215] pkgload_1.3.3 GenomeInfoDb_1.36.4
## [217] ipred_0.9-14 nloptr_2.0.3
## [219] DelayedArray_0.26.7 tidyselect_1.2.0
## [221] vipor_0.4.5 htmlTable_2.4.2
## [223] ggforce_0.4.1 CytoDx_1.20.0
## [225] car_3.1-2 future_1.33.0
## [227] ModelMetrics_1.2.2.2 munsell_0.5.0
## [229] laeken_0.5.2 data.table_1.14.8
## [231] htmlwidgets_1.6.2 ComplexHeatmap_2.16.0
## [233] RColorBrewer_1.1-3 rlang_1.1.1
## [235] remotes_2.4.2.1 colorRamps_2.3.1
## [237] ggnewscale_0.4.9 fansi_1.0.5
## [239] hardhat_1.3.0 beeswarm_0.4.0
## [241] prodlim_2023.08.28