Data Loading and Transformation
Data_Loading_and_Transformation.Rmd
How to load data
cyCONDOR
provides an integrated function to prepare a
condor
object (flow cytometry dataset) starting from the
input files in either .fcs
or .csv
format. All
files should be saved in a single directory which path should be stated
in data_path
. The user can define the number of (cells) to
process from each file within ‘max_cell’. If the input type is
.csv
the useCSV
setting should be set to
TRUE
. Is important to keep in mind that currently all files
in the data_path
folder are loaded independently from which
are also included in the annotation table, this can induce slight
differences in auto-logicle transformation, to avoid this only include
in the data_path
the files you plan to analyse.
For data transformation cyCONDOR
provides different
options:
-
auto_logi
: For HDFC and Spectral Flow data (recommended, auto-logicle transformation). This transformation gives good results also with cyTOF data, especially if you are experiencing a lot of noise witharcsinh
due to negative values. auto-logicle transformation is inherited from theCytofkit
package Chen er al. 2016. -
clr
: Recommended for CITE-seq data (centered log ratio transformation) -
arcsinh
: arcsinh transformation with co-factor 5, common transformation for cyTOF data.
The last important piece to build a condor
object is the
annotation table. The annotation table should contain all necessary
metadata used for analysis as well a column containing the names of the
input files and should be supplied as .csv
file. The column
containing the file names should be stated in ‘filename_col’. Below an
exemplary metadata table is shown.
read.csv("../.test_files/metadata.csv")
## filename sample_ID group batch
## 1 ID1.fcs ID1 ctrl Day1
## 2 ID2.fcs ID2 pat Day1
## 3 ID3.fcs ID3 ctrl Day2
## 4 ID4.fcs ID4 pat Day2
## 5 ID5.fcs ID5 ctrl Day2
## 6 ID6.fcs ID6 pat Day2
## 7 ID7.fcs ID7 ctrl Day3
## 8 ID8.fcs ID8 pat Day3
## 9 ID9.fcs ID9 ctrl Day3
## 10 ID10.fcs ID10 pat Day3
Unwanted parameters that are not important for the downstream
analysis (e.g. Time) and should be removed can be listed in
remove_param
. In the prep_fcd
function we also
set a seed
for reproducibility since the subsetting to
max_cell
is otherwise randomized.
Session Info
info <- sessionInfo()
info
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] cyCONDOR_0.3.0
##
## loaded via a namespace (and not attached):
## [1] IRanges_2.40.1 Rmisc_1.5.1
## [3] urlchecker_1.0.1 nnet_7.3-20
## [5] CytoNorm_2.0.1 TH.data_1.1-3
## [7] vctrs_0.6.5 digest_0.6.37
## [9] png_0.1-8 shape_1.4.6.1
## [11] proxy_0.4-27 slingshot_2.14.0
## [13] ggrepel_0.9.6 corrplot_0.95
## [15] parallelly_1.45.0 MASS_7.3-65
## [17] pkgdown_2.1.3 reshape2_1.4.4
## [19] httpuv_1.6.16 foreach_1.5.2
## [21] BiocGenerics_0.52.0 withr_3.0.2
## [23] ggrastr_1.0.2 xfun_0.52
## [25] ggpubr_0.6.1 ellipsis_0.3.2
## [27] survival_3.8-3 memoise_2.0.1
## [29] hexbin_1.28.5 ggbeeswarm_0.7.2
## [31] RProtoBufLib_2.18.0 princurve_2.1.6
## [33] profvis_0.4.0 ggsci_3.2.0
## [35] systemfonts_1.2.3 ragg_1.4.0
## [37] zoo_1.8-14 GlobalOptions_0.1.2
## [39] DEoptimR_1.1-3-1 Formula_1.2-5
## [41] promises_1.3.3 scatterplot3d_0.3-44
## [43] httr_1.4.7 rstatix_0.7.2
## [45] globals_0.18.0 rstudioapi_0.17.1
## [47] UCSC.utils_1.2.0 miniUI_0.1.2
## [49] generics_0.1.4 ggcyto_1.34.0
## [51] base64enc_0.1-3 curl_6.4.0
## [53] S4Vectors_0.44.0 zlibbioc_1.52.0
## [55] flowWorkspace_4.18.1 polyclip_1.10-7
## [57] randomForest_4.7-1.2 GenomeInfoDbData_1.2.13
## [59] SparseArray_1.6.2 RBGL_1.82.0
## [61] ncdfFlow_2.52.1 RcppEigen_0.3.4.0.2
## [63] xtable_1.8-4 stringr_1.5.1
## [65] desc_1.4.3 doParallel_1.0.17
## [67] evaluate_1.0.4 S4Arrays_1.6.0
## [69] hms_1.1.3 glmnet_4.1-9
## [71] GenomicRanges_1.58.0 irlba_2.3.5.1
## [73] colorspace_2.1-1 harmony_1.2.3
## [75] reticulate_1.42.0 readxl_1.4.5
## [77] magrittr_2.0.3 lmtest_0.9-40
## [79] readr_2.1.5 Rgraphviz_2.50.0
## [81] later_1.4.2 lattice_0.22-7
## [83] future.apply_1.20.0 robustbase_0.99-4-1
## [85] XML_3.99-0.18 cowplot_1.2.0
## [87] matrixStats_1.5.0 xts_0.14.1
## [89] class_7.3-23 Hmisc_5.2-3
## [91] pillar_1.11.0 nlme_3.1-168
## [93] iterators_1.0.14 compiler_4.4.2
## [95] RSpectra_0.16-2 stringi_1.8.7
## [97] gower_1.0.2 minqa_1.2.8
## [99] SummarizedExperiment_1.36.0 lubridate_1.9.4
## [101] devtools_2.4.5 CytoML_2.18.3
## [103] plyr_1.8.9 crayon_1.5.3
## [105] abind_1.4-8 locfit_1.5-9.12
## [107] sp_2.2-0 sandwich_3.1-1
## [109] pcaMethods_1.98.0 dplyr_1.1.4
## [111] codetools_0.2-20 multcomp_1.4-28
## [113] textshaping_1.0.1 recipes_1.3.1
## [115] openssl_2.3.3 Rphenograph_0.99.1
## [117] TTR_0.24.4 bslib_0.9.0
## [119] e1071_1.7-16 destiny_3.20.0
## [121] GetoptLong_1.0.5 ggplot.multistats_1.0.1
## [123] mime_0.13 splines_4.4.2
## [125] circlize_0.4.16 Rcpp_1.1.0
## [127] sparseMatrixStats_1.18.0 cellranger_1.1.0
## [129] knitr_1.50 clue_0.3-66
## [131] lme4_1.1-37 fs_1.6.6
## [133] listenv_0.9.1 checkmate_2.3.2
## [135] DelayedMatrixStats_1.28.1 Rdpack_2.6.4
## [137] pkgbuild_1.4.8 ggsignif_0.6.4
## [139] tibble_3.3.0 Matrix_1.7-3
## [141] rpart.plot_3.1.2 statmod_1.5.0
## [143] tzdb_0.5.0 tweenr_2.0.3
## [145] pkgconfig_2.0.3 pheatmap_1.0.13
## [147] tools_4.4.2 cachem_1.1.0
## [149] rbibutils_2.3 smoother_1.3
## [151] fastmap_1.2.0 rmarkdown_2.29
## [153] scales_1.4.0 grid_4.4.2
## [155] usethis_3.1.0 broom_1.0.8
## [157] sass_0.4.10 graph_1.84.1
## [159] carData_3.0-5 RANN_2.6.2
## [161] rpart_4.1.24 farver_2.1.2
## [163] reformulas_0.4.1 yaml_2.3.10
## [165] MatrixGenerics_1.18.1 foreign_0.8-90
## [167] ggthemes_5.1.0 cli_3.6.5
## [169] purrr_1.0.4 stats4_4.4.2
## [171] lifecycle_1.0.4 uwot_0.2.3
## [173] askpass_1.2.1 caret_7.0-1
## [175] Biobase_2.66.0 mvtnorm_1.3-3
## [177] lava_1.8.1 sessioninfo_1.2.3
## [179] backports_1.5.0 cytolib_2.18.2
## [181] timechange_0.3.0 gtable_0.3.6
## [183] rjson_0.2.23 umap_0.2.10.0
## [185] ggridges_0.5.6 parallel_4.4.2
## [187] pROC_1.18.5 limma_3.62.2
## [189] jsonlite_2.0.0 edgeR_4.4.2
## [191] RcppHNSW_0.6.0 ggplot2_3.5.2
## [193] Rtsne_0.17 FlowSOM_2.14.0
## [195] ranger_0.17.0 flowCore_2.18.0
## [197] jquerylib_0.1.4 timeDate_4041.110
## [199] shiny_1.11.1 ConsensusClusterPlus_1.70.0
## [201] htmltools_0.5.8.1 diffcyt_1.26.1
## [203] glue_1.8.0 XVector_0.46.0
## [205] VIM_6.2.2 gridExtra_2.3
## [207] boot_1.3-31 TrajectoryUtils_1.14.0
## [209] igraph_2.1.4 R6_2.6.1
## [211] tidyr_1.3.1 SingleCellExperiment_1.28.1
## [213] vcd_1.4-13 cluster_2.1.8.1
## [215] pkgload_1.4.0 GenomeInfoDb_1.42.3
## [217] ipred_0.9-15 nloptr_2.2.1
## [219] DelayedArray_0.32.0 tidyselect_1.2.1
## [221] vipor_0.4.7 htmlTable_2.4.3
## [223] ggforce_0.5.0 CytoDx_1.26.0
## [225] car_3.1-3 future_1.58.0
## [227] ModelMetrics_1.2.2.2 laeken_0.5.3
## [229] data.table_1.17.6 htmlwidgets_1.6.4
## [231] ComplexHeatmap_2.22.0 RColorBrewer_1.1-3
## [233] rlang_1.1.6 remotes_2.5.0
## [235] colorRamps_2.3.4 ggnewscale_0.5.2
## [237] hardhat_1.4.1 beeswarm_0.4.0
## [239] prodlim_2025.04.28