Data Loading and Transformation
Source:vignettes/Data_Loading_and_Transformation.Rmd
Data_Loading_and_Transformation.Rmd
How to load data
cyCONDOR
provides an integrated function to prepare a
condor
object (flow cytometry dataset) starting from the
input files in either .fcs
or .csv
format. All
files should be saved in a single directory which path should be stated
in data_path
. The user can define the number of (cells) to
process from each file within ‘max_cell’. If the input type is
.csv
the useCSV
setting should be set to
TRUE
. Is important to keep in mind that currently all files
in the data_path
folder are loaded independently from which
are also included in the annotation table, this can induce slight
differences in auto-logicle transformation, to avoid this only include
in the data_path
the files you plan to analyse.
For data transformation cyCONDOR
provides different
options:
-
auto_logi
: For HDFC and Spectral Flow data (recommended, auto-logicle transformation). This transformation gives good results also with cyTOF data, especially if you are experiencing a lot of noise witharcsinh
due to negative values. auto-logicle transformation is inherited from theCytofkit
package Chen er al. 2016. -
clr
: Recommended for CITE-seq data (centered log ratio transformation) -
arcsinh
: arcsinh transformation with co-factor 5, common transformation for cyTOF data.
The last important piece to build a condor
object is the
annotation table. The annotation table should contain all necessary
metadata used for analysis as well a column containing the names of the
input files and should be supplied as .csv
file. The column
containing the file names should be stated in ‘filename_col’. Below an
exemplary metadata table is shown.
read.csv("../.test_files/metadata.csv")
## filename sample_ID group batch
## 1 ID1.fcs ID1 ctrl Day1
## 2 ID2.fcs ID2 pat Day1
## 3 ID3.fcs ID3 ctrl Day2
## 4 ID4.fcs ID4 pat Day2
## 5 ID5.fcs ID5 ctrl Day2
## 6 ID6.fcs ID6 pat Day2
## 7 ID7.fcs ID7 ctrl Day3
## 8 ID8.fcs ID8 pat Day3
## 9 ID9.fcs ID9 ctrl Day3
## 10 ID10.fcs ID10 pat Day3
Unwanted parameters that are not important for the downstream
analysis (e.g. Time) and should be removed can be listed in
remove_param
. In the prep_fcd
function we also
set a seed
for reproducibility since the subsetting to
max_cell
is otherwise randomized.
Session Info
info <- sessionInfo()
info
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] cyCONDOR_0.2.1
##
## loaded via a namespace (and not attached):
## [1] IRanges_2.34.1 Rmisc_1.5.1
## [3] urlchecker_1.0.1 nnet_7.3-19
## [5] CytoNorm_2.0.1 TH.data_1.1-2
## [7] vctrs_0.6.4 digest_0.6.33
## [9] png_0.1-8 shape_1.4.6
## [11] proxy_0.4-27 slingshot_2.8.0
## [13] ggrepel_0.9.4 parallelly_1.36.0
## [15] MASS_7.3-60 pkgdown_2.0.7
## [17] reshape2_1.4.4 httpuv_1.6.12
## [19] foreach_1.5.2 BiocGenerics_0.46.0
## [21] withr_2.5.1 ggrastr_1.0.2
## [23] xfun_0.40 ggpubr_0.6.0
## [25] ellipsis_0.3.2 survival_3.5-7
## [27] memoise_2.0.1 hexbin_1.28.3
## [29] ggbeeswarm_0.7.2 RProtoBufLib_2.12.1
## [31] princurve_2.1.6 profvis_0.3.8
## [33] ggsci_3.0.0 systemfonts_1.0.5
## [35] ragg_1.2.6 zoo_1.8-12
## [37] GlobalOptions_0.1.2 DEoptimR_1.1-3
## [39] Formula_1.2-5 prettyunits_1.2.0
## [41] promises_1.2.1 scatterplot3d_0.3-44
## [43] rstatix_0.7.2 globals_0.16.2
## [45] ps_1.7.5 rstudioapi_0.15.0
## [47] miniUI_0.1.1.1 generics_0.1.3
## [49] ggcyto_1.28.1 base64enc_0.1-3
## [51] processx_3.8.2 curl_5.1.0
## [53] S4Vectors_0.38.2 zlibbioc_1.46.0
## [55] flowWorkspace_4.12.2 polyclip_1.10-6
## [57] randomForest_4.7-1.1 GenomeInfoDbData_1.2.10
## [59] RBGL_1.76.0 ncdfFlow_2.46.0
## [61] RcppEigen_0.3.3.9.4 xtable_1.8-4
## [63] stringr_1.5.0 desc_1.4.2
## [65] doParallel_1.0.17 evaluate_0.22
## [67] S4Arrays_1.0.6 hms_1.1.3
## [69] glmnet_4.1-8 GenomicRanges_1.52.1
## [71] irlba_2.3.5.1 colorspace_2.1-0
## [73] harmony_1.1.0 reticulate_1.34.0
## [75] readxl_1.4.3 magrittr_2.0.3
## [77] lmtest_0.9-40 readr_2.1.4
## [79] Rgraphviz_2.44.0 later_1.3.1
## [81] lattice_0.22-5 future.apply_1.11.0
## [83] robustbase_0.99-0 XML_3.99-0.15
## [85] cowplot_1.1.1 matrixStats_1.1.0
## [87] xts_0.13.1 class_7.3-22
## [89] Hmisc_5.1-1 pillar_1.9.0
## [91] nlme_3.1-163 iterators_1.0.14
## [93] compiler_4.3.1 RSpectra_0.16-1
## [95] stringi_1.7.12 gower_1.0.1
## [97] minqa_1.2.6 SummarizedExperiment_1.30.2
## [99] lubridate_1.9.3 devtools_2.4.5
## [101] CytoML_2.12.0 plyr_1.8.9
## [103] crayon_1.5.2 abind_1.4-5
## [105] locfit_1.5-9.8 sp_2.1-1
## [107] sandwich_3.0-2 pcaMethods_1.92.0
## [109] dplyr_1.1.3 codetools_0.2-19
## [111] multcomp_1.4-25 textshaping_0.3.7
## [113] recipes_1.0.8 openssl_2.1.1
## [115] Rphenograph_0.99.1 TTR_0.24.3
## [117] bslib_0.5.1 e1071_1.7-13
## [119] destiny_3.14.0 GetoptLong_1.0.5
## [121] ggplot.multistats_1.0.0 mime_0.12
## [123] splines_4.3.1 circlize_0.4.15
## [125] Rcpp_1.0.11 sparseMatrixStats_1.12.2
## [127] cellranger_1.1.0 knitr_1.44
## [129] utf8_1.2.4 clue_0.3-65
## [131] lme4_1.1-35.1 fs_1.6.3
## [133] listenv_0.9.0 checkmate_2.3.0
## [135] DelayedMatrixStats_1.22.6 pkgbuild_1.4.2
## [137] ggsignif_0.6.4 tibble_3.2.1
## [139] Matrix_1.6-1.1 rpart.plot_3.1.1
## [141] callr_3.7.3 tzdb_0.4.0
## [143] tweenr_2.0.2 pkgconfig_2.0.3
## [145] pheatmap_1.0.12 tools_4.3.1
## [147] cachem_1.0.8 smoother_1.1
## [149] fastmap_1.1.1 rmarkdown_2.25
## [151] scales_1.2.1 grid_4.3.1
## [153] usethis_2.2.2 broom_1.0.5
## [155] sass_0.4.7 graph_1.78.0
## [157] carData_3.0-5 RANN_2.6.1
## [159] rpart_4.1.21 farver_2.1.1
## [161] yaml_2.3.7 MatrixGenerics_1.12.3
## [163] foreign_0.8-85 ggthemes_4.2.4
## [165] cli_3.6.1 purrr_1.0.2
## [167] stats4_4.3.1 lifecycle_1.0.3
## [169] uwot_0.1.16 askpass_1.2.0
## [171] caret_6.0-94 Biobase_2.60.0
## [173] mvtnorm_1.2-3 lava_1.7.3
## [175] sessioninfo_1.2.2 backports_1.4.1
## [177] cytolib_2.12.1 timechange_0.2.0
## [179] gtable_0.3.4 rjson_0.2.21
## [181] umap_0.2.10.0 ggridges_0.5.4
## [183] parallel_4.3.1 pROC_1.18.5
## [185] limma_3.56.2 jsonlite_1.8.7
## [187] edgeR_3.42.4 RcppHNSW_0.5.0
## [189] bitops_1.0-7 ggplot2_3.4.4
## [191] Rtsne_0.16 FlowSOM_2.8.0
## [193] ranger_0.16.0 flowCore_2.12.2
## [195] jquerylib_0.1.4 timeDate_4022.108
## [197] shiny_1.7.5.1 ConsensusClusterPlus_1.64.0
## [199] htmltools_0.5.6.1 diffcyt_1.20.0
## [201] glue_1.6.2 XVector_0.40.0
## [203] VIM_6.2.2 RCurl_1.98-1.13
## [205] rprojroot_2.0.3 gridExtra_2.3
## [207] boot_1.3-28.1 TrajectoryUtils_1.8.0
## [209] igraph_1.5.1 R6_2.5.1
## [211] tidyr_1.3.0 SingleCellExperiment_1.22.0
## [213] vcd_1.4-11 cluster_2.1.4
## [215] pkgload_1.3.3 GenomeInfoDb_1.36.4
## [217] ipred_0.9-14 nloptr_2.0.3
## [219] DelayedArray_0.26.7 tidyselect_1.2.0
## [221] vipor_0.4.5 htmlTable_2.4.2
## [223] ggforce_0.4.1 CytoDx_1.20.0
## [225] car_3.1-2 future_1.33.0
## [227] ModelMetrics_1.2.2.2 munsell_0.5.0
## [229] laeken_0.5.2 data.table_1.14.8
## [231] htmlwidgets_1.6.2 ComplexHeatmap_2.16.0
## [233] RColorBrewer_1.1-3 rlang_1.1.1
## [235] remotes_2.4.2.1 colorRamps_2.3.1
## [237] ggnewscale_0.4.9 fansi_1.0.5
## [239] hardhat_1.3.0 beeswarm_0.4.0
## [241] prodlim_2023.08.28