Lesson 1 Installation and Environment
1.1 Learning outcomes
By the end of this lesson, you will be able to:
- Understand why environment setup matters for RNA-Seq analysis
- Use a centralized CDI script to install required R packages safely
- Verify that core packages load correctly
- Understand the CDI project folder structure
- Record session information for reproducibility
1.2 Why the environment matters
RNA-Seq analysis is sensitive not only to your data and design choices, but also to the computational environment. Different R versions, package versions, or system libraries can change results, trigger warnings, or break installs.
In CDI, environment setup is treated as part of responsible analysis:
- Install only what is missing
- Avoid forced upgrades
- Record what you used
- Keep installation logic centralized
1.3 What you need
You will need:
- R (≥ 4.2 recommended)
- A terminal (macOS/Linux; Windows via WSL is acceptable)
- A working internet connection for package installation
This guide does not rely on RStudio or IRkernel. Notebooks are executed using a Python kernel, with R code fenced as R Markdown chunks.
1.4 Verify your R installation
Confirm that R is available and record the version:
[1] "R version 4.4.1 (2024-06-14)"
1.5 Install required R packages (CDI pattern)
Rather than installing packages inline in every lesson, CDI uses a single, reusable setup script. This script installs packages only if they are missing, making it safe to re-run.
The setup script lives at:
scripts/setup-r-packages.R
Run the following once to ensure all required packages are available:
1.6 Verify package loading
After running the setup script, confirm that the core packages load without errors:
1.7 CDI project structure
CDI projects follow a simple, explicit folder layout to keep inputs, outputs, and figures clearly separated.
rnaseq-project/
├─ data/ # demo and input data
├─ results/ # analysis tables (QC summaries, DE results)
├─ figures/ # saved figures (auto-managed via CDI visualization tools)
├─ notebooks/ # lesson notebooks (.ipynb)
└─ scripts/ # reusable setup and helper scripts
This repository already includes demo inputs in data/ that will be reused throughout the guide:
data/demo_counts.csvdata/demo_metadata.csvdata/rlog_matrix.csvdata/deseq2_results.csv
1.8 Record session information
To support reproducibility and debugging, record session information at key points:
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Africa/Dar_es_Salaam
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices datasets utils methods
[8] base
other attached packages:
[1] pheatmap_1.0.13 tximport_1.34.0
[3] DESeq2_1.46.0 SummarizedExperiment_1.36.0
[5] Biobase_2.66.0 GenomicRanges_1.58.0
[7] GenomeInfoDb_1.42.3 IRanges_2.40.1
[9] S4Vectors_0.44.0 BiocGenerics_0.52.0
[11] MatrixGenerics_1.18.1 matrixStats_1.5.0
[13] lubridate_1.9.4 forcats_1.0.1
[15] stringr_1.6.0 dplyr_1.1.4
[17] purrr_1.2.1 readr_2.1.6
[19] tidyr_1.3.2 tibble_3.3.1
[21] ggplot2_4.0.1 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] gtable_0.3.6 xfun_0.56 bslib_0.9.0
[4] lattice_0.22-7 tzdb_0.5.0 vctrs_0.7.0
[7] tools_4.4.1 generics_0.1.4 parallel_4.4.1
[10] pkgconfig_2.0.3 Matrix_1.7-4 RColorBrewer_1.1-3
[13] S7_0.2.1 lifecycle_1.0.5 GenomeInfoDbData_1.2.13
[16] compiler_4.4.1 farver_2.1.2 codetools_0.2-20
[19] htmltools_0.5.9 sass_0.4.10 yaml_2.3.12
[22] pillar_1.11.1 crayon_1.5.3 jquerylib_0.1.4
[25] BiocParallel_1.40.2 DelayedArray_0.32.0 cachem_1.1.0
[28] abind_1.4-8 tidyselect_1.2.1 locfit_1.5-9.12
[31] digest_0.6.39 stringi_1.8.7 bookdown_0.46
[34] fastmap_1.2.0 grid_4.4.1 colorspace_2.1-2
[37] cli_3.6.5 SparseArray_1.6.2 magrittr_2.0.4
[40] S4Arrays_1.6.0 withr_3.0.2 scales_1.4.0
[43] UCSC.utils_1.2.0 timechange_0.3.0 rmarkdown_2.30
[46] XVector_0.46.0 httr_1.4.7 otel_0.2.0
[49] hms_1.1.4 evaluate_1.0.5 knitr_1.51
[52] rlang_1.1.7 Rcpp_1.1.1 glue_1.8.0
[55] BiocManager_1.30.27 renv_1.1.6 rstudioapi_0.18.0
[58] jsonlite_2.0.0 R6_2.6.1 zlibbioc_1.52.0
1.9 Common issues and fixes
Package installation fails
Ensure R is up to date and that you have write access to the project library.Bioconductor version warnings
These usually indicate an outdated R version.Repeated install prompts
The CDI setup script avoids reinstalling packages unless necessary.
1.10 Takeaway
Centralizing package installation makes RNA-Seq projects easier to maintain, reproduce, and share. Once your environment is set up, you are ready to focus on study design and metadata, not tooling issues.
Proceed to Lesson 02: RNA-Seq Study Design and Metadata