Lesson 1 Installation and Environment

CDI goal: set up a clean, reproducible environment for Applied RNA-Seq analysis and verify that core tools work end-to-end.

1.1 Learning outcomes

By the end of this lesson, you will be able to:

Understand why environment setup matters for RNA-Seq analysis
Use a centralized CDI script to install required R packages safely
Verify that core packages load correctly
Understand the CDI project folder structure
Record session information for reproducibility

1.2 Why the environment matters

RNA-Seq analysis is sensitive not only to your data and design choices, but also to the computational environment. Different R versions, package versions, or system libraries can change results, trigger warnings, or break installs.

In CDI, environment setup is treated as part of responsible analysis:

Install only what is missing
Avoid forced upgrades
Record what you used
Keep installation logic centralized

1.3 What you need

You will need:

R (≥ 4.2 recommended)
A terminal (macOS/Linux; Windows via WSL is acceptable)
A working internet connection for package installation

This guide does not rely on RStudio or IRkernel. Notebooks are executed using a Python kernel, with R code fenced as R Markdown chunks.

1.4 Verify your R installation

Confirm that R is available and record the version:

R.version.string

[1] "R version 4.4.1 (2024-06-14)"

1.5 Install required R packages (CDI pattern)

Rather than installing packages inline in every lesson, CDI uses a single, reusable setup script. This script installs packages only if they are missing, making it safe to re-run.

The setup script lives at:

scripts/setup-r-packages.R

Run the following once to ensure all required packages are available:

source("scripts/setup-r-packages.R")

1.6 Verify package loading

After running the setup script, confirm that the core packages load without errors:

library(tidyverse)
library(SummarizedExperiment)
library(DESeq2)
library(tximport)
library(pheatmap)

1.7 CDI project structure

CDI projects follow a simple, explicit folder layout to keep inputs, outputs, and figures clearly separated.

rnaseq-project/
├─ data/            # demo and input data
├─ results/         # analysis tables (QC summaries, DE results)
├─ figures/         # saved figures (auto-managed via CDI visualization tools)
├─ notebooks/       # lesson notebooks (.ipynb)
└─ scripts/         # reusable setup and helper scripts

This repository already includes demo inputs in data/ that will be reused throughout the guide:

data/demo_counts.csv
data/demo_metadata.csv
data/rlog_matrix.csv
data/deseq2_results.csv

1.8 Record session information

To support reproducibility and debugging, record session information at key points:

sessionInfo()

R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Africa/Dar_es_Salaam
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices datasets  utils     methods  
[8] base     

other attached packages:
 [1] pheatmap_1.0.13             tximport_1.34.0            
 [3] DESeq2_1.46.0               SummarizedExperiment_1.36.0
 [5] Biobase_2.66.0              GenomicRanges_1.58.0       
 [7] GenomeInfoDb_1.42.3         IRanges_2.40.1             
 [9] S4Vectors_0.44.0            BiocGenerics_0.52.0        
[11] MatrixGenerics_1.18.1       matrixStats_1.5.0          
[13] lubridate_1.9.4             forcats_1.0.1              
[15] stringr_1.6.0               dplyr_1.1.4                
[17] purrr_1.2.1                 readr_2.1.6                
[19] tidyr_1.3.2                 tibble_3.3.1               
[21] ggplot2_4.0.1               tidyverse_2.0.0            

loaded via a namespace (and not attached):
 [1] gtable_0.3.6            xfun_0.56               bslib_0.9.0            
 [4] lattice_0.22-7          tzdb_0.5.0              vctrs_0.7.0            
 [7] tools_4.4.1             generics_0.1.4          parallel_4.4.1         
[10] pkgconfig_2.0.3         Matrix_1.7-4            RColorBrewer_1.1-3     
[13] S7_0.2.1                lifecycle_1.0.5         GenomeInfoDbData_1.2.13
[16] compiler_4.4.1          farver_2.1.2            codetools_0.2-20       
[19] htmltools_0.5.9         sass_0.4.10             yaml_2.3.12            
[22] pillar_1.11.1           crayon_1.5.3            jquerylib_0.1.4        
[25] BiocParallel_1.40.2     DelayedArray_0.32.0     cachem_1.1.0           
[28] abind_1.4-8             tidyselect_1.2.1        locfit_1.5-9.12        
[31] digest_0.6.39           stringi_1.8.7           bookdown_0.46          
[34] fastmap_1.2.0           grid_4.4.1              colorspace_2.1-2       
[37] cli_3.6.5               SparseArray_1.6.2       magrittr_2.0.4         
[40] S4Arrays_1.6.0          withr_3.0.2             scales_1.4.0           
[43] UCSC.utils_1.2.0        timechange_0.3.0        rmarkdown_2.30         
[46] XVector_0.46.0          httr_1.4.7              otel_0.2.0             
[49] hms_1.1.4               evaluate_1.0.5          knitr_1.51             
[52] rlang_1.1.7             Rcpp_1.1.1              glue_1.8.0             
[55] BiocManager_1.30.27     renv_1.1.6              rstudioapi_0.18.0      
[58] jsonlite_2.0.0          R6_2.6.1                zlibbioc_1.52.0

1.9 Common issues and fixes

Package installation fails
Ensure R is up to date and that you have write access to the project library.
Bioconductor version warnings
These usually indicate an outdated R version.
Repeated install prompts
The CDI setup script avoids reinstalling packages unless necessary.

1.10 Takeaway

Centralizing package installation makes RNA-Seq projects easier to maintain, reproduce, and share. Once your environment is set up, you are ready to focus on study design and metadata, not tooling issues.

Proceed to Lesson 02: RNA-Seq Study Design and Metadata