Lesson 1 Installation and Environment

CDI goal: set up a clean, reproducible environment for Applied RNA-Seq analysis and verify that core tools work end-to-end.

1.1 Learning outcomes

By the end of this lesson, you will be able to:

  • Understand why environment setup matters for RNA-Seq analysis
  • Use a centralized CDI script to install required R packages safely
  • Verify that core packages load correctly
  • Understand the CDI project folder structure
  • Record session information for reproducibility

1.2 Why the environment matters

RNA-Seq analysis is sensitive not only to your data and design choices, but also to the computational environment. Different R versions, package versions, or system libraries can change results, trigger warnings, or break installs.

In CDI, environment setup is treated as part of responsible analysis:

  • Install only what is missing
  • Avoid forced upgrades
  • Record what you used
  • Keep installation logic centralized

1.3 What you need

You will need:

  • R (≥ 4.2 recommended)
  • A terminal (macOS/Linux; Windows via WSL is acceptable)
  • A working internet connection for package installation

This guide does not rely on RStudio or IRkernel. Notebooks are executed using a Python kernel, with R code fenced as R Markdown chunks.

1.4 Verify your R installation

Confirm that R is available and record the version:

R.version.string
[1] "R version 4.4.1 (2024-06-14)"

1.5 Install required R packages (CDI pattern)

Rather than installing packages inline in every lesson, CDI uses a single, reusable setup script. This script installs packages only if they are missing, making it safe to re-run.

The setup script lives at:

scripts/setup-r-packages.R

Run the following once to ensure all required packages are available:

source("scripts/setup-r-packages.R")

1.6 Verify package loading

After running the setup script, confirm that the core packages load without errors:

library(tidyverse)
library(SummarizedExperiment)
library(DESeq2)
library(tximport)
library(pheatmap)

1.7 CDI project structure

CDI projects follow a simple, explicit folder layout to keep inputs, outputs, and figures clearly separated.

rnaseq-project/
├─ data/            # demo and input data
├─ results/         # analysis tables (QC summaries, DE results)
├─ figures/         # saved figures (auto-managed via CDI visualization tools)
├─ notebooks/       # lesson notebooks (.ipynb)
└─ scripts/         # reusable setup and helper scripts

This repository already includes demo inputs in data/ that will be reused throughout the guide:

  • data/demo_counts.csv
  • data/demo_metadata.csv
  • data/rlog_matrix.csv
  • data/deseq2_results.csv

1.8 Record session information

To support reproducibility and debugging, record session information at key points:

sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Africa/Dar_es_Salaam
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices datasets  utils     methods  
[8] base     

other attached packages:
 [1] pheatmap_1.0.13             tximport_1.34.0            
 [3] DESeq2_1.46.0               SummarizedExperiment_1.36.0
 [5] Biobase_2.66.0              GenomicRanges_1.58.0       
 [7] GenomeInfoDb_1.42.3         IRanges_2.40.1             
 [9] S4Vectors_0.44.0            BiocGenerics_0.52.0        
[11] MatrixGenerics_1.18.1       matrixStats_1.5.0          
[13] lubridate_1.9.4             forcats_1.0.1              
[15] stringr_1.6.0               dplyr_1.1.4                
[17] purrr_1.2.1                 readr_2.1.6                
[19] tidyr_1.3.2                 tibble_3.3.1               
[21] ggplot2_4.0.1               tidyverse_2.0.0            

loaded via a namespace (and not attached):
 [1] gtable_0.3.6            xfun_0.56               bslib_0.9.0            
 [4] lattice_0.22-7          tzdb_0.5.0              vctrs_0.7.0            
 [7] tools_4.4.1             generics_0.1.4          parallel_4.4.1         
[10] pkgconfig_2.0.3         Matrix_1.7-4            RColorBrewer_1.1-3     
[13] S7_0.2.1                lifecycle_1.0.5         GenomeInfoDbData_1.2.13
[16] compiler_4.4.1          farver_2.1.2            codetools_0.2-20       
[19] htmltools_0.5.9         sass_0.4.10             yaml_2.3.12            
[22] pillar_1.11.1           crayon_1.5.3            jquerylib_0.1.4        
[25] BiocParallel_1.40.2     DelayedArray_0.32.0     cachem_1.1.0           
[28] abind_1.4-8             tidyselect_1.2.1        locfit_1.5-9.12        
[31] digest_0.6.39           stringi_1.8.7           bookdown_0.46          
[34] fastmap_1.2.0           grid_4.4.1              colorspace_2.1-2       
[37] cli_3.6.5               SparseArray_1.6.2       magrittr_2.0.4         
[40] S4Arrays_1.6.0          withr_3.0.2             scales_1.4.0           
[43] UCSC.utils_1.2.0        timechange_0.3.0        rmarkdown_2.30         
[46] XVector_0.46.0          httr_1.4.7              otel_0.2.0             
[49] hms_1.1.4               evaluate_1.0.5          knitr_1.51             
[52] rlang_1.1.7             Rcpp_1.1.1              glue_1.8.0             
[55] BiocManager_1.30.27     renv_1.1.6              rstudioapi_0.18.0      
[58] jsonlite_2.0.0          R6_2.6.1                zlibbioc_1.52.0        

1.9 Common issues and fixes

  • Package installation fails
    Ensure R is up to date and that you have write access to the project library.

  • Bioconductor version warnings
    These usually indicate an outdated R version.

  • Repeated install prompts
    The CDI setup script avoids reinstalling packages unless necessary.

1.10 Takeaway

Centralizing package installation makes RNA-Seq projects easier to maintain, reproduce, and share. Once your environment is set up, you are ready to focus on study design and metadata, not tooling issues.