Preface and Setup

  • ID: RNASEQ-L01
  • Type: Lesson
  • Audience: Public
  • Theme: Preface and reproducible setup

Why this guide exists

RNA-seq results are easy to generate.

Interpreting them responsibly is harder.

This guide focuses on bulk RNA-seq analysis and builds interpretive clarity before introducing advanced modeling techniques.

The goal is not just to run commands.
The goal is to understand why each step exists and how decisions propagate through the workflow.


What this guide emphasizes

In this guide, you will learn how to:

  • Evaluate experimental design and metadata integrity
  • Inspect and validate a count matrix
  • Understand the mean–variance structure of RNA-seq data
  • Apply simple normalization for exploratory analysis
  • Use PCA and clustering to assess global structure
  • Separate statistical output from biological claims

The emphasis is on reasoning, workflow discipline, and calibrated interpretation.

This guide does not replace formal count-based differential expression modeling.


Reproducibility philosophy

We follow simple reproducibility principles:

  • Inputs remain unchanged
  • Transformations are performed in code
  • Outputs are traceable back to inputs
  • Software versions can be recorded

You are encouraged to rerun, modify parameters, and compare outcomes.


Setup

What you need

  • R (4.2 or newer recommended)
  • Internet access for package installation
  • Any editor you prefer

This guide is R-centric. All analysis and visualization are performed in R.


Verify R

R.version.string
[1] "R version 4.4.1 (2024-06-14)"

Global CDI Plot Theme

This guide uses a global plotting theme so visuals stay consistent across domains.

The theme lives here:

  • scripts/R/cdi-plot-theme.R

You will source it at the top of lessons that generate plots.

source("scripts/R/cdi-plot-theme.R")

Install core packages

pkgs <- c(
  "ggplot2",
  "dplyr",
  "tidyr",
  "readr",
  "tibble"
)

to_install <- setdiff(pkgs, rownames(installed.packages()))
if (length(to_install) > 0) {
  install.packages(to_install, repos = "https://cloud.r-project.org")
}

for (pkg in pkgs) {
  library(pkg, character.only = TRUE)
}

Optional: renv

If you use renv, you may activate it for stricter reproducibility. It is optional.


Demo dataset used throughout

This repository includes a small synthetic dataset reused across lessons.

Generate it from the project root:

Rscript scripts/R/generate-demo-data.R

Files created in data/:

  • demo-counts.csv
  • demo-metadata.csv
  • demo-truth.csv

Verify the demo dataset loads

counts <- readr::read_csv("data/demo-counts.csv", show_col_types = FALSE)
meta   <- readr::read_csv("data/demo-metadata.csv", show_col_types = FALSE)

dim(counts)
[1] 500  13
table(meta$condition)

  Control Treatment 
        6         6 

Record session information

sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Africa/Dar_es_Salaam
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tibble_3.3.0  readr_2.1.5   tidyr_1.3.1   dplyr_1.1.4   ggplot2_4.0.0

loaded via a namespace (and not attached):
 [1] bit_4.6.0          gtable_0.3.6       jsonlite_2.0.0     crayon_1.5.3      
 [5] compiler_4.4.1     tidyselect_1.2.1   parallel_4.4.1     scales_1.4.0      
 [9] yaml_2.3.10        fastmap_1.2.0      R6_2.6.1           generics_0.1.4    
[13] knitr_1.50         htmlwidgets_1.6.4  pillar_1.11.1      RColorBrewer_1.1-3
[17] tzdb_0.5.0         rlang_1.1.6        xfun_0.54          S7_0.2.0          
[21] bit64_4.6.0-1      cli_3.6.5          withr_3.0.2        magrittr_2.0.4    
[25] digest_0.6.37      grid_4.4.1         vroom_1.6.6        hms_1.1.4         
[29] lifecycle_1.0.4    vctrs_0.6.5        evaluate_1.0.5     glue_1.8.0        
[33] farver_2.1.2       rmarkdown_2.30     purrr_1.2.0        tools_4.4.1       
[37] pkgconfig_2.0.3    htmltools_0.5.8.1 

Note on advanced modeling

Advanced count-based modeling workflows, including dispersion estimation and shrinkage, are addressed separately.