R.version.string[1] "R version 4.4.1 (2024-06-14)"
RNA-seq results are easy to generate.
Interpreting them responsibly is harder.
This guide focuses on bulk RNA-seq analysis and builds interpretive clarity before introducing advanced modeling techniques.
The goal is not just to run commands.
The goal is to understand why each step exists and how decisions propagate through the workflow.
In this guide, you will learn how to:
The emphasis is on reasoning, workflow discipline, and calibrated interpretation.
This guide does not replace formal count-based differential expression modeling.
We follow simple reproducibility principles:
You are encouraged to rerun, modify parameters, and compare outcomes.
This guide is R-centric. All analysis and visualization are performed in R.
R.version.string[1] "R version 4.4.1 (2024-06-14)"
This guide uses a global plotting theme so visuals stay consistent across domains.
The theme lives here:
scripts/R/cdi-plot-theme.RYou will source it at the top of lessons that generate plots.
source("scripts/R/cdi-plot-theme.R")pkgs <- c(
"ggplot2",
"dplyr",
"tidyr",
"readr",
"tibble"
)
to_install <- setdiff(pkgs, rownames(installed.packages()))
if (length(to_install) > 0) {
install.packages(to_install, repos = "https://cloud.r-project.org")
}
for (pkg in pkgs) {
library(pkg, character.only = TRUE)
}If you use renv, you may activate it for stricter reproducibility. It is optional.
This repository includes a small synthetic dataset reused across lessons.
Generate it from the project root:
Rscript scripts/R/generate-demo-data.RFiles created in data/:
demo-counts.csvdemo-metadata.csvdemo-truth.csvcounts <- readr::read_csv("data/demo-counts.csv", show_col_types = FALSE)
meta <- readr::read_csv("data/demo-metadata.csv", show_col_types = FALSE)
dim(counts)[1] 500 13
table(meta$condition)
Control Treatment
6 6
sessionInfo()R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Africa/Dar_es_Salaam
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tibble_3.3.0 readr_2.1.5 tidyr_1.3.1 dplyr_1.1.4 ggplot2_4.0.0
loaded via a namespace (and not attached):
[1] bit_4.6.0 gtable_0.3.6 jsonlite_2.0.0 crayon_1.5.3
[5] compiler_4.4.1 tidyselect_1.2.1 parallel_4.4.1 scales_1.4.0
[9] yaml_2.3.10 fastmap_1.2.0 R6_2.6.1 generics_0.1.4
[13] knitr_1.50 htmlwidgets_1.6.4 pillar_1.11.1 RColorBrewer_1.1-3
[17] tzdb_0.5.0 rlang_1.1.6 xfun_0.54 S7_0.2.0
[21] bit64_4.6.0-1 cli_3.6.5 withr_3.0.2 magrittr_2.0.4
[25] digest_0.6.37 grid_4.4.1 vroom_1.6.6 hms_1.1.4
[29] lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.5 glue_1.8.0
[33] farver_2.1.2 rmarkdown_2.30 purrr_1.2.0 tools_4.4.1
[37] pkgconfig_2.0.3 htmltools_0.5.8.1
Advanced count-based modeling workflows, including dispersion estimation and shrinkage, are addressed separately.