Q&A 4 How do you perform differential gene expression analysis using DESeq2 in R?

4.1 Explanation

With a validated count matrix and metadata, we can now run DESeq2 to identify differentially expressed genes between two conditions.

DESeq2 performs:

  1. ๐Ÿ“Š Size factor estimation (normalization)
  2. ๐Ÿ” Dispersion estimation
  3. โš™๏ธ Negative binomial model fitting
  4. ๐Ÿงช Wald test for significance
  5. ๐Ÿ“‰ Adjusted p-values via FDR

This produces a results table with log2 fold changes, p-values, and adjusted p-values.

4.2 R Code

library(tidyverse)
library(DESeq2)

set.seed(42)  # For reproducibility

# ๐Ÿ”„ Load data
count_df <- read_csv("data/demo_counts.csv")
metadata <- read_csv("data/demo_metadata.csv")

# ๐Ÿงฌ Prepare DESeq2 inputs
counts <- count_df |>
  column_to_rownames("gene") |>
  as.matrix()

dds <- DESeqDataSetFromMatrix(
  countData = counts,
  colData = metadata,
  design = ~ condition
)

# โš™๏ธ Run differential expression analysis
dds <- DESeq(dds)

# ๐Ÿ“‹ Extract results and clean
res_df <- as.data.frame(results(dds)) |>
  rownames_to_column("gene") |>
  drop_na(log2FoldChange, padj) |>
  arrange(padj)

# ๐Ÿ’พ Save for downstream steps
if (!dir.exists("data")) dir.create("data", recursive = TRUE)
write_csv(res_df, "data/deseq2_results.csv")

โœ… Takeaway: DESeq2 provides a robust statistical framework to identify significantly regulated genes. Once saved, these results can be used for downstream visualizations like volcano plots and MA plots.