Q&A 4 How do you perform differential gene expression analysis using DESeq2 in R?
4.1 Explanation
With a validated count matrix and metadata, we can now run DESeq2 to identify differentially expressed genes between two conditions.
DESeq2 performs:
- ๐ Size factor estimation (normalization)
- ๐ Dispersion estimation
- โ๏ธ Negative binomial model fitting
- ๐งช Wald test for significance
- ๐ Adjusted p-values via FDR
This produces a results table with log2 fold changes, p-values, and adjusted p-values.
4.2 R Code
library(tidyverse)
library(DESeq2)
set.seed(42) # For reproducibility
# ๐ Load data
count_df <- read_csv("data/demo_counts.csv")
metadata <- read_csv("data/demo_metadata.csv")
# ๐งฌ Prepare DESeq2 inputs
counts <- count_df |>
column_to_rownames("gene") |>
as.matrix()
dds <- DESeqDataSetFromMatrix(
countData = counts,
colData = metadata,
design = ~ condition
)
# โ๏ธ Run differential expression analysis
dds <- DESeq(dds)
# ๐ Extract results and clean
res_df <- as.data.frame(results(dds)) |>
rownames_to_column("gene") |>
drop_na(log2FoldChange, padj) |>
arrange(padj)
# ๐พ Save for downstream steps
if (!dir.exists("data")) dir.create("data", recursive = TRUE)
write_csv(res_df, "data/deseq2_results.csv")โ Takeaway: DESeq2 provides a robust statistical framework to identify significantly regulated genes. Once saved, these results can be used for downstream visualizations like volcano plots and MA plots.