Lesson 6 Differential Expression Modeling Concepts
6.1 Learning outcomes
By the end of this lesson, you will be able to:
- Explain what differential expression means in RNA-Seq analysis
- Distinguish modeling goals from visualization and exploration
- Understand the role of the statistical model in DE analysis
- Identify key assumptions behind count-based models
- Interpret contrasts conceptually (without fitting a model yet)
6.2 What is differential expression?
Differential expression (DE) analysis asks a focused statistical question:
Which genes show evidence of systematic expression differences between conditions, beyond random variability?
In RNA-Seq, this question is answered using count-based statistical models that explicitly account for biological and technical variability.
6.3 Differential expression is a modeling problem
It is tempting to think of DE as:
- “genes that look different in a plot”
- “genes with large fold changes”
But DE is fundamentally about probabilistic modeling, not visual separation.
Visualization supports intuition; models support inference.
6.4 Why raw counts are modeled (not transformed values)
RNA-Seq DE methods (e.g. DESeq2) operate on raw counts, not rlog- or log-transformed values.
Why?
- Counts retain the mean–variance relationship
- Variance depends on expression level
- Transformations distort distributional assumptions
Exploration uses transformed data; modeling uses raw counts.
6.5 The basic ingredients of a DE model
A differential expression model requires:
- A count matrix (genes × samples)
- Sample metadata describing experimental variables
- A design formula specifying which effects to model
The model links counts to experimental conditions through statistical assumptions.
6.6 Experimental conditions and contrasts
In the demo dataset, samples belong to two conditions:
- positive
- negative
A DE analysis typically asks questions like:
- How does gene expression differ between positive and negative samples?
This comparison is encoded as a contrast within the model.
6.7 What a DE result represents
For each gene, a DE method estimates:
- A log2 fold change between conditions
- An estimate of uncertainty (standard error)
- A p-value testing whether the observed difference is larger than expected by chance
These quantities are always interpreted in the context of the model assumptions.
6.8 Common assumptions in RNA-Seq DE models
Most count-based DE methods assume:
- Counts follow a negative binomial distribution
- Samples are independent
- Most genes are not differentially expressed
- Technical effects have been reasonably controlled
Violations of these assumptions can lead to misleading results.
6.9 Why exploratory analysis comes first
EDA (Lesson 05) helps you assess whether modeling assumptions are plausible:
- Are samples clustering by condition?
- Are there strong batch effects?
- Are there obvious outliers?
Modeling without this context is risky.
6.10 What we are not doing yet
In this lesson, we deliberately avoid:
- Fitting a DESeq2 model
- Choosing thresholds
- Interpreting volcano plots
Those steps come after the modeling framework is understood.
6.11 Takeaway
Differential expression analysis is not a visualization task or a filtering exercise. It is a statistical modeling problem grounded in assumptions about RNA-Seq data.
Congratulations!
You have completed all lessons in the Free Applied RNA-Seq Analysis Track.
Continue to the closing chapter:
🎉 Congratulations on Completing the Free Track
If you’re reading the Premium Track:
Continue to Welcome to the Premium CDI Track →