Lesson 6 Differential Expression Modeling Concepts

CDI goal: understand what differential expression (DE) modeling is, what questions it answers, and what assumptions underlie statistical tests before running any DESeq2 code.

6.1 Learning outcomes

By the end of this lesson, you will be able to:

Explain what differential expression means in RNA-Seq analysis
Distinguish modeling goals from visualization and exploration
Understand the role of the statistical model in DE analysis
Identify key assumptions behind count-based models
Interpret contrasts conceptually (without fitting a model yet)

6.2 What is differential expression?

Differential expression (DE) analysis asks a focused statistical question:

Which genes show evidence of systematic expression differences between conditions, beyond random variability?

In RNA-Seq, this question is answered using count-based statistical models that explicitly account for biological and technical variability.

6.3 Differential expression is a modeling problem

It is tempting to think of DE as:

“genes that look different in a plot”
“genes with large fold changes”

But DE is fundamentally about probabilistic modeling, not visual separation.

Visualization supports intuition; models support inference.

6.4 Why raw counts are modeled (not transformed values)

RNA-Seq DE methods (e.g. DESeq2) operate on raw counts, not rlog- or log-transformed values.

Why?

Counts retain the mean–variance relationship
Variance depends on expression level
Transformations distort distributional assumptions

Exploration uses transformed data; modeling uses raw counts.

6.5 The basic ingredients of a DE model

A differential expression model requires:

A count matrix (genes × samples)
Sample metadata describing experimental variables
A design formula specifying which effects to model

The model links counts to experimental conditions through statistical assumptions.

6.6 Experimental conditions and contrasts

In the demo dataset, samples belong to two conditions:

positive
negative

A DE analysis typically asks questions like:

How does gene expression differ between positive and negative samples?

This comparison is encoded as a contrast within the model.

6.7 What a DE result represents

For each gene, a DE method estimates:

A log2 fold change between conditions
An estimate of uncertainty (standard error)
A p-value testing whether the observed difference is larger than expected by chance

These quantities are always interpreted in the context of the model assumptions.

6.8 Common assumptions in RNA-Seq DE models

Most count-based DE methods assume:

Counts follow a negative binomial distribution
Samples are independent
Most genes are not differentially expressed
Technical effects have been reasonably controlled

Violations of these assumptions can lead to misleading results.

6.9 Why exploratory analysis comes first

EDA (Lesson 05) helps you assess whether modeling assumptions are plausible:

Are samples clustering by condition?
Are there strong batch effects?
Are there obvious outliers?

Modeling without this context is risky.

6.10 What we are not doing yet

In this lesson, we deliberately avoid:

Fitting a DESeq2 model
Choosing thresholds
Interpreting volcano plots

Those steps come after the modeling framework is understood.

6.11 Takeaway

Differential expression analysis is not a visualization task or a filtering exercise. It is a statistical modeling problem grounded in assumptions about RNA-Seq data.

You’ve already demonstrated careful thinking, patience, and attention to detail — exactly what RNA-Seq analysis demands.

At this point, you understand what differential expression is, why it requires modeling, and which assumptions must hold for results to be meaningful.

As you move forward, treat every downstream result as something you can explain, justify, and reproduce — not just generate.

That mindset is what separates routine analysis from defensible science.

Congratulations!

You have completed all lessons in the Free Applied RNA-Seq Analysis Track.

Continue to the closing chapter:
🎉 Congratulations on Completing the Free Track

If you’re reading the Premium Track:
Continue to Welcome to the Premium CDI Track →