Welcome to CDI Applied RNA-Seq Analysis
Welcome to Complex Data Insights (CDI) — a practical learning pathway designed to teach applied bioinformatics skills through clear explanations, structured lessons, and hands-on analysis.
This guide focuses on bulk RNA-Seq analysis, a foundational technique for studying gene expression in modern biological and biomedical research.
Whether you are new to RNA-Seq analysis or returning to transcriptomics after some time, this guide will help you build skills progressively — from raw sequencing data and quality control to differential expression analysis and biological interpretation.
What this guide is
This guide is designed to teach applied RNA-Seq analysis as it is practiced in real research and analytical settings.
You will learn how to:
- Understand RNA-Seq experimental design and metadata
- Work with raw sequencing data and quality control outputs
- Construct and assess count matrices
- Perform exploratory data analysis and normalization
- Carry out differential expression analysis
- Interpret results in a biological context
- Report analyses in a reproducible, transparent way
The emphasis throughout is on reasoning, workflow, and interpretation, not just running commands.
What this guide is not
This is not:
- A tool-specific reference manual
- A shortcut-driven “click-through” tutorial
- A collection of disconnected scripts
Instead, this guide focuses on why each step exists, how decisions propagate through the pipeline, and how to recognize problems before they affect results.
Who this guide is for
This guide is suitable for:
- Graduate students and researchers working with RNA-Seq data
- Computational biologists and bioinformaticians building reproducible pipelines
- Data scientists transitioning into genomics
- Analysts who want a deeper understanding of RNA-Seq workflows beyond software defaults
A basic familiarity with R and command-line concepts is assumed, but advanced bioinformatics experience is not required.
How the guide is structured
The guide is organized into two tracks:
Free track — Foundations
The free lessons focus on building a correct and trustworthy RNA-Seq dataset.
By the end of the free track, you will be able to:
- Reason about RNA-Seq study design
- Interpret quality control results
- Choose appropriate quantification strategies
- Build and assess a count matrix
- Perform normalization and exploratory analysis
- Identify common technical and biological pitfalls
These lessons form a complete and valuable learning path on their own.
Reproducibility philosophy (CDI standard)
This guide follows the Complex Data Insights (CDI) reproducibility principles:
- Raw data remain raw
- All transformations are performed inside lessons
- Code and interpretation appear together
- Results are traceable and explainable
- Session information and versions are recorded
You are encouraged to rebuild analyses yourself, modify parameters, and explore alternatives.
How to use this guide
- Read each lesson sequentially, especially in the free track
- Run the code blocks and inspect intermediate outputs
- Focus on interpretation, not just successful execution
- Use the appendix for reference and reproducibility details
- Treat this guide as a working notebook, not a static textbook
A note on scope and evolution
RNA-Seq analysis is a broad and evolving field.
This guide is intentionally designed to grow over time, with future lessons and extensions added without breaking the core structure.
Versioned updates will expand depth while preserving clarity.
Begin with Lesson 1: Installation and Environment