Chronic lymphocytic leukemia (CLL) is a B-cell malignancy with highly variable clinical outcome, where survival after diagnosis ranges from months to decades. CLL has continuously been at the forefront of genomic discovery, but despite previous advances, we lacked a comprehensive molecular landscape of the disease. CLL has two major subtypes, based on the mutational status of the IGHV gene locus, reflecting different cells of origin: U-CLLs resemble naive-like B-cells and have unmutated IGHV (>98% identity to germline), whereas M-CLLs resemble memory B-cells and have heavily mutated IGHV genes, introduce by somatic hypermutation during B-cell development. DNA methylation analysis has provided another layer to the CLL classification system (epigenetic ‘epitypes’), where in addition to the naive-like (n-CLL) and memory-like CLLs (m-CLLs), an intermediate subtype is defined (i-CLL). M-CLLs have significantly better clinical outcome than U-CLLs, and i-CLL patients have intermediate outcomes. Previous analyses were underpowered to fully characterize the genetic driver landscapes of CLL and its subtypes and large-scale multiomic CLL datasets were previously unavailable, impeding our ability to integrate different data-types into a complete molecular map that would enhance prognostics and enable precision medicine.
Therefore, we established the CLL-map project
, an international study conducted by researchers
and physicians from the US, Spain and Germany to assemble a multiomic cohort of >1100 CLL samples.
Aggregating existing and new data, we assembled, harmonized and analyzed a dataset of ~1100 whole genome
or exome sequences, ~700 RNA-seqs and ~1000 methylome profiles (490 Illumina 450K arrays and 509 reduced
representation bisulfite-sequencing or ‘RRBS’). The cohort has clinical outcomes of overall
survival (OS) and failure-free survival (FFS, which ends at last follow-up, death, treatment or
progression), with a median follow up greater than 6 years.
This dataset, which doubled the size of previous CLL cohorts, empowered novel discovery. We identified ~200 candidate genetic drivers of CLL (109 novel) and refined the characterization of the IGHV subtypes, which were distinct in their genomic landscapes and leukemogenic trajectories. Using the RNA-seq data, we discovered new gene expression subtypes (n=8) that further subcategorized CLL and proved to be an independent prognostic factor. Building unified prognostic models showed that clinical outcomes are associated with a combination of genetic, epigenetic, and gene expression features, highlighting the value of integrating multiple data-types for prognostication. Altogether, this project presents an immensely more complete clinico-biological map of CLL, which can serve as a reference for future CLL research and precision medicine. We hope that this endeavor will motivate the establishment of larger-scale multiomic projects for other tumor types as well in the current post-TCGA era of cancer genomics.