Genomic characterisation of lncRNAs in health and disease
Another strategy for understanding the functions of 1000s of new lncRNA genes is to study their properties as a group. We use bioinformatic analysis of large genomic datasets to try to characterise lncRNAs, divide them into classes, and predict their functions. Right now, we are pursuing several main approaches.
First, we are attempting to predict lncRNAs that are involved in the progression of cancer. This work is carried out within the PanCancer Analysis of Whole Genomes (PCAWG) effort by the International Cancer Genome Consortium. The software tool we developed, ExInAtor, uses maps of mutations from thousands of tumours to identify cancer driver lncRNAs. The candidates predicted in this way can be tested experimentally.
Another approach we have is driven by the idea that an lncRNA’s function is strongly reflected in its localisation within the cell. For example, chromatin regulatory lncRNAs must presumably be located in the nucleus, while those regulating mRNA translation should be located in the cytosol. We are creating genome-wide maps of lncRNA sub-cellular localisation to predict their functions and attempt to identify the factors that control this process.
ExInAtor: Cancer Driver LncRNA Discovery
ExInAtor is a pipeline for discovering cancer driver lncRNAs with an enrichment of somatic mutations in their exons, compared to the background mutation rate.
Our latest version is accessible from GitHub.
In our paper we tested arround 6 thousand lncRNAs from Gencode V19 with than 20 million somatic variants from 1112 whole genomes with ExInAtor. Below there is a summary of the top drivers discovered with a false discovery rate (FDR) cutoff of 0.1.
Legend: Pc – PanCancer; Ex mut - number of exonic mutations; Ex len - total exonic length of gene (bp); Intr mut - number of background mutations; Intr length - background region length for gene (bp); Pval - uncorrected p-value; Qval - q-value, equivalent to False Discovery Rate; Ex mut rate - exonic mutation rate (mutations per kb); Intr mut rate - background mutation rate (mutations per kb); Ratio - ratio of mutation ratio in exonic and background regions; Super_pc - Super pancancer, all cancers combined.
The Cancer LncRNA Census is an continous effort to identify and catalogue lncRNA genes which have been causally implicated in cancer. The original census, including criteria and analysis was published in bioRxiv (https://www.biorxiv.org/content/early/2017/08/25/152769).
CLC can be used for two main analysis:
a) Analysis of the genomic and functional properties of cancer related lncRNAs, that distinguish them from the rest of lncRNAs.
b) Sensitivity and precision analysis of lncRNA cancer driver prediction methods like ExInAtor (https://www.nature.com/articles/srep41544), in order to check the percentage of candidates that have prior evidence of cancer relationship.
Please send suggestions for new cancer related lncRNAs or any comments to Andrés Lanzós (email@example.com).
You can download the table here
Cancer LncRNA Census List