Projects.old | goldlab

Transposable elements: address codes for lncRNAs?

The sequence domains underlying long noncoding RNA (lncRNA) activities remain largely unknown. We have proposed that these domains can originate from neofunctionalised fragments of transposable elements (TEs), otherwise known as RIDLs (Repeat Insertion Domains of Long Noncoding RNA). A small but growing number of RIDLs have been identified.

We are interested in identifying RIDLs and understanding how they contribute to lncRNAs' biological activity. An important challenge here, is to discern true RIDLs against a far more numerous background of neutrally-evolving, non-functional TEs. We have recently found evidence that a subset of TE types experience evolutionary selection in the context of lncRNA exons, and that their host lncRNAs tend to be functionally validated and associated with disease.

On the other hand, an emerging question in the lncRNA field, is how specific subcellular locations of lncRNAs are encoded in their primary sequence. We hypothesised that RIDLs may play a role in directing lncRNAs to subcellular compartments, and have found evidence to support this in a recent study. We use global localisation data from human cell lines to identify a role for evolutionarily-conserved L2b, MIRb and MIRc elements in regulating nuclear/cytoplasmic distribution of lncRNAs. These findings point to important roles for repetitive sequences in lncRNA localisation, and raise the question of what other activities they may encode.

For more information see these papers:

The RIDL hypothesis; Johnson and Guigó (2014)

TEs and Localisation; Carlevaro-Fita et al. (2019)

Footer: REF. Authors (Year).

LncRNAs that drive cancer

Tumours develop through the acquisition of driver mutations that enable cells to replicate uncontrollably and invade other sites. Genes containing driver mutations are attractive therapeutic targets, but it remains unknown whether these can include long noncoding RNAs. As part of our work in the International Cancer Genome Consortium's PCAWG (PanCancer Analysis of Whole Genomes) project, we have developed ExInAtor, a bioinformatic pipeline to identify driver lncRNAs using somatic mutations from whole tumour genomes.

ExInAtor, a pipeline for the discovery of cancer driver lncRNAs using somatic mutations. Lanzós A. et al, Sci Rep (2017).

CRISPR-Cas9 tools for lncRNAs

CRISPR-Cas9 is a powerful and versatile genome-editing tool that has revolutionised the field of long noncoding RNA research. We have developed a suite of tools for single-gene and high-throughput functional analysis of lncRNAs. Our DECKO (Dual Excision CRISPR Knockout) vector system delivers paired sgRNAs for deletion of lncRNAs or other genomic elements. These can be designed using CRISPETa (CRISPR Paired Excision Tool), by both non-specialists and bioinformatics. Both DECKO and CRISPETa are scalable from single-gene to genome-wide experiments. Most recently, we have developed CASPR (CRISPR Analysis for Single and Paired RNA-guides), an end-to-end analysis tool for CRISPR screen analysis.

DECKO, a simple and scalable vector system for CRISPR-based deletion experiments. Aparicio-Prat E. et al. BMC Genomics (2015).

CRISPETa, a tool for the design of CRISPR deletion experiments. Pulido-Quetglas C. et al, PLOS Comp. (2017).

LncRNA & Heart Regeneration

Heart regeneration is a biological process of utmost importance to medicine. It is known that lncRNAs play critical roles in the molecular networks mediating the response of cardiac cells to damage, such as caused by infarction. We use a variety of strategies to identify lncRNAs involved in this process.

Mending broken hearts: cardiac development as a basis for adult heart regeneration and repair. Mei Xin et al. Nature Reviews Molecular Cell Biology volume 14, pages 529–541 (2013)

Project in collaboration with Thierry Pedrazzini (CHUV, Lausanne), Raffaela Santoro (Univ. of Zurich) and Mauro Giacca (King's College, London). Funded by Swiss National Science Foundation through the Sinergia programme.

LncRNA Characterisation

To date only 2% of lncRNAs has been functionally characterized, and little is known about their molecular mechanisms and transcript properties. Bioinformatic tools applied to the protein-coding world are generally ineffective for lncRNAs, complicating their classification and prediction of their functions. A major bottleneck is the lack of framework for categorizing lncRNAs and understanding how molecular functions are encoded in their sequence.

In this context, we are especially focused on the characterisation of lncRNAs through the study of their subcellular localisation and embedded repeat elements. See more information here.

The RIDL hypothesis. Johnson and Guigó (2014)

Repeats and subcellular localisation. Carlevaro-Fita et al. (2019)

LncRNA & Diagnostics

We are excited about the possibility of developing rapid and cheap diagnostics for early-stage diseases based on RNA profiles The RNA content, or transcriptome, of an organ presents a dynamic and data-rich snapshot of its physiological state. Therefore RNA sequencing affords the possibility of rapid and cheap diagnostics of early-stage disease. In a collaborative project with Josep Comin-Colet, Director of the Cardiac Insufficiency Program at the Hospital del Mar (Barcelona), and funded by La Marató de TV3, we are developing new RNA biomarkers in cardiac disease.

GENCODE Annotation

Genomics depends on the availability of high quality annotations, or curated collections of genes. The worldwide reference annotation of lncRNAs in human and mouse is managed by GENCODE, a collaboration led by the Wellcome Trust Sanger Institute and funded by the National Human Genome Research Institute (NHGRI). Since 2010, RJ has contributed to this project, through the first landmark paper (Derrien, Johnson et al), and more recently through the development of Capture Long Read Sequencing (CLS) to improve and expand lncRNA annotations (Lagarde et al).

Find the raw CLS data here

Find the publication here