At GOLD Lab we strive to make all our data and tools freely available.
Below you will find descriptions of and links to some resources we have created. We'd love to hear your feedback about them.
Plasmids: We deposit our most useful plasmids on Addgene.
Sequencing data: Our high-throughput sequencing datasets are available at Gene Expression Omnibus.
Software: Our software and pipelines can be accessed through Github.
ExInAtor: Cancer Driver LncRNA Discovery
ExInAtor is a pipeline for discovering cancer driver lncRNAs with elevated mutational burden. It requires as input a gene annotation and sets of somatic mutations.
The latest version is accessible from GitHub.
In the original study (Lanzos et al), we tested arround 6000 lncRNAs from Gencode V19 with than 20 million somatic variants from 1112 whole genomes. Below is a list of candidate lncRNA drivers discovered with a cutoff of false discovery rate (FDR) <0.1.
Legend: Pc – PanCancer; Ex mut - number of exonic mutations; Ex len - total exonic length of gene (bp); Intr mut - number of background mutations; Intr length - background region length for gene (bp); Pval - uncorrected p-value; Qval - q-value, equivalent to False Discovery Rate; Ex mut rate - exonic mutation rate (mutations per kb); Intr mut rate - background mutation rate (mutations per kb); Ratio - ratio of mutation ratio in exonic and background regions; Super_pc - Super pancancer, all cancers combined.
Cancer LncRNA Census (CLC)
The Cancer LncRNA Census is an continuous effort to identify and catalogue lncRNA genes that have been implicated in cancer. In contrast to other catalogues, to be included in CLC a lncRNA must be (1) causally implicated in cancer, and (2) must be annotated by GENCODE. The updated CLC2 includes lncRNAs not only from literature resources but also includes candidate cancer lncRNAs from mutagenesis and CRISPRi screens.
The analysis pipeline and including criteria is documented in the preprint article. The original CLC and the CLC2 can be used for two main analysis:
a) Analysis of the genomic and functional properties of cancer-lncRNAs.
b) Benchmarking of lncRNA cancer-driver prediction, such as using ExInAtor
You can download the table here
Double Excision CRISPR Knockout (DECKO)
Effective and practical loss-of-function technologies have been a major challenge in long non-coding RNAs (lncRNA) research. CRISPR/Cas9 genome-editing technology now allows permanent or temporal perturbation of lncRNAs, at both low- and high-throughput.
We developed a vector system called DECKO (Double Excision CRISPR Knockout) for deletion of lncRNAs or any other genomic element. This system applies a simple two-step cloning to generate lentiviral vectors expressing two single guide RNAs (sgRNAs) simultaneously.
Thanks to its design, DECKO can be used for pooled library screens.
The main characteristics of the system are:
The pair of sgRNAs driven by two different promoters, to avoid interference.
The vector contains two selectable markers: Puromycin and mCherry fluorescent protein for FACS applications.
It can be used for transfection or lentiviral infection.
Adaptable for single-gene targeting and large-scale pooled libraries.
See the original paper for further details.
Vectors available here: https://www.addgene.org/Rory_Johnson/
The latest DECKO protocol can be found here.
CRISPETa is a pipeline for paired sgRNA design, such as required by DECKO. It is available in both user-friendly webserver and standalone versions. CRISPETa can be run on any number of targets – from one to thousands. At present, designs can be performed for five genomes: human, mouse, zebrafish, fruitfly and worm.
Access it here: http://crispeta.crg.eu/
Because molecular functions depend on physical interactions, and physical interactions in turn depend on co-localization, we expect subcellular localization of lncRNA to be crucial for understanding their roles and regulation in cells. Hence, we approach the categorization of lncRNAs by creating subcellular maps of lncRNAs in human cell lines. These maps represent the basis to study gene features, sequence domains and post-transcriptional regulation of lncRNAs with the same subcellular fate.
An important source of confusion over RNA localisation arises from how we define nuclear / cytoplasmic enrichment for polyA+ RNA. Conventionally, relative localisation is defined as the ratio of concentrations of RNA between two compartments. Conversely, absolute localisation is defined as the ratio of the number of molecules between two compartments. We recently developed a method to infer absolute localisation from RNA-seq data, and compared the results to relative quantification from the same cells. More information can be found in our recent article (Carlevaro-Fita and Johnson, manuscript submitted). The entire set of quantification can be downloaded here:
We have also made relative quantification maps available to the research community in a webserver: lncATLAS (see below)
LncATLAS is a user-friendly web-based visualization tool to view and download the relative localization of lncRNAs in human cells based on RNA-seq. Website is available at: http://lncatlas.crg.eu
For a given lncRNA of interest, using lncATLAs you can:
1) Inspect the cytoplasmic-nuclear localisation of your gene of interest (GOI) in 15 different cell lines.
2) Inspect the cytoplasmic-nuclear localisation of your GOI with respect to the distribution of mRNAs and lncRNA genes.
3) Inspect the localisation of your GOI at sub-compartment level.
Usage and source of data used are described here: Mas-Ponte et al. (2017).