V. Single-cell transcriptomics: a powerful tool to analyze cell composition of tissues with high
V.1 Challenges of scRNA-seq
Compared with the many improvements that have already been introduced in experimental protocols of single cell transcriptomics, cell dissociation is probably one of the most important and challenging steps of all the process, although it is essentially based on protocols that were elaborated long time ago, when people were developing cell culture protocols. From one sample to the other, cell dissociation can be easy or difficult, and there is no evidence that the chosen protocol is necessary able to extract quantitatively all the cells to be analyzed.
Various reasons can contribute to a bad cellular extraction: cell-cell interactions, the cells are making strong conntections whith their neighbours such as tight junctions, as in human adult kidney or lung, which makes enzymatic digestion particularly inefficient (dissociation can be less challenging in the case of developmental studies, made on fetal or infant tissue); cellular morphology, as in adipocytes, which become very fragile after dissociation; cellular ultrastructure, as in cardiac or in muscular tissues, with the fusion of myoblasts in myotubes; cellular size, as in neurons, where the isolation of just one cell can represent a very tedious work. In each of these situations, optimized protocols have to be set up, which have to take into account the specificities of the sample to be analysed. Tissue cell dissociation needs of an accurate and
efficient protocol designed for each type of tissue and cell type. It is important to avoid cell death and transcriptomic modifications due to the process of dissociation. Controls have to be performed in order to know the possible changes in the single cell profiling that can result from this dissociation process. For example, comparison to in situ measurements can help to determine the change in single-cell profiles that were introduced during dissociation.
When cell dissociation becomes ineffective, or too damaging to cells, an alternative is to perform isolation of nuclei, and perform single nucleus RNA-Seq (Habib et al., 2017). This technique, in which nuclei can be directly isolated from frozen tissue, quantify the nuclear RNA. Surprisingly, there is a good correlation between this type of measurement and the more classical quantification of cytoplasmic RNA (or more likely of the pool between cytoplasmic and nuclear RNA). The big advantage of this technique is to allow profiling of archived material.
V.1.2 Cell isolation
The big challenge of the technique is, in fact, a quantitative isolation of the whole types of cells in a “healthy” form, so that their analysis can be undergone. The situation is very different between immune cells which are non-adherent cells, and most others cell types whichmust be dissociated from their tissues. The challenge is to dissociate the cells of the tissue of interest using enzymatic or mechanical techniques avoiding cell death and transcriptome modifications due to the dissociation.
Once a single cell suspension is obtained, there are different methods to isolate individual cells for their analysis.
a. The first protocols were using FACS sorting. A major advantage of this approach is to be able to choose the cells that will be analysed by using, for instance, immunofluorescent markers. It is then possible to isolate rare cell populations. There have been several protocols adapted to merge FACS sorting and scRNA- seq andthe use of micro-well plates allows loading of cells into minuscule wells with the subsequent reduction in the reagent volume used. It remains that a main disadvantage of the approach comes from the relatively large volumes of reagents that are required at the different steps, which makes this approach quite expensive, despite a relatively low throughput (a few hundreds of cells).
b. Other systems developed for single cell isolation are based on microfluidics. One of them, the C1 Single-cell Autoprep from the company Fluidigm is able to isolate 96 or 800 cells in isolated chambers inside a chip (IFC, Integrated Fluidic Circuit) and inside these chambers the cells are going to be lysed, their mRNA
(nanoliters), which reduces a lot the use of the different reagents, hence the cost of each experiment and improving efficiency. Another advantage is that the possibility to image the cells inside their chambers, in order to control the occurrence of doublets. The C1 machine also offers the possibility to implement and develop novel protocols. It is with this device that our group has developed a cost and labor effective 5’ selective single cell transcriptomic profiling approach suitable for Ion Torrent and Illumina sequencers (Arguel et al., 2017). Disadvantages of this system are: (1) the low capture efficiency: to only capture 96 cells, a suspension with thousands of cells is needed, which is not always possible; (2) the size restriction of the cells to load in the IFC, since there are different IFC depending on the size of the chambers, meaning that each cell suspension will have to be uniform in size; (3) High rate of doublets which can only be detected upon cautious imaging of each well, which is not possible using an 800-cell IFC.
c. Droplet based microfluidics methods such as inDrop (Klein et al., 2015), Drop-seq (Macosko et al., 2015) or Chromium from 10X genomics (Zheng et al., 2016). These methods load single cells in aqueous droplets which are emulsioned in an oil phase, then fusioned with other droplets containing barcoded beads and the different reagents necessary to perform cell lysis, reverse transcription and cDNA amplification. These methods are largely used nowadays. They allow the analysis of thousands of single cells in one single experiment, and work well with limited amounts of isolated cells.
V.1.3 Cell barcoding.
Adding a barcode to the all the cDNAs generated from each single cell allows multiplexing, meaning that at the end of sequencing, it will be possible to deconvolute the signal and attribute each transcript to an identified cell, in just one step. The approach reduces in an important manner the techniqual biases, since most of the steps are performed in a unique tube for all the cells. Less reagents and less processing are required and reproducibility is increased. Islam et al. published the first protocol or scRNA-seq using cell barcoding (Islam et al., 2011).
Barcoding of the cells captured in droplets or in micro-well plates is accomplished by the use of beads, these beads have the RT primers (generally a poly-T) containing the cell barcode sequence, this way each cDNA from every single cell can be barcoded in each isolate. The problem of multiplexing is the high number of barcodes needed (as much as cells you want to process), for large cell populations analysis could be a very costly step in the protocol. There are different ways to avoid this, one is to combine short barcodes into longer ones (H. C. Fan, Fu and Fodor, 2015; Klein et al., 2015). With this method, it is possible to reach a number of 147 456 barcodes. In a second approach, very long random barcodes are synthetized, they have to be long to avoid doublets in their synthesis (Macosko et al., 2015; Gierahn et al., 2017). This allows the synthesis of 16.7 million barcodes.
V.1.4 Untargeted amplification of transcriptomes.
As mentioned before, the amount of RNA needed to perform standard bulk RNAseq reaches the scale of nanograms to micrograms. For a unique single cell, the amount of RNA is in the range of 1 to 50 pg. A solution to get sufficient material from so little RNA is the retrotanscription of the mRNA to cDNA followed post amplification.
For the untargeted retrotranscription (RT) of the mRNA most of the techniques use the poly A tail of the mRNA to start the RT using a poly (T)-oligonucleotide that contains an adaptor sequence. Once the first strand of cDNA is generated, one possibility is that a transferase adds a polyA tail to the extremity γ’ of the first strand so using other oligo-poly-(T), with other adaptor incorporated, the RT can start to synthetize the second strand (Tang et al., 2009).
Fig. 27. Schematic of the single-cell whole-transcriptome analysis. After cell lysis, the mRNAs are reverse-transcribed into cDNAs using a poly(T) primer with anchor sequence (UP1) and unused primers are digested. Poly(A) tails are added to the first- strand cDNAs at the 3’ end, and second-strand cDNAs are synthesized using poly(T) primers with another anchor sequence (UP2). Then cDNAs are evenly amplified by PCR using UP1 and UP2 primers, fragmented, and P1 and P2 adaptors are ligated to the ends. From (Tang et al., 2009).
The most used method to ensure full transcription of the mRNA is the use of a Template-Switching- Oligo (TSO). This method relies on the capacity of transcriptases from a “Moloney” murine leukemia virus to add some nucleotides, mostly cytosines, when the RT reaches the end of the first strand. This addition of a short sequence at the end of the strand allows the polymerase to start the RT of the second strand, this method was used by Islam and colleagues (Islam et al., 2011) in their single-cell tagged reverse transcription (STRT) protocol.
V.1.5 Unique Molecular Identifiers
The group of Fu also developed UMI (Unique Molecular Identifiers) (Fu et al., 2011), which represents a powerful strategy to better control the biases introduced by the multiple cycle of polymerase chain reaction that were necessary to amplify the cDNA. To count single molecules, the technique entitled “stochastic labelling” consists in the attachment of a random set of labels at one extremity of the cDNA. With this technique they converted a population of identical DNA molecules in a population of distinct DNA molecules, differing by a random sequence located at one extremity of the DNA. UMIs are used as an internal validation control. During the RT, each molecule of cDNA is tagged with random sequence acting as a UMI. The UMI are going to reflect counting of the initial unamplified cDNA molecules that is much less biased than using number of reads per kilobase per million reads (RPKM). This approach was developed by Islam (Islam et al., 2014) and Jaitin (Jaitin et al., 2014).
Fig. 28. A schematic representation of the labeling process. An example showing four identical target molecules in solution. Each DNA molecule ran- domly captures and joins with a label by choosing from a large, nondepleting reservoir of m labels. Each resulting labeled DNA molecule takes on a new identity and is amplified to detect the number of k distinct labels. From (Fu et al., 2011).