scRNA-seq

Top PDF scRNA-seq:

scAlign: a tool for alignment, integration, and rare cell identification from scRNA seq data

scAlign: a tool for alignment, integration, and rare cell identification from scRNA seq data

Like all other supervised and unsupervised alignment methods, scAlign makes an underlying assumption that the two or more conditions used as input make sense biologic- ally to align. That is, alignment methods assume that there are at least some common cell types between conditions that share some functional origin or similarity, that should be matched across conditions, even if they differ in state (e.g., expression) due to condition or stimulus. To the best of our knowledge, there is no procedure or strategy for identifying datasets that should not be aligned due to lack of matching cell types. As a result, any alignment method when applied to datasets which contain unrelated or dis- similar cell types can potentially lead to false positive matchings. This limitation is not specific to alignment methods; scRNA-seq analysis tools designed for other pur- poses, such as trajectory inference, assume that a trajectory exists in the input data in the first place, and will return a trajectory regardless of whether it makes sense to do so. scRNA-seq tools in general are useful for generating hy- potheses (in the case of alignment, hypotheses about which cell types match across conditions, and how they differ), but need to be used cautiously by downstream users.
Show more

21 Read more

scRNA seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation

scRNA seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation

Hypothermic preservation of intact tissues, as used during organ transplant procedures, has been opti- mized to reduce the effects of ischemia (lack of blood supply) and hypoxia (oxygen deficiency) during stor- age at 4 °C [29]. Clinically, the kidneys are trans- planted with a median cold ischemic time of 13 h and maximum around 35 h; the lungs with median 6.4 h and maximum 14 h. However, the human kidney and pancreas maintain their function even after 72 h stor- age in the University of Wisconsin solution, and the liver for up to 30 h [30]. Wang et al. [31] demon- strated that intact mouse kidneys could be stored in HypoThermosol FRS media for up to 72 h before dis- sociation and scRNA-seq without altering the tran- scriptomic profile or cellular heterogeneity of kidney- resident immune cells. Considering human tissue re- search, this method has major advantages. Firstly, it requires no processing of the sample at the collection site; the clinician can immerse an intact piece of tis- sue in cold HypoThermosol FRS solution and store or ship this on ice to the receiving laboratory, where all other tissue processing can take place. This can be done in a standardized and reproducible way. Sec- ondly, it utilizes a commercially available, chemically defined, non-toxic, and ready-to-use hypothermic preservation solution, designed to mimic clinical organ preservation.
Show more

16 Read more

scBFA: modeling detection patterns to mitigate technical noise in large scale single cell genomics data

scBFA: modeling detection patterns to mitigate technical noise in large scale single cell genomics data

A surprising finding was that HEG gene selection led to a systematically better cell type identification for every tested method in almost all datasets, compared to HVG selection (Fig. 3c). HVG selection anecdotally is the standard criterion upon which variable genes are typic- ally selected during preprocessing [42], suggesting at least for cell type identification, HEG selection may lead to improved performance regardless of the method used. While single-cell genomic data from different modal- ities, such as scATAC-seq, have similar data structure as scRNA-seq data, the analysis tools and pipelines devel- oped to date for those two technologies are largely mu- tually exclusive. Here, we show that scBFA generalizes to other single-cell genomic modalities and outperforms the existing methods for cell type identification for scATAC-seq datasets as well, even those that take ad- vantage of auxiliary data such as transcription factor mo- tifs and distance to transcription start sites [35]. We expect our results to generalize to other single-cell gen- omic modalities such as single-cell methylation or his- tone modification data.
Show more

20 Read more

Human bone marrow assessment by single-cell RNA sequencing, mass cytometry, and flow cytometry

Human bone marrow assessment by single-cell RNA sequencing, mass cytometry, and flow cytometry

New technologies for characterizing cell populations are being implemented to more deeply describe the cell surface receptor phenotype and gene transcriptional signature at the single-cell level (1, 2). Benefits of sin- gle-cell approaches include examination of heterogeneity within the sample, and the most recent advances permit use of samples with very limited cell numbers for high-dimensional characterization of cell surface phenotype or transcriptome. Single-cell RNA sequencing (scRNA-Seq) has been used to elucidate hemato- poietic differentiation (3–5) and immune cell subsets (6), including dendritic cells and monocytes (7), and innate lymphoid cells (8). Mass cytometry has been applied to the study of tissue-infiltrating immune cells (e.g., melanoma, ref. 9; renal cell carcinoma, ref. 10; lung cancer, ref. 11; and breast cancer, ref. 12).
Show more

10 Read more

A validated single-cell-based strategy to identify diagnostic and therapeutic targets in complex diseases

A validated single-cell-based strategy to identify diagnostic and therapeutic targets in complex diseases

network science have been applied to analyze genome- wide data from different diseases [25, 26]. We and others have used such methods to identify biomarkers and therapeutic targets based on bulk expression profiling data of individual cell types [12, 27, 28], as well as to de- velop a mathematical framework to rank network nodes [29]. A core concept is that the most interconnected nodes in a network tend to be most important. Indeed, a large body of evidence supports that such analyses can be formalized and used to find crucial nodes in a wide range of systems, ranging from proteins essential for cell survival to relevant web pages in a Google search [30, 31]. Because many cell types are not accessible from pa- tients, we started with a mouse disease model. We fo- cused on a mouse model of antigen-induced arthritis (AIA), because it allows potential analysis of all cells in the target organ, joints, and adjacent lymph nodes. We used our recently developed method for translational scRNA-seq [32]. The resulting MCDMs and comple- mentary analyses of patients with RA and 174 other dis- eases supported multicellular pathogenesis of great complexity. Our analyses indicate that network analyses of the MCDMs can help to prioritize cell types and genes for diagnostics and therapeutics. General applic- ability of our strategy was supported by prospective diagnostic studies of 151 patients with 13 autoimmune, allergic, infectious, malignant, endocrine, metabolic, and cardiovascular diseases, as well as 53 age- and sex- matched controls. The therapeutic potential of the strat- egy was supported by network-based analyses of these diseases, as well as a study of the mouse model of arth- ritis. Taken together, our results support that our strat- egy may have the potential to prioritize therapeutic and diagnostic targets in complex diseases.
Show more

25 Read more

Going the Distance: Optimizing RNA-Seq Strategies for Transcriptomic Analysis of Complex Viral Genomes

Going the Distance: Optimizing RNA-Seq Strategies for Transcriptomic Analysis of Complex Viral Genomes

Nowadays, many research labs have either direct (integrated into the research group) or indirect (as a collaboration or core facility) access to bioinformaticians who can turn raw sequencing data into lists of regulated genes supported by statistical significant values (P values) corrected for multiple testing. To ensure success, it is crucial to involve these individuals in the planning stages so that subsequent analyses can be tailored to the viral genome of choice and to avoid the pitfalls that come with applying “one size (does not) fits all” approaches. Planning discussions should address issues of reproducibility (biological replicates), batch effects, availability, and the quality of gene annotations for the organism(s) of interest. Establishing and optimizing analytical pipelines using test data sets prior to generating the final experimental data sets can also help to identify and preempt critical issues that might otherwise necessitate an experimental redesign and resequencing, an expensive proposition in both time and money. It is also critical that bioinformaticians move away from standard RNA-Seq analysis pathways when dealing with viruses and, with guidance, become aware of the biological characteristics and genome structure of the virus being studied. For instance, the existence of polycistronic arrays in herpesviruses limits scRNA-Seq gene expression analyses, because all transcripts generated across the polycistronic unit share the same 3= end. This can significantly impact the alignment of viral reads to a transcriptome, because many scRNA-Seq software packages will by default discard reads that map identically against the 3= ends of multiple transcripts. Thus, it is necessary to represent polycistronic genes that share the same 3= ends as a single transcription unit, which diminishes the yield of biologically relevant information. Naturally, it is also critical that the 3= ends of these transcription units are accurately mapped prior to embarking upon scRNA-Seq projects; otherwise, meaningful biological data will likely end up being discarded. This is a frequent problem with viral annotations specifying only the boundaries of the coding sequences (ORFs) rather than the transcript as a whole. The reads obtained during scRNA-Seq are typically limited to the 3= untranscribed region (UTR) and may not be correctly assigned to a recognized gene. Another potential confounder is that many viral genomes contain duplicated regions which can result in short or long sequence reads being automatically discarded (no single mapping location) or their distribution distorted (sequence reads not allocated correctly between duplicated units), which can influence resulting TPM counts.
Show more

9 Read more

Discovering myeloid cell heterogeneity in the lung by means of next generation sequencing

Discovering myeloid cell heterogeneity in the lung by means of next generation sequencing

“secretion”, “regulation of cell migration” and “extracellular matrix organization”. By using scRNA-seq on a bleomycin- induced lung fibrosis mouse model, Aran et al. [63] identi- fied a profibrotic macrophage subpopulation expressing the specific markers CX3CR1 and SiglecF, which localized at the sites of fibrotic scarring where Pdgfra + and Pdgfrb + fibroblasts accumulated. This finding suggests an important role of the macrophage subpopulation in the regulation of fibroblasts. Regarding the source of the profibrotic macro- phages, the study showed that the macrophages partially shared a gene expression profile with both alveolar and interstitial macrophages, suggesting a transitional state of resident lung macrophages that is initiated following injury. Not only macrophages but also distinct monocytes, charac- terized by Ceacam1 + Msr1 + Ly6C − F4/80 − Mac1 + and termed segregated-nucleus-containing atypical monocytes (SatMs), were also found in the bleomycin-induced fibrosis mouse model, suggesting a role for these cells in the progression of fibrosis. Notably, the differentiation of SatMs was dependent on CCAAT/enhancer binding protein β (C/EBPβ), which usually plays a crucial role in the maturation and differenti- ation of granulocytes [37]. These results indicate that target- ing myeloid cells is a potential novel strategy for the prevention and therapy of lung fibrosis.
Show more

10 Read more

The small intestine, an underestimated site of SARS-CoV-2 infection: from Red Queen effect to probiotics

The small intestine, an underestimated site of SARS-CoV-2 infection: from Red Queen effect to probiotics

The public scRNA-seq data of human ileum immune cells (GSE134809) were downloaded. The data were generated from Crohn ’ s disease lesions. Only the uninflamed samples were subjected to our analysis. In order to remove the batch effect, data integration was performed following the Seurat integration procedure. The first 15 PCA components were utilized for further two-dimensional t-distributed stochastic neighbor embedding (tSNE). K.param of 10 was used in the function FindNeighbors, and resolution of 1.5 was used in the function FindClusters. The identity of each cluster was defined by the marker genes for each cell type.
Show more

18 Read more

DeepImpute: an accurate, fast, and scalable deep neural network method to impute single cell RNA seq data

DeepImpute: an accurate, fast, and scalable deep neural network method to impute single cell RNA seq data

Dropout values in scRNA-seq experiments represent a serious issue for bioinformatic analyses, as most bio- informatics tools have difficulty handling sparse matri- ces. In this study, we present DeepImpute, a new algorithm that uses deep neural networks to impute dropout values in scRNA-seq data. We show that Dee- pImpute not only has the highest overall accuracy using various metrics and a wide range of validation ap- proaches, but also offers faster computation time with less demand on the computer memory. In both simulated and experimental datasets, DeepImpute shows benefits in increasing clustering results and identifying significantly differentially expressed genes, even when other imputation methods are not desirable. Furthermore, it is a very
Show more

14 Read more

Huh_unc_0153D_18545.pdf

Huh_unc_0153D_18545.pdf

There are couple of parameters that need to be specified by the user for better results. The resolution parameter is influential on cluster results as it sets the ’granularity’ of clustering. Larger values lead to greater number of clusters and Seurat finds that setting this resolution pa- rameter between 0.6-1.2 typically returns good clustering results. Another parameter that needs to be specified is the number of PCs to use in the clustering. Users can examine the plot of the standard deviation of principal components and determine a cutoff at the elbow of the plot. This decision can be rather arbitrary when there is not a clear elbow. Another option is to use the embedded JackStrawPlot function in the Seurat R package, to discover significant PCs. Overall, Seurat includes all important features of scRNA-seq clustering, however, the ambiguous decision needed to make for the resolution parameter and the number of PCs to use, may pose a challenge for potential users. Aside from clustering, Seurat provides many convenient downstream analysis features including finding differentially expressed genes and t-SNE visualization.
Show more

113 Read more

Comparative analysis of sequencing technologies for single cell transcriptomics

Comparative analysis of sequencing technologies for single cell transcriptomics

fragments. In cPAS-based sequencing, the template cDNA is first fragmented and size selected (200–500 bp). The tem- plate undergoes four sequential rounds of adaptor ligation, circularization, and cleavage, generating a final circularized template with four unique adaptors. The circular templates undergo rolling circle amplification (RCA) to produce a large mass of DNA concatemers (DNBs) and are finally immobilized and sequenced on a flow cell using combina- torial probe-anchor synthesis (cPAS). Across the flow cell, the DNBs bind to an anchor and fluorescent probe (com- plementary to adaptors). The probes are degenerate (apart from the first position) and capture the first base at either end of the anchor. Each sequencing cycle consists of re- moving the previous probe, re-ligating to the same anchor with different fluorescent probes, and sequence determin- ation. This cycle is repeated for each of the remaining three adapter sequences to generate paired-end reads (reviewed in [8]). The BGISEQ-500 platform has been previously ap- plied to detection of small noncoding RNAs [9], human genome re-sequencing [10], and palaeogenomic ancient DNA sequencing [11], but not to scRNA-seq.
Show more

8 Read more

SCRABBLE: single cell RNA seq imputation constrained by bulk RNA seq data

SCRABBLE: single cell RNA seq imputation constrained by bulk RNA seq data

All imputation methods above recover dropout values using scRNA-seq only. Here, we describe the SCRAB- BLE algorithm for imputing scRNA-seq data by using bulk RNA-seq as a constraint. SCRABBLE only requires consistent cell population between single-cell and bulk data. The bulk data represent the unfractionated com- posite mixture of all cell types without sorting them into individual types. For many scRNA-seq data, there are usually existing bulk data on the same cell/tissue. And it is becoming increasingly common to collect matched bulk data when a new scRNA-seq experiment is per- formed. Bulk RNA-seq data allows SCRABBLE to achieve a more accurate estimate of the gene expression distributions across cells than using single-cell data alone. SCRABBLE is based on the framework of matrix regularization that does not impose an assumption of specific statistical distributions for gene expression levels and dropout probabilities. It also does not force the
Show more

12 Read more

EmptyDrops: distinguishing cells from empty droplets in droplet based single cell RNA sequencing data

EmptyDrops: distinguishing cells from empty droplets in droplet based single cell RNA sequencing data

Here, we propose a new method for detecting empty droplets in droplet-based single-cell RNA sequencing (scRNA-seq) data. We estimate the profile of the ambi- ent RNA pool and test each barcode for deviations from this profile using a Dirichlet-multinomial model of UMI count sampling. Barcodes with significant deviations are considered to be genuine cells, thus allowing recovery of cells with low total RNA content and small total counts. We combine our approach with a knee point fil- ter to ensure that barcodes with large total counts are always retained. Using a variety of simulations, we demon- strate that our method outperforms methods based on a simple threshold on the total UMI count. We also apply our method to several real datasets where we are able to recover more cells from both existing and new cell types.
Show more

9 Read more

Genotype free demultiplexing of pooled single cell RNA seq

Genotype free demultiplexing of pooled single cell RNA seq

Here we introduce a simple, accurate, and effi- cient tool, mainly for droplet-based scRNA-seq, called "scSplit", which uses a hidden state model approach to demultiplex individual samples from mixed scRNA- seq data with high accuracy. Our approach does not require genotype information from the individual sam- ples to demultiplex them, which also makes it suit- able for applications where genotypes are unavailable or difficult to obtain. scSplit uses existing bioinfor- matics tools to identify putative variant sites from scRNA-seq data, then models the allelic counts to assign cells to clusters using an expectation-maximisation framework.
Show more

12 Read more

Comprehensive single cell mRNA profiling reveals a detailed roadmap for pancreatic endocrinogenesis

Comprehensive single cell mRNA profiling reveals a detailed roadmap for pancreatic endocrinogenesis

co-activator Eya2, regulator of G-protein signaling 16 (Rgs16), semaphorin 3G (Sema3g), the cancer-associated cell-surface antigen six-transmembrane epithelial antigen of the prostate 1 (Steap1), and the uncharacterized gene Gm8773 (Fig. 5C; Fig. S2B). Notably, for most of these genes no function associated with pancreas development has been reported so far. However, genes such as Neurod2, Sult2b1 and Lynx1 are involved in neuronal development and function, which highlights the similarities of the developmental programs required for the formation of EPs and neuronal cells. We also found comparable expression levels of several of these signature genes in E15.5 pancreatic epithelial cells from WT and homozygous NVF pancreata, which further supports proper functioning of the fusion protein (Fig. S2C). Furthermore, scRNA-seq analysis of human fetal pancreatic cells has indicated the expression of some of these genes, including EYA2 and RGS16 in human endocrine progenitors (Ramond et al., 2018).
Show more

16 Read more

Single-cell transcriptome analysis of lineage diversity in high-grade glioma

Single-cell transcriptome analysis of lineage diversity in high-grade glioma

PJ017 and PJ032 are notable not just because of their unbranched structure, strong immunological gene ex- pression, and loss of neural lineage identity, but also be- cause they are the only two tumors in our data set where transformed glioma cells are in the minority of profiled cells. PJ017 is 48% myeloid cells, 5% T cells, and 45% transformed glioma cells; PJ032 is 57% myeloid cell and 43% transformed glioma cells based on scRNA-Seq. While the observation of extensive myeloid infiltration in tumors that express high levels of inflammatory markers is intriguing, it also raises the possibility that our observa- tion arises from experimental cross-contamination either during mRNA capture or library construction. We compared the transformed cells in PJ017/PJ032 to the remaining tumors after stringent filtration of the dif- ferentially expressed genes to remove any genes that are more highly expressed in the myeloid compart- ment of these tumors and could result in cross-contamination (see “Methods”). Figure 5b shows that, despite our stringent filter, the transformed cells in PJ017/PJ032 express high levels of immune genes compared to the remaining tumors (Additional file 1: Table S3), thus indicating that tumor cells expressing an immune-like signature may recruit infiltration of mye- loid cells. Interestingly, we were able to validate this find- ing in an independent patient cohort (Additional file 1: Figure S15) by re-analyzing an earlier, smaller-scale GBM data set from Patel et al. where one out of the five tumors profiled expressed this same signature at high levels among transformed cells [4].
Show more

15 Read more

Single-cell transcriptomics–based MacSpectrum reveals macrophage activation signatures in diseases

Single-cell transcriptomics–based MacSpectrum reveals macrophage activation signatures in diseases

Next, we applied MacSpectrum to scRNA-seq profiles of ATMs to dissect their functional contribution to tissue homeostatic maintenance and actions under stress (Figure 5, B–D). ATMs are the most abundant immune cells in white visceral fat (10) and play crucial roles in maintaining adipose tissue function and immunological homeostasis (11). ATMs increase significantly in quantity and proportion in obese visceral stroma, and the whole ATM population has been shown to shift from a less inflammatory M2-like to a proinflammatory M1-like activation state (13). Given such a distinct phenotypic shift and proportional change between M1-/M2-like states in lean and obese ATMs, single-cell transcriptome analysis would be expected to display clear clusters of M1 or M2 cells. In practice, however, lean and obese ATMs analyzed using the t-SNE algorithm overlapped with themselves in a cluster that was separate from polarized M1 or M2 BMDM clusters (Figure 1D and Supplemental Table 1). Of note, classic M1/M2 signature genes were rarely or weakly expressed in ATMs and/or presented no obvious enrichment in either lean or obese pop- ulations (Supplemental Figure 1C), despite a report that obese ATMs contain more M1-like subsets (17).
Show more

22 Read more

Systematic comparative analysis of single nucleotide variant detection methods from single cell RNA sequencing data

Systematic comparative analysis of single nucleotide variant detection methods from single cell RNA sequencing data

Here, we systematically analyzed and compared seven SNV-calling methods on scRNA-seq data. We found that the detection performances of these tools highly de- pend on the read depths, genomic contexts, functional regions, and variant allele frequencies. When using SMART-seq2, the median sensitivities are above 90% for most tools for homozygous SNVs in high-confidence exons with sufficient read depths (more than 10). How- ever, the sensitivities would decrease when detecting SNVs in regions with high GC content, high identity, or low mappability for all analyzed tools. In addition, low supporting reads and low variant ratios could also re- duce the sensitivities. Low read depths could be a result of biologically low expressions or technical bias like dropout events from scRNA-seq. Our results suggest that the improvement of sequencing methods to elimin- ate dropout events may greatly improve the variant de- tection effect. The FDRs were generally low (< 1%), which were less impacted by read depths or VAFs com- pared with sensitivity. Notably, SAMtools, FreeBayes, and Strelka2 achieved the best performance in most situ- ations, among which SAMtools exhibited higher sensi- tivity but lower specificity, especially when detecting SNVs located in high-identity regions or introns. Free- Bayes showed high sensitivities with high VAFs, while the sensitivities decreased with low VAFs, and the speci- ficities were not stable among different datasets. Strelka2 showed stable TPRs and FDRs in different genomic re- gions and different datasets, while its sensitivities with low read depths were inferior to SAMtools and Free- Bayes. In contrast, MuTect2 did not perform well in most cases, which might be because of the lack of matched normal samples. VarScan2 showed the highest specificities, but it needed more supporting reads to gen- erate confident results. Overall, our results highlight the importance of stratification, for example, by genomic contexts or functional regions, in variant calling for scRNA-seq data, which should be noticed in future benchmarking studies and variant-calling applications.
Show more

15 Read more

CellFishing jl: an ultrafast and scalable cell search method for single cell RNA sequencing

CellFishing jl: an ultrafast and scalable cell search method for single cell RNA sequencing

The DEG detection method introduced in this paper assumes that the database encompasses enough cells to retrieve a small group of homogeneous neighbors con- taining no biological differences, and each UMI count follows a Poisson distribution. The former can be jus- tified by considering the high-throughput characteris- tic of recent scRNA-seq experiments, the feasibility of which we have demonstrated using the TabulaMuris data set; the latter is experimentally verified by several works [8, 43, 44]. However, some highly expressed genes, such as Malat1, seem to be exceptions to these model assump- tions, and as a consequence, despite it being unlikely that Malat1 is related to biological differences between cells, it was falsely detected as a DEG in many cells within our experiment. We predict that this problem can be partially mitigated by replacing point estimation of the mean expression with some interval estimation, such as Bayesian inference.
Show more

23 Read more

Systematic expression analysis of ligand-receptor pairs reveals important cell-to-cell interactions inside glioma

Systematic expression analysis of ligand-receptor pairs reveals important cell-to-cell interactions inside glioma

revealed that IDH-A gliomas were highly infiltrated by microglia/macrophage cells, but they did not ex- plore the interactions between tumor cells and mac- rophages in gliomas [9]. Cancer stem cells (CSCs) is also a small subpopulation of tumor cells with the ability of self-renew, differentiate and responsible for drug resistance and cancer recurrence [10–12]. CSCs and TAMs are enriched around blood vessels [13, 14], and both of them are important for promoting tumor growth by intercellular signaling to support diverse biological processes [15, 16]; however, the in- teractions between TAMS and CSCs are less explored. Cell-to-cell communications between diverse cell types are mediated by specific pairs of secreted ligands and cell-surface receptors. Chakrabarti et al. found that macro- phages may nourish stem cells and the stem cells could se- crete ligand DLL1 to activate Notch pathway to increase the expression level of Wnt ligand in macrophages for promoting the function and survival of stem cells [17]. CSCs of gliomas were also found to recruit TAMs by se- creting POSTN to support the growth of glioblastoma (GBM) [18]. Nevertheless, those findings are done by functional experiments, which are time-consuming and limited to a single interaction each time. The scRNA-seq data provide great opportunities for interrogating the genome-wide crosstalk between glioma CSCs and TAMs.
Show more

10 Read more

Show all 997 documents...