Skip to contents

Description

View on DamageDetective website

Damaged cells are an artifact of single-cell RNA sequencing (scRNA-seq) data formed when cells succumb to stress before, or in the process of, being sequenced. As a result, the gene expression data captured for damaged cells fails to reflect viable cell states. Continuing an analysis with damage compromises its reliability as it is not possible to differentiate true from technical variability. Filtering damaged cells is therefore an essential step in scRNA-seq quality control (QC).

Current approaches detect damage according to deviations in cell-level QC metrics. This outlier-based detection assumes viable cells follow a unimodal distribution where being deviant is equivalent to being damaged. This assumption does not hold in heterogeneous scRNA-seq data and often introduces a filtering bias relative to cell type abundance. More recent approaches address this by detecting damage at a finer scale, computing deviations in isolation for populations with distinct distributions. However, this assumes all distinct distributions are associated with viable cell populations, meaning abundant damage is at risk of misclassification. Ultimately, the filtering decisions of current approaches are controlled by statistical definitions of deviation that do not always align with biological definitions of damage.

DamageDetective takes a different approach, rather than detecting damage by measuring the extent to which cells deviate from one another, it measures the extent to which cells deviate from artificially damaged profiles of themselves. This is inspired by the approach of DoubletFinder—a high-performing QC tool for filtering doublets. Using principal component analysis, it computes the proximity of true cells to sets of artificial cells with known levels of damage. The damage level of the set to which a true cell shows the highest proximity is assigned to the cell as a score ranging from 0 to 1. This score provides an intuitive scale for filtering damage that is standardised across cell types, sample origin, and experimental design.

DamageDetective requires only a count matrix to run and accepts input from the popularised R data types, Seurat and SingleCellExperiment objects, as well as direclty from alignment output (package tutorials). However, by accepting platform-agnostic input and providing output that is the same, it is designed to integrate seamlessly into any single-cell analysis workflow, including those outside of R.


Installation

Install DamageDetective from CRAN,

install.packages('DamageDetective')

Or through GitHub,

library(devtools)
devtools::install_github("AlicenJoyHenning/DamageDetective", build_vignettes = TRUE)

To verify installation, run the following to see if you can view the package vignette and the function help pages,

library(DamageDetective)
help(package = "DamageDetective")



Quick start

The demonstrations below can be followed immediately after loading the package and serve as a test to ensure all is running smoothly. For more detailed examples and explanations, please refer to the package articles available on our website.

Prepare input

Damage detection is carried out by the detect_damage function and requires a count matrix to run. We will use the dummy count matrix provided by DamageDetective, test_counts, a subset of the (kotliarov-pbmc-2020) PBMC dataset provided in the scRNAseq2^2 package.

library(DamageDetective)
library(Matrix)
data("test_counts", package = "DamageDetective")
dim(test_counts)
# [1] 32738   500


Select parameters for damage detection

ribosome_penalty

While detect_damage requires only a count matrix as input, there are optional parameters that control the implementation of the function. Of these, we recommend ribosome_penalty be adjusted for each dataset. This can be done automatically with select_penalty using the count matrix as input.

penalty <- select_penalty(
  count_matrix = test_counts,
  seed = 7
)
# Testing penalty of 1e-05...
# Testing penalty of 0.00501...
# Testing penalty of 0.01001...
# Testing penalty of 0.01501...
# Stopping early: dTNN is no longer improving.

penalty
# [1] 0.01001

filter_threshold

DamageDetective performs filtering using the damage level scores it computes and a threshold specifying the score above which cells will be filtered. This score reflects the estimated extent of damage in the cell and is taken directly from the proportion of cytoplasmic RNA loss simulated in the set of artificial cells that a true cell shows the greatest proximity to.

By default, DamageDetective offers the threshold of 0.7 where values greater than 0.7 reflect more permissive filtering and values closer to 0 reflect more stringent filtering. We recommend the default but suggest that if adjustments are made, they are informed by the output detect_damage plots, generate_plot = TRUE.


Run damage detection

Damage detection is run using the count matrix and ribosomal penalty as inputs. Below we have specified filter_counts parameter to be TRUE. This will use the default filter_threshold, 0.7, to detect cells for removal and returns the filtered count matrix.

# Perform damage detection
detection_results <- detect_damage(
  count_matrix = test_counts,
  ribosome_penalty = penalty,
  filter_counts = TRUE,
  seed = 7
)
# Simulating 1e-05 and 0.08 RNA loss...
# Simulating 0.1 and 0.3 RNA loss...
# Simulating 0.3 and 0.5 RNA loss...
# Simulating 0.5 and 0.7 RNA loss...
# Simulating 0.7 and 0.9 RNA loss...
# Computing pANN...

# View the resulting count matrix
dim(detection_results$output)
# [1] 32738   458

Output of test_counts with cells coloured according to the estimated level of damage


Alternatively, if filter_counts is set to FALSE, a data frame will be given as output containing the damage scores for each barcode. This is provided for the user if they wish to interact with the DamageDetective results directly. From here, a user can filter their data manually, as is done by filter_counts=TRUE automatically.

# Perform damage detection
detection_results <- detect_damage(
  count_matrix = test_counts,
  ribosome_penalty = penalty,
  filter_counts = FALSE,
  seed = 7
)
# Simulating cells between 1e-05 and 0.08 RNA loss...
# Simulating cells between 0.1 and 0.3 RNA loss...
# Simulating cells between 0.3 and 0.5 RNA loss...
# Simulating cells between 0.5 and 0.7 RNA loss...
# Simulating cells between 0.7 and 0.9 RNA loss...
# Computing pANN...

# View output
print(head(detection_results$output), row.names = FALSE)
#                     Cells DamageDetective DamageDetective_filter
#  TCTGGAAAGCCCAACC_H1B2ln6      0.03826609                   cell
#  CCGTTCATCGTGGGAA_H1B2ln2      0.23600000                   cell
#  CTTCTCTTCAGCCTAA_H1B2ln1      0.05565522                   cell
#  GGATTACAGGGATGGG_H1B2ln1      0.01044348                   cell
#  TCTATTGTCTGGTATG_H1B2ln2      0.02435478                   cell
#  ACGGGTCAGACAAGCC_H1B2ln6      0.27600000                   cell

# Filter matrix 
undamaged_cells <- subset(detection_results$output, DamageDetective < 0.7)
filtered_matrix <- test_counts[, undamaged_cells$Cells]
dim(filtered_matrix)
# [1] 32738   458

License

DamageDetective is made available for public use through the GNU AGPL-3.0

Authors

Alicen Henning
Stellenbosch University, Cape Town, South Africa
Bioinformatics and Computational Biology

This work was done under the supervision of Prof Marlo Möller, Prof Gian van der Spuy, and Prof André Loxton.

References

  1. McGinnis, C. S., Murrow, L. M., & Gartner, Z. J. (2019). DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. Cell Systems, 8(4), 329-337.e4. https://doi.org/10.1016/j.cels.2019.03.003

  2. Risso D, Cole M (2024). scRNAseq: Collection of Public Single-Cell RNA-Seq Datasets. doi:10.18129/B9.bioc.scRNAseq https://doi.org/10.18129/B9.bioc.scRNAseq, R package version 2.20.0, https://bioconductor.org/packages/scRNAseq.