Skip to contents

Tutorial overview

The goal of DamageDetective is to simplify the process of making informed and reproducible damaged cell filtering decisions. This tutorial provides an outline of the steps needed to achieve this:

  • Data preparation

  • Parameter selection

  • Damaged cell detection

Ensure the following packages are installed and made available in your R environment to follow along:

  • DamageDetective, ggplot2, Seurat, SeuratData, scRNAseq, SingleCellExperiment, and patchwork.


Workflow overview

The DamageDetective workflow is made up of two stages, first, creating artificial profiles of damage using the cells of an input data set, and second, comparing the input cells to the artificial profiles.

This approach is inspired by DoubletFinder—a high-performing tool for filtering doublets, another prominent scRNA-seq artifact. Here, artificial profiles of doublets are created through the random combination of existing cells. All cells are processed before being compared in a principal component analysis (PCA) from which proximity measures are taken and used to determine the proportion of a true cell’s neighbours that are of artificial origin (proportion of artificial nearest neighbours, or pANN). Cells that show the highest pANN, i.e., have the highest proximity to artificial profiles, are ultimately labelled as doublets in accordance to anticipated doublet formation rate.

In DamageDetective, a score is similarly calculated for each cell based on PCA comparisons where scores closer to one indicate a greater proximity to artificially profiles and a higher likelihood of damage. This score is used to filter cells, either automatically using the default threshold or manually according to a user-defined threshold.

How these scores are calculated and what exactly they describe will be explored briefly before heading into the demonstrations.

Creating artificial damage

The loss of plasma membrane integrity is a characteristic of cellular damage that has long since been exploited to assess cell viability. Compromised membranes allow cellular content, including RNA, to escape, leading to shifts in quality control metrics such as a reduction in library size and complexity.

An additional and fundamental consequence of membrane integrity loss is a shift in library composition. This occurs because not all RNA within a cell is equally vulnerable to loss. RNA enclosed by additional membranes, such as within mitochondria or nuclear speckles, is more protected than RNA freely present in the cytoplasm. As a result, viable cells have a higher proportion of cytoplasmic RNA, whereas damaged cells—where RNA has escaped—appear to have an elevated proportion of protected RNA. This forms the basis for simulating damage in DamageDetective.

Artificial damage profiles are generated by simulating the loss of cytoplasmic RNA to reflect varying levels of cellular damage. Each artificial cell is assigned a target damage level between 0 and 1, drawn from a normal distribution centered between 0.65 and 1. This range is based on the assumption that losing at least 65 % of cytoplasmic RNA represents a substantial level of damage warranting exclusion from downstream analysis. To simulate this, transcripts are sampled from each cell without replacement, according to gene abundance within the cell, until the desired proportion of RNA loss is achieved.

Simulations are grouped by cluster from the input data to generate cell type–specific profiles of damage. The goal is to create a pool of artificially damaged cells that reflects all cell types present in a heterogeneous sample. Depending on the target damage level, these artificial cells retain some of the original gene expression signatures of their cell type while exhibiting features typical of damage—such as reduced complexity, smaller library sizes, and increased mitochondrial RNA—resulting from the simulated loss of cytoplasmic RNA. This provides a realistic basis for comparison and, ultimately, a means of estimating true damage.

Estimating true damage by comparison to artificial damage

DamageDetective estimates the damage level of true cells by measuring their similarity to artificially damaged cells. Like DoubletFinder, this measurement is rooted in PCA. However, instead of using full gene expression values, it uses cell quality control (QC) metrics, including library size, complexity, and mitochondrial RNA proportion—a biologically informative and computationally efficient alternative.

Using Euclidean distances between the PCA embeddings of cells, a set of kk nearest neighbours is identified for each cell via the HNSW (Hierarchical Navigable Small World) algorithm. This is optimized for high-dimensional data and implemented in R using C++ through the RcppHNSW package.

The proportions of a cell’s nearest neighbours that originate from each artificial profile (pANN) are calculated. In other words, if there are five clusters in a dataset, there will be five sets of artificial profiles and five pANN scores for each cell. After scaling according to the size of each artificial profile, the maximum pANN is retrieved for each cell. This score naturally lies between 0 and 1 and indicates the similarity of true cells to damaged profiles.

The proximity score forms the main output of DamageDetective and can be thought of as answering the question: on a scale of 0 to 1, how likely is it that this cell has lost a considerable amount of cytoplasmic RNA? Or, equivalently, how likely is it that this cell is damaged and should be filtered?



Data preparation

DamageDetective operates with single cell data in the form of a compressed, column-oriented sparse matrix (dgCMatrix) in R. This format efficiently handles the sparse nature of single cell data by indexing only non-zero elements within each column, i.e., only the rows containing actual expression values. We will begin by showing how three common single cell data storage types can be converted to this form to act as the starting point for damage detection.

These include,

  1. Alignment output files
  2. Seurat object
  3. SingleCellExperiment (sce) object

Convert data to sparse matrix format

1. Alignment output

Alignment output comes in the form of three files,

  • features.tsv, containing the gene names
  • barcodes.tsv, containing the cell identifiers
  • matrix.mtx, containing the input values (gene expression values).

These can be compiled using ReadMtx offered by Seurat which simplifies the count matrix compilation process into one function. This includes the processes of decompressing zipped files, mapping feature names to HGNC gene symbols, and converting the matrix to sparse format. The ReadMtx() output can be used directly as input for DamageDetective.

We will use publicly available alignment data from the 10X Genomics website for demonstration, specifically the ‘1k PBMCs from a healthy donor (v3)’ dataset, which can be downloaded here following the link named “Feature / cell matrix (filtered)”. This should begin the download of a 9.6 MB directory containing ‘filtered_feature_bc_matrix’ sub-directory housing the alignment files.

# Set the file paths relative to location on your device
matrix_file <- "~/Projects/filtered_feature_bc_matrix/matrix.mtx.gz"     
barcodes_file <- "~/Projects/filtered_feature_bc_matrix/barcodes.tsv.gz" 
features_file <- "~/Projects/filtered_feature_bc_matrix/features.tsv.gz" 

# Construct the sparse matrix
alignment_counts <- Seurat::ReadMtx(
  mtx = matrix_file,
  cells = barcodes_file,
  features = features_file
)

Note: Please adjust the location, ~/Projects, according to where the files are saved on your device.

2. Seurat object

Seurat is one of the most popular packages for single cell analysis in R. Along with an extensive suite of functions, it offers its own S4 class of data storage for housing the count matrix, aptly named the Seurat object. By default, the count matrix is already in compressed column-oriented sparse form in the Seurat object and can be used as input for DamageDetective after it is extracted from the assay slot.

You can use the SeuratData package to retrieve a publicly available Seurat object to test this. This object stores the data from peripheral blood mononuclear cells (PBMCs) sequenced using the 10X Genomics platform.

devtools::install_github('satijalab/seurat-data')
library(SeuratData)

# Retrieve the dataset of interest
SeuratData::InstallData("pbmc3k")
data("pbmc3k")

# Extract the count matrix
pbmc3k_counts <- pbmc3k[["RNA"]]$counts

3. sce object

Bioconductor offers its own suite of packages for single cell analysis in R with their own single cell data storage type known as the SingleCellExperiment (sce) object. Here, the count matrix is stored as a delayed matrix. This format enables memory-efficient operations by delaying computations until explicitly requested, rather than storing all values in memory upfront. However, DamageDetective computations require repeated access to the matrix and delays at each request greatly accumulate to slow down this process. We therefore convert the delayed matrix to a sparse matrix via a dense matrix intermediate.

You can use the scRNAseq package to retrieve a publicly available sce object to test this. This object stores PBMC data of multiple samples from a study investigating influenza and yellow fever vaccine responsiveness. We will focus on the “234_d0” sample coming from an individual with a high vaccine responsiveness.

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("scRNAseq")
library(scRNAseq)

# Retrieve multisample dataset
pbmc_sce <- scRNAseq::fetchDataset("kotliarov-pbmc-2020", "2024-04-18")

# Extract sample of interest
metadata <- SummarizedExperiment::colData(pbmc_sce)
sample_sce <- subset(metadata, sample == "234_d0")
sample_sce <- rownames(sample_sce)

# Subset and convert to sparse format
pbmc_counts <- SummarizedExperiment::assay(pbmc_sce, "counts")
sample_counts  <- pbmc_counts[, sample_sce]
sample_counts <- as.matrix(sample_counts)
sample_counts <- as(sample_counts, "dgCMatrix")

Note: Like many single cell analysis packages, DamageDetective can only operate with one single cell dataset at a time. Hence why we work with only the “234_d0” sample above.

Parameter selection

Once the count matrix is in the correct format, the detect_damage function of DamageDetective can be run immediately. However, detect_damage accepts additional parameters. These parameters can be divided into two categories, those that alter the computation and result in a different output, computational parameters, and those that adjust how the user receives the output but cannot change the output itself, aesthetic parameters. Both will be explored briefly.

Computational parameters

Dataset-defined computational parameters

Of the parameters that affect the output of the detection algorithm, some have only one possible input based on the data being investigated. In many cases, these require no alteration from the defaults. These include,

Parameter Default Description
organism "Hsap" Specifies the organism the data was derived from. Supports "Hsap" (human) and "Mmus" (mouse).
annotated_celltypes FALSE Indicates whether cell types are already annotated. If TRUE, ensures simulations are distributed evenly across known cell types.

To use a non-default organism, specify the strings used to identify mitochondrial and ribosomal genes, and optionally, a set of nuclear-enriched genes.
For example, the format for human data is:

organism <- list(mito_pattern = "^MT-",
                 ribo_pattern = "^(RPS|RPL)",
                 nuclear = c("NEAT1", "XIST", "MALAT1"))

User-defined computational parameters

The remaining computational parameters have a wider range of input possibilities. While the user is free to adjust these as they see fit, we recommend using the defaults provided as they resulted in consistent and intended performance during package development. This being said, the ribosome_penalty parameter must be adjusted according to the input data and is explored in more detail below.

Parameter Default Description
seed User-defined Ensures reproducibility of the algorithm. While the seed affects random sampling, its impact is minimal unless using cells with unusually high expression and low sparsity.
kN One third of the dataset size Number of nearest neighbours considered when calculating the proportion of nearby damaged cells. Higher values increase chance of false detection due to unrelated artificial neighbours.
target_damage c(0.65, 1) Range of cytoplasmic RNA loss used to simulate damage. Higher values simulate more severely damaged cells, forming the basis of the comparison with true cells.
pca_columns c("log.features", "log.counts", "mt.prop", "rb.prop") Variables used for principal component analysis during damage detection. Includes traditional and novel quality control metrics and transformations.
filter_threshold 0.5 Score threshold (0 to 1) used to determine which cells are removed. Lower values enforce stricter filtering, while higher values are more permissive.
ribosome_penalty 0.5 Adjusts the likelihood of ribosomal RNA loss during simulation, correcting for observed discrepancies where ribosomal RNA is retained more than expected.
Computing the ideal ribosome_penalty

ribosome_penalty is a value between 0 and 1 that is multiplied by the probability scores of ribosomal RNA loss, reducing the scores and making it less likely for the ribosomal transcripts to be sampled for loss.

The impact of changing ribosome_penalty can be explored using the plots generated from simulate_counts below. The idea here is to see how well the artificial cells generated with a selected penalty describe the true cells. In other words, how well the coloured dots superimpose the grey dots. You will see that as you increase the penalty, i.e. go from values closer to 1 to values closer to 0, the coloured dots shift from an extreme position on the left-hand side of the plot to a more central position. At what point along this range true cells exist is dataset-dependent but generally lies closer to 0. Where,

  • count_matrix is the data in a sparse matrix format as prepared above,
  • ribosome_penalty is a numeric between 0 and 1, and
  • damage_proportion is a number between 0 and 1 specifying the amount of artificial cells to create relative to the input data (setting this to a lower value makes the computation faster).


no_penalty <- DamageDetective::simulate_counts(
  count_matrix = alignment_counts,
  ribosome_penalty = 1,  
  damage_proportion = 0.25,  
  target_damage = c(0.6, 1), 
  plot_ribosomal_penalty = TRUE,
  seed = 7
)

no_penalty_plot <- no_penalty$plot + ggtitle("No penalty") + 
  theme(plot.title = element_text(face = "bold", hjust = 0.5))

medium_penalty <- DamageDetective::simulate_counts(
  count_matrix = alignment_counts,
  ribosome_penalty = 0.2,  
  damage_proportion = 0.25,  
  target_damage = c(0.6, 1),
  plot_ribosomal_penalty = TRUE,
  seed = 7
)

med_penalty_plot <- medium_penalty$plot + 
  ggtitle("Average penalty (0.2)") + 
  theme(plot.title = element_text(face = "bold", hjust = 0.5))

high_penalty <- DamageDetective::simulate_counts(
  count_matrix = alignment_counts,
  ribosome_penalty = 0.01,  
  damage_proportion = 0.25,  
  target_damage = c(0.6, 1), 
  plot_ribosomal_penalty = TRUE,
  seed = 7
)

high_penalty_plot <- high_penalty$plot + 
  ggtitle("Heavy penalty (0.01)") + 
  theme(plot.title = element_text(face = "bold", hjust = 0.5))

no_penalty_plot | med_penalty_plot | high_penalty_plot 

Scatter plot showing the distribution of quality control  metrics in altered and unaltered counts.

*^* ribosome_penalty is a multiplicative reduction factor meaning a value of 1 is the same as introducing no penalty while values increasingly closer to zero introduce increasingly greater reductions.


From the above plots you can see how selecting an unideal ribosome_penalty can generate damaged profiles that do not describe the data well and, as a result, generate estimations of damage that do not describe the data well.

Selecting a ribosome_penalty that simulates RNA loss in a way that is relevant to the input data can be done through trial and error, similar to the plotting exercise above, or in an automated fashion using the select_penalty function,

selected_penalty <- DamageDetective::select_penalty(
  count_matrix = alignment_counts,
  max_penalty_trials = 5, 
  seed = 7, 
  verbose = TRUE
)
#> Testing penalty of 0.1...
#> Testing penalty of 0.15...
#> Testing penalty of 0.2...
#> Testing penalty of 0.25...
#> Stopping early: dTNN is no longer improving.
selected_penalty
#> [1] 0.2

As seen above, the select_penalty function generates artificial cells using different ribosome_penalty estimates and evaluates the similarity of the resulting artificial cells to true cells.

Here, ‘similarity’ refers to the shortest distance from each artificial cell to a true cell in PC space. The mean shortest distances are taken for each level of damage simulated. The penalty that generates the smallest mean distance across damage levels is selected as the ideal ribosome_penalty. In other words, a penalty that generates artificial cells that are the most similar to true cells over a wide range of damage levels is ideal. This can be viewed in greater detail by setting the return_output parameter to "full".

selected_penalty <- DamageDetective::select_penalty(
  count_matrix = alignment_counts,
  max_penalty_trials = 5, 
  seed = 7, 
  return_output = "full",
  verbose = TRUE
)

# View full results 
selected_penalty$penalty_results

# View selected penalty (lowest global mean)
selected_penalty$selected_penalty
#>   Penalty Global_mean     Mean_3     Mean_4     Mean_5     Mean_6
#> 1    0.10  0.07205541 0.04395754 0.04783755 0.06005843 0.13636814
#> 2    0.15  0.04990311 0.03003056 0.04784675 0.05462734 0.06710781
#> 3    0.20  0.05339491 0.03922892 0.04730901 0.05555464 0.07148706
#> 4    0.25  0.05346742 0.03445152 0.05333923 0.06202791 0.06405101
#> [1] 0.15

Though not strictly necessary, the select_penalty function can be adjusted to give a user more control over the testing process. This includes,

  • penalty_range setting a specific range of ribosome_penalty estimates
  • max_penalty_trials set a maximum number of attempts allowed
  • penalty_step setting the difference between each estimate
  • return_output returns a table showing the mean distances for each estimate giving a better idea of how the estimates differ.

select_penalty inherits additional parameters from the simulate_counts function that control other aspects of the damage simulation.


Aesthetic parameters

The remaining parameters control how the user receives the final output.

Parameter Default Description
filter_counts TRUE If TRUE, returns only the filtered count matrix based on damage classifications using filter_threshold. If FALSE, returns a data frame of damage scores per barcode for manual filtering or inspection.
palette User-defined Allows customization of the color palette used in damage score plots. Must be a vector of three colors indicating low to high damage severity.

penalty_plot <- DamageDetective::simulate_counts(
  count_matrix = alignment_counts,
  ribosome_penalty = selected_penalty$selected_penalty,  
  damage_proportion = 0.2,  
  target_damage = c(0.4, 1),
  plot_ribosomal_penalty = TRUE,
  palette = c("grey", "#BCEEFB", "#325EF7"),
  seed = 7
)


Damaged cell detection

Once the data is in sparse matrix form and an appropriate ribosome_penalty is selected, the detect_damage function can be run. Below, we have selected the default filter_counts=FALSE option to view the output directly.

# Run detection
detection_output <- DamageDetective::detect_damage(
  count_matrix = alignment_counts, 
  ribosome_penalty = selected_penalty$selected_penalty,
  seed = 7
)
#> Clustering cells...
#> For a (much!) faster implementation of the Wilcoxon Rank Sum Test,
#> (default method for FindMarkers) please install the presto package
#> --------------------------------------------
#> install.packages('devtools')
#> devtools::install_github('immunogenomics/presto')
#> --------------------------------------------
#> After installation of presto, Seurat will automatically use the more 
#> efficient implementation (no further action necessary).
#> This message will be shown once per session
#> Simulating damage...
#> Computing pANN...

Scatter plot showing the outcome damage detection where cells of the input data are coloured according to the estimated level of damage with grey being no damage,  purple being light damage, and red being heavy damage.


# View output
head(detection_output$output)
#>                                 Cells DamageDetective
#> AAACCCAAGGAGAGTA-1 AAACCCAAGGAGAGTA-1               0
#> AAACGCTTCAGCCCAG-1 AAACGCTTCAGCCCAG-1               0
#> AAAGAACAGACGACTG-1 AAAGAACAGACGACTG-1               0
#> AAAGAACCAATGGCAG-1 AAAGAACCAATGGCAG-1               0
#> AAAGAACGTCTGCAAT-1 AAAGAACGTCTGCAAT-1               0
#> AAAGGATAGTAGACAT-1 AAAGGATAGTAGACAT-1               0

From here, the matrix can be filtered according to a threshold damage value before continuing with the remainder of pre-processing steps required for a single cell analysis.

undamaged_cells <- subset(detection_output$output, DamageDetective <= 0.5)
filtered_matrix <- alignment_counts[, undamaged_cells$Cells]
dim(alignment_counts)
#> [1] 33538  1222
dim(filtered_matrix)
#> [1] 33538  1184


This could be done automatically if filter_counts=TRUE.


# Run detection
filtered_output <- DamageDetective::detect_damage(
  count_matrix = alignment_counts, 
  ribosome_penalty = selected_penalty$selected_penalty,
  filter_counts = TRUE, 
  generate_plot = FALSE,
  seed = 7
)

dim(filtered_output)
#> [1] 33538  1184

Session Information

#> R version 4.5.0 (2025-04-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] future_1.49.0          patchwork_1.3.0        ggplot2_3.5.2         
#> [4] stringr_1.5.1          Seurat_5.3.0           SeuratObject_5.1.0    
#> [7] sp_2.2-0               DamageDetective_2.0.15
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3     jsonlite_2.0.0         magrittr_2.0.3        
#>   [4] spatstat.utils_3.1-4   farver_2.1.2           rmarkdown_2.29        
#>   [7] fs_1.6.6               ragg_1.4.0             vctrs_0.6.5           
#>  [10] ROCR_1.0-11            spatstat.explore_3.4-2 rstatix_0.7.2         
#>  [13] htmltools_0.5.8.1      broom_1.0.8            Formula_1.2-5         
#>  [16] sass_0.4.10            sctransform_0.4.2      parallelly_1.44.0     
#>  [19] KernSmooth_2.23-26     bslib_0.9.0            htmlwidgets_1.6.4     
#>  [22] desc_1.4.3             ica_1.0-3              plyr_1.8.9            
#>  [25] plotly_4.10.4          zoo_1.8-14             cachem_1.1.0          
#>  [28] igraph_2.1.4           mime_0.13              lifecycle_1.0.4       
#>  [31] pkgconfig_2.0.3        Matrix_1.7-3           R6_2.6.1              
#>  [34] fastmap_1.2.0          fitdistrplus_1.2-2     shiny_1.10.0          
#>  [37] digest_0.6.37          colorspace_2.1-1       tensor_1.5            
#>  [40] RSpectra_0.16-2        irlba_2.3.5.1          textshaping_1.0.1     
#>  [43] ggpubr_0.6.0           labeling_0.4.3         progressr_0.15.1      
#>  [46] spatstat.sparse_3.1-0  httr_1.4.7             polyclip_1.10-7       
#>  [49] abind_1.4-8            compiler_4.5.0         proxy_0.4-27          
#>  [52] withr_3.0.2            backports_1.5.0        carData_3.0-5         
#>  [55] fastDummies_1.7.5      ggsignif_0.6.4         MASS_7.3-65           
#>  [58] tools_4.5.0            lmtest_0.9-40          httpuv_1.6.16         
#>  [61] future.apply_1.11.3    goftest_1.2-3          glue_1.8.0            
#>  [64] nlme_3.1-168           promises_1.3.2         grid_4.5.0            
#>  [67] Rtsne_0.17             cluster_2.1.8.1        reshape2_1.4.4        
#>  [70] generics_0.1.4         gtable_0.3.6           spatstat.data_3.1-6   
#>  [73] class_7.3-23           tidyr_1.3.1            data.table_1.17.2     
#>  [76] car_3.1-3              spatstat.geom_3.3-6    RcppAnnoy_0.0.22      
#>  [79] ggrepel_0.9.6          RANN_2.6.2             pillar_1.10.2         
#>  [82] spam_2.11-1            RcppHNSW_0.6.0         later_1.4.2           
#>  [85] splines_4.5.0          dplyr_1.1.4            lattice_0.22-6        
#>  [88] survival_3.8-3         deldir_2.0-4           tidyselect_1.2.1      
#>  [91] miniUI_0.1.2           pbapply_1.7-2          knitr_1.50            
#>  [94] gridExtra_2.3          scattermore_1.2        xfun_0.52             
#>  [97] matrixStats_1.5.0      stringi_1.8.7          lazyeval_0.2.2        
#> [100] yaml_2.3.10            evaluate_1.0.3         codetools_0.2-20      
#> [103] tibble_3.2.1           cli_3.6.5              uwot_0.2.3            
#> [106] xtable_1.8-4           reticulate_1.42.0      systemfonts_1.2.3     
#> [109] jquerylib_0.1.4        Rcpp_1.0.14            globals_0.18.0        
#> [112] spatstat.random_3.3-3  png_0.1-8              spatstat.univar_3.1-3 
#> [115] parallel_4.5.0         pkgdown_2.1.2          dotCall64_1.2         
#> [118] listenv_0.9.1          viridisLite_0.4.2      scales_1.4.0          
#> [121] ggridges_0.5.6         e1071_1.7-16           purrr_1.0.4           
#> [124] rlang_1.1.6            cowplot_1.1.3

References

  • Amezquita R, et al. (2020). “Orchestrating single-cell analysis with Bioconductor.” Nature Methods, 17, 137-145.
  • Hao et al. (2023). Seurat V5. Nature Biotechnology.
  • McGinnis, C. S., Murrow, L. M., & Gartner, Z. J. (2019). “DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors”. Cell Systems, 8(4), 329-337.e4.
  • Pedersen T (2024). patchwork: The Composer of Plots. R package version 1.3.0.
  • Satija R, et al. (2025). SeuratData: Install and Manage Seurat Datasets. R package version 0.2.2.9002.
  • Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.