scdrs.pp.compute_stats#

scdrs.pp.compute_stats(adata, implicit_cov_corr=False, cell_weight=None, n_mean_bin=20, n_var_bin=20, n_chunk=20)[source]#

Compute gene-level and cell-level statstics used for scDRS analysis. adata should be log-scale. It has two modes. In the normal mode, it computes statistics for adata.X. In the implicit covariate correction mode, the covariate correction has not been performed on adata.X but the corresponding information is stored in adata.uns[“SCDRS_PARAM”]. In this case, it computes statistics for the covariate-corrected data

transformed_X = adata.X + COV_MAT * COV_BETA + COV_GENE_MEAN

Parameters:
adataanndata.AnnData

Single-cell data of shape (n_cell, n_gene). Assumed to be log-scale.

implicit_cov_corrbool, default=False

If True, compute statistics for the implicit corrected data adata.X + COV_MAT * COV_BETA + COV_GENE_MEAN. Otherwise, compute for the original data adata.X.

cell_weightarray_like, default=None

Cell weights of length adata.shape[0] for cells in adata, used for computing weighted gene-level statistics.

n_mean_binint, default=20

Number of mean-expression bins for matching control genes.

n_var_binint, default=20

Number of expression-variance bins for matching control genes.

n_chunkint, default=20

Number of chunks to split the data into when computing mean and variance using _get_mean_var_implicit_cov_corr.

Returns:
df_genepandas.DataFrame

Gene-level statistics of shape (n_gene, 7):

  • “mean” : mean expression in log scale.

  • “var” : variance expression in log scale.

  • “var_tech” : technical variance in log scale.

  • “ct_mean” : mean expression in original non-log scale.

  • “ct_var” : variance expression in original non-log scale.

  • “ct_var_tech” : technical variance in original non-log scale.

  • “mean_var” : n_mean_bin * n_var_bin mean-variance bins

df_cellpandas.DataFrame

Cell-level statistics of shape (n_cell, 2):

  • “mean” : mean expression in log scale.

  • “var” : variance expression in log scale.