scdrs.pp.compute_stats#
- scdrs.pp.compute_stats(adata, implicit_cov_corr=False, cell_weight=None, n_mean_bin=20, n_var_bin=20, n_chunk=20)[source]#
Compute gene-level and cell-level statstics used for scDRS analysis. adata should be log-scale. It has two modes. In the normal mode, it computes statistics for adata.X. In the implicit covariate correction mode, the covariate correction has not been performed on adata.X but the corresponding information is stored in adata.uns[“SCDRS_PARAM”]. In this case, it computes statistics for the covariate-corrected data
transformed_X = adata.X + COV_MAT * COV_BETA + COV_GENE_MEAN
- Parameters:
- adataanndata.AnnData
Single-cell data of shape (n_cell, n_gene). Assumed to be log-scale.
- implicit_cov_corrbool, default=False
If True, compute statistics for the implicit corrected data adata.X + COV_MAT * COV_BETA + COV_GENE_MEAN. Otherwise, compute for the original data adata.X.
- cell_weightarray_like, default=None
Cell weights of length adata.shape[0] for cells in adata, used for computing weighted gene-level statistics.
- n_mean_binint, default=20
Number of mean-expression bins for matching control genes.
- n_var_binint, default=20
Number of expression-variance bins for matching control genes.
- n_chunkint, default=20
Number of chunks to split the data into when computing mean and variance using _get_mean_var_implicit_cov_corr.
- Returns:
- df_genepandas.DataFrame
Gene-level statistics of shape (n_gene, 7):
“mean” : mean expression in log scale.
“var” : variance expression in log scale.
“var_tech” : technical variance in log scale.
“ct_mean” : mean expression in original non-log scale.
“ct_var” : variance expression in original non-log scale.
“ct_var_tech” : technical variance in original non-log scale.
“mean_var” : n_mean_bin * n_var_bin mean-variance bins
- df_cellpandas.DataFrame
Cell-level statistics of shape (n_cell, 2):
“mean” : mean expression in log scale.
“var” : variance expression in log scale.