scdrs.score_cell#

scdrs.score_cell(data, gene_list, gene_weight=None, ctrl_match_key='mean_var', n_ctrl=1000, n_genebin=200, weight_opt='vs', copy=False, return_ctrl_raw_score=False, return_ctrl_norm_score=False, random_seed=0, verbose=False, save_intermediate=None)[source]#

Score cells based on the disease gene set.

Preprocessing information data.uns[“SCDRS_PARAM”] is required (run scdrs.pp.preprocess first).

It operates in implicit-covariate-correction mode if both FLAG_SPARSE and FLAG_COV are True, where computations are based on the implicit covariate-corrected data

CORRECTED_X = data.X + COV_MAT * COV_BETA + COV_GENE_MEAN.

It operates in normal mode otherwise, where computations are based on data.X,

Parameters:
dataanndata.AnnData

Single-cell data of shape (n_cell, n_gene). Assumed to be size-factor-normalized and log1p-transformed.

gene_listlist

Disease gene list of length n_disease_gene.

gene_weightarray_like, default=None

Gene weights of length n_disease_gene for genes in the gene_list. If gene_weight=None, the weights are set to be one.

ctrl_match_keystr, default=”mean_var”

Gene-level statistic used for matching control and disease genes; should be in data.uns[“SCDRS_PARAM”][“GENE_STATS”].

n_ctrlint, default=1000

Number of control gene sets.

n_genebinint, default=200

Number of bins for dividing genes by ctrl_match_key if data.uns[“SCDRS_PARAM”][“GENE_STATS”][ctrl_match_key] is a continuous variable.

weight_optstr, default=”vs”

Option for computing the raw score

  • ‘uniform’: average over the genes in the gene_list.

  • ‘vs’: weighted average with weights equal to 1/sqrt(technical_variance_of_logct).

  • ‘inv_std’: weighted average with weights equal to 1/std.

  • ‘od’: overdispersion score.

copybool, default=False

If to make copy of the AnnData object to avoid writing on the orignal data.

return_raw_ctrl_scorebool, default=False

If to return raw control scores.

return_norm_ctrl_scorebool, default=False

If to return normalized control scores.

random_seedint, default=0

Random seed.

verbosebool, default=False

If to output messages.

save_intermediatestr, default=None

File path prefix for saving intermediate results.

Returns:
df_respandas.DataFrame (dtype=np.float32)

scDRS results of shape (n_cell, n_key) with columns

  • raw_score: raw disease scores.

  • norm_score: normalized disease scores.

  • mc_pval: Monte Carlo p-values based on the normalized control scores of the same cell.

  • pval: scDRS individual cell-level disease-association p-values.

  • nlog10_pval: -log10(pval). Needed in case the single precision (np.float32) gives inaccurate p-values

  • zscore: one-side z-score converted from pval.

  • ctrl_raw_score_*: raw control scores.

  • ctrl_norm_score_*: normalized control scores.