CLI#

scDRS CLI supports 3 main functions:

Munge gene sets scdrs munge-gs, producing .gs file.
Compute scores scdrs compute-score, producing <trait>.score.gz and <trait>.full_score.gz files.
Perform downstream analyses scdrs perform-downstream, producing <trait>.scdrs_group.<annot>, <trait>.scdrs_cell_corr, and <trait>.scdrs_gene files.

See scDRS file formats. Use --help to check out more CLI details, e.g., scdrs compute-score --help.

scDRS CLI does not distinguish between dash “-” and underscore “_”. E.g., followings are equivalent:

# All dashes
scdrs compute-score --h5ad-file <h5ad_file> --out-folder <out_folder> ...
# All underscores
scdrs compute_score --h5ad_file <h5ad_file> --out_folder <out_folder> ...
# Mixed
scdrs compute_score --h5ad-file <h5ad_file> --out_folder <out_folder> ...

munge-gs#

Convert a .tsv GWAS gene statistics file to an scDRS .gs file.

Read result from a .tsv p-value or z-score file.
Select a subset of genes for each trait:
- If both fdr and fwer are None, select the top n_max genes.
- If fdr is not None, select genes for based on FDR (across all genes of a given trait) and cap between n_min and n_max.
- If fwer is not None, select genes based on FWER (across all genes of a given trait) and cap between n_min and n_max.
Assign gene weights based on weight.
Write the .gs file to out_file.

# Select top 1,000 genes and use z-score weights
scdrs munge-gs \
    --out-file <out_file> \
    --zscore-file <zscore_file> \
    --weight zscore \
    --n-max 1000

out_filestr: Output scDRS .gs file.
pval_filestr, optional: P-value file. A .tsv file with first column corresponding to genes and other columns corresponding to p-values of traits (one trait per column). One of pval-file and zscore-file is expected. Default is None.
zscore_filestr, optional: Z-score file. A .tsv file with first column corresponding to genes and other columns corresponding to z-scores of traits (one trait per column). One of pval-file and zscore-file is expected. Default is None.
weightstr, optional: Gene weight options. One of zscore or uniform. Default is zscore.
fdrfloat, optional: FDR threshold. Default is None. E.g., --fdr 0.05
fwerfloat, optional: FWER threshold. Default is None. E.g., --fwer 0.05
n_minint, optional: Minimum number of genes for each gene set. Default is 100. E.g., --n-min 100
n_maxint, optional: Maximum number of genes for each gene set. Default is 1000. E.g., --n-min 1000

Example p-value file:

GENE    BMI    HEIGHT
OR4F5   0.001  0.01
DAZ3    0.01   0.001

compute-score#

Compute scDRS scores. Generate .score.gz and .full_score.gz files for each trait.

scdrs compute-score \
    --h5ad-file <h5ad_file>\
    --h5ad-species mouse\
    --gs-file <gs_file>\
    --gs-species human\
    --out-folder <out_folder>\
    --cov-file <cov_file>\
    --flag-filter-data True\
    --flag-raw-count True\
    --n-ctrl 1000\
    --flag-return-ctrl-raw-score False\
    --flag-return-ctrl-norm-score True

h5ad_filestr: Single-cell .h5ad file.
h5ad_speciesstr: Species of h5ad_file. One of hsapiens, human, mmusculus, or mouse.
gs_filestr: scDRS gene set .gs file.
gs_speciesstr: Species of gs_file. One of hsapiens, human, mmusculus, or mouse.
out_folderstr: Output folder. Save scDRS score files as <out_folder>/<trait>.score.gz and scDRS full score files as <out_folder>/<trait>.full_score.gz, where trait identifier <trait> is from gs_file file.
cov_filestr, optional: scDRS covariate .cov file. Default is None.
weight_optstr, optional: Option for single-cell data-based weights (separate from the MAGMA z-score weights in the gs_file). One of vs (variance-stablization weights) and uniform (uniform weights). Default is vs.
adj_propstr, optional: Cell group annotation (e.g., cell type) in adata.obs.columns used for adjusting for cell group proportions. Cells are inversely weighted by the corresponding group size. Default is None.
flag_filter_databool, optional: If to apply minimal cell and gene filtering to h5ad_file. Default is True.
flag_raw_countbool, optional: If to apply size-factor normalization and log1p-transformation to h5ad_file. Default is True.
n_ctrlint, optional: Number of control gene sets. Default is 1000.
min_genesint, optional: Minimum number of genes expressed required for a cell to pass filtering. Used in scanpy.pp.filter_cells. Default is 250.
min_cellsint, optional: Minimum number of cells expressed required for a gene to pass filtering. Used in scanpy.pp.filter_genes. Default is 50.
flag_return_ctrl_raw_scorebool, optional: If to return raw control scores. Default is False.
flag_return_ctrl_norm_scorebool, optional: If to return normalized control scores. Default is True.

perform-downstream#

Perform scDRS downstream analyses based on precomputed scDRS .full_score.gz files. The number of MC samples in MC tests depends on the number of control scores in the .full_score.gz file; to increase this number, specify a larger --n_ctrl when calling scdrs compute-score in the previous step.

--group-analysis: For a given cell group-level annotation (e.g., tissue or cell type), assess cell group-disease association (control-score-based MC tests using 5% quantile) and within-cell group disease-association heterogeneity (control-score-based MC tests using Geary’s C).
--corr-analysis: For a given individual cell-level variable (e.g., T cell effectorness gradient), assess association between disease and the individual cell-level variable (control-score-based MC tests using Pearson’s correlation).
--gene-analysis: Compute Pearson’s correlation between expression of each gene and the scDRS disease score.

scdrs perform-downstream \
    --h5ad-file <h5ad_file>\
    --score-file <score_file>\
    --out-folder <out_folder>\
    --group-analysis cell_type \
    --corr-analysis causal_variable,non_causal_variable,covariate\
    --gene-analysis\
    --flag-filter-data True\
    --flag-raw-count True

h5ad_filestr: Single-cell .h5ad file.
score_filestr: scDRS .full_score.gz file. Use “@” to specify multiple file names, e.g., <score_folder>/@.full_score.gz. However, <score_folder> should not contain “@”.
out_folderstr: Output folder.
group_analysisstr, optional: Comma-seperated column names for cell group annotations in adata.obs.columns, e.g., cell types or tissues. Results are saved as <out_folder>/<trait>.scdrs_group.<annot>, one file per annotation. Default is None.
corr_analysisstr, optional: Comma-seperated column names for continuous annotations in adata.obs.columns, e.g., T cell effectorness gradient. Results are saved as <out_folder>/<trait>.scdrs_cell_corr for all variables. Default is None.
gene_analysisstr, optional: Flag to perform the gene prioritization by correlating gene expression with scDRS scores. Specifying --gene-analysis without any arguments. Results are saved as <out_folder>/<trait>.scdrs_gene for all genes. Default is None.
flag_filter_databool, optional: If to apply minimal cell and gene filtering to h5ad_file. Default is True.
flag_raw_countbool, optional: If to apply size-factor normalization and log1p-transformation to h5ad_file. Default is True.
min_genesint, optional: Minimum number of genes expressed required for a cell to pass filtering. Used in scanpy.pp.filter_cells. Default is 250.
min_cellsint, optional: Minimum number of cells expressed required for a gene to pass filtering. Used in scanpy.pp.filter_genes. Default is 50.
knn_n_neighborsint, optional: n_neighbors parameter for computing KNN graph using sc.pp.neighbors. Default is 15 (consistent with the TMS pipeline).
knn_n_pcsint, optional: n_pcs parameter for computing KNN graph using sc.pp.neighbors. Default is 20 (consistent with the TMS pipeline).