CLI#

scDRS CLI supports 3 main functions:

  1. Munge gene sets scdrs munge-gs, producing .gs file.

  2. Compute scores scdrs compute-score, producing <trait>.score.gz and <trait>.full_score.gz files.

  3. Perform downstream analyses scdrs perform-downstream, producing <trait>.scdrs_group.<annot>, <trait>.scdrs_cell_corr, and <trait>.scdrs_gene files.

See scDRS file formats. Use --help to check out more CLI details, e.g., scdrs compute-score --help.

scDRS CLI does not distinguish between dash “-” and underscore “_”. E.g., followings are equivalent:

# All dashes
scdrs compute-score --h5ad-file <h5ad_file> --out-folder <out_folder> ...
# All underscores
scdrs compute_score --h5ad_file <h5ad_file> --out_folder <out_folder> ...
# Mixed
scdrs compute_score --h5ad-file <h5ad_file> --out_folder <out_folder> ...

munge-gs#

Convert a .tsv GWAS gene statistics file to an scDRS .gs file.

  1. Read result from a .tsv p-value or z-score file.

  2. Select a subset of genes for each trait:

    • If both fdr and fwer are None, select the top n_max genes.

    • If fdr is not None, select genes for based on FDR (across all genes of a given trait) and cap between n_min and n_max.

    • If fwer is not None, select genes based on FWER (across all genes of a given trait) and cap between n_min and n_max.

  3. Assign gene weights based on weight.

  4. Write the .gs file to out_file.

# Select top 1,000 genes and use z-score weights
scdrs munge-gs \
    --out-file <out_file> \
    --zscore-file <zscore_file> \
    --weight zscore \
    --n-max 1000
out_filestr

Output scDRS .gs file.

pval_filestr, optional

P-value file. A .tsv file with first column corresponding to genes and other columns corresponding to p-values of traits (one trait per column). One of pval-file and zscore-file is expected. Default is None.

zscore_filestr, optional

Z-score file. A .tsv file with first column corresponding to genes and other columns corresponding to z-scores of traits (one trait per column). One of pval-file and zscore-file is expected. Default is None.

weightstr, optional

Gene weight options. One of zscore or uniform. Default is zscore.

fdrfloat, optional

FDR threshold. Default is None. E.g., --fdr 0.05

fwerfloat, optional

FWER threshold. Default is None. E.g., --fwer 0.05

n_minint, optional

Minimum number of genes for each gene set. Default is 100. E.g., --n-min 100

n_maxint, optional

Maximum number of genes for each gene set. Default is 1000. E.g., --n-min 1000

Example p-value file:

GENE    BMI    HEIGHT
OR4F5   0.001  0.01
DAZ3    0.01   0.001

compute-score#

Compute scDRS scores. Generate .score.gz and .full_score.gz files for each trait.

scdrs compute-score \
    --h5ad-file <h5ad_file>\
    --h5ad-species mouse\
    --gs-file <gs_file>\
    --gs-species human\
    --out-folder <out_folder>\
    --cov-file <cov_file>\
    --flag-filter-data True\
    --flag-raw-count True\
    --n-ctrl 1000\
    --flag-return-ctrl-raw-score False\
    --flag-return-ctrl-norm-score True
h5ad_filestr

Single-cell .h5ad file.

h5ad_speciesstr

Species of h5ad_file. One of hsapiens, human, mmusculus, or mouse.

gs_filestr

scDRS gene set .gs file.

gs_speciesstr

Species of gs_file. One of hsapiens, human, mmusculus, or mouse.

out_folderstr

Output folder. Save scDRS score files as <out_folder>/<trait>.score.gz and scDRS full score files as <out_folder>/<trait>.full_score.gz, where trait identifier <trait> is from gs_file file.

cov_filestr, optional

scDRS covariate .cov file. Default is None.

weight_optstr, optional

Option for single-cell data-based weights (separate from the MAGMA z-score weights in the gs_file). One of vs (variance-stablization weights) and uniform (uniform weights). Default is vs.

adj_propstr, optional

Cell group annotation (e.g., cell type) in adata.obs.columns used for adjusting for cell group proportions. Cells are inversely weighted by the corresponding group size. Default is None.

flag_filter_databool, optional

If to apply minimal cell and gene filtering to h5ad_file. Default is True.

flag_raw_countbool, optional

If to apply size-factor normalization and log1p-transformation to h5ad_file. Default is True.

n_ctrlint, optional

Number of control gene sets. Default is 1000.

min_genesint, optional

Minimum number of genes expressed required for a cell to pass filtering. Used in scanpy.pp.filter_cells. Default is 250.

min_cellsint, optional

Minimum number of cells expressed required for a gene to pass filtering. Used in scanpy.pp.filter_genes. Default is 50.

flag_return_ctrl_raw_scorebool, optional

If to return raw control scores. Default is False.

flag_return_ctrl_norm_scorebool, optional

If to return normalized control scores. Default is True.

perform-downstream#

Perform scDRS downstream analyses based on precomputed scDRS .full_score.gz files. The number of MC samples in MC tests depends on the number of control scores in the .full_score.gz file; to increase this number, specify a larger --n_ctrl when calling scdrs compute-score in the previous step.

--group-analysis

For a given cell group-level annotation (e.g., tissue or cell type), assess cell group-disease association (control-score-based MC tests using 5% quantile) and within-cell group disease-association heterogeneity (control-score-based MC tests using Geary’s C).

--corr-analysis

For a given individual cell-level variable (e.g., T cell effectorness gradient), assess association between disease and the individual cell-level variable (control-score-based MC tests using Pearson’s correlation).

--gene-analysis

Compute Pearson’s correlation between expression of each gene and the scDRS disease score.

scdrs perform-downstream \
    --h5ad-file <h5ad_file>\
    --score-file <score_file>\
    --out-folder <out_folder>\
    --group-analysis cell_type \
    --corr-analysis causal_variable,non_causal_variable,covariate\
    --gene-analysis\
    --flag-filter-data True\
    --flag-raw-count True
h5ad_filestr

Single-cell .h5ad file.

score_filestr

scDRS .full_score.gz file. Use “@” to specify multiple file names, e.g., <score_folder>/@.full_score.gz. However, <score_folder> should not contain “@”.

out_folderstr

Output folder.

group_analysisstr, optional

Comma-seperated column names for cell group annotations in adata.obs.columns, e.g., cell types or tissues. Results are saved as <out_folder>/<trait>.scdrs_group.<annot>, one file per annotation. Default is None.

corr_analysisstr, optional

Comma-seperated column names for continuous annotations in adata.obs.columns, e.g., T cell effectorness gradient. Results are saved as <out_folder>/<trait>.scdrs_cell_corr for all variables. Default is None.

gene_analysisstr, optional

Flag to perform the gene prioritization by correlating gene expression with scDRS scores. Specifying --gene-analysis without any arguments. Results are saved as <out_folder>/<trait>.scdrs_gene for all genes. Default is None.

flag_filter_databool, optional

If to apply minimal cell and gene filtering to h5ad_file. Default is True.

flag_raw_countbool, optional

If to apply size-factor normalization and log1p-transformation to h5ad_file. Default is True.

min_genesint, optional

Minimum number of genes expressed required for a cell to pass filtering. Used in scanpy.pp.filter_cells. Default is 250.

min_cellsint, optional

Minimum number of cells expressed required for a gene to pass filtering. Used in scanpy.pp.filter_genes. Default is 50.

knn_n_neighborsint, optional

n_neighbors parameter for computing KNN graph using sc.pp.neighbors. Default is 15 (consistent with the TMS pipeline).

knn_n_pcsint, optional

n_pcs parameter for computing KNN graph using sc.pp.neighbors. Default is 20 (consistent with the TMS pipeline).