CLI#
scDRS CLI supports 3 main functions:
Munge gene sets
scdrs munge-gs, producing.gsfile.Compute scores
scdrs compute-score, producing<trait>.score.gzand<trait>.full_score.gzfiles.Perform downstream analyses
scdrs perform-downstream, producing<trait>.scdrs_group.<annot>,<trait>.scdrs_cell_corr, and<trait>.scdrs_genefiles.
See scDRS file formats. Use --help to check out more CLI details, e.g., scdrs compute-score --help.
scDRS CLI does not distinguish between dash “-” and underscore “_”. E.g., followings are equivalent:
# All dashes
scdrs compute-score --h5ad-file <h5ad_file> --out-folder <out_folder> ...
# All underscores
scdrs compute_score --h5ad_file <h5ad_file> --out_folder <out_folder> ...
# Mixed
scdrs compute_score --h5ad-file <h5ad_file> --out_folder <out_folder> ...
munge-gs#
Convert a .tsv GWAS gene statistics file to an scDRS .gs file.
Read result from a
.tsvp-value or z-score file.Select a subset of genes for each trait:
If both
fdrandfwerareNone, select the topn_maxgenes.If
fdris notNone, select genes for based on FDR (across all genes of a given trait) and cap betweenn_minandn_max.If
fweris notNone, select genes based on FWER (across all genes of a given trait) and cap betweenn_minandn_max.
Assign gene weights based on
weight.Write the
.gsfile toout_file.
# Select top 1,000 genes and use z-score weights
scdrs munge-gs \
--out-file <out_file> \
--zscore-file <zscore_file> \
--weight zscore \
--n-max 1000
- out_filestr
Output scDRS
.gsfile.- pval_filestr, optional
P-value file. A .tsv file with first column corresponding to genes and other columns corresponding to p-values of traits (one trait per column). One of
pval-fileandzscore-fileis expected. Default isNone.- zscore_filestr, optional
Z-score file. A .tsv file with first column corresponding to genes and other columns corresponding to z-scores of traits (one trait per column). One of
pval-fileandzscore-fileis expected. Default isNone.- weightstr, optional
Gene weight options. One of
zscoreoruniform. Default iszscore.- fdrfloat, optional
FDR threshold. Default is
None. E.g.,--fdr 0.05- fwerfloat, optional
FWER threshold. Default is
None. E.g.,--fwer 0.05- n_minint, optional
Minimum number of genes for each gene set. Default is
100. E.g.,--n-min 100- n_maxint, optional
Maximum number of genes for each gene set. Default is
1000. E.g.,--n-min 1000
Example p-value file:
GENE BMI HEIGHT
OR4F5 0.001 0.01
DAZ3 0.01 0.001
compute-score#
Compute scDRS scores. Generate .score.gz and .full_score.gz files for each trait.
scdrs compute-score \
--h5ad-file <h5ad_file>\
--h5ad-species mouse\
--gs-file <gs_file>\
--gs-species human\
--out-folder <out_folder>\
--cov-file <cov_file>\
--flag-filter-data True\
--flag-raw-count True\
--n-ctrl 1000\
--flag-return-ctrl-raw-score False\
--flag-return-ctrl-norm-score True
- h5ad_filestr
Single-cell
.h5adfile.- h5ad_speciesstr
Species of
h5ad_file. One ofhsapiens,human,mmusculus, ormouse.- gs_filestr
scDRS gene set
.gsfile.- gs_speciesstr
Species of
gs_file. One ofhsapiens,human,mmusculus, ormouse.- out_folderstr
Output folder. Save scDRS score files as
<out_folder>/<trait>.score.gzand scDRS full score files as<out_folder>/<trait>.full_score.gz, where trait identifier<trait>is fromgs_filefile.- cov_filestr, optional
scDRS covariate
.covfile. Default isNone.- weight_optstr, optional
Option for single-cell data-based weights (separate from the MAGMA z-score weights in the
gs_file). One ofvs(variance-stablization weights) anduniform(uniform weights). Default isvs.- adj_propstr, optional
Cell group annotation (e.g., cell type) in
adata.obs.columnsused for adjusting for cell group proportions. Cells are inversely weighted by the corresponding group size. Default isNone.- flag_filter_databool, optional
If to apply minimal cell and gene filtering to
h5ad_file. Default isTrue.- flag_raw_countbool, optional
If to apply size-factor normalization and log1p-transformation to
h5ad_file. Default isTrue.- n_ctrlint, optional
Number of control gene sets. Default is
1000.- min_genesint, optional
Minimum number of genes expressed required for a cell to pass filtering. Used in
scanpy.pp.filter_cells. Default is250.- min_cellsint, optional
Minimum number of cells expressed required for a gene to pass filtering. Used in
scanpy.pp.filter_genes. Default is50.- flag_return_ctrl_raw_scorebool, optional
If to return raw control scores. Default is
False.- flag_return_ctrl_norm_scorebool, optional
If to return normalized control scores. Default is
True.
perform-downstream#
Perform scDRS downstream analyses based on precomputed scDRS .full_score.gz files. The number of MC samples in MC tests depends on the number of control scores in the .full_score.gz file; to increase this number, specify a larger --n_ctrl when calling scdrs compute-score in the previous step.
- --group-analysis
For a given cell group-level annotation (e.g., tissue or cell type), assess cell group-disease association (control-score-based MC tests using 5% quantile) and within-cell group disease-association heterogeneity (control-score-based MC tests using Geary’s C).
- --corr-analysis
For a given individual cell-level variable (e.g., T cell effectorness gradient), assess association between disease and the individual cell-level variable (control-score-based MC tests using Pearson’s correlation).
- --gene-analysis
Compute Pearson’s correlation between expression of each gene and the scDRS disease score.
scdrs perform-downstream \
--h5ad-file <h5ad_file>\
--score-file <score_file>\
--out-folder <out_folder>\
--group-analysis cell_type \
--corr-analysis causal_variable,non_causal_variable,covariate\
--gene-analysis\
--flag-filter-data True\
--flag-raw-count True
- h5ad_filestr
Single-cell
.h5adfile.- score_filestr
scDRS
.full_score.gzfile. Use “@” to specify multiple file names, e.g.,<score_folder>/@.full_score.gz. However,<score_folder>should not contain “@”.- out_folderstr
Output folder.
- group_analysisstr, optional
Comma-seperated column names for cell group annotations in
adata.obs.columns, e.g., cell types or tissues. Results are saved as<out_folder>/<trait>.scdrs_group.<annot>, one file per annotation. Default isNone.- corr_analysisstr, optional
Comma-seperated column names for continuous annotations in
adata.obs.columns, e.g., T cell effectorness gradient. Results are saved as<out_folder>/<trait>.scdrs_cell_corrfor all variables. Default isNone.- gene_analysisstr, optional
Flag to perform the gene prioritization by correlating gene expression with scDRS scores. Specifying
--gene-analysiswithout any arguments. Results are saved as<out_folder>/<trait>.scdrs_genefor all genes. Default isNone.- flag_filter_databool, optional
If to apply minimal cell and gene filtering to
h5ad_file. Default isTrue.- flag_raw_countbool, optional
If to apply size-factor normalization and log1p-transformation to
h5ad_file. Default isTrue.- min_genesint, optional
Minimum number of genes expressed required for a cell to pass filtering. Used in
scanpy.pp.filter_cells. Default is250.- min_cellsint, optional
Minimum number of cells expressed required for a gene to pass filtering. Used in
scanpy.pp.filter_genes. Default is50.- knn_n_neighborsint, optional
n_neighborsparameter for computing KNN graph usingsc.pp.neighbors. Default is15(consistent with the TMS pipeline).- knn_n_pcsint, optional
n_pcsparameter for computing KNN graph usingsc.pp.neighbors. Default is20(consistent with the TMS pipeline).