CLI#
scDRS CLI supports 3 main functions:
Munge gene sets
scdrs munge-gs
, producing.gs
file.Compute scores
scdrs compute-score
, producing<trait>.score.gz
and<trait>.full_score.gz
files.Perform downstream analyses
scdrs perform-downstream
, producing<trait>.scdrs_group.<annot>
,<trait>.scdrs_cell_corr
, and<trait>.scdrs_gene
files.
See scDRS file formats. Use --help
to check out more CLI details, e.g., scdrs compute-score --help
.
scDRS CLI does not distinguish between dash “-” and underscore “_”. E.g., followings are equivalent:
# All dashes
scdrs compute-score --h5ad-file <h5ad_file> --out-folder <out_folder> ...
# All underscores
scdrs compute_score --h5ad_file <h5ad_file> --out_folder <out_folder> ...
# Mixed
scdrs compute_score --h5ad-file <h5ad_file> --out_folder <out_folder> ...
munge-gs#
Convert a .tsv
GWAS gene statistics file to an scDRS .gs
file.
Read result from a
.tsv
p-value or z-score file.Select a subset of genes for each trait:
If both
fdr
andfwer
areNone
, select the topn_max
genes.If
fdr
is notNone
, select genes for based on FDR (across all genes of a given trait) and cap betweenn_min
andn_max
.If
fwer
is notNone
, select genes based on FWER (across all genes of a given trait) and cap betweenn_min
andn_max
.
Assign gene weights based on
weight
.Write the
.gs
file toout_file
.
# Select top 1,000 genes and use z-score weights
scdrs munge-gs \
--out-file <out_file> \
--zscore-file <zscore_file> \
--weight zscore \
--n-max 1000
- out_filestr
Output scDRS
.gs
file.- pval_filestr, optional
P-value file. A .tsv file with first column corresponding to genes and other columns corresponding to p-values of traits (one trait per column). One of
pval-file
andzscore-file
is expected. Default isNone
.- zscore_filestr, optional
Z-score file. A .tsv file with first column corresponding to genes and other columns corresponding to z-scores of traits (one trait per column). One of
pval-file
andzscore-file
is expected. Default isNone
.- weightstr, optional
Gene weight options. One of
zscore
oruniform
. Default iszscore
.- fdrfloat, optional
FDR threshold. Default is
None
. E.g.,--fdr 0.05
- fwerfloat, optional
FWER threshold. Default is
None
. E.g.,--fwer 0.05
- n_minint, optional
Minimum number of genes for each gene set. Default is
100
. E.g.,--n-min 100
- n_maxint, optional
Maximum number of genes for each gene set. Default is
1000
. E.g.,--n-min 1000
Example p-value file:
GENE BMI HEIGHT
OR4F5 0.001 0.01
DAZ3 0.01 0.001
compute-score#
Compute scDRS scores. Generate .score.gz
and .full_score.gz
files for each trait.
scdrs compute-score \
--h5ad-file <h5ad_file>\
--h5ad-species mouse\
--gs-file <gs_file>\
--gs-species human\
--out-folder <out_folder>\
--cov-file <cov_file>\
--flag-filter-data True\
--flag-raw-count True\
--n-ctrl 1000\
--flag-return-ctrl-raw-score False\
--flag-return-ctrl-norm-score True
- h5ad_filestr
Single-cell
.h5ad
file.- h5ad_speciesstr
Species of
h5ad_file
. One ofhsapiens
,human
,mmusculus
, ormouse
.- gs_filestr
scDRS gene set
.gs
file.- gs_speciesstr
Species of
gs_file
. One ofhsapiens
,human
,mmusculus
, ormouse
.- out_folderstr
Output folder. Save scDRS score files as
<out_folder>/<trait>.score.gz
and scDRS full score files as<out_folder>/<trait>.full_score.gz
, where trait identifier<trait>
is fromgs_file
file.- cov_filestr, optional
scDRS covariate
.cov
file. Default isNone
.- weight_optstr, optional
Option for single-cell data-based weights (separate from the MAGMA z-score weights in the
gs_file
). One ofvs
(variance-stablization weights) anduniform
(uniform weights). Default isvs
.- adj_propstr, optional
Cell group annotation (e.g., cell type) in
adata.obs.columns
used for adjusting for cell group proportions. Cells are inversely weighted by the corresponding group size. Default isNone
.- flag_filter_databool, optional
If to apply minimal cell and gene filtering to
h5ad_file
. Default isTrue
.- flag_raw_countbool, optional
If to apply size-factor normalization and log1p-transformation to
h5ad_file
. Default isTrue
.- n_ctrlint, optional
Number of control gene sets. Default is
1000
.- min_genesint, optional
Minimum number of genes expressed required for a cell to pass filtering. Used in
scanpy.pp.filter_cells
. Default is250
.- min_cellsint, optional
Minimum number of cells expressed required for a gene to pass filtering. Used in
scanpy.pp.filter_genes
. Default is50
.- flag_return_ctrl_raw_scorebool, optional
If to return raw control scores. Default is
False
.- flag_return_ctrl_norm_scorebool, optional
If to return normalized control scores. Default is
True
.
perform-downstream#
Perform scDRS downstream analyses based on precomputed scDRS .full_score.gz
files. The number of MC samples in MC tests depends on the number of control scores in the .full_score.gz
file; to increase this number, specify a larger --n_ctrl
when calling scdrs compute-score
in the previous step.
- --group-analysis
For a given cell group-level annotation (e.g., tissue or cell type), assess cell group-disease association (control-score-based MC tests using 5% quantile) and within-cell group disease-association heterogeneity (control-score-based MC tests using Geary’s C).
- --corr-analysis
For a given individual cell-level variable (e.g., T cell effectorness gradient), assess association between disease and the individual cell-level variable (control-score-based MC tests using Pearson’s correlation).
- --gene-analysis
Compute Pearson’s correlation between expression of each gene and the scDRS disease score.
scdrs perform-downstream \
--h5ad-file <h5ad_file>\
--score-file <score_file>\
--out-folder <out_folder>\
--group-analysis cell_type \
--corr-analysis causal_variable,non_causal_variable,covariate\
--gene-analysis\
--flag-filter-data True\
--flag-raw-count True
- h5ad_filestr
Single-cell
.h5ad
file.- score_filestr
scDRS
.full_score.gz
file. Use “@” to specify multiple file names, e.g.,<score_folder>/@.full_score.gz
. However,<score_folder>
should not contain “@”.- out_folderstr
Output folder.
- group_analysisstr, optional
Comma-seperated column names for cell group annotations in
adata.obs.columns
, e.g., cell types or tissues. Results are saved as<out_folder>/<trait>.scdrs_group.<annot>
, one file per annotation. Default isNone
.- corr_analysisstr, optional
Comma-seperated column names for continuous annotations in
adata.obs.columns
, e.g., T cell effectorness gradient. Results are saved as<out_folder>/<trait>.scdrs_cell_corr
for all variables. Default isNone
.- gene_analysisstr, optional
Flag to perform the gene prioritization by correlating gene expression with scDRS scores. Specifying
--gene-analysis
without any arguments. Results are saved as<out_folder>/<trait>.scdrs_gene
for all genes. Default isNone
.- flag_filter_databool, optional
If to apply minimal cell and gene filtering to
h5ad_file
. Default isTrue
.- flag_raw_countbool, optional
If to apply size-factor normalization and log1p-transformation to
h5ad_file
. Default isTrue
.- min_genesint, optional
Minimum number of genes expressed required for a cell to pass filtering. Used in
scanpy.pp.filter_cells
. Default is250
.- min_cellsint, optional
Minimum number of cells expressed required for a gene to pass filtering. Used in
scanpy.pp.filter_genes
. Default is50
.- knn_n_neighborsint, optional
n_neighbors
parameter for computing KNN graph usingsc.pp.neighbors
. Default is15
(consistent with the TMS pipeline).- knn_n_pcsint, optional
n_pcs
parameter for computing KNN graph usingsc.pp.neighbors
. Default is20
(consistent with the TMS pipeline).