CLI === scDRS CLI supports 3 main functions: 1. Munge gene sets :code:`scdrs munge-gs`, producing :code:`.gs` file. 2. Compute scores :code:`scdrs compute-score`, producing :code:`.score.gz` and :code:`.full_score.gz` files. 3. Perform downstream analyses :code:`scdrs perform-downstream`, producing :code:`.scdrs_group.`, :code:`.scdrs_cell_corr`, and :code:`.scdrs_gene` files. See `scDRS file formats `_. Use :code:`--help` to check out more CLI details, e.g., :code:`scdrs compute-score --help`. scDRS CLI does not distinguish between dash "-" and underscore "_". E.g., followings are equivalent: .. code-block:: bash # All dashes scdrs compute-score --h5ad-file --out-folder ... # All underscores scdrs compute_score --h5ad_file --out_folder ... # Mixed scdrs compute_score --h5ad-file --out_folder ... munge-gs ~~~~~~~~ Convert a :code:`.tsv` GWAS gene statistics file to an scDRS :code:`.gs` file. 1. Read result from a :code:`.tsv` p-value or z-score file. 2. Select a subset of genes for each trait: - If both :code:`fdr` and :code:`fwer` are :code:`None`, select the top :code:`n_max` genes. - If :code:`fdr` is not :code:`None`, select genes for based on FDR (across all genes of a given trait) and cap between :code:`n_min` and :code:`n_max`. - If :code:`fwer` is not :code:`None`, select genes based on FWER (across all genes of a given trait) and cap between :code:`n_min` and :code:`n_max`. 3. Assign gene weights based on :code:`weight`. 4. Write the :code:`.gs` file to :code:`out_file`. .. code-block:: bash # Select top 1,000 genes and use z-score weights scdrs munge-gs \ --out-file \ --zscore-file \ --weight zscore \ --n-max 1000 out_file : str Output scDRS :code:`.gs` file. pval_file : str, optional P-value file. A .tsv file with first column corresponding to genes and other columns corresponding to p-values of traits (one trait per column). One of :code:`pval-file` and :code:`zscore-file` is expected. Default is :code:`None`. zscore_file : str, optional Z-score file. A .tsv file with first column corresponding to genes and other columns corresponding to z-scores of traits (one trait per column). One of :code:`pval-file` and :code:`zscore-file` is expected. Default is :code:`None`. weight : str, optional Gene weight options. One of :code:`zscore` or :code:`uniform`. Default is :code:`zscore`. fdr : float, optional FDR threshold. Default is :code:`None`. E.g., :code:`--fdr 0.05` fwer : float, optional FWER threshold. Default is :code:`None`. E.g., :code:`--fwer 0.05` n_min : int, optional Minimum number of genes for each gene set. Default is :code:`100`. E.g., :code:`--n-min 100` n_max : int, optional Maximum number of genes for each gene set. Default is :code:`1000`. E.g., :code:`--n-min 1000` Example p-value file:: GENE BMI HEIGHT OR4F5 0.001 0.01 DAZ3 0.01 0.001 compute-score ~~~~~~~~~~~~~ Compute scDRS scores. Generate :code:`.score.gz` and :code:`.full_score.gz` files for each trait. .. code-block:: bash scdrs compute-score \ --h5ad-file \ --h5ad-species mouse\ --gs-file \ --gs-species human\ --out-folder \ --cov-file \ --flag-filter-data True\ --flag-raw-count True\ --n-ctrl 1000\ --flag-return-ctrl-raw-score False\ --flag-return-ctrl-norm-score True h5ad_file : str Single-cell :code:`.h5ad` file. h5ad_species : str Species of :code:`h5ad_file`. One of :code:`hsapiens`, :code:`human`, :code:`mmusculus`, or :code:`mouse`. gs_file : str scDRS gene set :code:`.gs` file. gs_species : str Species of :code:`gs_file`. One of :code:`hsapiens`, :code:`human`, :code:`mmusculus`, or :code:`mouse`. out_folder : str Output folder. Save scDRS score files as :code:`/.score.gz` and scDRS full score files as :code:`/.full_score.gz`, where trait identifier :code:`` is from :code:`gs_file` file. cov_file : str, optional scDRS covariate :code:`.cov` file. Default is :code:`None`. weight_opt : str, optional Option for single-cell data-based weights (separate from the MAGMA z-score weights in the :code:`gs_file`). One of :code:`vs` (variance-stablization weights) and :code:`uniform` (uniform weights). Default is :code:`vs`. adj_prop : str, optional Cell group annotation (e.g., cell type) in :code:`adata.obs.columns` used for adjusting for cell group proportions. Cells are inversely weighted by the corresponding group size. Default is :code:`None`. flag_filter_data : bool, optional If to apply minimal cell and gene filtering to :code:`h5ad_file`. Default is :code:`True`. flag_raw_count : bool, optional If to apply size-factor normalization and log1p-transformation to :code:`h5ad_file`. Default is :code:`True`. n_ctrl : int, optional Number of control gene sets. Default is :code:`1000`. min_genes : int, optional Minimum number of genes expressed required for a cell to pass filtering. Used in :code:`scanpy.pp.filter_cells`. Default is :code:`250`. min_cells : int, optional Minimum number of cells expressed required for a gene to pass filtering. Used in :code:`scanpy.pp.filter_genes`. Default is :code:`50`. flag_return_ctrl_raw_score : bool, optional If to return raw control scores. Default is :code:`False`. flag_return_ctrl_norm_score : bool, optional If to return normalized control scores. Default is :code:`True`. perform-downstream ~~~~~~~~~~~~~~~~~~ Perform scDRS downstream analyses based on precomputed scDRS :code:`.full_score.gz` files. The number of MC samples in MC tests depends on the number of control scores in the :code:`.full_score.gz` file; to increase this number, specify a larger :code:`--n_ctrl` when calling :code:`scdrs compute-score` in the previous step. --group-analysis For a given cell group-level annotation (e.g., tissue or cell type), assess cell group-disease association (control-score-based MC tests using 5% quantile) and within-cell group disease-association heterogeneity (control-score-based MC tests using Geary's C). --corr-analysis For a given individual cell-level variable (e.g., T cell effectorness gradient), assess association between disease and the individual cell-level variable (control-score-based MC tests using Pearson's correlation). --gene-analysis Compute Pearson's correlation between expression of each gene and the scDRS disease score. .. code-block:: bash scdrs perform-downstream \ --h5ad-file \ --score-file \ --out-folder \ --group-analysis cell_type \ --corr-analysis causal_variable,non_causal_variable,covariate\ --gene-analysis\ --flag-filter-data True\ --flag-raw-count True h5ad_file : str Single-cell :code:`.h5ad` file. score_file : str scDRS :code:`.full_score.gz` file. Use "@" to specify multiple file names, e.g., :code:`/@.full_score.gz`. However, :code:`` should not contain "@". out_folder : str Output folder. group_analysis : str, optional Comma-seperated column names for cell group annotations in :code:`adata.obs.columns`, e.g., cell types or tissues. Results are saved as :code:`/.scdrs_group.`, one file per annotation. Default is :code:`None`. corr_analysis : str, optional Comma-seperated column names for continuous annotations in :code:`adata.obs.columns`, e.g., T cell effectorness gradient. Results are saved as :code:`/.scdrs_cell_corr` for all variables. Default is :code:`None`. gene_analysis : str, optional Flag to perform the gene prioritization by correlating gene expression with scDRS scores. Specifying :code:`--gene-analysis` without any arguments. Results are saved as :code:`/.scdrs_gene` for all genes. Default is :code:`None`. flag_filter_data : bool, optional If to apply minimal cell and gene filtering to :code:`h5ad_file`. Default is :code:`True`. flag_raw_count : bool, optional If to apply size-factor normalization and log1p-transformation to :code:`h5ad_file`. Default is :code:`True`. min_genes : int, optional Minimum number of genes expressed required for a cell to pass filtering. Used in :code:`scanpy.pp.filter_cells`. Default is :code:`250`. min_cells : int, optional Minimum number of cells expressed required for a gene to pass filtering. Used in :code:`scanpy.pp.filter_genes`. Default is :code:`50`. knn_n_neighbors : int, optional :code:`n_neighbors` parameter for computing KNN graph using :code:`sc.pp.neighbors`. Default is :code:`15` (consistent with the TMS pipeline). knn_n_pcs : int, optional :code:`n_pcs` parameter for computing KNN graph using :code:`sc.pp.neighbors`. Default is :code:`20` (consistent with the TMS pipeline).