File formats#

.sumstats#

GWAS summary statistics following the LDSC format.

Example .sumstats file#
GENE	BMI	HEIGHT
SNP	A1	A2	N	CHISQ	Z
rs7899632	A	G	59957	3.4299	-1.852
rs3750595	A	C	59957	3.3124	1.82

.h5ad#

Single-cell data .h5ad file as defined in AnnData and Scanpy.

pval_file,zscore_file#

GWAS gene-level p-values / z-scores for different traits. A .tsv file with first column corresponding to genes and other columns corresponding to p-values / z-scores of traits (one trait per column).

Example pval_file#
GENE	BMI	HEIGHT
OR4F5	0.001	0.01
DAZ3	0.01	0.001

.gs#

scDRS gene set file. A .tsv file with two columns ["TRAIT", "GENESET"] and one line per trait. Can be generated using customized code or from p-value or z-score files using scDRS CLI scdrs munge-gs.

TRAIT: Trait (gene set) identifier.
GENESET: Comma-separated list of gene-weight pairs with the form “gene1:weight1,gene2:weight2,…” or “gene1,gene2,…” (meaning weights are 1).

Example weighted .gs file#
TRAIT	GENESET
PASS_HbA1C	FN3KRP:1.2,FN3K:2.3,HK1:4.7,GCK:5.2
PASS_MedicationUse_Wu2019	FTO:3,SEC16B:0.6,ADCY3:1.5,DNAJC27:1.3

Example unweighted .gs file#
TRAIT	GENESET
PASS_HbA1C	FN3KRP,FN3K,HK1,GCK
PASS_MedicationUse_Wu2019	FTO,SEC16B,ADCY3,DNAJC27

.cov#

scDRS covariate file for the .h5ad single-cell data. .tsv file.

First column: cell names, consistent with adata.obs_names.
Other comlumns: covariates with numerical values.

Example .cov file#
index	const	n_genes	sex_male	age
A10_B000497_B009023_S10	1	2706	1	18
A10_B000497_B009023_S10	1	2501	0	24

<trait>.score.gz#

scDRS score file for a give trait. .tsv.gz file.

First column: cell names, should be the same as adata.obs_names.
raw_score: raw disease score.
norm_score: normalized disease score.
mc_pval: cell-level MC p-value. Raw p-value without multiple testing adjustment.
pval: cell-level scDRS p-value. Raw p-value without multiple testing adjustment.
nlog10_pval: -log10(pval).
zscore: z-score converted from pval.

Example <trait>.score.gz file#
index	raw_score	norm_score	mc_pval	pval	nlog10_pval	zscore
A10_B000497_B009023_S10	0.730	7.04	0.0476	0.00166	2.78	2.94
A10_B000756_B007446_S10	0.725	7.30	0.0476	0.00166	2.78	2.94

<trait>.full_score.gz#

scDRS full score file for a give trait. .tsv.gz file.

All columns of {trait}.score.gz file.
ctrl_raw_score_<i_ctrl> : raw control scores, specified by --flag_return_ctrl_raw_score True.
ctrl_norm_score_<i_ctrl> : normalized control scores, specified by --flag_return_ctrl_norm_score True.

<trait>.scdrs_group.<annot>#

Results for scDRS group-level analysis for a give trait and a given cell-group annotation (e.g., cell type). .tsv file.

<trait> : trait name consistent with <trait>.full_score.gz file.
<annot> : cell-annotation in adata.obs.columns, specified by group_analysis in CLI.
First column: different cell groups in adata.obs[<annot>].
n_cell: number of cells from the cell group.
n_ctrl: number of control gene sets.
assoc_mcp: MC p-value for cell group-disease association. Raw p-value without multiple testing adjustment.
assoc_mcz: MC z-score for cell group-disease association.
hetero_mcp: MC p-value for within-cell group heterogeneity in association with disease. Raw p-value without multiple testing adjustment.
hetero_mcz: MC z-score for within-cell group heterogeneity in association with disease.

Example <trait>.scdrs_group.<annot> file#
	n_cell	n_ctrl	assoc_mcp	assoc_mcz	hetero_mcp	hetero_mcz
causal_cell	10.0	20.0	0.04761905	12.297529	1.0	1.0
non_causal_cell	20.0	20.0	0.9047619	-1.1364214	1.0	1.0

<trait>.scdrs_cell_corr#

Results for scDRS cell-level correlation analysis for a given trait. .tsv file.

<trait> : trait name consistent with <trait>.full_score.gz file.
First column: all cell-level variables, specified by specified by corr_analysis in CLI.
n_ctrl: number of control gene sets.
corr_mcp: MC p-value for cell-level variable association with disease. Raw p-value without multiple testing adjustment.
corr_mcz: MC z-score for cell-level variable association with disease.

Example <trait>.scdrs_cell_corr file#
	n_cell	corr_mcp	corr_mcz
causal_variable	20.0	0.04761905	3.4574268
non_causal_variable	20.0	0.23809524	0.8974108
covariate	20.0	0.1904762	1.1220891

<trait>.scdrs_gene#

Results for scDRS gene-level correlation analysis for a given trait. .tsv file.

<trait> : trait name consistent with <trait>.full_score.gz file.
First column: genes in adata.var_names.
CORR: correlation with scDRS disease score across all cells in adata.
RANK: rank of correlation across genes (starting from 0).

Example <trait>.scdrs_gene file#
index	CORR	RANK
Serping1	0.314	0
Lmna	0.278	1