This locus containing APOE is well known to be related to Alzheimer’s disease and cognitive impairment disorder (Seshadri et al. Note that GSA-SNP2 removes those highly correlated adjacent genes in calculating pathway scores to further reduce false positives (Supplementary Data). The aSPUset and aSPUset-Score tests identified gene APOE with P-values of 0.019 and 0.024, respectively, which passed the significance threshold of 0.05/2 as shown in Table 3, but gene AMOTL1 was not significant by any test.
Denote the parameter estimates in an MLM as follows: is a vector for intercepts; is a matrix, in which represents the effect size of SNP j on trait t; the element in matrix stands for the qth covariate effect on the tth trait; and ∑ is the covariance estimate for the multivariate error term. The researchers confirmed the score’s predictive power through studies in cell culture, in organoid tissue, and by using existing patient genomic data.
Suppose for each individual we observe k traits q covariates and a set of SNPs with Denote and where I is a identity matrix, and ⊗ represents the Kronecker product. These 15 height-related GS categories, supports in the literature, as well as all the corresponding GS pathways from the MSigDB C5 gene ontology terms (v 6.0) (28,33) are listed in the Supplementary Data. 2014). CTRL + SPACE for auto-complete. (A) Global network from significant gene-sets (FDR < 0.25; gene score < 0.01).
2014), it may still suffer from power loss in the presence of more sparse association patterns (i.e., when there are fewer associated SNP–trait pairs). Pathway P-values for sARTP and GSA-SNP2 are also represented. Table 2 lists the SNPs included in the significant genes. The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies, and nonprofit organizations, as a $60 million, 5-year public–private partnership. Recall that is the generalized score vector of length in GEE, and V is the covariance matrix of the score vector; each element of the score, quantifies the association between SNP j and trait t. In practice, the true and unknown association patterns across multiple SNPs and multiple traits are complex: some SNPs may be associated with some traits, but not with other traits; different SNPs may be associated with different subsets of the traits with varying association strengths and directions. Two factors were considered: association effect size (setup 1) and sparsity of association patterns (setup 2). Comparison of false discovery controls. The test statistic arrives at(B2)If we have a single SNP to be tested, i.e., Λ is an matrix, the test statistic (B2) reduces to with Hence, SPU(2,2) and MDMR are equivalent for a single SNP and multiple traits, as established by Zhang et al. MaCH-Admix: genotype imputation for admixed populations. Pers T.H., Karjalainen J.M., Chan Y., Westra H.J., Wood A.R., Yang J., Lui J.C., Vedantam S., Gustafsson S., Esko T.et al. The phenotype, \begin{equation*}Y = {\beta _1}{X_1} + \cdots {\beta _k}{X_k} + \varepsilon \end{equation*}. 2014), which is powerful in the presence of a dense association pattern, in which most SNP–trait pairs are associated with almost equal effect sizes and directions; otherwise, e.g., when the association directions of some SNP–trait pairs are different, it does not perform well (as is well known for analysis of rare variants). The critical improvement from the previous version (11) is obtained from the usage of the gene scores adjusted for the SNP counts for each gene using a monotone cubic spline trend (23).
The authors are grateful to the reviewers for constructive comments. None declared. We compare the proposed test with several existing gene-based tests for multiple traits, including multivariate analysis of variance (MANOVA), MDMR with the Euclidean distance (McArdle and Anderson 2001), multivariate KMR under linear kernel (Maity et al. A fundamental challenge in multivariate analysis is the lack of a uniformly most powerful test: nonadaptive test may be powerful in some situations, but not in others. Because the power of a test critically depends on several unknown factors such as the proportions of associated SNPs and of traits, we propose a highly adaptive test at both the SNP and trait levels, giving higher weights to those likely associated SNPs and traits, to yield high power across a wide spectrum of situations. Gowinda exhibited varied powers depending on the SNP P-value cutoff. With the SPU test weights each SNP equally and yields the highest power if all the SNPs are associated with the trait t with similar effect sizes and association direction (i.e., all positive or all negative). Some genes in a pathway can be closely located on the genome or highly overlapping family genes, and some of those genes may also belong to the same linkage disequilibrium (LD) block. Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.186502/-/DC1. They found that the PRS was valid in tests involving more than a dozen medications, including cyclosporine, bosentan, troglitazone, diclofenac, flutamide, ketoconazole, carbamazepine, amoxicillin-clavulanate, methapyrilene, tacrine, acetaminophen, and tolcapone. Hence, to take advantage of both tests, we combine them by taking their minimum P-value to form a new test statistic,(5)Its P-value can be calculated using simulations or permutations as for aSPUset. Nonparametric tests of association of multiple genes with human disease. Among the SPU tests applied with SPU(3,2) showed the minimum P-value, implying that weak effects were aggregated for an overall association. 2008; Meda et al. Subjects originally recruited for ADNI-1 and ADNI-GO had the option to be followed in ADNI-2.
In the following, we briefly review GEE and an existing method before introducing the new test in Materials and Methods. Compared with the previous version (11), GSA-SNP2 provides a greatly improved type I error control by using the SNP-count adjusted gene scores, while nevertheless preserving high statistical power. We note that competitive methods are also affected by such outlier genes and self-contained methods are also capable of detecting pathways composed of moderately associated genes only; however, this example demonstrates the difference of the two GWAS pathway analysis approaches.
Some methods, such as that in Van der Sluis et al.
The new polygenic risk score would make it possible to produce liver organoids that exhibit key risk variants, to determine if a drug is harmful before people ever take it. Although MAGENTA had a relatively low power, it exhibited the highest density of GS pathways (25 GS pathways within top 41 pathways: 61.0%) demonstrating its strict false positive control. Sometimes permutation-based MANOVA suffered from rank deficiency when constructing the test statistic and could not be applied to ∼600 genes; MFLM also failed for some genes due to rank deficiency.
Throughout simulations, 10,000 replicates were used for each setup and the tests were conducted at the significance level. For MFLM, we used β-smooth basis functions with the Pillai–Bartlett trace as a representative. For example, the competitive MAGMA methods detected the largest number of ‘skeletal system development’ pathways such as ‘cartilage and chondrocyte development’ (e.g. (2013) is built on a generalized Kendall’s τ, which quantifies the pairwise association between a single SNP and a single trait. 2013), which is more robust to association density/sparsity and varying association directions. To validate the results, each method was applied to the two genes AMOTL1 and APOE, using the ADNI-GO/2 data as the validation sample (with ). In addition, there were clear differences in the preference of GS categories between the pathway analysis methods.
Using the ADNI-1 data as the discovery sample, our GWAS identified two loci associated with the DMN.
For genome-wide association studies (GWAS), genes AMOTL1 on chromosome 11 and APOE on chromosome 19 were discovered by the new test to be significantly associated with the DMN. LocusZoom for two loci (A AMOTL1 and B APOE) identified by aSPUset and MDMR.
INRICH, MAGMA and MAGENTA exhibited strict FDR controls (almost zero false discoveries), whereas Gowinda exhibited rather varied FDR controls depending on the SNP P-value cutoff. The methods were applied to structural magnetic resonance imaging (MRI) data drawn from the ADNI to identify genes associated with the DMN. We included all rare variants within each gene region; the number of variants within each region ranged from 3 to 750. MAGMA exhibited varied powers depending on the gene scoring method and the simulation parameters.
For example, the global networks (extracted from HIPPIE networks) of the significant pathways (FDR < 25%, gene score < 0.01) obtained by analyzing DIAGRAM data contained a sub-network composed of eight genes such as TNF, RAB5A, CHUK, LTA, CARS, IGF2BP2, HSPA1L and HSPA1A (Figure 6).
2012; Wang et al. (2014) showed that when testing on a single SNP, the SPU(2,2) test under the GEE working independence model is equivalent to MDMR with the Euclidean distance. setTimeout(function(){var a=document.createElement("script"); The graphs corresponding to a stricter cut-off (q-value < 0.05) are also shown in Supplementary Figure S2. The file out/test.b.score.epacts.mh.pdf will be generated for chr20 only. Published by Oxford University Press on behalf of Nucleic Acids Research. We define each covariance estimate as follows. INRICH and MAGENTA exhibited low powers compared with other methods; the best powers of INRICH, MAGENTA (75%) and MAGENTA (95%) were only 16.3, 12.2 and 14.3%, respectively. Currently there is no cure for AD, and most cases are diagnosed in the late stage of the disease.
Protein interaction networks are visualized for the significant pathways. The P-values of permutation-based aSPUset and of simulation-based aSPUset agreed well (with a Pearson correlation of 0.98), and thus we reported only permutation-based results.
An F statistic can be constructed to test the hypothesis that the p regressor variables have no relationship to variation in the distance or dissimilarity of the n subjects reflected in the distance/dissimilarity matrix. Such genes exhibit a positive correlation in their P-values and may contribute to increasing false positive pathways. GSA-SNP2 employs the Z-statistic of the random set model (22) for evaluating gene-sets (pathways).
Each diagonal element of the covariance matrix (V) corresponds to the variance of the individual score element denote the variance of as The SPUw test is defined with the statisticThe aSPUw-set test statistic is defined as the one taking the minimum P-value of the multiple-SPUw tests in the same way as that for aSPUset and SPU The SPUw and aSPUw-set tests are invariant to the scale of each trait and hence may be useful when it is unclear how to standardize multiple traits that are in different scales. GSA-SNP2 is available at https://sourceforge.net/projects/gsasnp2. This illustrates the flexibility of our proposed test under GEE, in contrast to the stronger modeling assumption in KMR.