text-only page produced automatically by Usablenet Assistive Skip all navigation and go to page content Skip top navigation and go to directorate navigation Skip top navigation and go to page navigation
National Science Foundation
Awards
design element
Search Awards
Recent Awards
Presidential and Honorary Awards
About Awards
Grant Policy Manual
Grant General Conditions
Cooperative Agreement Conditions
Special Conditions
Federal Demonstration Partnership
Policy Office Website



Award Abstract #1309960

Optimal tests for weak, sparse, and complex signals with application to genetic association studies

NSF Org: DMS
Division Of Mathematical Sciences
divider line
Initial Amendment Date: August 1, 2013
divider line
Latest Amendment Date: August 1, 2013
divider line
Award Number: 1309960
divider line
Award Instrument: Standard Grant
divider line
Program Manager: Gabor J. Szekely
DMS Division Of Mathematical Sciences
MPS Direct For Mathematical & Physical Scien
divider line
Start Date: August 15, 2013
divider line
End Date: July 31, 2017 (Estimated)
divider line
Awarded Amount to Date: $109,999.00
divider line
Investigator(s): Zheyang Wu zheyangwu@wpi.edu (Principal Investigator)
divider line
Sponsor: Worcester Polytechnic Institute
100 INSTITUTE RD
WORCESTER, MA 01609-2247 (508)831-5000
divider line
NSF Program(s): STATISTICS
divider line
Program Reference Code(s):
divider line
Program Element Code(s): 1269

ABSTRACT

Detection of sparse and weak signals is a key for analyzing big data in many fields. Recent statistical research has made celebrated theoretical progress in revealing the detectability boundaries under the Gaussian means model and an idealized linear regression model. Detectability boundary illustrates the border in the two-dimensional phase space of signal sparsity and weakness, below which the signals are asymptotically too weak and sparse to be detectable by any statistical methods. Certain statistics are optimal for these models in the sense that they reach the boundary (i.e., the least requirements) for reliable signal detection. However, there are significant gaps between these theoretical models and practical meaningful models. In this project, the investigators extend statistical theory to handle weak, sparse, correlated, and interactive signals under the framework of generalized linear models. The investigators develop optimal testing procedures to address the realistic data features in genome-wide association studies and next-generation sequence studies.

Statistical theory and methodology development for the detection of weak and sparse signals is foundational for analyzing big data. The goal of this project is to extend statistical theoretical study to address complex signals that are correlated and interactively influential to quantitative or categorical responses. This study is of great interest in data science and is critical to many applications. For example, one perplexing problem of current genetic studies is the missing heritability of complex traits even after many genetic factors have been identified. The proposed work specifically addresses the features of those hidden disease genes yet to be discovered. Unlike some genetic studies based on heuristic arguments, this research combines the power of rigorous statistical theory, first-hand practices in the field, and cutting-edge data from genome-wide association studies and next-generation sequence studies. The proposed project is highly promising in the hunt for the missing heritability. Highly improved gene-detection techniques will help to identify more causative genes of complex human diseases, which will lead to the elucidation of disease pathogenesis and design of targeted therapeutics, thus have a far-reaching impact on improving quality of life.


PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.


Li Yang, Jing Xuan, Zheyang Wu. "A Goodness-of-Fit Association Test for Whole Genome Sequencing Data," BMC Proceedings, v.8, 2014.

Jing Xuan, Li Yang, Zheyang Wu. "Higher Criticism Approach to Detect Rare Variants Using Whole Genome Sequencing Data," BMC Proceedings, v.8, 2014.

Zheyang Wu, Yiming Sun, Shiquan He, Judy Cho, Hongyu Zhao, and Jiashun Jin. "Detection Boundary and Higher Criticism for Weak and Sparse Genetic Effects," The Annals of Applied Statistics, v.8, 2014, p. 824.

Li Yang, Jing Xuan, Zheyang Wu. "A Goodness-of-Fit Association Test for Whole Genome Sequencing Data," BMC Proceedings, v.8, 2014. 

Elizabeth T. Cirulli, et. al.. "Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways," Science, v.347, 2015, p. 1436. 

Bradley T. Smith et al.. "Exome-wide rare variant analysis identifies TUBA4A mutations associated with familial ALS," Neuron, v.84, 2014, p. 324. 

Jing Xuan, Li Yang, Zheyang Wu. "Higher Criticism Approach to Detect Rare Variants Using Whole Genome Sequencing Data," BMC Proceedings, v.8, 2014. 

Zheyang Wu, Yiming Sun, Shiquan He, Judy Cho, Hongyu Zhao, and Jiashun Jin. "Detection Boundary and Higher Criticism for Weak and Sparse Genetic Effects," The Annals of Applied Statistics, v.8, 2014, p. 824. 

 

Please report errors in award information by writing to: awardsearch@nsf.gov.

 

 

Print this page
Back to Top of page
  FUNDING   AWARDS   DISCOVERIES   NEWS   PUBLICATIONS   STATISTICS   ABOUT NSF   FASTLANE  
Research.gov  |  USA.gov  |  National Science Board  |  Recovery Act  |  Budget and Performance  |  Annual Financial Report
Web Policies and Important Links  |  Privacy  |  FOIA  |  NO FEAR Act  |  Inspector General  |  Webmaster Contact  |  Site Map
National Science Foundation Logo
The National Science Foundation, 4201 Wilson Boulevard, Arlington, Virginia 22230, USA
Tel: (703) 292-5111, FIRS: (800) 877-8339 | TDD: (800) 281-8749
  Text Only Version