Opportunities for the Mathematical Sciences

Bypass Navigation

Table of Contents
Preface
Summary Article
  Introduction
  Models and Simulations
  Computing with Large Data Sets
  Geometrization of Topology and Physics
  Noise and Randomness
  Nonlinearity
  Beyond Fermat
  Mathematics for Biology and Medicine
  Information Technology
Individual Contributions
List of Contributors with Affiliations


Summary Article

Mathematics -- The Science of Patterns and Algorithms

Computing with Large Data Sets

Breakthroughs in sensor technology are leading to generation of unprecedented volumes of high-dimensional data. The amazing continued growth in the raw power of computers and algorithmic advances offer the promise of addressing the tremendous challenges of analyzing such data. Huge data sets, terabytes of data, in very high-dimensional spaces, are now collected routinely in almost all sciences [MS]. Examples are images from diverse sources, dynamics of the Internet, neural recordings and, in the discrete domain, gene expression arrays. Whereas data sets in 1, 2, or 3 dimensions are easily visualized and analyzed, data sets in 1000 dimensions are much harder to understand. Even 10,000 points constitute a very sparse set in 1000-dimensional space and it is easy to "overfit" the data with a model that makes you detect spurious "patterns" that disappear when you acquire more data. A major challenge is to find methods to analyze the structure of such sets, to fit models robustly and identify and validate patterns. Techniques from statistics, harmonic analysis, graph theory and computer science are only beginning to clarify this problem.

 

Last Modified:
 

Previous page | Top of this page | Next page