text-only page produced automatically by Usablenet Assistive Skip all navigation and go to page content Skip top navigation and go to directorate navigation Skip top navigation and go to page navigation
National Science Foundation
Awards
design element
Search Awards
Recent Awards
Presidential and Honorary Awards
About Awards
Grant Policy Manual
Grant General Conditions
Cooperative Agreement Conditions
Special Conditions
Federal Demonstration Partnership
Policy Office Website



Award Abstract #1247750

BIGDATA: Mid-Scale: DCM: Collaborative Research: Eliminating the Data Ingestion Bottleneck in Big-Data Applications

NSF Org: IIS
Div Of Information & Intelligent Systems
divider line
Initial Amendment Date: September 20, 2012
divider line
Latest Amendment Date: July 31, 2015
divider line
Award Number: 1247750
divider line
Award Instrument: Standard Grant
divider line
Program Manager: Sylvia J. Spengler
IIS Div Of Information & Intelligent Systems
CSE Direct For Computer & Info Scie & Enginr
divider line
Start Date: February 1, 2013
divider line
End Date: January 31, 2017 (Estimated)
divider line
Awarded Amount to Date: $406,000.00
divider line
Investigator(s): Martin Farach-Colton farach@cs.rutgers.edu (Principal Investigator)
divider line
Sponsor: Rutgers University New Brunswick
33 Knightsbridge Road
Piscataway, NJ 08854-3925 (848)932-0150
divider line
NSF Program(s): ALGORITHMIC FOUNDATIONS,
Big Data Science &Engineering
divider line
Program Reference Code(s): 7433, 7924, 7926, 8083, 9251
divider line
Program Element Code(s): 7796, 8083

ABSTRACT

Big-data practice suggests that there is a tradeoff between the speed of data ingestion, the ability to answer queries quickly (e.g., via indexing), and the freshness of data. This perceived tradeoff lies, for example, at the heart of the historic division between OLTP (online transaction processing) and OLAP (online analytical processing). In an OLTP database, data gets ingested quickly and the data available for querying is fresh, but analytical queries run prohibitively slowly. In an OLAP data warehouse, data is buffered for off-line indexing so that analytical queries run quickly, but by the time the data gets indexed, it is stale.

This tradeoff has manifestations in the design of all types of storage systems. For example, some file-systems are optimized for reads and others for writes, but workloads generally involve a mixture of reads and writes.

In this project the PIs show that this is not a fundamental tradeoff, but rather a tradeoff imposed by the choice of data structure. The PIs use write-optimized structures, an alternative to traditional indexing methodologies, to build storage systems in which this tradeoff is significantly mitigated or alleviated altogether. The performance promise of such indexing schemes follows from the PIs previous work establishing that write-optimized data structures can speed up both inserts and queries.

This project addresses the remaining obstacles in the deployment of write-optimized indexes within big-data file-systems and databases. Big data imposes a new set of constraints on any storage system, and the PIs will show how write-optimized indexing can yield order-of-magnitude performance improvements at scale. In particular, this project will show that such techniques are not only applicable today but that they will scale with hardware trends, including the widespread adoption of solid-state disks (SSDs).


PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.


(Showing: 1 - 10 of 17)
  Show All

Michael A. Bender, Ritwik Bose, Rezaul Chowdhury, and Samuel McCauley. "The Kissing Problem: How to End a Gathering When Everyone Kisses Everyone Else Goodbye," Theory of Computing Systems, Special Issue on FUN12., 2014.

Martin Farach-Colton, Antonio Fernández Anta, and Miguel A. Mosteiro. "Optimal Memory-Aware Sensor Network Gossiping (or How to Break the Broadcast Lower Bound)," Theoretical Computer Science, v.472, 2013.

Bender, Michael A and Bose, Ritwik and Chowdhury, Rezaul and McCauley, Samuel. "The kissing problem: how to end a gathering when everyone kisses everyone else goodbye," Theory of Computing Systems Special Issue on Fun With Algorithms, 2013, p. 1-16.

Martin Farach-Colton and Antonio Fern{\'a}ndez-Anta and Miguel A. Mosteiro. "Optimal memory-aware Sensor Network Gossiping (or how to 1break the Broadcast lower bound)," Theoretical Computer Science, v.472, 2013, p. 60-80.

Martin Farach-Colton and Miguel A. Mosteiro. "Initialiazing Sensor Networks of Non-uniform Density in The Weak Sensor Model," Algorithmica, 2014.

Meng, Jie and McCauley, Samuel and Kaplan, Fulya and Leung, Vitus and Coskun, Ayse K. "Simulation and Optimization of HPC Job Allocation for Reducing Communication and Cooling Costs," Sustainable Computing (SUSCOM) Special Issue for the International Green Computing Conference, 2014.

Bender, Michael A. and Farach-Colton, Mart\'\i{}n and Fekete, S\'a{}ndor P. and Fineman, Jeremy T. and Gilbert, Seth. "Reallocation Problems in Scheduling," Algorithmica, v.73, 2015, p. 389--409. 

Martin Farach-Colton and Antonio Fern{\'a}ndez-Anta and Miguel A. Mosteiro. "Optimal memory-aware Sensor Network Gossiping (or how to 1break the Broadcast lower bound)," Theoretical Computer Science, v.472, 2013, p. 60-80.

Martin Farach{-}Colton and Meng{-}Tsung Tsai. "Exact Sublinear Binomial Sampling," Algorithmica, v.73, 2015, p. 637--651.

Martin Farach{-}Colton and Miguel A. Mosteiro. "Initializing Sensor Networks of Non-uniform Density in the Weak Sensor Model," Algorithmica, v.73, 2015, p. 87--114.


(Showing: 1 - 10 of 17)
  Show All




 

Please report errors in award information by writing to: awardsearch@nsf.gov.

 

 

Print this page
Back to Top of page
  FUNDING   AWARDS   DISCOVERIES   NEWS   PUBLICATIONS   STATISTICS   ABOUT NSF   FASTLANE  
Research.gov  |  USA.gov  |  National Science Board  |  Recovery Act  |  Budget and Performance  |  Annual Financial Report
Web Policies and Important Links  |  Privacy  |  FOIA  |  NO FEAR Act  |  Inspector General  |  Webmaster Contact  |  Site Map
National Science Foundation Logo
The National Science Foundation, 4201 Wilson Boulevard, Arlington, Virginia 22230, USA
Tel: (703) 292-5111, FIRS: (800) 877-8339 | TDD: (800) 281-8749
  Text Only Version