Data Travels Six Times Faster in the Clouds
Cloud computing enables faster, less expensive processing across geographically distributed data centers
The National Center for Data Mining (NCDM) at the University of Illinois at Chicago established a cloud computing system that can quickly compile data from widely geographically distributed data centers across high performance networks. NCDM used the Open Cloud Testbed, managed by the Open Cloud Consortium, to demonstrate the "Sector System" at the annual meeting of the American Association for the Advancement of Science conference earlier this month in Chicago.
"We demonstrated that our system is six times faster than competing technology," said Robert Grossman, NCDM director and Open Data Group managing partner. "Without the requirement of costly and combersome data transfer from various locations to one central location, this opens the way to exciting collaborative scientific discovery."
Grossman and his team demonstrated using a common benchmark called Terasort. They found there was less than a 5 percent performance penalty when Terasort was run across the four data centers distributed across the country compared to running the entire computation within one data center. Prior to the Sector System, such computations were rarely done, as performance penalties were as high as 30 percent.
"With the Sector System, data intensive computing can scale not only to a data center, but for the first time, across data centers," said Grossman." This enables locating data centers in areas in which power and cooling is cost-effective."
The Open Cloud Testbed consists of racks of computers located at the University of Illinois at Chicago, the StarLight facility in Chicago, Johns Hopkins University in Baltimore, Maryland, and the University of California at San Diego, all connected by a wide area 10 Gb/s network, and all running a variety of cloud computing services, including cloud storage services and cloud computing services. The technology that makes this possible uses an open architecture design, specifically the open source sector system developed by the NCDM (sector.sf.net).
Although cloud computing is becoming common, processing data by clouds today is almost always done within a single data center. Generally, data intensive computing across geographically distributed data centers is avoided due to the difficulties and cost of moving large amounts of data over long distances. Sector employs an alternative network protocol called UDT designed to swiftly and smoothly transfer data.
According to Joe Mambretti, director of the International Center of Advanced Internet Research at Northwestern University and co-director of the Open Cloud Testbed, "These innovative technologies provide unique capabilities that will enable new generations of applications that can make discoveries involving large volumes of highly distributed data."
The National Science Foundation (NSF) is an independent federal agency that supports fundamental research and education across all fields of science and engineering. In fiscal year (FY) 2016, its budget is $7.5 billion. NSF funds reach all 50 states through grants to nearly 2,000 colleges, universities and other institutions. Each year, NSF receives more than 48,000 competitive proposals for funding and makes about 12,000 new funding awards. NSF also awards about $626 million in professional and service contracts yearly.
Useful NSF Web Sites: