Algorithm Makes a More Efficient Cloud

March 29, 2013 No Comments

Researchers at MIT are hard at work revealing how cloud computing can perform to its greatest potential while helping businesses gain a firm understanding of the pricing of cloud services and how to diagnose any potential hiccups in the system.

More companies are moving applications to the cloud because of its cost-savings, its economies of scale, extensive technical support and easy accommodation of demand fluctuations, according to an article on MIT’s Website.

Cloud hosting, however, can bring with it a host of questions. Cloud services often partition their servers into “virtual machines,” each of which gets so many operations per second on a server’s central processing unit and a certain amount of memory. That makes cloud servers easier to manage, but for database-intensive applications, it can result in the allocation of about 20 times as much hardware as should be necessary.

MIT researchers are developing a new system called DBSeer that should help this problem and others, such as the pricing of cloud services and the diagnosis of application slowdowns.

Teradata, a large database company, has assigned engineers the task of importing the MIT researchers’ new algorithm—which has been released under an open-source license — into its own software.

“We’re really fascinated and thrilled that someone is doing this work,” said Doug Brown, a database software architect at Teradata, according to MIT’s Website. “We’ve already taken the code and are prototyping right now.”

Initially, Teradata will use the MIT researchers’ prediction algorithm to determine customers’ resource requirements.

“The really big question for our customers is, ‘How are we going to scale?’” Brown said.

Brown hopes, however, that the algorithm will ultimately help allocate server resources on the fly, as database requests come in. If servers can assess the demands imposed by individual requests and budget accordingly, they can ensure that transaction times stay within the bounds set by customers’ service agreements. For instance, “if you have two big, big resource consumers, you can calculate ahead of time that we’re only going to run two of these in parallel,” Brown said. “There’s all kinds of games you can play in workload management.”

Barzan Mozafari is the lead author of the research at MIT.

With virtual machines, he said, server resources must be allocated according to an application’s peak demand.

“You’re not going to hit your peak load all the time,” Mozafari said. “So that means that these resources are going to be underutilized most of the time.”

He said provisioning for peak demand is largely guesswork.

“It’s very counterintuitive, but you might take on certain types of extra load that might help your overall performance,” he said.

Increased demand means that a database server will store more of its frequently used data in its high-speed memory, which can help it process requests more quickly. On the other hand, a slight increase in demand could cause the system to slow down precipitously — if, for instance, too many requests require modification of the same pieces of data, which need to be updated on multiple servers.

“It’s extremely nonlinear,” Mozafari said.

DBSeer simply monitors fluctuations in both the number and type of user requests and system performance and uses machine-learning techniques to correlate the two. This approach is good at predicting the consequences of fluctuations that don’t fall too far outside the range of the training data.

The researchers tested their prediction algorithm against both a set of benchmark data, called TPC-C, that’s commonly used in database research and against real-world data on modifications to the Wikipedia database. On average, the model was about 80 percent accurate in predicting CPU use and 99 percent accurate in predicting the bandwidth consumed by disk operations.