Don’t let a (Java) Leak Ruin Your Day

September 28, 2012 No Comments

By: Charles Rich, Vice President of Product Management and Marketing at Nastel

Your day is going great. Traffic was light, coffee is good this morning and the application release that you’ve been working on forever is being released today. One more acceptance test run and it’s a go! Life is good….. until the call from server support. During final testing, the application started hoarding resources — in other words, there was a leak in performance. Abnormal usage continued to rise. Support had no idea that there was a problem. Then the server failed. Everyone from the CIO to the guy in the mailroom is in a panic. This has happened before — more times than anyone wants to admit.

Most companies try to prepare for these situations by having someone sit and stare at a screen all day in the hopes that maybe, just maybe, he might see something that might indicate that there is a problem. Or, they have the support group that doesn’t react until the phone rings and someone says the server failed; then members of the support group waste a lot of time meeting with the rest of operations trying to figure out what went wrong and hoping it isn’t their fault. Neither of these situations is productive nor effective. If only there were some way to be alerted that there is a failure just waiting to happen

One of the biggest causes of these problems is unchecked resource allocation, otherwise referred to as a leak. Leaks are typically caused by programming error and bugs. Some examples of such leaks are:

**Unchecked arrays, lists, HashMap growth – Applications are adding data to the lists and for some reason are not able to clean it up or remove those items from the lists.
**Not closing Java Database Connectivity (JDBC) prepared statements, sockets and file handles – Even though the objects are being “garbage collected,” it doesn’t necessarily mean that the resources allocated by the underlying databases are being de-allocated. Typically, you will find that the application developers are not properly closing the file handle sockets or prepared statements.
**Thread leaks, handle leaks – Applications are spawning threads and these threads are not properly shutting down, resulting in leaks.
**ClassLoader leaks – ClassLoader leaks are very difficult to spot, but they do happen, especially when redeploying applications or when working with applications with custom class loaders.
**Resources allocated outside JVM – Especially for the applications that are using third-party libraries such as JNI.

Regardless of the reasons for these leaks, dealing with composite applications and multiple moving parts can be a real challenge in application development and production support. Due to the complexity, it is virtually impossible to spot the resource or component systems that are exhibiting abnormal behavior in resource utilization.

One option for spotting leak behavior is to have someone constantly looking at charts and analytics to try and figure out what they mean. In today’s economy, where CIOs implement location-based strategies such as out-sourcing or off-shoring, there aren’t enough trained eyes available to stare at these screens and thus, this is neither practical nor desirable. By the time you are able to identify the problem, it is usually too late to achieve a fix to stop a server failure.

A more appropriate way to handle JVM leak detection would be to automatically detect memory leak patterns behind the scenes without being dependent on people looking at charts.

Application leaks must always either be prevented or fixed. The server is always busy, but by monitoring behavior more than individual metrics, you can be in better control of what your server is doing. Is it going to handle the thousands of transactions required or is it going to choke? Another thing to consider is the scalability of your monitoring tooling — whether it can handle as many resources as you need monitored, regardless of the underlying platform or technology.

You can’t afford down time. By effectively monitoring behavior, you will be able to mitigate a possibly catastrophic server failure.

About Charles Rich
Charles Rich, vice president of product management and marketing at Nastel, is a software product management professional who brings over 27 years of technical hands-on experience working with large-scale customers to meet their application and systems management requirements.