Sunday, October 28, 2012

Why is the ActiveCount PMI statistic on a WebContainer thread pool non zero even if no HTTP requests are processed?

The ActiveCount statistic on a thread pool in WebSphere is defined as the "the number of concurrently active threads" managed by that thread pool. This metric is particularly useful on the WebContainer thread pool because it gives an indication of the number of HTTP requests processed concurrently. If that metric approaches its maximum (which is determined by the maximum pool size), then you know that either the pool is simply too small or that there is a bottleneck that blocks the processing of some of the requests.
One would expect that the ActiveCount drops to zero when no HTTP requests are received. If you have ever used that statistic, then you may have noticed that this is not the case, and you may have wondered why this is so. There are actually two reasons for that:
  1. WebSphere always initializes the metric with a zero value and then increments and decrements the value as tasks begin and finish executing on the thread pool. That works fine if the statistic is enabled in the WebSphere configuration because the metric will be initialized when the thread pool is started. On the other hand, if the statistic is switched on at runtime, then some threads may already be active and the values reported by PMI will be incorrect. More precisely, there will be a non zero offset between the reported metric and the actual number of concurrently active threads. In that case, the value of the metric may drop to a negative value if no HTTP requests are received.
  2. In WebSphere, thread pools are instances of a generic component and that component is used for different types of thread pools. A thread pool provides a facility to execute tasks on a set of threads that it manages, but it is completely agnostic to the nature of the tasks being executed. In the case of the WebContainer thread pool, these tasks are scheduled by the HTTP request processor. Most of these tasks actually represent HTTP requests being processed by a servlet. However, there is also a small number of constantly running tasks waiting for new HTTP requests to come in. In a thread dump, they are identified by the following stack trace:
    "WebContainer : 3" - Thread t@86
      java.lang.Thread.State: RUNNABLE
        at com.ibm.io.async.AsyncLibrary.aio_getioev3(Native Method)
        at com.ibm.io.async.AsyncLibrary.getCompletionData3(AsyncLibrary.java:625)
        at com.ibm.io.async.ResultHandler.runEventProcessingLoop(ResultHandler.java:530)
        at com.ibm.io.async.ResultHandler$2.run(ResultHandler.java:905)
        at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1783)
    
    Since a thread pool is a generic component which doesn't know anything about the HTTP request processor, it simply counts these tasks as active threads, even if they are actually idle waiting for new HTTP requests. That is why the ActiveCount never becomes zero even on a completely idle system. It should be noted that the HTTP request processor may execute more than one of these tasks and that their number is not necessarily constant over time. This is illustrated by the following plot of the ActiveCount metric, where one can see that (on that particular system) the ActiveCount never drops below 3:

Sunday, October 7, 2012

RHQ WebSphere plug-in released!

I've been working for quite some time now on a WebSphere plug-in for RHQ. RHQ is an Open Source enterprise management and monitoring solution written in Java and is part of the JBoss universe. It already has support for numerous server-side products, but is missing integration with proprietary application server platforms such as WebSphere and Weblogic. My plug-in attempts to close this gap for WebSphere, and to provide an Open Source alternative to commercial products such as IBM Tivoli Monitoring.

It has already been running successfully for more than a year in a production environment with several dozens WebSphere Application Server instances, but until recently it lacked some operability features and the necessary documentation to allow it to be used by a larger public. I've been working on these issues over the last couple of weeks and I'm proud to announce that the first official release of the plug-in is now available. You can find the binary packages and documentation here.

The RHQ WebSphere plug-in primarily focuses on monitoring, and to some extend on managing the runtime state of the monitored WebSphere servers. It doesn't provide any features to manage the WebSphere configuration. The reason is that WebSphere already has outstanding capabilities in that area (both for manual and scripted configuration management) and that the configuration model used by WebSphere doesn't fit naturally into RHQ's world view. The plug-in collects many of the metrics available through WebSphere's PMI (Performance Monitoring Infrastructure) API. In addition to that, it has some advanced monitoring capabilities that are not readily available with other solutions:

  • The plug-in can connect to DB2 to collect agent (i.e. per connection) statistics. These metrics are then aggregated per data source configured in WebSphere. This allows you for example to determine the CPU time consumed on the DB2 instance by applications using a given data source.
  • The plug-in can measure the number of leaked application class loaders. The data is provided per application/module and as a global (per WebSphere instance) metric. This makes it easier to investigate out of memory conditions and to decide when it's time to restart a WebSphere instance because of too many application restarts or redeployments.
  • The plug-in can be configured to remotely collect log events from the monitored WebSphere instances and to correlate these events with the component, module or application that triggered them (which is something that is not possible to do by inspecting SystemOut.log).

Note that the last two features are only available in conjunction with XM4WAS, another project of mine.