Thursday, December 13, 2012

Retrieving custom user attributes from LDAP in WebSphere

WebSphere can be configured to use LDAP as authentication mechanism. The implementation is fairly complete and has support for SSL, connection reuse, multiple LDAP servers with failover as well as mapping of client certificates to LDAP users. However, in many use cases applications also need access to additional LDAP attributes (such as the email address or employee ID of the authenticated user).

An obvious approach to get access to these attributes would be to make use of the javax.naming.ldap API (or an equivalent API) inside the application or to write a custom JAAS login module that performs the lookup using that API and adds the information to the Subject (from where they are retrieved by the application). However, this approach would have several drawbacks:
  • It leads to duplicate configuration because LDAP needs to be configured in WebSphere and the same configuration information also needs to be provided to the custom code.
  • The LDAP support in WebSphere maintains a pool of LDAP connections and correctly performs failover if it detects that the primary LDAP server becomes unavailable. The custom code cannot take advantage of these features and needs to manage its own set of LDAP connections.
Ideally, instead of interacting with LDAP directly, the custom code should perform the lookup via some API exposed by WebSphere in, so that it can reuse the existing LDAP client infrastructure (including connection pooling, SSL support, failover, etc.). Unfortunately this is not feasible if WebSphere is configured with a standalone LDAP registry as user registry. The reason is that although the user registry (of type com.ibm.ws.security.registry.ldap.LdapRegistryImpl in that case) can be looked up via JNDI, there is no public API allowing to access additional LDAP attributes.

However, LDAP can also be configured as a backend of a federated user repository. "Federated repositories" is one of the four user registry types supported by WebSphere, the other three being "Local operating system", "Standalone LDAP registry" and "Standalone custom registry". The "Federated repositories" implementation originally comes from WebSphere Portal Server and is also called VMM (Virtual Member Manager) or WIM (WebSphere Identity Manager). The main feature of this registry type is its ability to map entries from multiple individual user repositories into a single virtual repository. It also exposes an API that gives access to additional user attributes. This is of course the feature we are looking for.

Therefore a prerequisite to access custom LDAP attributes is to configure WebSphere security to use VMM instead of the standalone LDAP registry. All features (SSL, pooling, failover, etc.) supported by the standalone LDAP registry are also supported by VMM, and it is relatively straightforward to create a VMM configuration that is equivalent to an existing standalone LDAP registry configuration. In the following we will assume that this has been done and that VMM has been configured as the user registry implementation.

We can now examine how to use the Virtual Member Manager API to get access to custom LDAP attributes. We assume that the code will be integrated into a custom JAAS login module, but the ingredients are the same if you want to integrate the code into your applications.

The first step is to get access to the Virtual Member Manager API. That API is defined by the Service interface. To get a reference to the VMM service in the local JVM, simply instantiate LocalServiceProvider with the default constructor:

Service service = new LocalServiceProvider();

The WebSphere infocenter document "Getting the properties of an entity" describes how to use the Service API to look up the attributes of a user. As you can see in that documentation, this operation requires as input the unique security name of the user, which looks as follows (Note that this is not necessarily identical to the DN of the user in LDAP):

uid=SalesManager,cn=users,dc=yourco,dc=com

This information can be retrieved from the WSCredential object which is put by one of the WebSphere login modules into the shared state (i.e. the Map that is passed to the initialize method of the LoginModule). The key to get the object from the map is defined in Constants. The unique security name is returned by the getUniqueSecurityName method.

The WebSphere infocenter document mentioned above shows how to specify the list of attributes to be retrieved. It is important to note that these are not LDAP attribute names but names of properties of the PersonAccount entity defined by VMM. By default, if a property is defined in the PersonAccount, then it is mapped to the LDAP attribute with the same name. This also means that in order to access an LDAP attribute, a corresponding property must be defined in the PersonAccount entity. The WebSphere admin console doesn’t allow to inspect or edit the properties of an entity. Therefore this must be done with the help of wsadmin. To inspect the list of existing properties, use the following command:

$AdminTask getIdMgrPropertySchema { -entityTypeName PersonAccount }

You will see that the PersonAccount already defines properties for many of the attributes typically used in LDAP. If you use custom attributes not defined in PersonAccount, you need to add them using the addIdMgrPropertyToEntityTypes admin task. For example:

$AdminTask addIdMgrPropertyToEntityTypes { -name ssn -dataType string -entityTypeNames PersonAccount }

Note that the addIdMgrPropertyToEntityTypes operation has parameters (nsURI and nsPrefix) to specify a custom namespace for the property (to be used instead of the default http://www.ibm.com/websphere/wim namespace). While it may seem a good idea to define custom properties in a different namespace, the available documentation is not clear about how to query such properties (they are not returned by the code shown in the infocenter document).

Also note that AdminTask doesn’t define any operation to modify or remove properties. However, this can be achieved by manipulating the cells/{cell_name}/wim/model/wimxmlextension.xml document in the configuration repository.

The infocenter document mentioned above shows how to invoke the Service#get method. However, that invocation will only work if the caller has sufficient privileges to access the user information. If the code is executed inside an application, then the user has already been authenticated and the call should succeed (because VMM grants each user access to his own information). On the other hand, if the code is executed inside a login module, authentication is not yet complete and the call will fail with a CWWIM2008E error. To avoid this, it is necessary to execute the code with additional privileges. To do this, execute the code with the identity of the server subject:

ContextManagerFactory.getInstance().runAsSystem(new PrivilegedExceptionAction<Void>()) {
    public Void run() {
        ...
        return null;
    }
};

The infocenter document doesn't show how to programmatically extract the properties from the result of the Service#get method. This is actually fairly easy, as shown in the following example:

DataObject response = service.get(root);
DataObject entity = (DataObject)response.get("entities[1]");
String ssn = entity.getString("ssn");

Note that the getString method throws an IllegalArgumentException if the attribute is not present.

You can now use the retrieved attributes to enrich the Subject built by the chain of login modules to make the information available to your applications.

Further reading:


  • To get an overview of the Virtual Member Manager:
  • To get more information about JAAS login modules in WebSphere: http://www.ibm.com/developerworks/websphere/techjournal/0508_benantar/0508_benantar.html
  • Quote of the day

    Using Scuds to target tanks or military bases is one thing. Using them to target rebels hiding in playgrounds at schools is something else.
    New York Times, "Syria Uses Scud Missiles in New Effort to Push Back Rebels", quoting a senior U.S. defense official

    So, using Scuds is not OK, but using children as human shields (what else would "rebels" do hiding in playgrounds at schools?!?) is OK? WTF!!

    Note: The original quote in the NYT article has actually been censored, although it still appeared in the excerpt shown by Google:


    The quote has also been reproduced by several other newspapers. See e.g. the story from the Boston Globe.

    Friday, November 30, 2012

    Islamisme et obscurantisme: quand un poisson d'avril vaut plus que la science

    L'autre jour au bureau quelqu'un expliquait qu'il est possible d'utiliser des organes de porc pour réaliser des greffes chez l'homme, ce à quoi un collègue musulman répondit que c'est bien normal vu que génétiquement le cochon est plus proche de l'homme que le singe. C'est évidemment faux. On peut s'en convaincre en regardant le nombre de chromosomes de chacune de ces trois espèces. Le singe en a 48, alors que l'homme en a 46, dont une paire est issue de la fusion de deux paires chez le singe. Quant au cochon, il n'en a que 38 (ça vaut uniquement pour le cochon domestique; le sanglier a 36 chromosomes).

    En fait, l'affirmation faite par certains musulmans est que la proximité génétique entre le porc et l'homme serait de 99,5% et donc plus grande que celle entre l'homme et le singe. Il est intéressant de retracer d'où vient cette information fausse et d'analyser pourquoi elle est si importante pour certains de nos concitoyens musulmans. Vu que la théorie de l'évolution affirme que le singe est plus proche de l'homme, on pourrait croire qu'il s'agit simplement d'une fabulation mise en avant par le créationnisme musulman. Or, derrière cela se cache aussi un élément théologique. En effet, notre collègue nous expliquait aussi que certaines religions affirment que leur dieu a puni des peuples en les transformant en cochons. Cela se confirme en lisant le verset 5.60 du coran:
    Dis : "Puis-je vous informer de ce qu'il y a de pire, en fait de rétribution auprès d'Allah? Celui qu'Allah a maudit, celui qui a encouru Sa colère, et ceux dont Il a fait des singes, des porcs, et de même, celui qui a adoré le Tagut, ceux-là ont la pire des places et sont les plus égarés du chemin droit"
    Je m'attendais donc à pouvoir retracer cette affirmation vers un site créationniste ou concordiste*. Or, c'est ni l'un ni l'autre. L'"information" semble provenir d'un post publié sur un blog personnel. Citation:
    “Nous avons eu l’idée d’intégrer le génome du porc après avoir découvert que l’homme ne descendait pas du singe, à cause de sa grande proximité génétique avec l’homme (99.5%)”, déclare François Gurtoin, du laboratoire de génétéalogie de Nantes, qui a secondé l’équipe du professeur Derrefer.
    Or, le post est intitulé L’homme descendrait du porc - Vague de suicides à la Mecque et a été publié le... 1er avril 2007, ne laissant aucun doute quant à la nature de cet article.

    Ce qui est inquiétant, c'est la réaction dans un forum d'une internaute musulmane confrontée au fait que l'affirmation vient d'un poisson d'avril:
    Ce site ne montre en rien que ce pourcentage est une blague.
    Certes.

    Reste à espérer que ce type de pensée n'est représentative que pour une minorité de nos concitoyens musulmans.

    * Le concordisme est un système d'exégèse visant à établir une concordance entre les textes bibliques ou coraniques et les données scientifiques.

    How to build custom WebSphere plug-ins using Maven and Tycho

    Introduction

    Beginning with WebSphere 6.1, the application server runtime is actually packaged as a set of OSGi bundles running on Eclipse Equinox. This makes it possible to write your own custom plug-ins to extend the server runtime. I used that possibility in my XM4WAS project to enhance WebSphere's monitoring capabilities.

    While the Eclipse IDE is the natural choice to develop this kind of plug-ins, you may still want to automate the build process using Maven. The easiest way to set up the Maven build is using Tycho because it allows Maven to use the metadata of the Eclipse project (primarily the bundle manifest). This keeps the amount of configuration required in the POM files small and ensures that the artifacts produced by Maven are identical to the ones produced by Eclipse.

    However, there is an important difference between Maven/Tycho and Eclipse in the way dependencies are resolved:
    • To allow Eclipse to resolve dependencies to other WebSphere bundles, you will typically define a target platform that points to the WebSphere installation directory. Eclipse then automatically configures the project dependencies based on the bundle manifest.
    • Although Tycho also supports target platform definitions, it has an important limitation: The location types "Directory", "Installation", and "Features" are not supported. That means that only software sites (i.e. P2 repositories) are supported.

    Since there is no public P2 repository containing the WebSphere bundles, there is no simple way to use a common configuration for Eclipse and Maven/Tycho. In the following I will discuss two possible solutions for this problem.

    Importing the WebSphere bundles into the Maven repository

    Starting with version 0.6.0, Tycho is able to use OSGi bundles deployed to Maven repositories. Therefore one way to let Maven/Tycho resolve WebSphere dependencies is to deploy the bundles to the local (or a private/company) Maven repository. Since WebSphere is built on top of an Eclipse runtime, this can be easily achieved using the to-maven goal of the maven-eclipse-plugin. E.g. the following command will deploy the WebSphere bundles to the local Maven repository:

    mvn eclipse:to-maven -DeclipseDir=/opt/IBM/WebSphere/AppServer

    Unfortunately this is not enough. To resolve dependencies, Tycho uses the information from the project's manifest file. The manifest specifies dependencies using bundle symbolic names (Require-Bundle) or package names (Import-Package). On the other hand, Maven needs the artifact coordinates (group ID, artifact ID and version) to locate an artifact in the repository. The problem is that the eclipse:to-maven goal doesn't produce the necessary metadata that would allow Tycho to locate a Maven artifact by exported package.

    To solve this problem, one has to declare the WebSphere bundles as Maven dependencies in the POM and configure Tycho to consider these POM dependencies during calculation of the target platform (by setting the pomDependencies property to "consider" in the configuration of the target-platform-configuration plug-in).

    While this approach looks rather simple at first, it has several important drawbacks:
    • Tycho not only resolves the dependencies needed to build the project, but needs to calculate the entire target platform, i.e. the set of bundles required at runtime. This set includes transitive dependencies and is much larger. Since all of these bundles must be declared in the POM, one typically ends up declaring all WebSphere bundles in the POM and let Tycho choose the ones it really needs. The problem is that the WebSphere runtime has more than 100 bundles...
    • The content of the WebSphere bundles may vary between fix packs. A package exported by some bundle in a given fix pack may be exported by a different bundle (typically a new one) in a later fix pack. If one uses Import-Package to specify dependencies, this is not a problem for the Eclipse project. However, for the Maven/Tycho build, all these bundles must also be declared as dependencies in the POM. This implies that the Maven build will only work with a certain range of fix packs and may break if the wrong fix pack is used.
    • Before it can calculate the target platform, Tycho needs to scan the POM dependencies in order to extract the necessary metadata. Since the WebSphere runtime has more than 100 bundles, some of which are quite large, this has a significant impact on build time. In practice, the impact is so high that the dependency resolution takes more time than the actual build.

    Creating a P2 repository from the WebSphere bundles

    Another option is to create a P2 repository from the WebSphere bundles and configure that repository in Maven. Since P2 repositories contain OSGi specific metadata, Tycho will be able to calculate the target platform without the need to declare additional POM dependencies. The Eclipse platform provides a tool that can be used to create the P2 repository (Note that the tool is not included in WebSphere; you need to run the one that comes with the Eclipse IDE). The command looks as follows:

    java -jar plugins/org.eclipse.equinox.launcher_*.jar -application org.eclipse.equinox.p2.publisher.FeaturesAndBundlesPublisher -metadataRepository file:/was_repo -artifactRepository file:/was_repo -source /opt/IBM/WebSphere/AppServer -compress -publishArtifacts

    Once this is done, you can set up the repository in Maven:

    <repository>
        <id>p2</id>
        <layout>p2</layout>
        <url>file:/was_repo</url>
    </repository>

    If each developer is expected to set up his own (local) P2 repository (this would e.g. be the case in an Open Source project), then the repository should be configured in settings.xml (because the repository URL will not be the same for everyone). On the other hand, if you make the repository accessible over HTTP (e.g. on a company-wide repository), then you can configure it in the POM.

    Although the setup is more complicated, the P2 based approach eliminates all the drawbacks encountered with the first approach. Nevertheless you need to take into account the following aspects:
    • Most packages exported by the WebSphere bundles are not versioned. This means that dependency resolution is only predictable if the P2 repository contains artifacts from a single WebSphere version. This contrasts with the first approach where the WebSphere version is specified in the POM dependencies.
    • If all your Maven modules have packaging "eclipse-plugin", then you don't need to declare any POM dependencies. However, you may still have some modules that have packaging "jar", such as modules that execute unit tests (outside of an OSGi container). For these modules, you again need POM dependencies. By convention, Maven artifacts loaded from a P2 repository have "p2.osgi.bundle" as group ID the bundle symbolic name as artifact ID.


    Monday, November 26, 2012

    Idéalisme vs. matérialisme

    Le monde se divise en deux types de personnes. D'un côté celles qui prennent leur désirs, peurs, croyances religieuses et autres fantasmes pour la réalité, et qui défendent leurs positions en utilisant de la pure rhétoriques. De l'autre côté celles qui approchent le monde par la science et la raison et qui fondent leurs arguments sur l'observation de la réalité.
    La vidéo suivante fournit une excellente illustration de cela:


    Cette opposition entre les deux modes de pensées est bien résumée par l'échange suivant entre Jean-Luc Mélenchon et un jeune qui prétend qu'il y aurait 99% de "glandeurs" parmi les chômeurs:

    Mélenchon: Pourquoi pas, mais vous allez nous le démontrer?

    Jeune: Mais, comment le démontrer, ça?

    Mélenchon: Votre impression n'a aucun intérêt, c'est comme la mienne. L'impression que vous me donnez, ça ne présente aucun intérêt. Ce qui compte c'est sur quoi on fonde nos raisonnements.

    Monday, November 19, 2012

    Installing VisualVM plug-ins into the shared directory

    This article describes how to install VisualVM plug-ins into the shared installation directory instead of the user's home directory. This is useful if the VisualVM installation is used by multiple users on the same system or if you want to create a custom VisualVM distribution with a set of preinstalled plug-ins. Actually the "Force install into shared directories" option in the plug-in installation dialog (see the "Settings" tab) should enable that, but the option doesn't seem to work in VisualVM 1.3.4.

    The following procedure can be used as a workaround:
    • Start with a clean VisualVM configuration, i.e. remove (or backup) the ${HOME}/.visualvm/x.y.z folder (Note that on Windows, ${HOME} points to the user's Application Data directory).
    • Launch VisualVM and install the relevant plug-ins. They will be placed into ${HOME}/.visualvm/x.y.z/modules.
    • Create a new directory called custom (you may of course choose a different name if you want) under the VisualVM installation directory (i.e. at the same level as the platform and visualvm directories).
    • Copy the following folder structures from ${HOME}/.visualvm/x.y.z to the custom directory (so that the resulting folder structure matches the one in platform and visualvm):
      • config/Modules
      • modules
      • update_tracking
    • Edit the etc/visualvm.clusters file and add the custom folder to the list.
    • Clear the user configuration, i.e. reexecute the first step.
    If you start VisualVM now, the plug-ins you have copied to the custom folder should be available immediately.

    Sunday, October 28, 2012

    Why is the ActiveCount PMI statistic on a WebContainer thread pool non zero even if no HTTP requests are processed?

    The ActiveCount statistic on a thread pool in WebSphere is defined as the "the number of concurrently active threads" managed by that thread pool. This metric is particularly useful on the WebContainer thread pool because it gives an indication of the number of HTTP requests processed concurrently. If that metric approaches its maximum (which is determined by the maximum pool size), then you know that either the pool is simply too small or that there is a bottleneck that blocks the processing of some of the requests.
    One would expect that the ActiveCount drops to zero when no HTTP requests are received. If you have ever used that statistic, then you may have noticed that this is not the case, and you may have wondered why this is so. There are actually two reasons for that:
    1. WebSphere always initializes the metric with a zero value and then increments and decrements the value as tasks begin and finish executing on the thread pool. That works fine if the statistic is enabled in the WebSphere configuration because the metric will be initialized when the thread pool is started. On the other hand, if the statistic is switched on at runtime, then some threads may already be active and the values reported by PMI will be incorrect. More precisely, there will be a non zero offset between the reported metric and the actual number of concurrently active threads. In that case, the value of the metric may drop to a negative value if no HTTP requests are received.
    2. In WebSphere, thread pools are instances of a generic component and that component is used for different types of thread pools. A thread pool provides a facility to execute tasks on a set of threads that it manages, but it is completely agnostic to the nature of the tasks being executed. In the case of the WebContainer thread pool, these tasks are scheduled by the HTTP request processor. Most of these tasks actually represent HTTP requests being processed by a servlet. However, there is also a small number of constantly running tasks waiting for new HTTP requests to come in. In a thread dump, they are identified by the following stack trace:
      "WebContainer : 3" - Thread t@86
        java.lang.Thread.State: RUNNABLE
          at com.ibm.io.async.AsyncLibrary.aio_getioev3(Native Method)
          at com.ibm.io.async.AsyncLibrary.getCompletionData3(AsyncLibrary.java:625)
          at com.ibm.io.async.ResultHandler.runEventProcessingLoop(ResultHandler.java:530)
          at com.ibm.io.async.ResultHandler$2.run(ResultHandler.java:905)
          at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1783)
      
      Since a thread pool is a generic component which doesn't know anything about the HTTP request processor, it simply counts these tasks as active threads, even if they are actually idle waiting for new HTTP requests. That is why the ActiveCount never becomes zero even on a completely idle system. It should be noted that the HTTP request processor may execute more than one of these tasks and that their number is not necessarily constant over time. This is illustrated by the following plot of the ActiveCount metric, where one can see that (on that particular system) the ActiveCount never drops below 3:

    Sunday, October 7, 2012

    RHQ WebSphere plug-in released!

    I've been working for quite some time now on a WebSphere plug-in for RHQ. RHQ is an Open Source enterprise management and monitoring solution written in Java and is part of the JBoss universe. It already has support for numerous server-side products, but is missing integration with proprietary application server platforms such as WebSphere and Weblogic. My plug-in attempts to close this gap for WebSphere, and to provide an Open Source alternative to commercial products such as IBM Tivoli Monitoring.

    It has already been running successfully for more than a year in a production environment with several dozens WebSphere Application Server instances, but until recently it lacked some operability features and the necessary documentation to allow it to be used by a larger public. I've been working on these issues over the last couple of weeks and I'm proud to announce that the first official release of the plug-in is now available. You can find the binary packages and documentation here.

    The RHQ WebSphere plug-in primarily focuses on monitoring, and to some extend on managing the runtime state of the monitored WebSphere servers. It doesn't provide any features to manage the WebSphere configuration. The reason is that WebSphere already has outstanding capabilities in that area (both for manual and scripted configuration management) and that the configuration model used by WebSphere doesn't fit naturally into RHQ's world view. The plug-in collects many of the metrics available through WebSphere's PMI (Performance Monitoring Infrastructure) API. In addition to that, it has some advanced monitoring capabilities that are not readily available with other solutions:

    • The plug-in can connect to DB2 to collect agent (i.e. per connection) statistics. These metrics are then aggregated per data source configured in WebSphere. This allows you for example to determine the CPU time consumed on the DB2 instance by applications using a given data source.
    • The plug-in can measure the number of leaked application class loaders. The data is provided per application/module and as a global (per WebSphere instance) metric. This makes it easier to investigate out of memory conditions and to decide when it's time to restart a WebSphere instance because of too many application restarts or redeployments.
    • The plug-in can be configured to remotely collect log events from the monitored WebSphere instances and to correlate these events with the component, module or application that triggered them (which is something that is not possible to do by inspecting SystemOut.log).

    Note that the last two features are only available in conjunction with XM4WAS, another project of mine.

    Wednesday, August 15, 2012

    Code coverage analysis in multi-module Maven projects

    When doing refactorings and other maintenance work on mature projects, code coverage analysis is an invaluable tool to help ensuring that the changes have no unexpected side effects. Usually I do that using Cobertura because it's very easy to use in Maven projects (just type mvn cobertura:cobertura and open the report generated under target/site/cobertura). One of the shortcomings of Cobertura is its lack of support for multi-module Maven builds. That is really annoying because in many large projects, a significant amount of code coverage for a given Maven module is actually generated by tests run in other modules. E.g. there are many projects that have a dedicated Maven module for integration tests.

    Recently I learnt about JaCoCo which is another code coverage analysis tool. Unfortunately it has the same shortcoming as Cobertura: out of the box, it doesn't support multi-module Maven builds. There is a bug report that contains a patch that solves this issue (at least to some degree; see below). The patch has not been applied yet; in order to use it you will have to build a patched version of JaCoCo manually. The patch can be applied cleanly to revision 1674 of JaCoCo trunk.

    Here is the complete set of instructions to build a patched version of JaCoCo and deploy it to the local Maven repository:

    svn co -r 1674 https://eclemma.svn.sourceforge.net/svnroot/eclemma/jacoco/trunk/jacoco
    cd jacoco
    curl http://sourceforge.net/apps/trac/eclemma/raw-attachment/ticket/186/jacoco-maven-aggregate.diff | patch -p0
    cd org.jacoco.build
    mvn clean install
    

    After that you can use JaCoCo on a multi-module Maven project with the following commands:

    mvn org.jacoco:jacoco-maven-plugin:0.5.8-SNAPSHOT:prepare-agent -Daggregate=true clean install
    mvn org.jacoco:jacoco-maven-plugin:0.5.8-SNAPSHOT:report -Daggregate=true

    There is however one important limitation: because of an issue in code added by the patch, it only works on multi-module projects where the root POM is also the parent POM. If this is not the case, then the plugin will fail with a NullPointerException.

    Tuesday, July 24, 2012

    How to deal with HeuristicMixedException in WebSphere?


    During an incident such as a database issue, applications deployed on WebSphere may sometimes get exceptions of type HeuristicMixedException. The meaning of this exception is defined by the JTA specification:

    Thrown to indicate that a heuristic decision was made and that some relevant updates have been committed while others have been rolled back.

    The XA Specification describes the concept of a "heuristic decision" as follows:

    Some RMs [Resource Managers] may employ heuristic decision-making: an RM that has prepared to commit a transaction branch may decide to commit or roll back its work independently of the TM [Transaction Manager]. It could then unlock shared resources. This may leave them in an inconsistent state. When the TM ultimately directs an RM to complete the branch, the RM may respond that it has already done so. The RM reports whether it committed the branch, rolled it back, or completed it with mixed results (committed some work and rolled back other work).

    This means that a transaction with a heuristic outcome may lead to data integrity problems because some resources have been rolled back while others have been committed, i.e. the transaction is no longer atomic. However, a HeuristicMixedException doesn't necessarily mean that this actually occurred, and in many cases, the transaction is actually rolled back successfully.

    One interesting case where HeuristicMixedException exceptions are often seen in WebSphere is a distributed transaction where one of the participating resources is a SIBus messaging engine and that cannot be completed because of an issue that affects the message store.

    It is important to know that a messaging engine typically doesn't persist messages immediately, but only when the transaction is committed. If there is a problem with the message store, then the transaction manager will get an exception from the SIBus resource adapter during the prepare phase. This will generate log messages of type J2CA0027E (An exception occurred while invoking prepare on an XA Resource Adapter) and WTRN0046E (An attempt by the transaction manager to call prepare on a transactional resource has resulted in an error. The error code was XAER_RMFAIL).

    When the transaction manager gets the exception from the resource adapter, it will decide to rollback the transaction. However, it doesn't know whether the resource that produced the exception has actually completed the prepare phase or not. From the point of view of the transaction manager, it could be that the prepare phase completed successfully and that the exception was caused by a communication failure just afterwards. Therefore the transaction manager needs to query the resource manager to check the status of the transaction branch and to instruct it to roll back the prepared transaction if necessary. WebSphere will attempt that periodically until the resource manager is available again. Each unsuccessful attempt will result in a WTRN0049W (An attempt by the transaction manager to call rollback on a transactional resource has resulted in an XAER_RMFAIL error) message being logged. While WebSphere is attempting to complete the rollback, the transaction will also appear in the list of retry transactions in the admin console:



    If the error is not transient, then completing the transaction may take a significant amount of time. For obvious reasons, WebSphere cannot simply block the application until the status of the transaction is resolved; at some point it has to return control to the application. The problem is that it cannot report the transaction as rolled back (by throwing a HeuristicRollbackException or a RollbackException) because from the point of view of the transaction manager, part of the transaction may have been prepared. Reporting the transaction as rolled back would be incorrect because it may cause the application to attempt to reexecute the transaction, although reexecuting a transaction that has been partially prepared is likely to fail.

    WebSphere internally puts this kind of transaction into status 11, which is the numeric value for HEURISTIC_HAZARD (see this documentation):



    The HEURISTIC_HAZARD status means that "The transaction branch may have been heuristically completed". Unfortunately, JTA defines no exception corresponding to HEURISTIC_HAZARD that could be thrown by the commit method in UserTransaction. Therefore WebSphere uses the closest match, which in this case is HeuristicMixedException.

    Tuesday, May 15, 2012

    Using custom PMI modules in a Network Deployment cell

    The WebSphere documentation has a section that explains how to implement a custom PMI module. This works well on a stand-alone application server, but not in a Network Deployment cell: the custom PMI module shows up in the admin console, but not the individual statistics defined by the module. The reason is that the deployment manager doesn't have access to the stats template (XML file) and the resource bundle (properties file). To solve this issue, these two files need to be added to the class path of the deployment manager.

    Note that when looking up the stats template for a PMI module that is registered in an application server, the deployment manager derives the resource name of the stats template from the module ID (which corresponds to the value of the type attribute of the Stats element in the template) by replacing all dots by slashes and appending .xml. For the example shown in the WebSphere documentation this would be com/stats/MyStats.xml. Note that the application code that registers the PMI module on the application server is not required to use this resource name (because the template location is passed as parameter to the corresponding StatsFactory methods). Therefore it is in general not enough to simply extract the stats template and resource bundle from the application: the stats template may need to be renamed or placed in a different package when installing it on the deployment manager.

    Thursday, April 12, 2012

    The real meaning of the ActiveTime PMI statistic in WebSphere

    For thread pools, WebSphere has a PMI statistic called ActiveTime. According to the documentation it is defined as the average time in milliseconds the threads are in active state. There is also a statistic called ActiveCount that measures the number of concurrently active threads. E.g. on the WebContainer thread pool this statistic enables one to determine the number of servlet requests being processed simultaneously at a given point in time.

    One would expect that correspondingly, ActiveTime measures the average time it takes to execute a task on the thread pool and that on the WebContainer thread pool this would be the average time to process a servlet request. However, this is not the case at all. Although the documentation of the two metrics both refer to "active threads", they measure two completely different things. In fact, ActiveTime measures the average time that threads declared hanging have been active [For the readers not familiar with how WebSphere thread pools work, a thread is flagged as hanging after it has been active for longer than a configurable amount of time (10 minutes by default)]. Obviously this definition renders the ActiveTime statistic pretty much useless in most cases.