Wednesday, June 15, 2011

ITIL Service Support with RHQ guides you to better management of your JBoss Applications

If you're unfamiliar with the RHQ project, it's an active open source project that provides an extensible framework for systems management (including discovery and monitoring) across platforms for large and small deployments alike. ITIL (IT Infrastructure Library) is a set of definitions and best practices for better managing deployed services in a controlled manner. Initially it is easy to get lost in some of the ITIL terminology, but the structured approach provides a cohesive view of most aspects of delivering robust IT services. As you will see, the combination of RHQ and ITIL can be quite powerful in better managing your JBoss application deployments. It is also helpful to remember that these best practices are applied in varying degrees depending upon the size and amount of process control present in your organization.

For the purpose of discussion, I'll define a 'service' to mean any web application (Ex. public website, rich internet application, company intranet application, etc.) where non-trivial content is being served and for which it is important that the application remain accessible and robust. While the full breadth of ITIL practices is rarely ever used for any single web application, the terminology and concepts around 'Service Support' are particularly useful for describing the challenges of maintaining a web application and its dependent services for large scale enterprises.

Service Support: focus areas.

  • Configuration Management
  • Incident Management
  • Problem Management
  • Change Management
  • Release Management
  • Service Desk

These activities can best be summarized as the processes involved in the classification and orderly resolution of events that affect the availability/reliability of a given service. Under the umbrella of Service Support, the following related processes coordinate to define an approach for maintaining continued operations:

Configuration Management:
Within all complicated systems, such as Java application servers, there are various items to be identified, configured and tuned. Within ITIL these items are labeled Configuration Items (CIs) and exist within some master Configuration Management Database (CMDB) which is often a headache to populate and maintain. When a web application is being hosted the specific operating system, available memory, version of JVM, database, Session Beans, etc are just a few of many relevant parameters important in maintaining a healthy service.
RHQ: With an RHQ installation, discovery and determination of relevant CI details for operating systems in the management domain is automatic and continually synchronized. Detailed information on a CI (and its health) anywhere within the management domain, including across groups of servers/operating systems can be quickly determined through the RHQ web console. (See image for example CI's)


Incident Management:
During normal operations of a web application, especially under load, disruptions of service due to memory, disk space, usage or otherwise can occur and are classified as Incidents.
RHQ: As long as RHQ is monitoring a platform(a.k.a Operating System, soon cloud deployments as well) and its related AS server instances, the management server is continually monitoring availability and data for varied services and servers. By configuring alert conditions for given CIs and notifications events, the RHQ server facilitates responsive and customizable Incident Management process.(See image for examples of Incident Management)

Problem Management:
The process of consolidating various related Incidents into Problems is called Problem Management. Repeated Out of Memory incidents could be consolidated into an insufficient memory problem report and would constitute a known error until addressed by a CI change.
RHQ: There is no current RHQ functionality configurable to regularly mine incident data to formally recognize and abstract out Problem conditions. That being said, with the RHQ server regularly monitoring and retaining lots of data for all CI's, the act of mining such incident data is currently a manual process optionally done by the RHQ System Administrator. The Monitoring and Summary>Timeline tabs for each resource are critically helpful in aggregating Incident data for root problem analysis. (See images for examples of Problem Management)
A Monitoring snapshot:

A Summary>Timeline snapshot:

Change & Release Management:

*With traditional ITIL implementations, Change Management is more focused on hardware changes while Release Management has more of a focus on software changes. As we've focused the 'service' definition to be the support of web application processes then the distinction between these two classifications is not really relevant as hardware updates are not typically tolerated.*
Any dynamic system requires component(Configuration Item) updates as bug fixes or scheduled version updates of software applications(Ex. database or application servers) occur. Change Management is the planned and controlled update of CI's to minimize service disruptions. In some large deployments, a formal Change Request is often created and passed through an approval process before the requested change is allowed to proceed.
RHQ: There is no formal Change Management process provided by RHQ at this time. The closest currently available is that the RHQ authorization process can be configured to restrict update/create/remove/edit access to a subset of authorized users. Until there is more demand for a formal approval process within RHQ, your existing change authorization procedures will have to suffice. Additionally a new feature, Drift Management, is currently under development to make it possible to detect unauthorized or unintentional configuration modifications that differ from some agreed upon set of Configuration Items. Stay tuned for more updates on this feature.

To address the issue of deploying specific software versions to JBoss application servers, the RHQ server supports the deployment and versioning of EAR,WAR, Connection Factories and Datasources to single JBoss AS instances or via the RHQ CLI client deployments to farms of JBoss AS instances. The first step of controlled deployment is being able to identify CI's to be updated and secondly to actually automate installations of application revisions in an orderly fashion. (See Application deployment examples of Change/Release Management)

Service Desk:
Typically the ITIL Service Desk serves as the single interface that lists and hosts all of the ITIL components and processes while coordinating all processing related to service maintenance. For web application support those details are primarily around Service Support, but larger ITIL applications can span much wider activities for 'Service Delivery' or 'Service Design'.
RHQ: There is no such explicit ServiceDesk interface provided by the RHQ UI. The closest approximation would be the RHQ UI itself while listing all CI's, Incidents and Changes that have recently occurred for Resources and Groups of Resources.


While the typical Service Desk UI may not currently be available with the RHQ Web console, it is interesting how many of the parallel focus areas are covered in some form or other to address Service support functionality by the RHQ UI. The most conspicuous feature missing from the RHQ UI is a formal authorization and approval process for Release/Change management. This is an interesting feature request that would allow the RHQ UI to support more traditional ITIL customers.

Larger enterprises typically have farms of application servers supporting their critical web applications for their critical business functions. In many cases home grown best practices have evolved to address some of the same concerns that have been formalized within the structure of ITIL. By responding to the requests of the RHQ community, the RHQ framework has evolved an enterprise class framework to monitor and address many of the same concerns critical in addressing Service Support.

Perhaps the greatest takeaways from this discussion are the following:

  • ITIL is only a set of best practices. In reality there is a fair amount of work and integration required to apply those best practices to your specific use case. RHQ is a great start.
  • RHQ is an extensible management framework that already has many tools to address web application support. Many of the driving forces behind the RHQ functionality was born out of enterprises attempting to address web application support. There is a wide array of tools already available.
  • This was only a taste of the functionality available via RHQ. RHQ has a CLI integration interface and greater Change Management support is evolving with RHQ Bundles that was discussed here.
  • Mix and match as it applies to your use case. Most Service Desk implementations struggle to seamlessly integrate with the existing application maintenance components or quietly getting out of the way when not needed. Feel free to use what works for you.
  • Good Service Support is hard to do. Use industry best practices where applicable to avoid reinventing the wheel.
Further Reading:
  1. ITIL http://www.itil-officialsite.com/AboutITIL/WhatisITIL.aspx
  2. ITIL http://en.wikipedia.org/wiki/Information_Technology_Infrastructure_Library
  3. ITSM http://en.wikipedia.org/wiki/IT_Service_Management
  1. The RHQ project http://www.rhq-project.org/display/RHQ/Home
  2. RHQ CLI http://www.rhq-project.org/display/JOPR2/Running+the+RHQ+CLI
  3. RHQ CLI Group Deployments http://community.jboss.org/wiki/JON23ScriptedGroupDeploymentsUsingTheCLIAPI
  4. RHQ Drift Management http://rhq-project.org/display/RHQ/Drift+Management

Wednesday, June 1, 2011

RHQ Summary>Overview pages rewritten as more customizable portlets

The old JSF Resource Summary>Overview pages offered a decent snapshot of the current state of the resource, summarizing recent data for:
Metrics, Configuration Updates, Alerts, Operations, Out of Bound Metrics, Event Counts, and Package History.


This approach had a few known restrictions:

  • Fixed time frames. In each case acceptable defaults for the time frames were hard coded to give a good guess as to what was relevant to you as the current state of the dashboard for your Resource. Ex. Event counts listed for past 24 hrs only.
  • Static screen real estate. In each case all of the Overview regions were always shown. Even if you decided to never define alerts for a given resource there would always be a 'No recent alerts' message displayed and you just had to give up on that screen real estate.
  • All or nothing refreshes. If you were monitoring your Resource>Summary page and you were only interested in updates to the Recent Measurements display region you would need to set the whole page interval and have the page request all the data for all the other Recent* regions as well just to see timely changes to the region you're interested in.

With the RHQ 4.0.0 release, most of the UI has been completely rewritten using SmartGWT but additionally the:

  • The entire Activity page is now a composition of N portlets for each of the recent data areas that can be individually removed, resized, repositioned or refreshed depending upon what information the customer deems important for that resource or group.
  • Each of these Activity portlet regions also has their own configuration section where i) time frame for data collection , ii) amount of elements to be displayed and various other data filtration options are available(Ex. only show High priority alerts) are now available.
  • RHQ 3.0 Resource Summary> Overview pages renamed to Resource Summary>Activity.
  • Matching Group Summary>Activity pages have been added to get Recent* data as it applies to a group.
  • Additionally for Platform resources/groups there is a 'Bundle Deployments' portlet showing relevant bundle updates.

By reimplementing these summary display regions as portlets, the presence of each region is customizable, as are the types and amount of data to display as well as the rate at which data is refreshed. These are cool enhancements that I think users will enjoy. As data continues to pile up for everything we need to manage it's nice to be able to customize your 'dashboard' pages to show you only what you know is relevant for your use case.