Session and Clustered Java Web Apps
Session can be a headache to work with in Java web applications. For that reason, most developers now use MVC frameworks, such as Java Server Faces, that hide the use of session and allow you to work with simple Java beans and configuration instead. But, that’s not always the case, especially where you have plain servlets that you have to maintain. With servlets, you have the power to put objects into and take objects out of session yourself. Alone, this presents thread-safety issues, but manually managing session in an application that is clustered presents more problems.
There is a jump that has to be made when moving from a single server environment to a clustered one. In a clustered environment, session management becomes complex because it has to be shared by all the servers in a cluster. This sharing of session becomes a distribution of objects strategy, which is an advanced topic you should familiarize yourself with before clustering your application. A high-end application server (i.e. IBM WebSphere) will handle most of this complexity for you, but you have to educate yourself on the myriad configuration options and their affect on your application.
In my recent excursion into running an existing Java web app in a clustered environment, I found that there was one major stumbling block from my programmer point-of-view—understanding the mechanism for distributing session objects between servers in a cluster. If you’re about to do the same for your application, the first thing to understand is that distribution of session objects is done through Java object serialization. That method is prescribed in the Servlet specification in section “SRV.7.7.2 Distributed Environments”, in fact. How application servers actually manage that so session is consistent is left up to the vendors and is one way in which they can compete.
Understanding that object serialization is the mechanism for sharing session is important to know. To start, it means that every single object that gets put into session has to do two things: 1.) it has to implement the class
java.io.Serializable and 2.) it has to actually be serializable. An object is not really serializable unless its entire object graph is also serializable. So think about this, if you didn’t design your application with this requirement in mind, you could be in serious trouble when moving to a cluster. At the very least, you will have to identify all objects you put into session in your servlets and modify them to implement
java.io.Serializable (which doesn’t require you to override any methods). At the worst, you will have to write custom serialization code for objects that don’t serialize naturally. The transient keyword can help with this by marking fields of objects that should be skipped during serialization.
The bright side to this discussion is that there is ample documentation out there on the topic, although it is spread out. Below I have annotated a few sources I found very helpful when I started researching this topic. I am working with an IBM WebSphere cluster, so most of the documentation is from IBM, however, all of the advice applies to any application server. Moving to a clustered environment is probably harder work than you realized, so you’ll be well served (and appreciated by your boss) if you check out these resources before making the jump to a clustered environment.
I’m still actively researching this topic myself. As such, I’ll keep updating this post as I learn more (read: I might be wrong on a detail or two :)).
Best practices for using HTTP sessions
This is IBM’s advice on the best way to handle various aspects of session. It’s broadly applicable advice that, honestly, I wish I would have seen about 5 years ago when I first started working with web apps 😦
Java Theory and Practice: State Replication in the Web Tier
This is an older article by Brian Goetz that contains similar knowledge to the previous link.
WebSphere Application Server V7 Administration and Configuration Guide
Overall, this book is very WebSphere specific. However, Chapter 12 does describe how WebSphere handles distributed session and it’s very enlightening to read. On a guess, I’d say that other app servers do things similarly. There is actually a good bit of general discussion in Chapter 12, much more than there is explanation of what buttons to click to set it up so it’s worth the time to skim.
Designing and Coding Applications for Performance and Scalability in WebSphere Application Server
This book is far less specific to WebSphere than it sounds. Chapter 3 “General coding considerations” is a fantastic overview of how to deal with topics such as garbage collection, synchronization and database access. Most of the other chapters deal with issues that are similarly applicable to any application server, not just WebSphere.
The big bonus for downloading this RedBook is that it has a section called “Cluster considerations”. This section walks you through all of the things you should consider when you want to run your application on a cluster of servers.
Serializing Access to Session
This link explains how to serialize access to objects in session for IBM WebSphere application server 7.0. I threw this link in to point out that session is basically a shared memory that multiple threads have access to concurrently. This is true whether you are running on a single server or a cluster of servers. This document is specific to WebSphere, but I’m guessing the other commercial app servers have similar options.