Narayana team blog: February 2009

Sunday, February 15, 2009

Lessons learned and magic numbers.

One of the things you do when you start developing software, especially in a new or cutting-edge area, is believe you know what's best for your users and try to hide any complexities from them. That's one of the reasons we pushed RPC as the best way in which to develop distributed applications in the 1980's. Well back when we started developing Arjuna we took hiding complexity to heart, since simplifying the development of transactional applications was core to all of our PhDs. And I think we did a very good job with the initial releases.

However, once we gave the system to people (the source was made freely available by FTP in 1991, but various industry sponsors were using it before then) the feedback we got was very interesting: users will always do things you didn't expect and demand flexibility in what you produce.

In Arjuna this didn't impact the interfaces users saw but many of the internal development parameters that we had used, thinking that they would never need to be configurable (statically or dynamically). For example, when making a remote invocation on an object failures can occur (the network could partition, the machine hosting the object could crash, etc.) In the absence of a perfect failure detector you use failure suspicion and timeouts. Basically if a response does not come back from the object in time T then you assume the object has failed and act accordingly. But if you get the timeout value wrong then you can make the wrong decision with associated consequences. The timeout value really needs to take into account how long work might take to execute, how overloaded a machine may be, network congestion etc. So one value is rarely right for every application. But we didn't take that into account initially and hard-coded a magic number into the system.

There were other examples of magic numbers, including: the number of RPC retransmissions to use before assuming that a request (or response) cannot get through to the endpoint (why should 5 be any better than 2 or worse than 10?); the number of clustered name server instances; the object store location; the orphan detection period; the lock-conflict detection timeout etc. These were all things we believed had the best (sensible) values, but with hindsight it was clear that it was all based on limited deployment knowledge.

What this all lead to quite quickly was a methodology of expect the unexpected and develop accordingly as far as your users are concerned. We made the system extremely flexible and configurable, where many of the old magic numbers could be overridden either as the system ran or during deployment, with sensible defaults (trying to hit Pareto's 80/20 principle). Those magic numbers we didn't make configurable were clearly documented (both so we could remember as much as users could determine why something was behaving the way it was).

From the feedback we received over the past 20 years I think we managed to come close to the right set of compromises. It's true that most users are happy with the default values we set (which were/are revised based on feedback). But those smaller users who really do need the ability to change things now (or since 20 years ago) have the ability to modify them without rebuilding the system. This has been important in the systems adoption and it's a lesson that we continue to apply. So if you're developing software, beware of using too many magic numbers and if you don't make them configurable you need to understand (and believe) why that's the case and, importantly, document them just in case!

Monday, February 9, 2009

JBossTS and Blacktie: the only combination you'll need!

[Cue advertizing jingle.]

Are you a major transaction user, or do you know someone who is? Are you worried about your legacy transaction infrastructure because the vendor may not be reliable? Do you have sleepless nights wondering how to drag your transactional applications into the modern era? Or are just concerned about the ever escalating costs for the transaction system you're currently tied into? Well worry no more: what you need is Blacktie from JBoss, the company that brought you those other useful gadgets like JBossAS, the JBoss T-shirt, the JBoss thong (picture no longer available!), and the JBoss World Virtual Conference.

Unlike offerings from other vendors, Blacktie is entirely open source. It comes in a convenient source or binary bundle for easy deployment to tackle those stubborn legacy applications. Blacktie also builds upon other mature products from your favourite JBoss vendor so it fits nicely into your existing investments from them. And if this is your first entry into open source and JBoss, then it's a great way to start. Blacktie will be the only legacy transaction implementation you'll ever need to solve those annoying problems at a price you'll love!

And coming soon iBlacktie: because everything needs a little i magic!

[Cue advertizing jingle.]

Go on. Check it out. You know you want to!

Tuesday, February 3, 2009

Stay away from pseudo-transactions!

I tried to stay clear of commenting on this article, but while participating in tonight's WS-RA meeting I let my defenses down! In short, the article can be summarized by: "How to use non-transactional resources in a transaction when you can't be bothered to do it right in the first place." Or "How I learned to break transactional semantics and put my data consistency on the line."

I'm fairly sure the authors are trying to help their audience, but they really aren't. They ignore critical problems with their proposal, either deliberately or because they simply haven't given the problem space enough thought. What they are trying to do is emulate the last-resource commit optimization within the application, but whereas a transaction manager will do this and provide support for crash failures, the article ignores that completely. There is no reference to failure recovery at all.

The article also appears to assume that because a datasource is managed by an XAResource it will always honor the business logic agreement when the transaction commits, e.g., if the table update succeeded through the session instance then prepare/commit will work later. I hate to break it to the authors, but that isn't always the case. This works in a transaction manager because we manage the resource ordering and durability very carefully to cope with failures, i.e., there are good reasons we don't require the application programmer to do this!

Oh and the way in which multiple non-transactional resources are managed in the transaction just scares the %$&* out of me! Look, these are resources that do their work when told to and you can't undo that (if you could, then wrap the &%*& things in an XAResource!) So if you crash part way through the normal flow of execution, or part way through the "rollback", what is the state of the application? How do you find out what happened to whom and when? Where's the log?! (I would hate to be a systems administrator in this situation when the sh*t hits the fan!)

The authors say that "Although this is not as robust and comprehensive as a truly transactional interface, it can provide excellent coverage at a fraction of the development cost of a JTA compliant interface." Unfortunately it does not. That's a bit like saying a car with worn brakes is roadworthy in all situations, when it clearly isn't! Ask yourself this: what happens to my data in the cases that aren't covered by this approach and can I really afford to lose it or spend the new hours/days/weeks repairing it manually? If the answer is yes, then you probably don't want to use transactions at all. If the answer is no, then stay away from this approach and go with a transaction system and transactional resources.

In conclusion: use transactions correctly and if you can't make your data items transactional then try using compensating transactions. At least you get logging and recovery!

Building transactional applications

I've been thinking about how we develop transactional applications for a very long time. That thought process got interrupted for a bit, but I came back to it over Christmas and recent events and articles have pushed this further. Plus we're working on transactional aspects for Drools 5 which has covered writing XAResources. Back in HP we had a tutorial on how to do this, which I think we're going to dust off and maybe Jonathan and the team will write some articles around that topic. (I know it's the subject of an update to the book, whenever that happens!)

But it got me thinking that as soon as people start asking about how they can write XAResources to make their data transactional then we really haven't progressed as an industry. XA is a good standard. But it's not perfect. (No standard is!) Plus it was designed with very specific database use cases in mind, which were fine 30 years ago but aren't always a perfect fit for the 21st century. The question should be "how do I make my data transactional?" and leave the implementation specifics to the engine or container.

We spent a great deal of time over the last 20 years making the development of transactional applications easier (HP was using Arjuna 20 years ago and liking it). Some PhD students wrote an entire database system based on it too. Simplicity, flexibility and power were keys to its success (nested transactions rule!) Back then there was little differentiation between the transaction engine and the way in which you developed applications. Subsequently we emphasized that they were different (Arjuna became known as the engine, while Transactional Objects for Java was the framework for developing applications).

However, we've not concentrated on the latter in recent years, which is a shame and something that we're going to remedy, because it's a proven way of developing flexible transactional applications. I think we'll throw in annotations and try to make it slightly less invasive than it was (back in the 1980's there was no such thing as Java let alone annotations). Hopefully we'll then be able to offer this to developers building on JBossTS and they can move away from asking "How do I write an XAResource?" to "What do my users want from my application/framework/service/..?" Leave the transaction complexity to JBossTS.

Side-note: while writing this I was reminded that we have a student project on some of these ideas, though not necessarily tied to TOJ. So if you are a student and looking for something to do, now may be the time to give it a go.