Like many application developers, we use RoboGuice to make a lot of things nicer and easier to deal with under Android.  One of the nice bits we use is SafeAsyncTask, which gives you quite a nice way of running a background task with convenient hooks for dealing with success, failure, pre-execution and finally/post-execution.

SafeAsyncTask in turn wants to run your onSuccess, onException, and onFinally via a Handler you provide.  This is done for you under the hood, but the gist of it is that it passes a Callable to a Handler which contains a decrement to a CountDownLatch in its finally block; meanwhile, the calling method waits on the latch (so it blocks until the background task is completed on its handler).

In our Activity, we have a polling task that we want to run every 2 seconds, and we achieve this by starting a HandlerThread, connecting to that a Handler, and using sendMessageDelayed with a simple Message with a constant int indicating what we want done.  Then in our handleMessage method, we fire off a subclass of SafeAsyncTask to do the poll, passing in the polling Handler and the Message we got. Inside the SafeAsyncTask's onFinally(), we recycled our Message using Message.obtain on the old message, and then we invoke handler.sendMessageDelayed using the new (recycled) message instance.

But this did not give us in any way the behavior we expected or wanted.  Instead of seeing this sequence of events:

onFinally() [two second pause]
handleMessage()
onFinally() [two second pause]
handleMessage()

we in fact saw THIS sequence of events:

onFinally() [two second pause]
onFinally() [from the handler thread this time, two second pause]
handleMessage()

It appears that though Message.obtain should have left us with a clean Message to use that should have been indistinguishable (as far as we are concerned) from the original Message, in fact it gave us a Message with a callback to run the current onFinally() method, recycled from the same Message that was used to run onFinally in a Handler by SafeAsyncTask when the callback really should have been nulled.

I'm somewhat inclined to call this a defect in Android, but I would need to research the semantics of how Message.obtain(message) is expected to work, look at the Android source in a little more detail, and see precisely what went wrong inside the obtained message to fire onFinally() instead of handleMessage() via normal message delivery before I would be willing to make that claim strongly.

Our workaround was to instead call handler.obtainMessage(message.what), which does give us a nice, clean message that doesn't attempt to call back to onFinally() again, and it results in ordinary message delivery as we expected.

The Perils of Legacy Android Development

One of the least awesome things to do is to make last-minute changes to an ecosystem that is known to be working when installing on a customer site.  Nonetheless, sometimes you're stuck and you have to scramble.  This was the case for us yesterday when we needed an inexpensive Android tablet to use at a customer site.  So we acquired a Samsung Galaxy Tab 3 "Lite".  This is a fairly non-spectacular, dated tablet, which runs Android 4.2.2, and probably always will.  Samsung appears to have abandoned support for this device, so it is not likely to graduate from Jelly Bean to Kitkat without rooting the device and installing a custom ROM.

Still, our app isn't horribly fancy, so we anticipated that things would probably work fine.  We were mistaken.

Even installation from the Play Store failed.  We tracked this down to probably happening because we specified a signing algorithm of SHA256withRSA.  This works on later versions of Android (or maybe just on other devices) but not the tablet we picked up; the whole app downloads, and then the installation fails with an error about the package file not being signed correctly.  The fix here is to specify -sigalg SHA1withRSA instead.

The next problem we encountered (after manually installing the app) was in annotation processing from within Jackson.  We share our model objects between our cloud REST service and our Android application.  Android makes using the XML annotations that Jackson handles somewhat of a pain; these annotations are in the javax.xml.bind.annotations package, which Android itself does not supply.  Further, dexing forbids providing this package, unless you specify the --core-library switch... which itself is cautioned against by Google in the strongest possible terms.

The curious thing is that this never seemed to matter in the Android 4.4 and later world; though we did nothing to strip out annotations, our app always ran fine, though we occasionally got an exception the first time Jackson scanned for annotations.  In 4.2, we got the exception every time, which prevented deserialization from working at all.  It isn't clear why it works in  >= 4.4 and not <= 4.3, though there are some potential indicators that earlier versions of Android were a little more aggressive about classloading annotation classes.

Fortunately, Jackson > 1.8 provides the means to disable annotation scanning entirely to avoid this whole JAXB binding issue.  For Jackson 2.4, it's just a matter of calling objectMapper.disable(MapperFeature.USE_ANNOTATIONS).  Of course this means that your JAXB annotations won't do anything and Jackson won't know anything about them, but in our case that was acceptable, because we are deserializing only, and Jackson is fairly intelligent about its default deserialization behavior in the absence of annotations.

Speaking of annotations, the next problem was a curious error coming out of Joda Time, again caused by an annotation, org.joda.convert.ToString.  This appears to be an annotation that Joda uses internally on its AbstractInstant, and again, it was something that we never had any problems handling on 4.4 and above.  We got a NoClassDefFoundError for the annotation class when deserializing an object using a DateTime type, giving some credence to our weak hypothesis about Android 4.2 being somewhat more aggressive about loading annotation classes than later versions of Android (and the Sun and OpenJDK JVMs as well).  The fix in this case was to add the dependency on joda-convert, which contains the classes needed to resolve those annotations.

Once we got past these three problems, we hit the dreaded NetworkOnMainThreadException.  In response to certain events, we post messages to Roboguice's event bus, and it turns out that these events may be received on the UI thread, at least in Android 4.2.2.  (It's also possible that they can be received on the UI thread in later versions of Android as well, but performing a network task from the UI thread is not a fatal exception that kills your app in later Android releases.)  In our case, when Preferences change, we want to refresh some data from our server, which we get by performing a REST GET request, which would kill our app on 4.2.2.  The easiest workaround for us is to publish the event to the eventBus from inside a Roboguice SafeAsyncTask, thereby ensuring it is never received on the UI thread.  (A possibly better solution would be for the entity that subscribes to the event to perform its work inside a SafeAsyncTask, but this proved to be far more work than we wanted to do in the near term.)

Then, just when everything appeared to finally be working, one last problem snuck up on us.  It seems that Android 4.2.2 has some issues with HTTP(s) Keep-Alive management.  Our app makes frequent GET requests, polling the server for new data (eventually we'll switch to a more event-driven model).  It seems that Jelly Bean isn't terribly inclined to close sockets when it is done with them if keep-alives are turned on, even if those connections are never reused.  The end result of this is a lot of sockets left open that aren't doing anything and aren't needed, which eventually become so numerous that new requests cannot be created, causing an exception to be thrown about too many open files (EMFILE).  We use Jersey to handle our web resources, and working around this problem is a matter of adding webResource.header("Connection","close"), which asks the server NOT to keep the connection alive so the sockets will end up getting closed in a timely manner.

After a lot of head-scratching, visits to StackOverflow, and making many changes, we have a functional app on Android 4.2.2.  In the end, it was a useful exercise, because some abnormalities were uncovered... so there is probably something to be said for keeping around a weird old piece of hardware for use in testing; chances are you'll probably experience at least one customer to try to deploy something that is less-than-optimal.
Quite often when multiple components of a system are combined, one can experience unexpected side-effects and unexpected failures, the cause of which seems completely unrelated to the consequence.  I recently experienced a situation in which a bit of configuration to jDeb (a Maven plug-in that assists in the creation of debian .deb packages) caused an application to stall out completely after about 4 minutes of run-time.

The Backstory

Our app is a Java program that runs from a .jar file, and we run it in production wrapped inside YAJSW (Yet Another Java Service Wrapper).  YAJSW allows us to do a few nice things like run as a daemon and run as a different, non-root user, along with some other nice features.  We log using sjf4j with log4j, and YAJSW also intercepts System.out and System.err to log those messages to its own log file.  The whole thing gets wrapped up in a .deb package using jDeb, which lets us install smoothly, installing the daemon and updating the rc.d files.

The Failure

After installation, our program started up fine and began doing its customary work.  Success! Almost!  After about four minutes, it stopped working, no longer making any of the API calls we expected, and generally behaving in an unresponsive manner.

The first thing I did was run jstack against the running Java process to see what all the threads were doing.  Almost all the threads in the application were in a BLOCKED state on org.apache.log4j.Category.callAppenders. A quick Google search revealed a lot of complaints about similar behavior, even finding one comment that described exactly what we were seeing: one thread seemed to be stuck in java.io.FileOutputStream.writeBytes forever, and all the other threads waiting for a lock so they could write to the appender too.

So what could be causing a thread to block forever trying to perform a simple write? Cursory checks to see if there were any issues with excessive garbage collection, or impending disk failures revealed no problems.

Things seemed to run okay running the jar file from the command line, as our effective user, by just using sudo -u and java -jar, so it seemed in some way related to running our application from inside YAJSW.

A little more Google searching found a YAJSW thread which mentioned the need of creating several memory-mapped files to which the wrapped process would write instead of writing to stdout and stderr; the wrapping process then 'gobbles' from these memory-mapped files for the purposes of logging.

Aha! Another thing I had noticed was that we didn't seem to get any console messages from our app in the YAJSW log like we would have expected, and a quick look revealed that in fact, no memory-mapped files for stdout and stderr were created in tmp/ for the wrapped app to write to.

So at this point it was starting to look like log4j was trying to write to its redirected System.err or System.out (for the console appender), and once some buffer filled, further writes blocked, while holding the appender lock, preventing any other threads from making any further progress, blocked trying to log.

Closer examination of the permissions on the tmp/ directory revealed that dpkg had created it with mode 600 instead of mode 700, and creating a new file is not possible on a Posix system without execute permission on the directory. Because of inadequate permissions, the memory-mapped files for System.out and System.err redirection couldn't be created, a failure which YAJSW couldn't log, because the files were needed to set up logging.

The culprit turned out to be my own misunderstanding of how jDeb's permissions work when applying a template data type in a dataSet. The perm mapper allows you to specify a user, group, filemode and dirmode for use when creating directories and files, but when using the template data type, the mode that gets used for creating the paths is not the dirmode, but the filemode. Consequently, the filemode needed to be 700 rather than 600 for the mapper for the template paths.

Conclusion

The chain of failure ended up looking like this:

  1. I misunderstood how jDeb/Debian package permission mappings worked when creating new, empty directories in a .deb package. Intuitively it seemed like dirmode permissions would be used when creating an empty directory, but in fact filemode permissions get used.
  2. This caused dpkg to create the new tmp/ directory with mode 600, excluding execute permission on the directory.
  3. Because of the exclusion of the execute bit, the YAJSW wrapper could not create the memory-mapped files it needed for writing using its overridden System.out and System.err.
  4. Because the files didn't get created, the YAJSW wrapper process could not consume the output of System.out and System.err, and the wrapped process had nowhere to write once whatever internal buffering existed was full.
  5. Because the buffer filled up and writes blocked, one log4j appender blocked waiting to write while inside a synchronized block.
  6. Because the write inside the synchronized block was blocked forever, all of the other threads that wanted to log anything then blocked waiting to enter the synchronized block.
  7. The whole process stopped working because eventually everything was waiting on a lock or waiting for IO to complete.
And that is how a permission mistake in a configuration file can lead to an application locking up after about 4 minutes (the time it took to fill up some buffer somewhere).

PostgreSQL + ActiveMQ = (auto)Vacuum Hell

This is one of those "I had this weird problem that others are likely to encounter as well, so here's a detailed explanation that will hopefully come up in a search engine for those people when they are trying to figure it out" posts.

As most PostgreSQL administrators know, vacuuming is a necessity for the long-term survival of your database.  This is because the PostgreSQL Multi-Version Concurrency Control strategy replaces changed/deleted rows with their new versions, but keeps the old tuples around for as long as some other existing transaction might need to see them.  Eventually these old rows are no longer visible to any transaction, and they can be cleaned up for re-use or discarded completely once an entire page is empty.

This works out really well, provided you have fairly straightforward database access going on: transactions start, stuff happens, transactions complete; repeat.  Things can go radically wrong if the "transactions complete" part doesn't happen, however.

Because PostgreSQL has no idea about what any given transaction might do next, it is not able to vacuum out any dead rows newer than the oldest transaction still in progress.  If you think about it, that makes sense.  Any given transaction could look at an older version of any row that was around at the time that transaction began, in any table in the database.

The symptom of this is that vacuum tells you about a large number of dead tuples, but says that it removed 0 / none.  Doing a "vacuum verbose" tells you there was a large number of "dead row versions" which "cannot be removed yet".  If you turn on autovacuum logging, you'll see messages like this:

Feb 27 02:57:41 slurm postgres[12345]: [3-1] LOG:  automatic vacuum of table "activemq.public.activemq_msgs": index scans: 0
Feb 27 02:57:41 slurm postgres[12345]: [3-2] #011pages: 0 removed, 14002 remain
Feb 27 02:57:41 slurm postgres[12345]: [3-3] #011tuples: 0 removed, 58897 remain

The number that remain will continue increasing forever, and the number removed is always 0.  Like me, you might naively check for transactions running against activemq_msgs, and find none, or find only ones which are short-lived.  And that would be you mistake.  While autovacuum runs per-table, running transactions are per-database.  You may well have a long-running transaction running statements against some other table preventing rows from being removed from the table you're watching.  Again, this is because PostgreSQL cannot predict the future; that long-running transaction might run a query against the table you're watching two seconds from now, and as long as that could happen, those old tuples cannot be removed.

How does this relate to ActiveMQ, you ask?  If you're running ActiveMQ and using PostgreSQL as your backing/persistent store (and you may well have reasons to do this), and you don't do anything to change it, the default failover locking strategy is for the master to acquire a JDBC lock at startup, and hold onto it forever.  This translates into a transaction that starts when ActiveMQ starts, and never completes until ActiveMQ exits.  You can see this in progress from the command line:

activemq=# select xact_start, query from pg_stat_activity where xact_start is not null and datname='activemq';
          xact_start           |                                                query                                                
-------------------------------+-----------------------------------------------------------------------------------------------------
 2014-02-27 01:24:13.677693+00 | UPDATE ACTIVEMQ_LOCK SET TIME = $1 WHERE ID = 1

If you look at the xact_start timestamp, you'll see that this query has been running since ActiveMQ started.  You can also see the locks it creates:


activemq=# select n_live_tup, n_dead_tup, relname, relid from pg_stat_user_tables order by n_dead_tup desc;
 n_live_tup | n_dead_tup |    relname    | relid 
------------+------------+---------------+-------
        628 |      58903 | activemq_msgs | 16387
          4 |          0 | activemq_acks | 16398
          1 |          0 | activemq_lock | 16419
activemq=# select locktype, mode from pg_locks where relation = 16419;
 locktype |       mode       
----------+------------------
 relation | RowShareLock
 relation | RowExclusiveLock


Again, as long as this transaction is running holding the ActiveMQ lock, (auto)vacuum cannot reclaim any dead tuples for this entire database.

Fortunately, ActiveMQ has a workable solution for this problem in version 5.7 and later in the form of the Lease Database Locker.  Instead of starting a transaction and blocking forever, instead the master will create a short-lived transaction and lock long enough to try to get a leased lock, which it will periodically renew (with timing that you specify in the configuration; see the ActiveMQ documentation for an example).  So long as the lock keeps renewing, the slave won't try to take over.  Your failover time, then, depends on the duration of the lease; it won't be nearly-instantaneous as it would in the case of a lock held when ActiveMQ exits (though it could be faster than a transaction ending after a socket times out due to an unclean exit).

Because the locking transactions come and go, rather than persisting forever, the autovacuum process is able to reap your dead tuples.


So the moral of the story is this: if you're using PostgreSQL as the persistent store for ActiveMQ, make sure you configure the Lease Database Locker in your persistenceAdapter configuration.  Otherwise, PostgreSQL will never be able to vacuum out old tuples and you may suffer performance degradation and a database that bloats in size forever (or until you stop ActiveMQ, run a vacuum, and restart it).

Five More Bloody Signs You Aren't Bloody Agile

This is getting old, isn't it?  I still think I might try to turn this into a book of some kind, though, so we press on.

5. You can't touch some code because someone "owns" it.  Or there's some piece of code that only one person can work on.  Sometimes this can happen because what a unit of code does is so complicated and so specialized that there's only one person on your team capable of understanding it.  But most of the time what is happening is some combination of a person's ego becoming involved, a lack of sufficient unit testing, and insufficient cross-pollination of code modules among the members of the team.  Consider this situation very carefully, because it means your code base now has a single point of potential failure.  If only one person understands it, what happens if that person gets sick or wants to take a vacation?  If someone's ego gets involved, it's almost always to the detriment of the rest of the team. Or are people just afraid to touch it because something might break?  That's the easiest case to deal with: keep adding unit tests until people feel comfortable with that safety net.  Pair Programming can help with the cross-pollination, as can inflating an estimate, biting the bullet, and giving the work to anyone BUT the person who "owns" that code.  Collective code ownership may be a little painful and a little slow in the short term, but in the long run, you'll be glad you did it.

4. Your team lives from one crisis to the next crisis.  Management is freaking out regularly.  Flailing, weeping, wailing, and gnashing of teeth.  What could this possibly have to do with Agile, you ask? Quite a lot: it's an indication that your Agile process has completely broken down or your stakeholders are not playing the game.  If Agile were working properly, why would Management need to freak out?  They'd have a predictable set of deliverables for each iteration and each release.  They'd have a good idea of the team's velocity.  They'd have a decent idea of when things would be "done".  They'd be seeing progress in the form of demos regularly.  If they're completely disengaged and not participating, you're going to need a heart-to-heart.  If they are flailing because your team has given them timelines, and they just don't like what you had to say, then you have a trust issue between your stakeholders and your team, and again, you need to have a heart-to-heart.  If Agile is new to your organization, you need to set expectations accordingly, and make sure everyone understands how the game is played.  Remember, though, out of crisis can arise opportunity: if you can hold your team together and execute cleanly, you can demonstrate how Agile can provide the predictability and response to change that has your reporting chain in an uproar.

3. Your team has more than 6-8 developers.  Agile performs best with small teams who can meet regularly and communicate easily.  If your product is really a product of products, consider breaking teams apart along product boundaries.  You may be tempted to break teams apart across some other boundaries, but try not to do that.  Remember, you're trying to satisfy a product owner; the key word is "product."  How can you do end-to-end iterative development if you've aligned "vertically" instead of "horizontally."  Give it some thought, because there's a decent amount of overhead in bootstrapping a new team.  Reflect on the values of Agile, and form your teams in such a way that you can satisfy the various roles most effectively and play the game by the rules.  

2. Chickens are being pigs.  Pigs are being chickens.  Or to put it another way, someone is regularly stepping outside the boundaries of their role in your meetings.  This could be a product owner trying to change the rules of your Scrum, a Scrum Master trying to influence the implementation of your code, a developer trying to inject a feature into the product that wasn't requested, etc.  It can also take the form of "Some Guy" trying to do anything at all in your meetings.  When this happens it is the responsibility of the Scrum Master to call it out, explain why it isn't allowed, and remind everyone what their roles are. If you're playing soccer out on a pitch, you can't have a handful of players decide they want to play rugby, nor can you have your goalie decide he wants to play forward.  For the game to work, the players must first agree to the rules, and then play by those rules.

1. The rules of your game keep changing, and nobody asked your team.  Someone from outside the team is issuing edicts to the team and expecting them to be obeyed.  Agile teams should be self-organizing, with the aid of the Scrum Master as a servant-leader.  It's a negotiation within the team to set things like meeting times, iteration length, and consulting with the Product Owner, to set things like the number of iterations to a release, release dates, demo agendas, etc.  If someone from outside the team is issuing directives (like "Some Guy"), they are cheating.  This, sadly, is an organizational problem and often an indicator of a lack of trust of or lack of respect for the team.  It's a tough nut to crack to figure out why this lack of trust or lack of respect persists, and even more difficult to remedy the problem.

I seriously hope I'm done this time, and that anything else I think of will be a straightforward repetition of one of my previous articles:



Yet Another 5 Signs You Are Not Agile

Boy, these just never get old, do they?  Given the span of time it's started taking me to collect another 5 signs, though, I'm hopeful that I'm running out of material.  Or maybe I'm just being optimistic.

5. Your iteration runs Monday through Friday.  Everyone checks in on Friday.  Test is always one week behind.  Acceptance is always delayed. I think I've seen this particular behavior in every Agile project to which I've ever contributed, and the cause is very simple: your stories (and probably your tasks) are too big.  If you're working on the same story through the entire iteration, either you're making no progress day-to-day, or your story was so big it took the entire week to accomplish.  The former problem can be addressed in daily stand-ups by asking what tasks each person completed on the previous day.  If the answer is "nothing" then you have a blockage or some other problem that needs to be addressed immediately.  The latter problem means you need to spend more time in story decomposition and task breakdown.

4. Something changes. Chaos ensues trying to hide the change.  Agile, when properly applied, excels at coping with change.  If you've kept your iterations short, decomposed your stories, broken down your tasks, considered your velocity in planning, etc., you can estimate the impact of whatever changed and adjust accordingly.  We've all had the key team member falling ill, business priorities changing, last-minute items coming from security auditors, etc.  These things will impact your velocity. And that's okay!  What you don't want to do is to try to hide the impact of the change, treating your velocity like it's a stick with which to beat people, pulling out all the stops to try to "meet your numbers" for the iteration.  That's not what velocity is for; rather, the impact of many small changes over time will reduce your average velocity slightly, and this will ultimately aid you in planning.

3. Rather than decomposing stories, your team lengthens its iterations.  Key to being Agile is having frequent opportunities for adjustment and collecting frequent data points about team velocity. Consider this: if you need four data points to start to get a good feel for your team velocity, with one-week iterations, that takes you a month.  With two-week iterations, it takes two months.  With three-week iterations, it takes three months just to figure out what your velocity is.  It's like trying to learn to pilot a jet ski, fishing boat, or cruise ship.  Moreover, when adjustments are needed, you're trying to steer a jet ski, fishing boat, or cruise ship.  Granted, a jet ski isn't the appropriate watercraft for every situation, but if you're planning to change to a fishing boat or a cruise ship just because you're unwilling to leave some of your gear on the shore between trips, you're not making a change for a good reason.

2. Related to #3, you survive many planning meetings without ever decomposing a single user story. Maybe it's possible that you entered all your planning meetings with all your user stories perfectly decomposed into chunks that can be delivered end-to-end inside of one iteration, but it's not particularly likely to happen iteration after iteration (unless your product owner is exceedingly well-trained).  Odds are your stories are going in too big, and you're about to experience #5 from this list.  You need to have a discussion about the right size for a user story, the difference between user stories and tasks, and how to write user stories correctly.  You may need to allow extra time for planning for several iterations until the right way becomes a healthy habit, and the wrong way starts to feel weird.

1. Your team throws out Agile practices willy-nilly whenever they become more than mildly inconvenient, yet still expects to reap the benefits of Agile.  I'm the last person to say that teams must absolutely adhere to Agile dogma (though one could get that impression based on my blog postings), but when you're making tweaks and changes to your process, you really must think carefully about what you're changing, why you're changing it, and what the potential unintended (and direct) consequences of your change may be.  If you choose three-week iterations, it's going to take months to establish a velocity.  If you don't track or use your velocity in planning, your estimates aren't going to be very useful.  Be very cautious here: it's easy to fall into the trap of the "compressed waterfall."  And don't complain that Agile failed when you've thrown the baby out with the bathwater, adopting a process that looks nothing like Agile (but still calling it "Agile").

Bonus: Nobody from the team ever acts as a dissenting voice in your commits, retrospectives, or estimation exercises.  Nothing says "just get me out of this meeting" than everyone giving a "five" when you call for a "fist of five" to commit for the iteration.  Similarly, nothing says "I don't really care about the size of this story" than having a handful of people giving a different estimate, yet automatically giving in to the more-common estimate without any discussion.  Your team is getting burned out, they don't understand the process, or they have no incentive to care.  Take the time to really understand the underlying problem here, rather than just having a knee-jerk reaction to it.  You may find valuable insight.

Like most things in life, what you get out of Agile is directly proportional to what you put into it.  Know your Agile practices, but also know why the practices are what they are, and understand the consequences of abandoning, changing, or tweaking the practices in your organization.  Make sure these consequences are clearly communicated to your stakeholders, too!

Travelocity, Why Are You Terrible?

Quite often when I say I will never do business with a particular company ever again, it's because they have engaged in some socially egregious behaviour rather than directly affecting me personally.  This is not the case with Travelocity.  I hope I never have to deal with them again, because they are awful.  Calling them "incompetent" would be an insult to the incompetent.

My partner and I spent hours planning out and booking a fairly complicated itinerary for what is effectively a honeymoon in Asia for us.  It is a plan that involves stays in three cities, and transport using three different airlines.  We booked the majority of this via Travelocity's site.  It went reasonably well once we had figured out our flight choices (which we did using Google Flight Search, not Travelocity).

But he decided that he'd like to spend one more week with his family before coming home.  I had to return to work.  So he wanted to change his ticket home to leave one week into the future.  My ticket would be unchanged.

The way Travelocity handled this, you would think we were the first couple in the history of powered flight to ever attempt such a change.

First the outsourced Travelocity customer service rep changed MY ticket instead of his.  This happened even after my partner told him at least THREE times that my ticket was to be left alone, and only his should be changed.  Then the rep wanted another $40 fee to change it back, for his own mistake!  To their credit, this fee was waived.  But this behaviour demonstrates a fundamental lack of comprehension on the part of the outsourced service drones.

After fixing the booking and performing the requested change, we found that all of the legs of my flights were now no longer visible on Travelocity.com.  The total price was still correct, and the number of passengers was correct, but none of my details were present.

He called back.  This time you would think we were trying to explain to someone how to assemble a nuclear reactor over the phone.  The customer service rep assured us that all would be well, and we ended the call.  Of course, all was not well.  The rep didn't actually DO anything.  My flight legs were still not visible on the site, and the confirmation email we got also contained only his flights.

So I called back, and this time I was informed by the outsourced customer service drone that this is a natural consequence of "splitting" the ticket so that one could be changed and not the other.  Yes, because booking airfare and changing tickets is a completely new thing in 2013, referential integrity is completely lost if you change anything.  The rep assured me that I could still get to my itinerary by visiting each airline's website individually and using different reservation codes with each site.  She also promised to send me emails containing all the reservation details.

Of course, that makes using Travelocity sort of pointless, doesn't it?

About 4 hours later, the emails finally arrived, and I discovered that at the very bottom of the emails was a code that could be used to pull up all the flight legs.  This is denoted by the string "***".  Because when you see "***" you think "oh here is a single reservation code I could use on either airline's web site to see all the legs of my flight.  It's obvious, you see.

But of course, Travelocity's own web site, which I used to book these tickets, lacks this capability. Because, again, nobody has ever changed a ticket before, ever, so that's not really a feature anyone ever thought might be nice to have.

Travelocity, why are you terrible?  Are you trying to be terrible deliberately?
Enhanced by Zemanta

Five More Signs You Are Not Agile

In a previous blog posting I wrote about my top 5 signs that an organization is not really agile.  That posting was more about the mechanics of Agile.  In this posting, I present 5 signs that are more social or cultural indicators that agile either is not working in your organization, or is doomed to fail.  Or flail. 

Honorable mention, because I thought of this after I already finished the top 5: the expectation of heroic effort to meet unreasonable expectations.  Sometimes this is coming from Some Guy (see #5, next) or it is indicative of a disengagement in your planning or demo process between your development team and your product owner; perhaps your product owner is expecting more than is reasonable to expect based on your velocity, or there's an expectation that change can happen without any consequences.  We embrace change, and we can try to predict how much change is going to affect the schedule, quickly and with reasonable certainty, but we can't ever predict that the cost is 0.  Also see item #2 in my previous post.

5. There is someone in your meetings who can't tell you why he or she is there (or what Agile role he or she is playing).  For a proper game of Agile Development, the roles need to be clearly defined, just as in any well-organized game.  Games of American Football are not played with a Center, Quarterback, Guards, Tackles, Running Backs, Wide Receivers, Tight End, Linebackers, Referees, Coaches, and Some Guy.  There's no place for "Some Guy" in this process.  Similarly, in Agile, there are well-defined roles, and no role called "Some Guy who shows up to your meetings and injects noise."  And that's what happens when someone who is not properly involved becomes involved: noise in your signal.  It's a distraction and an interference with your team's ability to self-organize to meet its goals.  If Some Guy is trying to control, manipulate, or micromanage your process from the outside, it could be a sign of serious trust issues that I can't help you with.  (Though you might want to skip to my last paragraph below.)

4. Your team frequently demonstrates misconceptions about Agile.  Part of implementing Agile correctly is that the players have to understand their roles, and they also need to understand something about the game they're playing.  I could do an entire blog posting on Agile Misconceptions, but for the sake of brevity I'll cite just a few common ones here: the idea that "agile" is synonymous with "fast", the idea that planning is minimal to nonexistent, and the idea that somehow Agile has no or low overhead.  The remedy to this is education: you must consistently reinforce with people the reasons for being agile: delivering the right product with decent estimates and responding well to change.

3. You are a slave to your tools.  Usually this means being a slave to Rally or Pivotal or Jira or even Excel or your favorite Gantt chart nightmare.  You spend a lot of time in your planning sessions fighting with or updating these tools.  You find yourself changing your own processes that work well for your team so that they fit into the model provided by the tool, or you spend a lot of time faking-out the tool in weird ways to make your processes work.  In the worst case, someone from #5 above sees pretty charts in your tool and starts using the tool as a mechanism of bean counting or performance tracking.  If you must use a tool, at least kick it out of your meetings.  Sticky notes if everyone is present, or just your favorite text editor shared in an online session if you have remote people will speed your process along, and then you can go update your tool later when you don't have a room full of expensive engineers sitting around wasting their time.

2. You are not doing Retrospectives. Or worse, you are doing retrospectives, and the team is offering few, if any "starts, stops and keeps."  (Worse because their time is expensive, and it's a waste of money to have them sitting around doing nothing.)  This is a sign that your team is not engaged or invested in the process.  Take a team out to lunch, and they can think of 20 different ways the restaurant could have improved their performance.  It's hard to believe, then, that after a week of meetings and development time, they can't think of anything they'd change.  To fix this, you have to figure out why the team is disengaged.  Is it something you said?  Are they afraid to speak up?  Do they think their previous retrospective items have been ignored or gone unimplemented?  Are there other problems going on within your organization that has impacted your team's morale?  This is by no means an exhaustive list of questions, but rather a place to start.  Lack of engagement will kill your process, so figure it out and fix it.

1. You are not doing demos.  Your product owner should want to see progress.  Your team should want to show off their achievements.  Moreover, demos offer a unique opportunity to "force" any integration work that might otherwise be put off until it's far too late.  (Integration should be happening more frequently than your demos as well, but nothing forces the issue like having to give a demo to your product owner.)  A lack of demos can have many causes.  You may have a disengaged team (see #2).  You may not be truly performing iterative development.  You may be putting off integration work that you shouldn't be putting off.  You may have planning problems with stories that are badly written or far too big or you may have no proper product owner (see #5 in my previous post).  Whatever the case may be, if you're not doing demos, find out why, quick!  A failure to demonstrate work achieved on a regular basis (ideally every iteration) is a tremendous red flag that something has gone wildly off the rails.  If you can change or fix only one thing, fix this one, because this will tend to expose and correct many other, smaller problems under the hood.

Remember: Changing code is easy.  Changing behavior is hard.  Changing culture is almost impossible.  If you're having these problems, the fix may not be trivial.  It might be very difficult.  Start with education and get the support of your management chain if the onus is on you to correct the problem.  (And if you're a self-organizing team, the onus IS on you.)

And I hate to say it, but I would be remiss if I did not mention Martin Fowler's well-known quote: "if you can't change your organization, change your organization."  In other words, if nothing you can ever say or do will make your organization "get well", and you want to be someone involved in an agile development process, you might be better off making a career adjustment.  I also take this statement in a way that Fowler probably didn't intend: if you have people in your organization who are holding it back and sabotaging your agile development efforts, maybe you need to have their careers adjusted so that you can get on with the business at hand.

The "Privilege" Fallacy

Let me be very clear: I have never disagreed that it is a truth that some people in western culture have it easier than others.  To deny that heterosexual, Christian white males in the United States have the deck stacked in their favor would be to deny reality. Though things have gotten better in the last few decades, and we're on a trajectory toward a more inclusive culture overall, we still have much progress to be made.  I don't deny any of these things, and in fact I am one of the people working to even the odds a little.  Being a gay atheist has also given me some insight into what it means to have to work a little harder because you're not automatically included.

Having said all of that, I have seen and experienced firsthand a disturbing trend among the more sanctimonious hipster crowd in which this unfortunate facet of current social norms is not used as a reminder that we must do more to level the playing field, but rather as a mechanism to attack others, insist upon their guilt, and, perhaps most egregiously of all, to exclude them from the conversation entirely.  If you should dare to attempt to express an opinion, play the devil's advocate, or otherwise question the conventional wisdom or party line, you can almost set your watch by how long it'll take before someone hisses "privilege" at you, should you happen to be any combination of white, male, heterosexual, or Christian (in the US at least).

The implication, of course, is that you cannot possibly have anything to add to the discussion, nor is it even possible that you might have a point of value to make or something thought-provoking to say, because, simply by virtue of your genitalia, you are such a sociopathic monster that you are utterly without empathy for anyone else and incapable of seeing the world from another person's perspective.  By any standard, this is an ad-hominem, a logical fallacy of the most trivial order.  Your idea cannot be discussed, because you are de-facto "bad" and probably not even a human being.  You belong to the "privileged" few, like it or not, and so nothing you could possibly say (unless it toes the party line) could possibly have any value whatsoever.

You could be volunteering in women's shelters, fighting AIDS in South Africa, feeding the homeless in soup kitchens, voting for candidates who support equality for everyone, contributing money to help open schools for underprivileged girls in far corners of the world while yodeling and drinking a glass of water at the same time, and none of that would count for anything.  You're "privileged" so you're not invited to the discussion (if you could call it a discussion; most real discussions have more than one side or at least consider multiple points of view) and nothing you could ever say could possibly be considered for any reason. You are a monster by birth.

"Privilege" isn't even the right word to use, but those who spit it at others enjoy it for the negative connotations it has.  The word conjures up ideas of being fed with a silver spoon, having everything handed to you with little or nothing asked in return, living a life of luxury and ease.  And while it's true to say that heterosexual white male Christians may be playing the game of life on the easiest setting, it's not correct to say that anyone growing up in middle-class or poor America is really living a life of "privilege."  It's not easy, even on the easiest setting.  Sure, it's definitely harder if you're, say, a dark-skinned woman, but that doesn't mean it's trivial otherwise.

Of course people puking out this sort of attack know that, but it wouldn't be as much fun to say something like, "I question whether you're able to empathize about this situation due to your background."  Why?  Because that's an affirmative statement that someone can actually be asked to defend.  Oh really?  Why do you think I'm incapable of experiencing empathy, exactly?  Conversely, the projectile-diarrhea of "privilege" just means that you have light skin or male genitals, to which the accused (presumed guilty) can't really mount a defense.  That is, of course, the intention.  It's a cheat, not a thoughtful response.

(Using a word like "privilege" in this manner also allows its users to avoid being accused of any impropriety themselves.  Imagine if, instead of shouting "privilege," someone shouted, "you're only saying that because you're white" or "you just think that because you're a man."  When you put it like that, it sounds like the mean-spirited, shallow, sweeping generalization that it is.)

It is among the most intellectually lazy of fallacious arguments, but more egregiously, it is used to prevent a conversation from happening, rather than having a conversation with someone who has already expressed an interest in hearing different points of view.  It is saying, "I cannot provide (or can't be bothered to provide) an intelligent or reasonable response to what you've said, therefore let's just assume you're horrible and you lose."

In effect, this blanket, targeted exclusion of certain people from even being invited to participate in the discourse demonstrates the very height of hypocrisy in many cases: in response to people being treated unfairly because of their gender, skin color, sexual orientation, etc., those who assert "privilege" as a mechanism of exclusion are, in fact, targeting others to treat them unfairly based solely on their gender, skin color, sexual orientation, etc.

Now some might argue that this is only fair; after all, isn't it just a question of the tables being turned?  That depends on whether you want a discussion or you want revenge.  It depends on whether adding more wrongs on the pile make the world more fair.

So if you want revenge, and if you think the best way to resolve egregious behavior by others is more egregious behavior on your part, then by all means, accuse someone of having "privilege". It'll make you feel better, because launching personal attacks on others always does.

But if you don't want to be intellectually lazy, avoiding questions to which you may not have good answers, perhaps you could try listening to what the other person has to say, evaluating what they've said to see if perhaps they could have something worth thinking about to say, and providing an adequate and thoughtful response.  Yes, even if you feel like you've said it a thousand times before, maybe you haven't said it to this person before, or maybe there's a better way to say it. This is how civil discourse is supposed to work.  It's supposed to be civil (maybe even free from personal attacks) and involving discourse (multiple people being permitted to contribute to the discussion).

Yes, it's more effort to explain to another person your different perspective and to elucidate how your own experiences led you to have a different point of view.  Sorry.

Think about it carefully: when you already have someone's ear, and they're receptive to communicating with you on a topic, why would you deliberately shut them out with such a thoughtless accusation?  You have a teaching and a learning opportunity when you engage with another person.  Don't throw it away out of spite.


Postscript: Let's open a betting pool on how long it'll be before someone says, "well this is exactly what I would expect a privileged person to say!"  (rather than actually offer any counterpoint to anything I've said, of course).
Enhanced by Zemanta

Top 5 Signs You Are Not Agile

Many organizations believe they are applying an agile methodology.  Many organizations give lip service to supporting agile development.  But despite this, what actually happens is anything but agile.  Here are some of the major signs I've seen over the years that indicate what is happening in your organization is not agile at all.

5. You have no Product Owner.  Having a Product Owner as an integrated member of the team is critical to success.  Your Product Owner is the person telling you what's the most important thing to work on next, and he or she is also the person providing you feedback during your iteration and during your demos about whether what was produced was what was envisioned.  You cannot alter course if necessary without someone navigating, and that navigator is your Product Owner.  Similarly, if you only saw your Product Owner once, at the beginning of your release cycle, and he or she just dropped off a giant pile of requirements, you are doing Waterfall, not Agile.


4. You have exactly one Release on the horizon.  Nothing is a bigger indicator that you're in a waterfall model, even if it has short bursts of activity, than having a plan for only one Release.  If you haven't built anything until you've built everything, this is a clear sign that your user stories haven't been written properly.  Ideally, you should be delivering some kind of functioning system with every release, even if the functionality is very limited in the first release, but becomes more elaborate with every release that follows.


3. Your release date is fixed.  Your features are also fixed. Everything is the top priority.  If you're in this situation, then your planning meetings have no room for negotiation.  Your velocity, if you know it, doesn't matter, because you have to do everything, and you have to do it by a certain date.  There is no ability to prioritize.  This kind of behavior is often indicative of trust issues coming from the Product Owner or his or her superiors.  They do not trust the development team to work hard, plan judiciously, and to deliver incremental value with each Release, or they do not want to be involved enough to evaluate multiple incremental releases.  (See #3.)


2. Someone tells you your velocity, you don't know your velocity, you don't use your velocity in planning, you measure velocity in "real hours," or you use velocity to measure performance.  These are typically symptoms of a team or of a management hierarchy that does not understand the concept of velocity.  Velocity is a measure of team throughput, and it lacks a necessary 1:1 correlation with time.  This is because the team will take into account mitigating factors like risk and uncertainty in addition to effort in producing an estimated size for a story or task.  Because we work as a team, not as individuals, we don't adjust up or down expected velocity by a percentage as team members take vacation, are out sick, or leave or join the team; rather, we wait for the end of the iteration and look at what the velocity actually did, expecting it to average out over time.  It's a measure of reality, not a measure of expectation, and we use that measure to estimate how much we think we can take on for the next iteration.


1. Your iterations are more than two weeks long or you have many tasks with very large estimates.  Fundamental to Agile development is the ability to react to changes and to react to unexpected disparities between estimated effort and actual effort, and these reactions typically will take place on iteration boundaries when planning occurs.  If you only have an opportunity to make a course correction every 3 or 4 weeks, your agility is severely hampered.  Moreover, if you're trying to establish an average velocity-- something that normally requires at least 3-4 data points to establish a trend-- the longer your iterations, the longer it takes in real time before you can begin to make meaningful estimates.  If your iterations are one week long, that means three weeks for a velocity trend.  If your iterations are three weeks long, that means nine weeks before you know your velocity... and many entire releases won't be that long.  (See #3.)


So what do you do if you live in a world of one or more of these signs?  (After you're done wallowing in despair, I mean.)

First you need to have a serious discussion with your management chain.  Feel free to bring this document as a visual aid. I've come to believe that a team can only be successful implementing agile if both the team and its management chain really buy into the idea and support it.  It's easy to give lip service to Agile methodology, but unless actual behavior changes on an organizational level, success will be limited.  You need organizational support.  You also need your own team to self-organize and support the process.  Without it, people will consciously and unconsciously sabotage your efforts.

Secondly, and I suspect somewhat more commonly, you can try to fake out the system.  This is more common when the team wants to implement an Agile methodology, but management doesn't really believe in it.  You can start working on your release early, before a release date is committed, and try to strongly suggest a likely release date based on the velocity that you've already figured out, before the organization itself has actually committed to doing any work.  You can create your own internal releases that will never see the light of day, but which give your team milestones closer to the horizon to to work toward.  You can have someone playing "proxy" Product Owner if necessary.

The problem with the second approach is that ultimately your team, your Scrum Master, your proxy Product Owner, and your management chain will feel dissatisfied with the approach.  Your team feels that its input isn't valued and that it isn't trusted by management.  Your Scrum Master feels he or she isn't supported by management or appreciated by the team.  Your Product Owner, lacking any real direction, feels accountable for the outcome without really having any true direction in making decisions.  And your Management chain gets the impression that Agile is one gigantic uncoordinated mess, but of course, what you did wasn't really particularly Agile in the first place.

Good luck.

---Nick