Quite often when multiple components of a system are combined, one can experience unexpected side-effects and unexpected failures, the cause of which seems completely unrelated to the consequence.  I recently experienced a situation in which a bit of configuration to jDeb (a Maven plug-in that assists in the creation of debian .deb packages) caused an application to stall out completely after about 4 minutes of run-time.

The Backstory

Our app is a Java program that runs from a .jar file, and we run it in production wrapped inside YAJSW (Yet Another Java Service Wrapper).  YAJSW allows us to do a few nice things like run as a daemon and run as a different, non-root user, along with some other nice features.  We log using sjf4j with log4j, and YAJSW also intercepts System.out and System.err to log those messages to its own log file.  The whole thing gets wrapped up in a .deb package using jDeb, which lets us install smoothly, installing the daemon and updating the rc.d files.

The Failure

After installation, our program started up fine and began doing its customary work.  Success! Almost!  After about four minutes, it stopped working, no longer making any of the API calls we expected, and generally behaving in an unresponsive manner.

The first thing I did was run jstack against the running Java process to see what all the threads were doing.  Almost all the threads in the application were in a BLOCKED state on org.apache.log4j.Category.callAppenders. A quick Google search revealed a lot of complaints about similar behavior, even finding one comment that described exactly what we were seeing: one thread seemed to be stuck in java.io.FileOutputStream.writeBytes forever, and all the other threads waiting for a lock so they could write to the appender too.

So what could be causing a thread to block forever trying to perform a simple write? Cursory checks to see if there were any issues with excessive garbage collection, or impending disk failures revealed no problems.

Things seemed to run okay running the jar file from the command line, as our effective user, by just using sudo -u and java -jar, so it seemed in some way related to running our application from inside YAJSW.

A little more Google searching found a YAJSW thread which mentioned the need of creating several memory-mapped files to which the wrapped process would write instead of writing to stdout and stderr; the wrapping process then 'gobbles' from these memory-mapped files for the purposes of logging.

Aha! Another thing I had noticed was that we didn't seem to get any console messages from our app in the YAJSW log like we would have expected, and a quick look revealed that in fact, no memory-mapped files for stdout and stderr were created in tmp/ for the wrapped app to write to.

So at this point it was starting to look like log4j was trying to write to its redirected System.err or System.out (for the console appender), and once some buffer filled, further writes blocked, while holding the appender lock, preventing any other threads from making any further progress, blocked trying to log.

Closer examination of the permissions on the tmp/ directory revealed that dpkg had created it with mode 600 instead of mode 700, and creating a new file is not possible on a Posix system without execute permission on the directory. Because of inadequate permissions, the memory-mapped files for System.out and System.err redirection couldn't be created, a failure which YAJSW couldn't log, because the files were needed to set up logging.

The culprit turned out to be my own misunderstanding of how jDeb's permissions work when applying a template data type in a dataSet. The perm mapper allows you to specify a user, group, filemode and dirmode for use when creating directories and files, but when using the template data type, the mode that gets used for creating the paths is not the dirmode, but the filemode. Consequently, the filemode needed to be 700 rather than 600 for the mapper for the template paths.


The chain of failure ended up looking like this:

  1. I misunderstood how jDeb/Debian package permission mappings worked when creating new, empty directories in a .deb package. Intuitively it seemed like dirmode permissions would be used when creating an empty directory, but in fact filemode permissions get used.
  2. This caused dpkg to create the new tmp/ directory with mode 600, excluding execute permission on the directory.
  3. Because of the exclusion of the execute bit, the YAJSW wrapper could not create the memory-mapped files it needed for writing using its overridden System.out and System.err.
  4. Because the files didn't get created, the YAJSW wrapper process could not consume the output of System.out and System.err, and the wrapped process had nowhere to write once whatever internal buffering existed was full.
  5. Because the buffer filled up and writes blocked, one log4j appender blocked waiting to write while inside a synchronized block.
  6. Because the write inside the synchronized block was blocked forever, all of the other threads that wanted to log anything then blocked waiting to enter the synchronized block.
  7. The whole process stopped working because eventually everything was waiting on a lock or waiting for IO to complete.
And that is how a permission mistake in a configuration file can lead to an application locking up after about 4 minutes (the time it took to fill up some buffer somewhere).

PostgreSQL + ActiveMQ = (auto)Vacuum Hell

This is one of those "I had this weird problem that others are likely to encounter as well, so here's a detailed explanation that will hopefully come up in a search engine for those people when they are trying to figure it out" posts.

As most PostgreSQL administrators know, vacuuming is a necessity for the long-term survival of your database.  This is because the PostgreSQL Multi-Version Concurrency Control strategy replaces changed/deleted rows with their new versions, but keeps the old tuples around for as long as some other existing transaction might need to see them.  Eventually these old rows are no longer visible to any transaction, and they can be cleaned up for re-use or discarded completely once an entire page is empty.

This works out really well, provided you have fairly straightforward database access going on: transactions start, stuff happens, transactions complete; repeat.  Things can go radically wrong if the "transactions complete" part doesn't happen, however.

Because PostgreSQL has no idea about what any given transaction might do next, it is not able to vacuum out any dead rows newer than the oldest transaction still in progress.  If you think about it, that makes sense.  Any given transaction could look at an older version of any row that was around at the time that transaction began, in any table in the database.

The symptom of this is that vacuum tells you about a large number of dead tuples, but says that it removed 0 / none.  Doing a "vacuum verbose" tells you there was a large number of "dead row versions" which "cannot be removed yet".  If you turn on autovacuum logging, you'll see messages like this:

Feb 27 02:57:41 slurm postgres[12345]: [3-1] LOG:  automatic vacuum of table "activemq.public.activemq_msgs": index scans: 0
Feb 27 02:57:41 slurm postgres[12345]: [3-2] #011pages: 0 removed, 14002 remain
Feb 27 02:57:41 slurm postgres[12345]: [3-3] #011tuples: 0 removed, 58897 remain

The number that remain will continue increasing forever, and the number removed is always 0.  Like me, you might naively check for transactions running against activemq_msgs, and find none, or find only ones which are short-lived.  And that would be you mistake.  While autovacuum runs per-table, running transactions are per-database.  You may well have a long-running transaction running statements against some other table preventing rows from being removed from the table you're watching.  Again, this is because PostgreSQL cannot predict the future; that long-running transaction might run a query against the table you're watching two seconds from now, and as long as that could happen, those old tuples cannot be removed.

How does this relate to ActiveMQ, you ask?  If you're running ActiveMQ and using PostgreSQL as your backing/persistent store (and you may well have reasons to do this), and you don't do anything to change it, the default failover locking strategy is for the master to acquire a JDBC lock at startup, and hold onto it forever.  This translates into a transaction that starts when ActiveMQ starts, and never completes until ActiveMQ exits.  You can see this in progress from the command line:

activemq=# select xact_start, query from pg_stat_activity where xact_start is not null and datname='activemq';
          xact_start           |                                                query                                                
 2014-02-27 01:24:13.677693+00 | UPDATE ACTIVEMQ_LOCK SET TIME = $1 WHERE ID = 1

If you look at the xact_start timestamp, you'll see that this query has been running since ActiveMQ started.  You can also see the locks it creates:

activemq=# select n_live_tup, n_dead_tup, relname, relid from pg_stat_user_tables order by n_dead_tup desc;
 n_live_tup | n_dead_tup |    relname    | relid 
        628 |      58903 | activemq_msgs | 16387
          4 |          0 | activemq_acks | 16398
          1 |          0 | activemq_lock | 16419
activemq=# select locktype, mode from pg_locks where relation = 16419;
 locktype |       mode       
 relation | RowShareLock
 relation | RowExclusiveLock

Again, as long as this transaction is running holding the ActiveMQ lock, (auto)vacuum cannot reclaim any dead tuples for this entire database.

Fortunately, ActiveMQ has a workable solution for this problem in version 5.7 and later in the form of the Lease Database Locker.  Instead of starting a transaction and blocking forever, instead the master will create a short-lived transaction and lock long enough to try to get a leased lock, which it will periodically renew (with timing that you specify in the configuration; see the ActiveMQ documentation for an example).  So long as the lock keeps renewing, the slave won't try to take over.  Your failover time, then, depends on the duration of the lease; it won't be nearly-instantaneous as it would in the case of a lock held when ActiveMQ exits (though it could be faster than a transaction ending after a socket times out due to an unclean exit).

Because the locking transactions come and go, rather than persisting forever, the autovacuum process is able to reap your dead tuples.

So the moral of the story is this: if you're using PostgreSQL as the persistent store for ActiveMQ, make sure you configure the Lease Database Locker in your persistenceAdapter configuration.  Otherwise, PostgreSQL will never be able to vacuum out old tuples and you may suffer performance degradation and a database that bloats in size forever (or until you stop ActiveMQ, run a vacuum, and restart it).

Five More Bloody Signs You Aren't Bloody Agile

This is getting old, isn't it?  I still think I might try to turn this into a book of some kind, though, so we press on.

5. You can't touch some code because someone "owns" it.  Or there's some piece of code that only one person can work on.  Sometimes this can happen because what a unit of code does is so complicated and so specialized that there's only one person on your team capable of understanding it.  But most of the time what is happening is some combination of a person's ego becoming involved, a lack of sufficient unit testing, and insufficient cross-pollination of code modules among the members of the team.  Consider this situation very carefully, because it means your code base now has a single point of potential failure.  If only one person understands it, what happens if that person gets sick or wants to take a vacation?  If someone's ego gets involved, it's almost always to the detriment of the rest of the team. Or are people just afraid to touch it because something might break?  That's the easiest case to deal with: keep adding unit tests until people feel comfortable with that safety net.  Pair Programming can help with the cross-pollination, as can inflating an estimate, biting the bullet, and giving the work to anyone BUT the person who "owns" that code.  Collective code ownership may be a little painful and a little slow in the short term, but in the long run, you'll be glad you did it.

4. Your team lives from one crisis to the next crisis.  Management is freaking out regularly.  Flailing, weeping, wailing, and gnashing of teeth.  What could this possibly have to do with Agile, you ask? Quite a lot: it's an indication that your Agile process has completely broken down or your stakeholders are not playing the game.  If Agile were working properly, why would Management need to freak out?  They'd have a predictable set of deliverables for each iteration and each release.  They'd have a good idea of the team's velocity.  They'd have a decent idea of when things would be "done".  They'd be seeing progress in the form of demos regularly.  If they're completely disengaged and not participating, you're going to need a heart-to-heart.  If they are flailing because your team has given them timelines, and they just don't like what you had to say, then you have a trust issue between your stakeholders and your team, and again, you need to have a heart-to-heart.  If Agile is new to your organization, you need to set expectations accordingly, and make sure everyone understands how the game is played.  Remember, though, out of crisis can arise opportunity: if you can hold your team together and execute cleanly, you can demonstrate how Agile can provide the predictability and response to change that has your reporting chain in an uproar.

3. Your team has more than 6-8 developers.  Agile performs best with small teams who can meet regularly and communicate easily.  If your product is really a product of products, consider breaking teams apart along product boundaries.  You may be tempted to break teams apart across some other boundaries, but try not to do that.  Remember, you're trying to satisfy a product owner; the key word is "product."  How can you do end-to-end iterative development if you've aligned "vertically" instead of "horizontally."  Give it some thought, because there's a decent amount of overhead in bootstrapping a new team.  Reflect on the values of Agile, and form your teams in such a way that you can satisfy the various roles most effectively and play the game by the rules.  

2. Chickens are being pigs.  Pigs are being chickens.  Or to put it another way, someone is regularly stepping outside the boundaries of their role in your meetings.  This could be a product owner trying to change the rules of your Scrum, a Scrum Master trying to influence the implementation of your code, a developer trying to inject a feature into the product that wasn't requested, etc.  It can also take the form of "Some Guy" trying to do anything at all in your meetings.  When this happens it is the responsibility of the Scrum Master to call it out, explain why it isn't allowed, and remind everyone what their roles are. If you're playing soccer out on a pitch, you can't have a handful of players decide they want to play rugby, nor can you have your goalie decide he wants to play forward.  For the game to work, the players must first agree to the rules, and then play by those rules.

1. The rules of your game keep changing, and nobody asked your team.  Someone from outside the team is issuing edicts to the team and expecting them to be obeyed.  Agile teams should be self-organizing, with the aid of the Scrum Master as a servant-leader.  It's a negotiation within the team to set things like meeting times, iteration length, and consulting with the Product Owner, to set things like the number of iterations to a release, release dates, demo agendas, etc.  If someone from outside the team is issuing directives (like "Some Guy"), they are cheating.  This, sadly, is an organizational problem and often an indicator of a lack of trust of or lack of respect for the team.  It's a tough nut to crack to figure out why this lack of trust or lack of respect persists, and even more difficult to remedy the problem.

I seriously hope I'm done this time, and that anything else I think of will be a straightforward repetition of one of my previous articles:

Yet Another 5 Signs You Are Not Agile

Boy, these just never get old, do they?  Given the span of time it's started taking me to collect another 5 signs, though, I'm hopeful that I'm running out of material.  Or maybe I'm just being optimistic.

5. Your iteration runs Monday through Friday.  Everyone checks in on Friday.  Test is always one week behind.  Acceptance is always delayed. I think I've seen this particular behavior in every Agile project to which I've ever contributed, and the cause is very simple: your stories (and probably your tasks) are too big.  If you're working on the same story through the entire iteration, either you're making no progress day-to-day, or your story was so big it took the entire week to accomplish.  The former problem can be addressed in daily stand-ups by asking what tasks each person completed on the previous day.  If the answer is "nothing" then you have a blockage or some other problem that needs to be addressed immediately.  The latter problem means you need to spend more time in story decomposition and task breakdown.

4. Something changes. Chaos ensues trying to hide the change.  Agile, when properly applied, excels at coping with change.  If you've kept your iterations short, decomposed your stories, broken down your tasks, considered your velocity in planning, etc., you can estimate the impact of whatever changed and adjust accordingly.  We've all had the key team member falling ill, business priorities changing, last-minute items coming from security auditors, etc.  These things will impact your velocity. And that's okay!  What you don't want to do is to try to hide the impact of the change, treating your velocity like it's a stick with which to beat people, pulling out all the stops to try to "meet your numbers" for the iteration.  That's not what velocity is for; rather, the impact of many small changes over time will reduce your average velocity slightly, and this will ultimately aid you in planning.

3. Rather than decomposing stories, your team lengthens its iterations.  Key to being Agile is having frequent opportunities for adjustment and collecting frequent data points about team velocity. Consider this: if you need four data points to start to get a good feel for your team velocity, with one-week iterations, that takes you a month.  With two-week iterations, it takes two months.  With three-week iterations, it takes three months just to figure out what your velocity is.  It's like trying to learn to pilot a jet ski, fishing boat, or cruise ship.  Moreover, when adjustments are needed, you're trying to steer a jet ski, fishing boat, or cruise ship.  Granted, a jet ski isn't the appropriate watercraft for every situation, but if you're planning to change to a fishing boat or a cruise ship just because you're unwilling to leave some of your gear on the shore between trips, you're not making a change for a good reason.

2. Related to #3, you survive many planning meetings without ever decomposing a single user story. Maybe it's possible that you entered all your planning meetings with all your user stories perfectly decomposed into chunks that can be delivered end-to-end inside of one iteration, but it's not particularly likely to happen iteration after iteration (unless your product owner is exceedingly well-trained).  Odds are your stories are going in too big, and you're about to experience #5 from this list.  You need to have a discussion about the right size for a user story, the difference between user stories and tasks, and how to write user stories correctly.  You may need to allow extra time for planning for several iterations until the right way becomes a healthy habit, and the wrong way starts to feel weird.

1. Your team throws out Agile practices willy-nilly whenever they become more than mildly inconvenient, yet still expects to reap the benefits of Agile.  I'm the last person to say that teams must absolutely adhere to Agile dogma (though one could get that impression based on my blog postings), but when you're making tweaks and changes to your process, you really must think carefully about what you're changing, why you're changing it, and what the potential unintended (and direct) consequences of your change may be.  If you choose three-week iterations, it's going to take months to establish a velocity.  If you don't track or use your velocity in planning, your estimates aren't going to be very useful.  Be very cautious here: it's easy to fall into the trap of the "compressed waterfall."  And don't complain that Agile failed when you've thrown the baby out with the bathwater, adopting a process that looks nothing like Agile (but still calling it "Agile").

Bonus: Nobody from the team ever acts as a dissenting voice in your commits, retrospectives, or estimation exercises.  Nothing says "just get me out of this meeting" than everyone giving a "five" when you call for a "fist of five" to commit for the iteration.  Similarly, nothing says "I don't really care about the size of this story" than having a handful of people giving a different estimate, yet automatically giving in to the more-common estimate without any discussion.  Your team is getting burned out, they don't understand the process, or they have no incentive to care.  Take the time to really understand the underlying problem here, rather than just having a knee-jerk reaction to it.  You may find valuable insight.

Like most things in life, what you get out of Agile is directly proportional to what you put into it.  Know your Agile practices, but also know why the practices are what they are, and understand the consequences of abandoning, changing, or tweaking the practices in your organization.  Make sure these consequences are clearly communicated to your stakeholders, too!

Travelocity, Why Are You Terrible?

Quite often when I say I will never do business with a particular company ever again, it's because they have engaged in some socially egregious behaviour rather than directly affecting me personally.  This is not the case with Travelocity.  I hope I never have to deal with them again, because they are awful.  Calling them "incompetent" would be an insult to the incompetent.

My partner and I spent hours planning out and booking a fairly complicated itinerary for what is effectively a honeymoon in Asia for us.  It is a plan that involves stays in three cities, and transport using three different airlines.  We booked the majority of this via Travelocity's site.  It went reasonably well once we had figured out our flight choices (which we did using Google Flight Search, not Travelocity).

But he decided that he'd like to spend one more week with his family before coming home.  I had to return to work.  So he wanted to change his ticket home to leave one week into the future.  My ticket would be unchanged.

The way Travelocity handled this, you would think we were the first couple in the history of powered flight to ever attempt such a change.

First the outsourced Travelocity customer service rep changed MY ticket instead of his.  This happened even after my partner told him at least THREE times that my ticket was to be left alone, and only his should be changed.  Then the rep wanted another $40 fee to change it back, for his own mistake!  To their credit, this fee was waived.  But this behaviour demonstrates a fundamental lack of comprehension on the part of the outsourced service drones.

After fixing the booking and performing the requested change, we found that all of the legs of my flights were now no longer visible on Travelocity.com.  The total price was still correct, and the number of passengers was correct, but none of my details were present.

He called back.  This time you would think we were trying to explain to someone how to assemble a nuclear reactor over the phone.  The customer service rep assured us that all would be well, and we ended the call.  Of course, all was not well.  The rep didn't actually DO anything.  My flight legs were still not visible on the site, and the confirmation email we got also contained only his flights.

So I called back, and this time I was informed by the outsourced customer service drone that this is a natural consequence of "splitting" the ticket so that one could be changed and not the other.  Yes, because booking airfare and changing tickets is a completely new thing in 2013, referential integrity is completely lost if you change anything.  The rep assured me that I could still get to my itinerary by visiting each airline's website individually and using different reservation codes with each site.  She also promised to send me emails containing all the reservation details.

Of course, that makes using Travelocity sort of pointless, doesn't it?

About 4 hours later, the emails finally arrived, and I discovered that at the very bottom of the emails was a code that could be used to pull up all the flight legs.  This is denoted by the string "***".  Because when you see "***" you think "oh here is a single reservation code I could use on either airline's web site to see all the legs of my flight.  It's obvious, you see.

But of course, Travelocity's own web site, which I used to book these tickets, lacks this capability. Because, again, nobody has ever changed a ticket before, ever, so that's not really a feature anyone ever thought might be nice to have.

Travelocity, why are you terrible?  Are you trying to be terrible deliberately?
Enhanced by Zemanta

Five More Signs You Are Not Agile

In a previous blog posting I wrote about my top 5 signs that an organization is not really agile.  That posting was more about the mechanics of Agile.  In this posting, I present 5 signs that are more social or cultural indicators that agile either is not working in your organization, or is doomed to fail.  Or flail. 

Honorable mention, because I thought of this after I already finished the top 5: the expectation of heroic effort to meet unreasonable expectations.  Sometimes this is coming from Some Guy (see #5, next) or it is indicative of a disengagement in your planning or demo process between your development team and your product owner; perhaps your product owner is expecting more than is reasonable to expect based on your velocity, or there's an expectation that change can happen without any consequences.  We embrace change, and we can try to predict how much change is going to affect the schedule, quickly and with reasonable certainty, but we can't ever predict that the cost is 0.  Also see item #2 in my previous post.

5. There is someone in your meetings who can't tell you why he or she is there (or what Agile role he or she is playing).  For a proper game of Agile Development, the roles need to be clearly defined, just as in any well-organized game.  Games of American Football are not played with a Center, Quarterback, Guards, Tackles, Running Backs, Wide Receivers, Tight End, Linebackers, Referees, Coaches, and Some Guy.  There's no place for "Some Guy" in this process.  Similarly, in Agile, there are well-defined roles, and no role called "Some Guy who shows up to your meetings and injects noise."  And that's what happens when someone who is not properly involved becomes involved: noise in your signal.  It's a distraction and an interference with your team's ability to self-organize to meet its goals.  If Some Guy is trying to control, manipulate, or micromanage your process from the outside, it could be a sign of serious trust issues that I can't help you with.  (Though you might want to skip to my last paragraph below.)

4. Your team frequently demonstrates misconceptions about Agile.  Part of implementing Agile correctly is that the players have to understand their roles, and they also need to understand something about the game they're playing.  I could do an entire blog posting on Agile Misconceptions, but for the sake of brevity I'll cite just a few common ones here: the idea that "agile" is synonymous with "fast", the idea that planning is minimal to nonexistent, and the idea that somehow Agile has no or low overhead.  The remedy to this is education: you must consistently reinforce with people the reasons for being agile: delivering the right product with decent estimates and responding well to change.

3. You are a slave to your tools.  Usually this means being a slave to Rally or Pivotal or Jira or even Excel or your favorite Gantt chart nightmare.  You spend a lot of time in your planning sessions fighting with or updating these tools.  You find yourself changing your own processes that work well for your team so that they fit into the model provided by the tool, or you spend a lot of time faking-out the tool in weird ways to make your processes work.  In the worst case, someone from #5 above sees pretty charts in your tool and starts using the tool as a mechanism of bean counting or performance tracking.  If you must use a tool, at least kick it out of your meetings.  Sticky notes if everyone is present, or just your favorite text editor shared in an online session if you have remote people will speed your process along, and then you can go update your tool later when you don't have a room full of expensive engineers sitting around wasting their time.

2. You are not doing Retrospectives. Or worse, you are doing retrospectives, and the team is offering few, if any "starts, stops and keeps."  (Worse because their time is expensive, and it's a waste of money to have them sitting around doing nothing.)  This is a sign that your team is not engaged or invested in the process.  Take a team out to lunch, and they can think of 20 different ways the restaurant could have improved their performance.  It's hard to believe, then, that after a week of meetings and development time, they can't think of anything they'd change.  To fix this, you have to figure out why the team is disengaged.  Is it something you said?  Are they afraid to speak up?  Do they think their previous retrospective items have been ignored or gone unimplemented?  Are there other problems going on within your organization that has impacted your team's morale?  This is by no means an exhaustive list of questions, but rather a place to start.  Lack of engagement will kill your process, so figure it out and fix it.

1. You are not doing demos.  Your product owner should want to see progress.  Your team should want to show off their achievements.  Moreover, demos offer a unique opportunity to "force" any integration work that might otherwise be put off until it's far too late.  (Integration should be happening more frequently than your demos as well, but nothing forces the issue like having to give a demo to your product owner.)  A lack of demos can have many causes.  You may have a disengaged team (see #2).  You may not be truly performing iterative development.  You may be putting off integration work that you shouldn't be putting off.  You may have planning problems with stories that are badly written or far too big or you may have no proper product owner (see #5 in my previous post).  Whatever the case may be, if you're not doing demos, find out why, quick!  A failure to demonstrate work achieved on a regular basis (ideally every iteration) is a tremendous red flag that something has gone wildly off the rails.  If you can change or fix only one thing, fix this one, because this will tend to expose and correct many other, smaller problems under the hood.

Remember: Changing code is easy.  Changing behavior is hard.  Changing culture is almost impossible.  If you're having these problems, the fix may not be trivial.  It might be very difficult.  Start with education and get the support of your management chain if the onus is on you to correct the problem.  (And if you're a self-organizing team, the onus IS on you.)

And I hate to say it, but I would be remiss if I did not mention Martin Fowler's well-known quote: "if you can't change your organization, change your organization."  In other words, if nothing you can ever say or do will make your organization "get well", and you want to be someone involved in an agile development process, you might be better off making a career adjustment.  I also take this statement in a way that Fowler probably didn't intend: if you have people in your organization who are holding it back and sabotaging your agile development efforts, maybe you need to have their careers adjusted so that you can get on with the business at hand.

The "Privilege" Fallacy

Let me be very clear: I have never disagreed that it is a truth that some people in western culture have it easier than others.  To deny that heterosexual, Christian white males in the United States have the deck stacked in their favor would be to deny reality. Though things have gotten better in the last few decades, and we're on a trajectory toward a more inclusive culture overall, we still have much progress to be made.  I don't deny any of these things, and in fact I am one of the people working to even the odds a little.  Being a gay atheist has also given me some insight into what it means to have to work a little harder because you're not automatically included.

Having said all of that, I have seen and experienced firsthand a disturbing trend among the more sanctimonious hipster crowd in which this unfortunate facet of current social norms is not used as a reminder that we must do more to level the playing field, but rather as a mechanism to attack others, insist upon their guilt, and, perhaps most egregiously of all, to exclude them from the conversation entirely.  If you should dare to attempt to express an opinion, play the devil's advocate, or otherwise question the conventional wisdom or party line, you can almost set your watch by how long it'll take before someone hisses "privilege" at you, should you happen to be any combination of white, male, heterosexual, or Christian (in the US at least).

The implication, of course, is that you cannot possibly have anything to add to the discussion, nor is it even possible that you might have a point of value to make or something thought-provoking to say, because, simply by virtue of your genitalia, you are such a sociopathic monster that you are utterly without empathy for anyone else and incapable of seeing the world from another person's perspective.  By any standard, this is an ad-hominem, a logical fallacy of the most trivial order.  Your idea cannot be discussed, because you are de-facto "bad" and probably not even a human being.  You belong to the "privileged" few, like it or not, and so nothing you could possibly say (unless it toes the party line) could possibly have any value whatsoever.

You could be volunteering in women's shelters, fighting AIDS in South Africa, feeding the homeless in soup kitchens, voting for candidates who support equality for everyone, contributing money to help open schools for underprivileged girls in far corners of the world while yodeling and drinking a glass of water at the same time, and none of that would count for anything.  You're "privileged" so you're not invited to the discussion (if you could call it a discussion; most real discussions have more than one side or at least consider multiple points of view) and nothing you could ever say could possibly be considered for any reason. You are a monster by birth.

"Privilege" isn't even the right word to use, but those who spit it at others enjoy it for the negative connotations it has.  The word conjures up ideas of being fed with a silver spoon, having everything handed to you with little or nothing asked in return, living a life of luxury and ease.  And while it's true to say that heterosexual white male Christians may be playing the game of life on the easiest setting, it's not correct to say that anyone growing up in middle-class or poor America is really living a life of "privilege."  It's not easy, even on the easiest setting.  Sure, it's definitely harder if you're, say, a dark-skinned woman, but that doesn't mean it's trivial otherwise.

Of course people puking out this sort of attack know that, but it wouldn't be as much fun to say something like, "I question whether you're able to empathize about this situation due to your background."  Why?  Because that's an affirmative statement that someone can actually be asked to defend.  Oh really?  Why do you think I'm incapable of experiencing empathy, exactly?  Conversely, the projectile-diarrhea of "privilege" just means that you have light skin or male genitals, to which the accused (presumed guilty) can't really mount a defense.  That is, of course, the intention.  It's a cheat, not a thoughtful response.

(Using a word like "privilege" in this manner also allows its users to avoid being accused of any impropriety themselves.  Imagine if, instead of shouting "privilege," someone shouted, "you're only saying that because you're white" or "you just think that because you're a man."  When you put it like that, it sounds like the mean-spirited, shallow, sweeping generalization that it is.)

It is among the most intellectually lazy of fallacious arguments, but more egregiously, it is used to prevent a conversation from happening, rather than having a conversation with someone who has already expressed an interest in hearing different points of view.  It is saying, "I cannot provide (or can't be bothered to provide) an intelligent or reasonable response to what you've said, therefore let's just assume you're horrible and you lose."

In effect, this blanket, targeted exclusion of certain people from even being invited to participate in the discourse demonstrates the very height of hypocrisy in many cases: in response to people being treated unfairly because of their gender, skin color, sexual orientation, etc., those who assert "privilege" as a mechanism of exclusion are, in fact, targeting others to treat them unfairly based solely on their gender, skin color, sexual orientation, etc.

Now some might argue that this is only fair; after all, isn't it just a question of the tables being turned?  That depends on whether you want a discussion or you want revenge.  It depends on whether adding more wrongs on the pile make the world more fair.

So if you want revenge, and if you think the best way to resolve egregious behavior by others is more egregious behavior on your part, then by all means, accuse someone of having "privilege". It'll make you feel better, because launching personal attacks on others always does.

But if you don't want to be intellectually lazy, avoiding questions to which you may not have good answers, perhaps you could try listening to what the other person has to say, evaluating what they've said to see if perhaps they could have something worth thinking about to say, and providing an adequate and thoughtful response.  Yes, even if you feel like you've said it a thousand times before, maybe you haven't said it to this person before, or maybe there's a better way to say it. This is how civil discourse is supposed to work.  It's supposed to be civil (maybe even free from personal attacks) and involving discourse (multiple people being permitted to contribute to the discussion).

Yes, it's more effort to explain to another person your different perspective and to elucidate how your own experiences led you to have a different point of view.  Sorry.

Think about it carefully: when you already have someone's ear, and they're receptive to communicating with you on a topic, why would you deliberately shut them out with such a thoughtless accusation?  You have a teaching and a learning opportunity when you engage with another person.  Don't throw it away out of spite.

Postscript: Let's open a betting pool on how long it'll be before someone says, "well this is exactly what I would expect a privileged person to say!"  (rather than actually offer any counterpoint to anything I've said, of course).
Enhanced by Zemanta

Top 5 Signs You Are Not Agile

Many organizations believe they are applying an agile methodology.  Many organizations give lip service to supporting agile development.  But despite this, what actually happens is anything but agile.  Here are some of the major signs I've seen over the years that indicate what is happening in your organization is not agile at all.

5. You have no Product Owner.  Having a Product Owner as an integrated member of the team is critical to success.  Your Product Owner is the person telling you what's the most important thing to work on next, and he or she is also the person providing you feedback during your iteration and during your demos about whether what was produced was what was envisioned.  You cannot alter course if necessary without someone navigating, and that navigator is your Product Owner.  Similarly, if you only saw your Product Owner once, at the beginning of your release cycle, and he or she just dropped off a giant pile of requirements, you are doing Waterfall, not Agile.

4. You have exactly one Release on the horizon.  Nothing is a bigger indicator that you're in a waterfall model, even if it has short bursts of activity, than having a plan for only one Release.  If you haven't built anything until you've built everything, this is a clear sign that your user stories haven't been written properly.  Ideally, you should be delivering some kind of functioning system with every release, even if the functionality is very limited in the first release, but becomes more elaborate with every release that follows.

3. Your release date is fixed.  Your features are also fixed. Everything is the top priority.  If you're in this situation, then your planning meetings have no room for negotiation.  Your velocity, if you know it, doesn't matter, because you have to do everything, and you have to do it by a certain date.  There is no ability to prioritize.  This kind of behavior is often indicative of trust issues coming from the Product Owner or his or her superiors.  They do not trust the development team to work hard, plan judiciously, and to deliver incremental value with each Release, or they do not want to be involved enough to evaluate multiple incremental releases.  (See #3.)

2. Someone tells you your velocity, you don't know your velocity, you don't use your velocity in planning, you measure velocity in "real hours," or you use velocity to measure performance.  These are typically symptoms of a team or of a management hierarchy that does not understand the concept of velocity.  Velocity is a measure of team throughput, and it lacks a necessary 1:1 correlation with time.  This is because the team will take into account mitigating factors like risk and uncertainty in addition to effort in producing an estimated size for a story or task.  Because we work as a team, not as individuals, we don't adjust up or down expected velocity by a percentage as team members take vacation, are out sick, or leave or join the team; rather, we wait for the end of the iteration and look at what the velocity actually did, expecting it to average out over time.  It's a measure of reality, not a measure of expectation, and we use that measure to estimate how much we think we can take on for the next iteration.

1. Your iterations are more than two weeks long or you have many tasks with very large estimates.  Fundamental to Agile development is the ability to react to changes and to react to unexpected disparities between estimated effort and actual effort, and these reactions typically will take place on iteration boundaries when planning occurs.  If you only have an opportunity to make a course correction every 3 or 4 weeks, your agility is severely hampered.  Moreover, if you're trying to establish an average velocity-- something that normally requires at least 3-4 data points to establish a trend-- the longer your iterations, the longer it takes in real time before you can begin to make meaningful estimates.  If your iterations are one week long, that means three weeks for a velocity trend.  If your iterations are three weeks long, that means nine weeks before you know your velocity... and many entire releases won't be that long.  (See #3.)

So what do you do if you live in a world of one or more of these signs?  (After you're done wallowing in despair, I mean.)

First you need to have a serious discussion with your management chain.  Feel free to bring this document as a visual aid. I've come to believe that a team can only be successful implementing agile if both the team and its management chain really buy into the idea and support it.  It's easy to give lip service to Agile methodology, but unless actual behavior changes on an organizational level, success will be limited.  You need organizational support.  You also need your own team to self-organize and support the process.  Without it, people will consciously and unconsciously sabotage your efforts.

Secondly, and I suspect somewhat more commonly, you can try to fake out the system.  This is more common when the team wants to implement an Agile methodology, but management doesn't really believe in it.  You can start working on your release early, before a release date is committed, and try to strongly suggest a likely release date based on the velocity that you've already figured out, before the organization itself has actually committed to doing any work.  You can create your own internal releases that will never see the light of day, but which give your team milestones closer to the horizon to to work toward.  You can have someone playing "proxy" Product Owner if necessary.

The problem with the second approach is that ultimately your team, your Scrum Master, your proxy Product Owner, and your management chain will feel dissatisfied with the approach.  Your team feels that its input isn't valued and that it isn't trusted by management.  Your Scrum Master feels he or she isn't supported by management or appreciated by the team.  Your Product Owner, lacking any real direction, feels accountable for the outcome without really having any true direction in making decisions.  And your Management chain gets the impression that Agile is one gigantic uncoordinated mess, but of course, what you did wasn't really particularly Agile in the first place.

Good luck.


Fixing random ssh disconnects on Linux

For a while now I've been combatting persistent, obnoxious, random disconnects on a few Linux hosts at the office.  The main symptom was suddenly getting "Write failed: broken pipe" on the client side, with no indications of anything abnormal about the disconnect on the server side.  It seemed to be happening whether my session was active or not; keeping things busy with "top" or otherwise didn't make any difference.

I made the usual inquiries to see if somehow the machine was exhausting all its resources, but it was using well below the amount of file descriptors, memory, and TCP memory that it had available to it.  There was nothing abnormal in netstat, interface counters (no ierrs or oerrs), or netstat -s.  The disconnects happened at any random time, not correlated to any spikes in CPU or memory consumption.  I went so far as to watch cron, just to see if some misbehaving script could be doing a kill -9 on my shell or my sshd process.  No dice.

So I cranked up the LogLevel in sshd_config to "DEBUG3", and found that for reasons unknown, sshd won't report particularly useful bits of information like "Read error from remote host" and "Connection timed out" unless you have debugging cranked up.  That might be a bit important, OpenSSH guys... maybe "info" would be a better level for that message?

This seemed fairly strange because it was happening while plenty of interaction was in progress.  Certainly my connections were not timing out in any amount of time that seemed normal.

Running tcpdump on both ends of the connection soon revealed something interesting, however... Every 5 seconds, there was a TCP packet exchanged between server and client, and several packets in 1-second intervals as the last activity, just before the disconnect.

Yes, it's good old TCP KeepAlive, which someone had set up quite aggressively on this particular host.  Our Linux installation has three variables controlling KeepAlives: net.ipv4.tcp_keepalive_time, net.ipv4.tcp_keepalive_intvl, and net.ipv4.tcp_keepalive_probes.

If a connection has been idle for at least tcp_keepalive_time seconds, the network stack will send up to tcp_keepalive_probes every tcp_keepalive_intvl seconds.  If none of those probes are returned before the next tcp_keepalive_intvl, the connection will be broken with the error -- you guessed it -- Read timed out.

On this machine, the keepalive_time was set to 5 seconds, probes was set to 2, and keepalive_intvl was set to 1.  So every 5 seconds of inactivity, the stack would send out up to 2 probes spaced 1 second apart.  And if it didn't hear back from either of those 2 probes within 1 second, then the connection would get killed. You can imagine it wouldn't take very much network congestion to lose 2 packets in 2 seconds, especially if WiFi got involved anywhere along the line.  I suspect being on VPN and connecting to a VM only make it more error-prone.

In the case of this particular host, a quick disconnect is actually desirable, but I think we were a little too aggressive in how quick.  Changing the number of probes from 2 to 5 has allowed the machine to be much more tolerant of short-term, transient network glitches while still failing fast in the event that something really has gone wrong with the network.
Enhanced by Zemanta

Modulating and Demodulating Signals in Java

I've been studying up to take my Amateur Radio License exam, and was kind of intrigued to read about the different ways ham operators send digital data.  It got me to thinking that it might be fun to try some modulation/demodulation trickery on my own, just to see what I could do with very little effort.  In the spirit of ham radio, and because the FCC doesn't allow coded transmissions on Amateur Radio (you can modulate data, but you have to make it public how you're doing it), I present a strategy I'm calling Harmonic Relative Amplitude Modulation 8.  I suppose you could call it HRAM8 if you wanted.  I haven't researched to see if anyone else is modulating signals this way; it's entirely possible this isn't even my idea.  But it's a nice way to learn about generating complex waves and demodulating them with Fast Fourier Transforms in Java.

First a few design principles.  I wanted to do something very simple that anyone could implement easily.  I also wanted it to sound pleasing to the ear, having lived through far too many years of annoying fax and modem communications.  I wanted it to be reasonably robust, though it needn't be perfect.  And I wanted to keep the bandwidth under 1kHz, while maximizing throughput.

I came up with a strategy in which I assign a base frequency and base amplitude, and associate the base frequency and its harmonics with bits 0-7 of a byte.  If a bit is set to 1, its amplitude is 4 times the base amplitude, otherwise its amplitude is the base amplitude.

I opted not to use a clock of any kind, deciding instead to simply observe when bit patterns change.  This proved to be problematic, given words like "mucopolysaccharides" which have repeating characters.  On the first repeat, the "clock" is lost.

In keeping with my design principles, I opted for a simple and easy solution: I added a parity bit.  The parity mode switches between even parity and odd parity with every character; even numbered characters get even parity and odd numbered characters get odd parity.  So even if characters repeat, the bit pattern will change by at least 1 bit every time.  I'm not currently testing the parity bit to report errors; that is yet to come.

Each byte is modulated long enough to ensure proper transformation with FFT, which turns out to be two times the base frequency.  In my present naive demodulation strategy, I simple scroll through the sample data at half the width of a single character, looking for bit patterns to change.  When receiving a random signal off-air, this probably would not be adequate and a better strategy would need to be devised to be sure demodulation began at the start of a new character.  I'm also thinking of having a "sync character" after every 32-byte frame, so if things go out of sync, they won't do it for more than 32 bytes.

I'm using the excellent JTransforms library for Java to do the FFT along with a fairly slow sample rate and small FFT window size to make the FFT computations very quick.  Fortunately the harmonics are spaced out well enough that this relatively low fidelity is adequate.

Without further ado, how about some code?

package net.spatula.sandbox;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.ByteBuffer;
import java.nio.ShortBuffer;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;

import edu.emory.mathcs.jtransforms.fft.FloatFFT_1D;

public class Sandbox {
	private static final int NUMBER_OF_CHANNELS = 1;
	private static final int BITS_PER_SAMPLE = 16;
	private static final int BIT_SET_MULTIPLIER = 4;
	private static final int BITS_PER_WORD = 8;
	private static final int SAMPLE_RATE = 8000; // 44100 if you want a really nice, clean sin wave, but then you must change FFT_SIZE to at least 16384 too
	private static final int BYTES_PER_SAMPLE = 2;
	private static final int BASE_FREQUENCY = 110;
	private static final int FRAME_SIZE = 32;
	private static final double BASE_AMPLITUDE = 4095;
	private static final int FFT_SIZE = 4096; // 16384 if you use 44100 as the sample rate. FFT happens faster with smaller sizes.

	public void modulate(String string) throws LineUnavailableException, IOException {

		ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
		ShortBuffer shortBuffer = byteBuffer.asShortBuffer();

		for (int i = 0; i < string.length() && i < FRAME_SIZE; i++)
			byte byteToModulate = (byte) string.charAt(i);

			modulateByte(shortBuffer, byteToModulate, i % 2 == 0);



	private void modulateByte(ShortBuffer shortBuffer, byte byteToModulate, boolean even) {
		for (int sampleCount = 0; sampleCount < SAMPLES_PER_CHARACTER; sampleCount++)
			double time = (double) sampleCount / (double) SAMPLE_RATE;
			double sampleValue = 0;
			int oneBits = 0;

			// sum the signals for each bit
			for (int bitNumber = 0; bitNumber < BITS_PER_WORD; bitNumber++)
				boolean bitSet = ((byteToModulate & (byte) (Math.pow(2, bitNumber))) != 0);
				if (bitSet)
				sampleValue += Math.sin(2 * Math.PI * BASE_FREQUENCY * (bitNumber + 1) * time) * BASE_AMPLITUDE * (bitSet ? BIT_SET_MULTIPLIER : 1);

			// add in the parity bit
			boolean setParity = ((even && (oneBits % 2 != 0)) || (!even && (oneBits % 2 == 0)));
			sampleValue += Math.sin(2 * Math.PI * BASE_FREQUENCY * (BITS_PER_WORD + 1) * time) * BASE_AMPLITUDE * (setParity ? BIT_SET_MULTIPLIER : 1);

			// average the signals
			sampleValue /= (BITS_PER_WORD + 1);
			shortBuffer.put((short) sampleValue);

	private void playByteArray(byte[] buffer) throws LineUnavailableException, IOException {
		InputStream is = new ByteArrayInputStream(buffer);
		AudioFormat audioFormat = new AudioFormat(SAMPLE_RATE, BITS_PER_SAMPLE, NUMBER_OF_CHANNELS, true, true);
		AudioInputStream ais = new AudioInputStream(is, audioFormat, buffer.length / audioFormat.getFrameSize());
		DataLine.Info dataLineInfo = new DataLine.Info(SourceDataLine.class, audioFormat);
		SourceDataLine sourceDataLine = (SourceDataLine) AudioSystem.getLine(dataLineInfo);


		byte[] playBuffer = new byte[buffer.length];
		int bytesRead;
		while ((bytesRead = ais.read(playBuffer, 0, playBuffer.length)) != -1)
			sourceDataLine.write(playBuffer, 0, bytesRead);

	private void demodulateSampleBuffer(ShortBuffer shortBuffer) {
		DemodulatedCharacter lastChar = null;
		for (int i = 0; i < shortBuffer.capacity(); i += SAMPLES_PER_CHARACTER / 2)
			DemodulatedCharacter nextChar = demodulateCharacter(shortBuffer, i, SAMPLES_PER_CHARACTER / 2);
			if (!nextChar.equals(lastChar))
				lastChar = nextChar;

	private DemodulatedCharacter demodulateCharacter(ShortBuffer shortBuffer, int offset, int length) {
		float[] floatArray = new float[FFT_SIZE * 2];
		for (int i = offset; i < shortBuffer.capacity() && i < offset + length; i++)
			floatArray[i - offset] = shortBuffer.get(i);

		FloatFFT_1D fft = new FloatFFT_1D(FFT_SIZE);

		int multiplier = (int) (BASE_FREQUENCY / ((float) SAMPLE_RATE / (float) FFT_SIZE));

		long maxPower = findMaxPower(floatArray, multiplier);

		int value = 0;
		for (int i = 0; i < BITS_PER_WORD; i++)
			int index = (i + 1) * multiplier;
			long power = computePowerAtIndex(floatArray, index);
			if (power > (maxPower / (BIT_SET_MULTIPLIER / 2)))
				value += Math.pow(2, i);

		DemodulatedCharacter character = new DemodulatedCharacter();
		character.setData((char) value);

		long parityPower = computePowerAtIndex(floatArray, (BITS_PER_WORD + 1) * multiplier);
		if (parityPower > (maxPower / (BIT_SET_MULTIPLIER / 2)))

		return character;


	private long findMaxPower(float[] floatArray, int multiplier) {
		long maxPower = 0;
		for (int i = 0; i < BITS_PER_WORD; i++)
			int index = (i + 1) * multiplier;
			long power = computePowerAtIndex(floatArray, index);
			if (power > maxPower)
				maxPower = power;
		return maxPower;

	private long computePowerAtIndex(float[] floatArray, int index) {
		return (long) Math.sqrt(Math.pow(floatArray[index * 2], 2) + Math.pow(floatArray[index * 2 + 1], 2));

	private static class DemodulatedCharacter {
		private char data;
		private boolean parity;

		public char getData() {
			return data;

		public void setData(char data) {
			this.data = data;

		public boolean isParity() {
			return parity;

		public void setParity(boolean parity) {
			this.parity = parity;

		public int hashCode() {
			final int prime = 31;
			int result = 1;
			result = prime * result + data;
			result = prime * result + (parity ? 1231 : 1237);
			return result;

		public boolean equals(Object obj) {
			if (this == obj)
				return true;
			if (obj == null)
				return false;
			if (getClass() != obj.getClass())
				return false;
			DemodulatedCharacter other = (DemodulatedCharacter) obj;
			if (data != other.data)
				return false;
			if (parity != other.parity)
				return false;
			return true;

		public String toString() {
			return String.valueOf(getData());

	 * @param args
	 * @throws LineUnavailableException
	 * @throws IOException
	public static void main(String[] args) throws LineUnavailableException, IOException {

		new Sandbox().modulate(args[0]);


Enhanced by Zemanta