Random Problem in Your Cloud-hosted App? Try a New Instance!

Still in for this? Good !! Let’s chalk this one up under ‘Time Sunk’.

My Ambience for the Masses app is a Spring / Hibernate / JSP stack, with a couple of other sweet components. I run it in on an AWS instance. It’s been purring along just wonderfully for months now. Then, about ten days ago, it just stopped working.

Normally Java apps don’t die without throwing some sort of Exception. But that’s just what was happening. So I stripped out various components – thank you, Dependency Injection pattern! – and found that it would sometimes die instantly (if I was lucky) but usually it took a couple of hours. I don’t have a lot of free time to track down random intermittent bullshit like this, so it took me about a week to boil it down.

It was somewhere in the Current Listener Map – my Shoutcast-listener-tracking geo-positioning statistics-gathering data sculpture back-end engine. I hear that the kids call them things mash-ups. The geo-lookup APIs were the most delicate part, and it seemed to work with them omitted. That red herring aside, it turned out to be the IP address resolution.

try { if (getLog().isTraceEnabled()) getLog().trace("lookup : " + hostname); // oh, lookup! InetAddress ipAddress = InetAddress.getByName(hostname); // NOTE: this is where it went bad on the AWS image // *sigh* return ipAddress.getHostAddress(); } catch (UnknownHostException e) { } catch (Exception e) { getLog().warn("failed to lookup " + hostname, e); }

That block was just a couple of lines until I’d added all the logging and desperate Exception handling. The app launches a lot of threads, so tracking down the issue was annoying … but eventually there it was in the traces. The stack just terminated when the app tried to getHostAddress (not during getByName though … must be a lazy-loading thing).

So I nearly had it all tracked down to that. Then Tomcat inexplicably became unable to find basic JARs in /usr/share/java – I was using Fedora 8’s RPM version vs. raw Apache, and it’s organized real funny-like.

So I threw up my hands and started up a fresh instance of my webapp AWS image. I’d rebooted the existing one, and that hadn’t helped at all. Of course, the issue magically disappeared on the new instance. Did anyone see that coming from a distance? Ya probably did. Cuz it’s ironic. And it’s the title line of the damn blog post.

The hostname resoultion is surely a low-level OS thing. Both Linux JavaSE 6 and IcedTea 7 just shat the bed when they got to that point, unlikely unless they both leveraged the same lib call. Something must have gone wonk in the virtualization, and apparently a key part of the solution was running it inside of a different farm. I wasted a helluva lot of time to find that out.

Today’s Lesson

If weird inexplicable freaky-ass things start happening to your cloud-hosted app, load it up on a new VM earlier than later. I’d taken a late-stage backup of the failed instance and assumed it would be corrupted with the Mystery Bug (read as: a waste of time to attempt). But oh no, it worked just great . Next time it’ll be a cinch to just bundle the instance up, image it, and use it to launch a new one.

And ultimately … it wasn’t a bug in my code !!!

NOTE: If your screen reader is reading this, please contact me at admin@cantremember.com ... because it shouldn't. FIXME: build this dynamically based upon the maximum content in any sub-Element of this Element. I will call this my "Safari Reader Counterweight". In some of my Posts, I have huge code excerpts, etc. Safari Reader, at least in iOS, will identify the 'main Element', the one it features, based upon its content length. Sometimes those code excerpts get identified as the 'main Element', and the Post is borked in Safari Reader mode. This is a counter-weight; it gives the <main> Element additional content so that it gets featured, algorithmically. Yes, it increases the payload of every page (@see FIXME above). But not by that much. Then again, this is a guess as to how much content any given Element could contain. If it's not enough, BOOM, Safari Reader looks like crap. So, here's a great article on how to enable Safari Reader on your site. It's mostly guesswork, but those guesses helped me debug this obtuse goddamn problem. Oh, and look, you can enter and exit Reader programmatically. JavaScript can fix anything. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload. I promise I will never cut-and-paste lines of text simply to add Element payload.