Over the past several months, in my copious free time, I have been working on a project to virtualize an old Windows XP box with a rather curious partition and boot configuration. This grand project involved
Vagrant and chef-solo are a great pair of tools for automating your cloud deployments. Vagrantbox.es provides several Ubuntu images which come pre-baked with Ruby + Chef, and they can save you even more time. However, Vagrant images won’t help you when it comes time to deploy onto a cloud provider such as AWS or DigitalOcean. I am a curious fellow, so instead I chose to go with a bare version of the OS and a bootstrapping script.
Now, I haven’t migrated all of my legacy projects to Ruby 2.x yet, so I begin by installing Ruby 1.9.3. And Ruby versioning is what the rvm toolkit is all about.
Ruby Built from Source
With the rvm toolkit in place, installing your Ruby is as easy as:
Hooray! The only drawback — is that it compiles from source.
If this is your intention, I highly recommend the two --auto* flags above; rvm does all the hard dependency work for you. But let’s say that after your Nth from-scratch deployment, you get sick of waiting for Ruby to compile. And, like me, you are as equally stubborn as you are curious, and you choose to make your bootstrap efficient (instead of going with a pre-baked Chef image).
Wouldn’t it be nice to install a pre-built binary version? Joy of joys, rvm can do that! In my case, Ubuntu consistently comes back with a uname -m of “i686”, and I couldn’t find any ready-to-use binaries via rvm list remote, etc. So I exported my own built-from-source binary by following these helpful directions.
I’ve provided some ENV vars below which are convenient for a shell script like mine which can either
install from source and then export the binary
install the binary after MD5 and SHA-512 checksum validation
What I have omitted are
the target directory for the exported binary ($YOUR_BINARY_DIR)
the checksums from exported binary ($YOUR_MD5, $YOUR_SHA512)
your S3 Bucket name ($YOUR_BUCKET)
manual steps like; updating the shell script with the latest checksums, uploading the binary to S3, etc.
Ruby Installed from Binary
With these ENV vars in place,
# convenience ENV vars
# allow rvm to validate your checksum
test "`grep "$RUBY_STAMP" $RUBY_MD5`" == "" && \
echo "$RUBY_BINARY=$YOUR_MD5" >> $RUBY_MD5
test "`grep "$RUBY_STAMP" $RUBY_SHA512`" == "" && \
echo "$RUBY_BINARY=$YOUR_SHA512" >> $RUBY_SHA512
# download it from S3
rvm mount -r $RUBY_BINARY --verify-downloads 1
Voila! So the next step in my bootstrap script is to install Chef in its own rvm gemset.
And if the following hadn’t happened, I’m not sure i could have justified a blog post on the subject:
Building native extensions. This could take a while...
ERROR: Error installing ffi:
ERROR: Failed to build gem native extension.
You see, when we stopped installing Ruby from source, we also stopped installing in the dev dependency packages, as posts like this one will happily remind you. Therein lies the magic of --autolibs=enable; and you must now provide your own magic.
As a side note, rvm can also download the Ruby source for you — without the build step — via rvm fetch $RUBY_STAMP. But that isn’t the real problem here.
Filling in the Gaps
Fortunately, you can do all the --autolibs steps manually — once you identify everything it does for you.
By watching the rvm install output, you can identify the package dependencies:
# install dependencies, by hand
apt-get -y install build-essential
apt-get -y install libreadline6-dev zlib1g-dev libssl-dev libyaml-dev # ... and plenty more
Or — even better — you can offer up thanks to the maintainers of rvm and simply run
# install dependencies, per rvm
Tremendous! So now everything’s unicorns and rainbows now, right?
Well, apparently not. There’s one more thing that --autolibs did for you; it specified the ENV variables for your C++ compiler(s). If those vars aren’t available, you’ll get a failed command execution that looks something like this
-E -I. -I/usr/lib64/ruby/1.8/i686-linux -I. # ... and a lot more
“What the hell is this ‘-E’ thing my box is trying to run?”, you ask yourself, loudly, as you pull out your hair in ever-increasing clumps. That, my friends, is the result of a missing ENV var. I’ll spare you all the madness of delving into ruby extconf.rb & mkmf.rb & Makefile. Here’s how simple the fix is:
# compiler vars
Excelsior! Once those vars are in place, you’re off to the races.
And here’s hoping that you don’t run into additional roadblocks of your own. Because technical obstacles can be some of the most soul-sucking time-sinks in the world.
November 9th, 2014 in Development|
Comments Off on Filling In the Gaps after Installing a Binary Ruby with rvm
I’ve been spending a lot of time these days in Node.js working with Promises. Our team has adopted them as our pattern for writing clean & comprehensible asynchronous code. Our library of choice is bluebird, which is — as it claims — a highly performant implementation of the Promises/A+ standard.
Personally, I find code crafted in Promise style to be much more legible and easy to follow than the callback-based equivalent. Those of us who’ve drank the kool-aid can become kinda passionate about it.
I’ll be presenting some flow patterns below which I keep coming back to — for better or for worse — to illustrate the kind of well-intentioned wackiness you may may encounter in Promise-influence projects. I would add that, to date, these patterns have survived the PR review scrutiny of my peers on multiple occasions. Also, they are heavily influenced by the rich feature-set of the bluebird library.
Yes, Promises do introduce a performance hit into your codebase. I don’t have concrete metrics here to quote to you, but our team has decided that it is an acceptable loss considering the benefits that Promises offer. A few of the more esoteric benefits are:
Old-school arguments.length-based optional-arg and vararg call signatures become easier to implement cleanly without the trailing callback Function.
return suddenly starts meaning something again in asynchronous Functions, rather than merely providing control flow.
Your code stops releasing Zalgo all the time — provided that your framework of choice is built with restraining Zalgo in mind (as bluebird is). Also, Zalgo.
How Many LOCs Do You Want?
Which of the following styles do you find more legible? I’ve gotten into the habit of writing the latter, even though it inflates your line count and visual whitespace. Frankly, I like whitespace.
It even makes the typical first-line indentation quirkiness read pretty well. Especially with a large argument count.
Throw Or Reject
I’ve found it’s good rule to stick with either a throw or Promise.reject coding style, rather than hopping back & forth between them.
Of course, rules are are meant to be broken; don’t compromise your code’s legibility just to accomodate a particular style.
If you’re going to be taking the throw route, it’s best to only do so once you’re wrapped within a Promise. Otherwise, your caller might get a nasty surprise — as they would if your Function suddenly returned null or something equally non-Promise-y.
Ensuring that Errors stay inside the Promise chain allows for much more graceful handling and fewer process.on('uncaughtException', ...);s. This is another place where the overhead of constructing a Promise wrapper to gain code stability is well worth the performance hit.
Deferred vs. new Promise
The bluebird library provides a very simple convenience method to create a ‘deferred’ Promise utility Object.
I find code that uses a ‘deferred’ Object to read better than code that uses the Promise constructor equivalent. I feel the same way when writing un-contrived code which actually needs to respect Error conditions.
Finally … here’s a lovely little pattern — or anti-pattern? — made possible by Object identity comparison.
The objective is to short-circuit the Promise execution chain. Early return methods often go hand-in-hand with carrying something forward from earlier in the Promise chain, hence the scopedResult.
There’s obtuseness sneaking in here, even with a simplified example using informative variable names. Admittedly, an early return Function is easier to write as pure CPS, or using an async support library. It’s also possible to omit the EARLY_RETURN by using Promise sub-chains, but you can end up with indentation hell all over again.
I’d Say That’s Plenty
No more. I promise.
November 9th, 2014 in Development|
Comments Off on Life in the Promise Land
At some point in the afternoon of Tue 2013-Dec-31 (as I remember it), one of my site health checks started firing off. WordPress was responding with an HTTP 500 and partial document body, ending with:
SSL connect error
I use the embed-github-gist plugin to display my code-snippet gists inline in my blog posts. It fetches them on the server side — thank goddess I cache my blog posts :) — and apparently at some point on Tuesday, GitHub had switched something over in its SSL termination. Until I fix this, my blog is kinda toast.
I had to add a One Line Fix to WordPress’ WP_Http component so that it would use the proper SSL Cipher. Scroll down towards the bottom of this post to see the code-snippet.
Whereas the rest of this post recounts the twists & turns I took in coming to that final solution, the process of which may or may not be of use to others. It certainly was a welcomed catharsis to write it all down.
What’s up with WordPress?
My /etc/php.ini has log_errors = On, and they end up in my nginx error log. I was seeing
2014/01/04 20:12:00 [error] 935#0: *95058410 FastCGI sent in stderr: "PHP Fatal error: Cannot use object of type WP_Error as array in ~/wordpress/wp-content/plugins/embed-github-gist/embed-github-gist.php on line 86" while reading upstream
Cool, that’s useful. So, I read the source code and mock up a URL to try out in curl. It works fine here on my laptop, but on my AWS instance, i get:
% curl -vvv https://api.github.com/gists/2166671?sslverify=false
* About to connect() to api.github.com port 443 (#0)
* Trying 126.96.36.199... connected
* Connected to api.github.com (188.8.131.52) port 443 (#0)
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
* NSS error -12286
Well, that sslverify=false isn’t helping. Nor does using curl with -k/--insecure. So, what’s up with ‘NSS error -12286’? This post states that it’s SSL_ERROR_NO_CYPHER_OVERLAP.
I figured that need to get some better tooling to identify my core issue with GitHub and TLS. Via this post I track down a tool called gnutls-cli:
% yum install gnutls gnutls-utils
% gnutls-cli -p 443 api.github.com
- Certificate info:
# The hostname in the certificate matches 'api.github.com'.
# valid since: Sun Apr 29 20:00:00 EDT 2012
# expires at: Wed Jul 9 08:00:00 EDT 2014
# fingerprint: 55:D8:B2:AC:FA:96:DF:AF:85:32:1C:0F:B2:5A:96:1D
# Subject's DN: C=US,ST=California,L=San Francisco,O=GitHub\, Inc.,CN=*.github.com
# Issuer's DN: C=US,O=DigiCert Inc,OU=www.digicert.com,CN=DigiCert High Assurance CA-3
- Certificate info:
# valid since: Wed Apr 2 08:00:00 EDT 2008
# expires at: Sat Apr 2 20:00:00 EDT 2022
# fingerprint: C6:8B:99:30:C8:57:8D:41:6F:8C:09:4E:6A:DB:0C:90
# Subject's DN: C=US,O=DigiCert Inc,OU=www.digicert.com,CN=DigiCert High Assurance CA-3
# Issuer's DN: C=US,O=DigiCert Inc,OU=www.digicert.com,CN=DigiCert High Assurance EV Root CA
- Peer's certificate issuer is unknown
- Peer's certificate is NOT trusted
- Version: TLS 1.1
- Key Exchange: RSA
- Cipher: AES 128 CBC
- MAC: SHA
Now, it turns out that the key word here is “cypher”. But since I saw “certificate is NOT trusted”, I went down my first Rabbit Hole instead.
Rabbit Hole: New Certificate Authority
Obviously I needed to update my CA Bundle within OpenSSL, right? The bundle is installed at /etc/pki/tls/certs/ca-bundle.crt per the curl response above. I’m guessing it’s probably quite old, since I’m still running Fedora 8 (as justified later). I’ll probably need to find an alternate source.
Looking at the directory hierarchy, /etc/pki/tls/openssl.cnf gives me a clue that it’s an OpenSSL thing. Moreover, this post assures me that The ca-bundle.crt is in the openssl rpm. Sweet.
However, I already have the latest (0.9.8b-17) and this RPM listing doesn’t give me anything more to go on. I suspect I could build a 1.x version of OpenSSL, but that may cause interoperability issues. Crap.
I find a Fedora 19 version of the OpenSSL RPM and pull it down. This post informs me that I need to use rpm2cpio to unpack an RPM as if it were a normal archive. Ultimately though, the resulting ca-bundle.crt does not help. Crap.
So, I guess it’s time to re-build the CA Bundle from scratch, right? Well, let’s see. Curl’s Details on Server SSL Certificates suggests Get a better/different/newer CA cert bundle!. Rather than do it through a source download of Firefox, I went with the CA Extract approach, and downloaded a new cacert.pem. That too didn’t help. Crap.
I check with the OpenSSL FAQ: How can I set up a bundle of commercial root CA certificates? I can generate them from Mozilla’s current master list using this convenient little script. That’ll provide me with the latest full CA Certificate List, so that’ll totally work! Pull down the file, modify the script a bit, and voila, right?
No, of course it won’t. Because this is a Rabbit Hole. I even pulled down the offending certificate from DigiCert and tried to use it in isolation. gnutls-cli offers a way for me to extract the public key into a PEM, but that didn’t work, so I viewed the GitHub SSL certificate info in Google Chrome, then did some manual typing (since URL cut-and-paste was prevented for some absurd reason):
Somehow, no matter what I do with the CA Bundle, it’s not helping. Also, something seems fishy, because the configuration in /etc/pki/tls/openssl.cnf doesn’t seem to match reality.
So it turns out that this is not an OpenSSL issue — it’s an NSS issue.
Perhaps I should have paid more attention to the prefixing of the curl error response:
* NSS error -12286
Also, when looking around for clues about CA Bundles, root certificates and the like, I came across this Mozilla wiki which states: Fedora: nss-tools. More hints. Also, waaay down on the curl man page, it suggests If curl is built against the NSS SSL library … So there’s that.
A quick review of Fedora 8’s curl package reveals that — okay, fine — yes, it’s an NSS problem..
Now, at this point I’m still chasing the wild CA Bundles goose, and this Fedora wiki describes the ‘Shared System Certificates’ Feature which promises to Make NSS, GnuTLS, OpenSSL and Java share a default source for retrieving system certificate anchors and black list information. However, it is unimplemented.
Fortunately, by now it’s dawning on me that this is probably not a CA Bundle issue, but rather an SSL Cipher issue. Because, you know, “SSL_ERROR_NO_CYPHER_OVERLAP”.
Rabbit Hole: NSS-Tools
So, obviously what I need here is nss-tools, right? They’re already installed as a standard package, so let’s see what they offer me.
certutil? Nope; again, this is not a CA Bundle issue.
Well, I can’t find any command-line tool to help me manage SSL Ciphers. It looks like there’s just this pre-baked-in set available to me. Under Available ciphers are:, I see “TLS_RSA_WITH_AES_128_CBC_SHA”, otherwise known as rsa_aes_128_sha. And gnutls-cli told me:
Yet even after I create ~nginx/.curlrc for my nginx daemon user, I’m still having issues. This post on libcurl bursts my balloon when it states the entire .curlrc parser and all associated logic is only present in the command line tool code. Crap.
Still. I’m clearly closing in on a solution, wouldn’t you think?
Perhaps WP_Http has a built-in configuration mechanism to provide curl-related options? A scan through the source code in wordpress/wp-includes/class-http.php doesn’t show any generalized uses of curl_setopt, so it appears not. Since I can’t patch embed-github-gist, it seems I’ll have to patch WordPress itself.
A look through PHP’s libcurl integration reveals that I want the “CURLOPT_SSL_CIPHER_LIST” flag. So here it, is the One Line Fix, with 3x as many comment lines as code:
No generalization, no resulting pull request, no nothin’. Just the minimum I need to fix the problem, isolated as close the broken component itself as possible. Yay!
Ultimately, I deployed the fix on the night of Sat 2014-Jan-04, after throwing back all of the red herrings that I netted along the way.
Hell no, I am not proud of how much time & effort I sank into this so-called trivial change. Granted, I didn’t start the process until after New Year’s Day, and I only had time to spelunk at night after I got home from ‘real’ work. But still … yow!
I’ll freely admit that my deduction techniques were flawed. The Rabbit Hole you’ve been in is something that you only see after coming up from missing that left turn at Albuquerque. This blog post documents the adversity, not the victory.
The OS on my AWS instance is kind of ancient: Fedora 8. Back in 2007 that was reasonable, but it has long since been end-of-lifed. Even so, my preference to stay with pre-ordained packages and not completely upgrade my server at this point in time, which left me working with ancient code. This was a choice that perhaps I should have re-considered, but it seemed a prudent choice in-the-moment.
Ultimately, this ordeal seemed to up following the typical ‘One Line Fix’ pattern that Developers run into when shit breaks — a lot of research, trial & error, leading to a tiny little patch in just the right place. A ton of thanks goes out to the search-indexed of others’ legacy posts and wikis which lit the way even in my darkest hours.
Of course, in retrospect, the One Line Fix seems like the easiest little thing, doesn’t it? But as we all know, getting there is 95% of the battle. And a battle it was. A struggle was had, adversity overcome, and a victory wrested away from the jaws of defeat!
January 5th, 2014 in Uncategorized|
Comments Off on one line fix: WordPress and GitHub’s SSL Cipher
this weekend turned out to be a rather odd mix of side-projects and technical chaos. and just to preface it — this is not a boastful blog entry. everything i did in the technical realm was either (a) a simple fix or (b) being helpful — nothing to brag about. it’s the circumstances that make it something i’d like to put down on record :)
so, Friday night i was stitching together the last parts of my Burning Man coat. it’s made of fur, and ridiculous by design. i’m adding some needed collar reinforcement, when suddenly i start getting Prowl notifications. my health checks are failing. “ah, crap, not again,” says the guy who’s used to running a totally-non-critical app platform in the AWS cloud, “i’ll get to it after i’ve finished sewing buffalo teeth into the collar.” so i did. my instance’s CPU appeared to be spiked — i could hit it with ssh, but the connection would time out. a reboot signal resolved the issue (after an overnight wait). and it was thus that i fell victim, like so many others, to Amazon’s ThunderCloudPocalypse 2012. and the secret bonus was that one of my EBS volumes was stuck in attaching state. “ah, crap, not again,” says the guy who’s gonna lose some data (because he has backup scripts for Capistrano but no automation for them yet), and i’m forced to spin up a new volume from a month-old snapshot. no worries – it wasn’t my MySQL / MongoDB volume, just the one for my blog & wiki & logs. i got that up and running on Saturday in-between rehearsing scenes for The Princess Bride (coming to The Dark Room in August 2012 !!)
then i was immediately off to rehearsal for my Dinner Detective show that night. yeah, it was one of those kind of Saturdays. so, i was sitting there waiting for my cue, when at about 5pm PDT, failure txts suddenly start raining down from work. and from multiple servers that have no reason to have load problems. i log into our Engineering channel via the HipChat iPhone app, and our DevOps hero is already on the case. ElasticSearch has pegged the CPU on its server, and JIRA & Confluence are going nuts as well. something’s suddenly up with our Java-based services. i ask him to check on Jenkins, and sure enough, it’s pegged too. and no one’s pushed anything to build. he goes off to reboot services and experiment, and i go off to check Twitter to see if we’re the only ones experiencing it. sudden JVM failures distributed across independent servers? that’s unlikely. he guesses it’s a problem with date calculation, and he was absolutely right. hello leap-second, the one added at midnight GMT July 1st 2012. i RT:d a few good informative posts to get the word out — what else can i do, i’m at rehearsal and on my phone! — and then let DevOps know. we’re able to bring down search for a while, and it turns out rebooting the servers solves the problem (even without disabling ntpd, as other folks recommended). so, disaster averted thanks to Nagios alerts, a bit of heroic effort, and our architect’s choice of a heavily Ruby-based platform stack
again, as i prefaced; nothing impressive. no Rockstar Ninja moves. no brilliant deductions or deep insightful inspections. neither lives no fortunes were saved. and i got to wake up on Sunday, do laundry, pay my bills, and go out dancing to Silent Frisco for the later hours of the afternoon. but it was fun to have been caught up in two different reminders of how fragile our amazing modern software is, and how the simplest unexpected things — storms in Virginia, and Earth’s pesky orbital rotation — can have such sudden, pervasive, quake-like impacts on it
July 1st, 2012 in Development|
Comments Off on a weekend of craft, theatre, and technical meltdowns