Archive for the ‘Development’ Category

Filling In the Gaps after Installing a Binary Ruby with rvm

Sunday, November 9th, 2014

Vagrant and chef-solo are a great pair of tools for automating your cloud deployments. Vagrantbox.es provides several Ubuntu images which come pre-baked with Ruby + Chef, and they can save you even more time. However, Vagrant images won’t help you when it comes time to deploy onto a cloud provider such as AWS or DigitalOcean. I am a curious fellow, so instead I chose to go with a bare version of the OS and a bootstrapping script.

Now, I haven’t migrated all of my legacy projects to Ruby 2.x yet, so I begin by installing Ruby 1.9.3. And Ruby versioning is what the rvm toolkit is all about.

Ruby Built from Source

With the rvm toolkit in place, installing your Ruby is as easy as:

rvm install ruby-1.9.3 --autolibs=enable --auto-dotfiles

Hooray! The only drawback — is that it compiles from source.

If this is your intention, I highly recommend the two --auto* flags above; rvm does all the hard dependency work for you. But let’s say that after your Nth from-scratch deployment, you get sick of waiting for Ruby to compile. And, like me, you are as equally stubborn as you are curious, and you choose to make your bootstrap efficient (instead of going with a pre-baked Chef image).

Wouldn’t it be nice to install a pre-built binary version? Joy of joys, rvm can do that! In my case, Ubuntu consistently comes back with a uname -m of “i686″, and I couldn’t find any ready-to-use binaries via rvm list remote, etc. So I exported my own built-from-source binary by following these helpful directions.

I’ve provided some ENV vars below which are convenient for a shell script like mine which can either

  • install from source and then export the binary
  • install the binary after MD5 and SHA-512 checksum validation

What I have omitted are

  • the target directory for the exported binary ($YOUR_BINARY_DIR)
  • the checksums from exported binary ($YOUR_MD5, $YOUR_SHA512)
  • your S3 Bucket name ($YOUR_BUCKET)
  • manual steps like; updating the shell script with the latest checksums, uploading the binary to S3, etc.

Ruby Installed from Binary

With these ENV vars in place,

# convenience ENV vars
RUBY_VERSION=1.9.3
RUBY_STAMP=ruby-$RUBY_VERSION-p547
RUBY_BINARY=https://s3.amazonaws.com/$YOUR_BUCKET/$RUBY_STAMP.tar.bz2
RUBY_MD5=/usr/local/rvm/user/md5
RUBY_SHA512=/usr/local/rvm/user/sha512

The build-from-source steps go like this

# build from source
rvm install ruby-1.9.3 --autolibs=enable --auto-dotfiles

# export $RUBY_STAMP.tar.bz2
pushd .
cd $YOUR_BINARY_DIR
rvm prepare $RUBY_STAMP
md5sum $RUBY_STAMP.tar.bz2 > $RUBY_STAMP.checksums
sha512sum $RUBY_STAMP.tar.bz2 >> $RUBY_STAMP.checksums
popd

And the install-from-binary steps go like this

# allow rvm to validate your checksum
test "`grep "$RUBY_STAMP" $RUBY_MD5`" == "" && \
    echo "$RUBY_BINARY=$YOUR_MD5" >> $RUBY_MD5
test "`grep "$RUBY_STAMP" $RUBY_SHA512`" == "" && \
    echo "$RUBY_BINARY=$YOUR_SHA512" >> $RUBY_SHA512

# download it from S3
rvm mount -r $RUBY_BINARY --verify-downloads 1

Voila! So the next step in my bootstrap script is to install Chef in its own rvm gemset.

And if the following hadn’t happened, I’m not sure i could have justified a blog post on the subject:

Building native extensions.  This could take a while...
ERROR:  Error installing ffi:
ERROR: Failed to build gem native extension.

You see, when we stopped installing Ruby from source, we also stopped installing in the dev dependency packages, as posts like this one will happily remind you. Therein lies the magic of --autolibs=enable; and you must now provide your own magic.

As a side note, rvm can also download the Ruby source for you — without the build step — via rvm fetch $RUBY_STAMP. But that isn’t the real problem here.

Filling in the Gaps

Fortunately, you can do all the --autolibs steps manually — once you identify everything it does for you.

By watching the rvm install output, you can identify the package dependencies:

# install dependencies, by hand
apt-get -y install build-essential
apt-get -y install libreadline6-dev zlib1g-dev libssl-dev libyaml-dev # ... and plenty more

Or — even better — you can offer up thanks to the maintainers of rvm and simply run

# install dependencies, per rvm
rvm requirements

Tremendous! So now everything’s unicorns and rainbows now, right?

Well, apparently not. There’s one more thing that --autolibs did for you; it specified the ENV variables for your C++ compiler(s). If those vars aren’t available, you’ll get a failed command execution that looks something like this

-E -I. -I/usr/lib64/ruby/1.8/i686-linux -I. # ... and a lot more

“What the hell is this ‘-E’ thing my box is trying to run?”, you ask yourself, loudly, as you pull out your hair in ever-increasing clumps. That, my friends, is the result of a missing ENV var. I’ll spare you all the madness of delving into ruby extconf.rb & mkmf.rb & Makefile. Here’s how simple the fix is:

# compiler vars
export CC=gcc
export CXX=g++

Excelsior! Once those vars are in place, you’re off to the races.

And here’s hoping that you don’t run into additional roadblocks of your own. Because technical obstacles can be some of the most soul-sucking time-sinks in the world.

Life in the Promise Land

Sunday, November 9th, 2014

I’ve been spending a lot of time these days in Node.js working with Promises. Our team has adopted them as our pattern for writing clean & comprehensible asynchronous code. Our library of choice is bluebird, which is — as it claims — a highly performant implementation of the Promises/A+ standard.

There’s plenty of debate in the JavaScript community as to the value of Promises. Continuation Passing Style is the de-facto standard, but it leads to strange coding artifacts — indentation pyramids, named Function chaining, etc. This can lead to some rather unfortunate and hard-to-read code. Producing grokable code is critical to honoring team-members and long-term module maintainability.

Personally, I find code crafted in Promise style to be much more legible and easy to follow than the callback-based equivalent. Those of us who’ve drank the kool-aid can become kinda passionate about it.

I’ll be presenting some flow patterns below which I keep coming back to — for better or for worse — to illustrate the kind of well-intentioned wackiness you may may encounter in Promise-influence projects. I would add that, to date, these patterns have survived the PR review scrutiny of my peers on multiple occasions. Also, they are heavily influenced by the rich feature-set of the bluebird library.

Performance Anxiety

Yes, Promises do introduce a performance hit into your codebase. I don’t have concrete metrics here to quote to you, but our team has decided that it is an acceptable loss considering the benefits that Promises offer. A few of the more esoteric benefits are:

  • Old-school arguments.length-based optional-arg and vararg call signatures become easier to implement cleanly without the trailing callback Function.
  • return suddenly starts meaning something again in asynchronous Functions, rather than merely providing control flow.
  • Your code stops releasing Zalgo all the time — provided that your framework of choice is built with restraining Zalgo in mind (as bluebird is). Also, Zalgo.

How Many LOCs Do You Want?

Which of the following styles do you find more legible? I’ve gotten into the habit of writing the latter, even though it inflates your line count and visual whitespace. Frankly, I like whitespace.

/**
 * fewer lines, sure ...
 */
function _sameLineStyle() {
    return example.firstMethod().then(function() {
        return example.secondMethod();
    }).then(function() {
        return example.thirdMethod();
    });
}

/**
 * yet i prefer this
 *   what a subtle change a little \n can make
 */
function _newLineStyle() {
    return example.firstMethod()
    .then(function() {
        return example.secondMethod();
    })
    .then(function() {
        return example.thirdMethod();
    });
}

It even makes the typical first-line indentation quirkiness read pretty well. Especially with a large argument count.

Throw Or Reject

I’ve found it’s good rule to stick with either a throw or Promise.reject coding style, rather than hopping back & forth between them.

/**
 * be careful to return a Promise from every exiting branch
 */
function mayReject(may) {
    if (may || (may === undefined)) {
        return Promise.reject(new Error('I REJECT'));
    }
    return Promise.resolve(true);
}

/**
 * or add a candy-coated Promise shell,
 *   and write it more like you're used to
 */
var mayThrow = Promise.method(function(may) {
    if (may || (may === undefined)) {
        throw new Error('I REJECT');
    }
    return true;
});

Of course, rules are are meant to be broken; don’t compromise your code’s legibility just to accomodate a particular style.

If you’re going to be taking the throw route, it’s best to only do so once you’re wrapped within a Promise. Otherwise, your caller might get a nasty surprise — as they would if your Function suddenly returned null or something equally non-Promise-y.

Ensuring that Errors stay inside the Promise chain allows for much more graceful handling and fewer process.on('uncaughtException', ...);s. This is another place where the overhead of constructing a Promise wrapper to gain code stability is well worth the performance hit.

Deferred vs. new Promise

The bluebird library provides a very simple convenience method to create a ‘deferred’ Promise utility Object.

function fooAndBarInParallel() {
    // boiler-plate.  meh
    var constructed = new Promise(function(resolve, reject) {
        emitter.once('foo', function(food) {
            resolve(food);
        });
    });

    // such clean.  so code
    var deferred = Promise.deferred();
    emitter.once('bar', function(barred) {
        deferred.resolve(barred);
    });

    return Promise.all([
        constructed,
        deferred.promise,
    ]);
}

I find code that uses a ‘deferred’ Object to read better than code that uses the Promise constructor equivalent. I feel the same way when writing un-contrived code which actually needs to respect Error conditions.

Early Return

Finally … here’s a lovely little pattern — or anti-pattern? — made possible by Object identity comparison.

// const ...
var EARLY_RETURN;

function _getSomething(key, data) {
    var scopedResult;

    return cache.fetchSomething(key)
    .then(function(_resolved) {
        scopedResult = _resolved;
        if (scopedResult) {
            // no need to create what we can fetch
            throw EARLY_RETURN;
        }

        return _createSomething(data);
    })
    // ...
    .then(function(_resolved) {
        scopedResult = _resolved;

        return cache.putSomething(key, _resolved);
    })
    .catch(function(err) {
        if (err !== EARLY_RETURN) {
            // oh, that's an Error fo real
            throw err;
        }
    })
    .then(function() {
        return scopedResult;
    });
}

The objective is to short-circuit the Promise execution chain. Early return methods often go hand-in-hand with carrying something forward from earlier in the Promise chain, hence the scopedResult.

There’s obtuseness sneaking in here, even with a simplified example using informative variable names. Admittedly, an early return Function is easier to write as pure CPS, or using an async support library. It’s also possible to omit the EARLY_RETURN by using Promise sub-chains, but you can end up with indentation hell all over again.

Clearly, YMMV.

I’d Say That’s Plenty

No more. I promise.

a weekend of craft, theatre, and technical meltdowns

Sunday, July 1st, 2012

this weekend turned out to be a rather odd mix of side-projects and technical chaos.  and just to preface it — this is not a boastful blog entry.  everything i did in the technical realm was either (a) a simple fix or (b) being helpful — nothing to brag about.  it’s the circumstances that make it something i’d like to put down on record :)

so, Friday night i was stitching together the last parts of my Burning Man coat.  it’s made of fur, and ridiculous by design.  i’m adding some needed collar reinforcement, when suddenly i start getting Prowl notifications.  my health checks are failing.  ”ah, crap, not again,” says the guy who’s used to running a totally-non-critical app platform in the AWS cloud, “i’ll get to it after i’ve finished sewing buffalo teeth into the collar.”  so i did.  my instance’s CPU appeared to be spiked — i could hit it with ssh, but the connection would time out.  a reboot signal resolved the issue (after an overnight wait).  and it was thus that i fell victim, like so many others, to Amazon’s ThunderCloudPocalypse 2012.  and the secret bonus was that one of my EBS volumes was stuck in attaching state.  ”ah, crap, not again,” says the guy who’s gonna lose some data (because he has backup scripts for Capistrano but no automation for them yet), and i’m forced to spin up a new volume from a month-old snapshot.  no worries – it wasn’t my MySQL / MongoDB volume, just the one for my blog & wiki & logs.  i got that up and running on Saturday in-between rehearsing scenes for The Princess Bride (coming to The Dark Room in August 2012 !!)

then i was immediately off to rehearsal for my Dinner Detective show that night.  yeah, it was one of those kind of Saturdays.  so, i was sitting there waiting for my cue, when at about 5pm PDT, failure txts suddenly start raining down from work.  and from multiple servers that have no reason to have load problems.  i log into our Engineering channel via the HipChat iPhone app, and our DevOps hero is already on the case.  ElasticSearch has pegged the CPU on its server, and JIRA & Confluence are going nuts as well.  something’s suddenly up with our Java-based services.  i ask him to check on Jenkins, and sure enough, it’s pegged too.  and no one’s pushed anything to build.  he goes off to reboot services and experiment, and i go off to check Twitter to see if we’re the only ones experiencing it.  sudden JVM failures distributed across independent servers?  that’s unlikely.  he guesses it’s a problem with date calculation, and he was absolutely right.  hello leap-second, the one added at midnight GMT July 1st 2012.  i RT:d a few good informative posts to get the word out — what else can i do, i’m at rehearsal and on my phone! — and then let DevOps know.  we’re able to bring down search for a while, and it turns out rebooting the servers solves the problem (even without disabling ntpd, as other folks recommended).  so, disaster averted thanks to Nagios alerts, a bit of heroic effort, and our architect’s choice of a heavily Ruby-based platform stack

again, as i prefaced; nothing impressive.  no Rockstar Ninja moves.  no brilliant deductions or deep insightful inspections.  neither lives no fortunes were saved.  and i got to wake up on Sunday, do laundry, pay my bills, and go out dancing to Silent Frisco for the later hours of the afternoon.  but it was fun to have been caught up in two different reminders of how fragile our amazing modern software is, and how the simplest unexpected things — storms in Virginia, and Earth’s pesky orbital rotation — can have such sudden, pervasive, quake-like impacts on it

delayedCallback(function(){ … }, delay);

Thursday, March 22nd, 2012

hola, Amigos! it’s been a long time since I rapped at ya!

so, i’ve been doing plenty, i’m just not chatty about it. i built a Ruby migration framework using bundler, pry, spreadsheet, sequel and mp3info to build a JSON document version of my SEB Broadcast database. next up is some node.js to serve it up, then some RequireJS, mustache (?) & jQuery goodness to spiff up the SEB site

but in the meanwhile, i wrote this little gem at work:

// returns a function that will invoke the callback 'delay' ms after it is last called
//   eg. invoke the callback 500ms after the last time a key is pressed
// based on http://stackoverflow.com/questions/2219924/idiomatic-jquery-delayed-event-only-after-a-short-pause-in-typing-e-g-timew
//   but fixes the following:
//   (a) returns a unique timer scope on every call
//   (b) propagates context and arguments from the last call to the returned closure
//   and adds .cancel() and .fire() for additional callback control
// then of course, you could use underscore's _.debounce
//   it doesn't have the .cancel and .fire, but you may not care :)
function delayedCallback(callback, delay) {
    return (function(callback, delay) {                          // (2) receives the values in scope
        var timer = 0;                                           // (3a) a scoped timer
        var context, args;                                       // (3b) scoped copies from the last invocation of the returned closure
        var cb = function() { callback.apply(context, args); }   // (3c) called with proper context + arguments
        var dcb = function() {                                   // (4) this closure is what gets returned from .delayedCallback
            context = this;
            args = arguments;
            window.clearTimeout(timer);
            timer = window.setTimeout(cb, delay);                // (5) only fires after this many ms of not re-invoking
        };
        dcb.cancel = function() { window.clearTimeout(timer); }; // (6a) you can cancel the delayed firing
        dcb.fire  = function() { this.cancel(); cb(); };         // (6b) or force it to fire immediately
        return dcb;
    })(callback, delay);                                         // (1) capture these values in scope
}

yes, i know. so it turns out that i didn’t know about underscore’s _.debounce() when i wrote it. eh. so much for DRY :)

still — i’m glad i thought it through. to me, this implementation captures the most powerful aspects of ECMAScript itself:

  • scope-capturing closures
  • specifiable function context
  • freestyle properties on Object instances
  • single-threading (look ma, no synchronize { ... } !)

anyway. bla dee blah. this post also gave me the incentive to start embedding gists in my blog. nice helper widget, dflydev !

peace out

Upgrading your Rails Development Mac to Snow Leopard

Saturday, February 6th, 2010

Oh, there is such joy in the process of upgrading to Mac OS/X Snow Leopard for us developer folks. Me, I chose the in-place ugrade path … my co-worker, who chose the from-scratch path, was deprived of some of these pleasures. Then again, he had to reconstruct everything from scrach, so he had his own barrel of monkeys to contend with.

Here’s all the bumps I ran into, pretty much in reverse order as I tried (unsuccessfully) to do the minimum amount of work possible :) I had to go pretty much this entire process twice — once on my work Macbook Pro, once on my identical home verison — so I figured I might as well document all of this crap down in the hope that it may reduce the shock and awe of future migrators. Of course you may run into a mess of fun issues not described here … And pardon me if the instructions aren’t perfect, because I’ve tried to boil a lot of this down to the it-finally-worked steps extracted from the frenzied back-and-forth of my real-time upgrade experience :)

Backups

Yes, this is just common-sense developer stuff (as is a number of the other obvious things that I call out in this post).

You’ll probably want to do a full mysqldump before upgrading. You can dump your port install and gem list --local listings up front as well, or wait ’til you get to those respective steps below.

MacPorts

If you also chose the MacPorts library system, you’ll need to re-install it from scratch. You’ll need X11 from the Snow Leopard from the OS/X install disks, and download the latest version of Xcode. Follow the migration steps as outlined on their Wiki; it does the trick.

Save off your port install list as a reference. Now, your MacPorts install will be completely toast, so that command won’t work until you re-install. No problem though — all of your packages will still be listed even after you upgrade.

The port clean step in the Wiki will crap out in the http* range, but that’s fine … you can probably skip that step anyway. Re-install your core packages and you’re good to go. I suggest installing readline if you haven’t, because it’s very useful in irb or any Ruby console.

MySQL

It was not necesary for me to build MySQL from source. Instead, I just installed the x86_64 version of MySQL — the latest was mysql-5.1.42-osx10.6-x86_64.dmg af the time of this writing.

If this is a 5.x verison upgrade for you as well, the install will just re-symlink /usr/local/mysql, so your old data will still be in your previous install dir.

I didn’t make mysqldumps before I did the upgrade (handpalm) so I had to copy over my data/ files raw and hope that the new version would make friends with them. Initially I had problems with InnoDB. It wasn’t showing up under show engines;, and when I tried to manually install the plug-in — per this bug report, which explains the whole thing — it would fail on the ‘Plugin initialization function’. Turns out you need to do two things when you bring over raw data:

  • Whack your /var/log/mysql/*binary* / binary log files in order to get past mysqld startup errors.
  • Whack your ib_logfile* files too. Once you do that, there’s a good chance MySQL will regen them in recovery mode. Me, I had no choice (except rolling back with Time Machine). Miracle of miracles … it works!

Don’t try this at home kids. Make your backups. Note: here’s the correct link to the manual page on InnoDB tablespace fun.

x86_64 ARCHFLAGS

Snow Leopard is a lot more native 64-bit than previous OS/X versions, and when you do your manual builds & makes, you may want to set the following environment variable:

export ARCHFLAGS="-Os -arch x86_64 -fno-common"

You’ll see a set of similar (though mixed) recommendations in the blogs I reference below; this particular flagset worked for me.

Ruby

I built Ruby 1.9 at work, and 1.8.7 on my personal machine. Either path is fine, just pick up the latest source of your choosing. Chris Cruft’s blog post goes into some of the details I’m describing here as well. Basically, the README boils down to:

autoconf
./configure --with-readline-dir=/usr/local
make clean && make
make install

Though there’s no reason in the world that you’d want to — it has been superceded — do not install ruby 1.9.1p243. If you do, you’ll never get the mysql gem to work. Or wait, or was it mongrel? Well, it was one or the other … just trust me, it’s bad.

Gem

I re-built gem from source from scratch as well, just to be sure. Save off your gem env and gem list --local as a reference. And before you start installing gems, you’ll also probably want to make sure you’re fully up-to-date with gem update --system, though that’s probably redundant.

Uninstall and re-install all of your gems; if some won’t uninstall even though they’re listed, it may be an install path issue. Use gem list -d GEMNAME to find where your gem was installed, and then use gem uninstall -i DIRNAME GEMNAME to finish the job.

With the ARCHFLAGS in place, the vast majority of the native source gem builds will go smoothly, but there are some notable exceptions …

The mysql Gem

Uh huh, this is the one gem that gets me every time. And again, you won’t have needed to have built MySQL from source.

For starters, you may way want to glance over this very useful iCoreTech blog post to see if it works for you. But if you run into a lot of issues like I did, you may need to do it in two steps:

Fetch and Install the Gem

At the time of this writing, either mysql gem version 2.7 or 2.8.1 will do the trick.

gem fetch mysql --version VERSION
gem install mysql -- --with-mysql-dir=/usr/local --with-mysql-config=/usr/local/mysql/bin/mysql_config
gem unpack mysql

Sadly, it may fail, either during the build or when you try to test it. I was able to successfuly run the (included?) test.rb at my workplace, but as simple as that sounds, I swear I don’t remember how I did it ! The second time, at home, I only found the problems retroactively when I tried to get my Rails projects to boot. If you do find and run the test.rb, you’ll need to make sure that the standard MySQL test database exists.

Both times, one of the big blockers that I — and many other people — ran into was:

NameError: uninitialized constant MysqlCompat::MysqlRes

If so, try this:

Manually Re-build the Binary

Go into ext/mysql_api, make sure your ARCHFLAGS are exported as described above, and …

ruby extconf.rb --with-mysql-config=/usr/local/mysql/bin/mysql_config
make clean && make
make install

Hopefully your newly built & installed binaries will resolve the issue.

Mongrel

It took me a little effort to build mongrel on Ruby 1.9 with the x86_64 architecture. My memory is a little hazy — since my 1.8.7 build at home worked perfectly through standard gem install — but buried deep in this Is It Ruby contributory blog post are probably all the answers you’ll need.

Under Ruby 1.9, I did had to modify the source, which (to paraphrase) involved some global code replacements:

  • RSTRING(foo)->len with RSTRING_LEN(foo)
  • RSTRING(foo)->ptr with RSTRING_PTR(foo)
  • change the one-line case ... when statements from Java-esque : delimiters to then‘s.

And then the re-build:

ruby extconf.rb install mongrel
make
make install
cd ../..
ruby setup.rb
gem build mongrel.gemspec
gem install mongrel.gem

Conclusion

And that’s as far as I had to go. Whew! I certainly hope that my post has been of some assistance to you (and with a minimum of unintended mis-direction). Of course, I learned everything the I reiterated above by searching the ‘net and plugging away. And there’s plenty of other folks who’ve gone down this insane path as well. Good luck, brave soul!

What Would a Wookie Do?

Saturday, September 12th, 2009

Yes, I had to ask myself that question a lot recently, at least from the perspective of how he would use the Twitter service. As much as this is a theoretical question, I believe I came up with some answers, and they are made manifest in the @cr_wookie Personality Engine.

Base-line Aesthetics

The first step in bringing a Wookie to life was to establish a basic phoenetic dialect. So I came up with a set of candidates ‘words’ along the lines of; auhrn, rghrn, gahrn, hraur, urau, ehruh, and nrauh. Hey, they sounded good. There’s about 60 of them total, comprised of the letters A E G H N R and U. After having watched The Empire Strikes Back, where Chewbacca seems to get most of his good lines, I expanded into a few W words as well (wahr seems to work particularly well). Many folks postulate that he was capable of speaking Os and Ks, yet I myself do not subscribe to that opinion.

I then used MadderLib to construct simple word generators which allowed the phoenemes to stretch appropriately for different word lengths. Long vowels, double Rs and Hs, whatever looked good. And with a little compositing logic, I came up with some sentence patterns that were quite fun to read out loud:

rauhr nruuuhh raghr uhr rghrrnnauhrnaauuuuuurhhh
nrauuh euu gauuhrr ruhr
urhn ehrraaah rhhreuhraahhrrrrnn gaurrh

Those are rather plain-looking though. In order to make the Wookie’s statements seem more like txt argot, some variety was needed. Punctuation was obvious, both terminating and delimiting (commas, semicolons). Plus there’s the proper use of txt idioms, the LOLs and WTFs that are so popular with the youngsters these days (AFAIK). Throw in a nice little collection of emoticons, and behold; the Wookie has charm:

LOLZ! uhrrn euueuuhhaur, eruh hraur nrauururhnehrrraaah auhrn harn aurreruhaaarruuuhh rraaghrrr ^_^
hraur rghneruh nuh waahhr???
rhhhnnghn uhrrrnn ehrah. euu urau ehuuurrr urrn aurheuuuh haarn uhrrrn k?… haauurrr nruharuhuhhrrh hruaaaauuh ghrn rghn nrruuhh

Well, yeah, they still look sorta flat. Real people quote and capitalize portions of what they type, and there are other non-verbal components to the average sentence. So, the Wookie was taught to inject numbers, times, abbreviations, and even Star Wars calendar years into his sentences:

gahrn rghhr ghrrnehur rghhrrn hruauh raghrehurauuuuuuhrn gahrn?? hrraaauu: hrrraaauu rauhr ;) _ehruuh_ ‘Rahr Ehrrraaaah’
raauh hrau Ehrahurrrrh *rauhr* gruh
auhrneuuhr. harnuhrrnuhrrr uhr – raauhneuuu 1:30 gahrn raauuughrrr *nuuh* aahhrruuuunn uhr. wuurh harn rhr?

Once the Wookie was at this point, he could talk for quite some time and produce diverse aesthetic results. Reading them out loud is a hoot! Thus was born the first Personality Engine bot (the Wookie is comprised of four of them). But he still wasn’t really tweeting until he could follow some of the core Twitter memes.

Twitter Memes

Hash tags were the first obvious choice, since they were easy to fake. To this day, the Wookie can simply prepend a # to any word or composite that he speaks. But to make this feature really zing, I added support for Twitter’s trending searches. This allowed him to use real-world releveant tags, injecting them into his sentences, or appending them to his tweet (as is common convention). It turns out that one of the joys of a nonsense grammar is that anything which isn’t nonsense magically becomes the ‘meaning’ of the sentence:

Harn ehureruheuuu & ahrn rrghhhn! AAAHHRRNNAUHRN EHUUR! #itsnotgonnawork
WAHR NUUUH RAUHRR!!! euu ‘aauurr nraur’ haaauurrurau! #fact
GRRUUUH! uhr hraheuuuuhhr aauurh #ChargersSuck rrhnn aur! gauhr aaurh haurerruuuhh! !! #aurh

Of course, no Twitter user can resist posting shortened URLs. They’ve been a cornerstone to the explosive growth of the service, maybe because there’s just so much interesting fast-moving crap out there on teh Internets. The Wookie follows several aggregation services — Digg, Technorati — and a smattering of other popular blogs — TMZ, The Onion Daily, LOLcats — etc. He pulls out links to recent content and shortens them with the bit.ly API. Again, since the Wookie is totally faking it, the results just cannot be accounted for. The best he can hope for is that the emotional texture of his tweet sometimes support the referenced source:

nrauh rrhn hrauuur ehrruhauhrnurrh. nrraauuuhh IMHO. hraur grraauhurhnauhrn rauh http://bit.ly/mjOD7
Ghrn euuhrr? haarneruuuhh urrn aruh rghrnn aaur, uhr hruh urrr :) http://bit.ly/1a9j1K

And no tweeter lives in a vacuum; their posts are replete with the user handles of friends, comrades and mentors.
The Wookie wasn’t about to make up handles, so his likely choices were his followees and followers.
Rather than take the name-dropping approach here — more on that later — the Wookie chooses to occasionally reference his most recent followers:

OMG! _rrrhnnneuh_ euh: urr wuuurrh rghnurrnh urhn hruhn @sleepbotzz rhagn ghrn rrhn waaahhr hruauhehuraghrrrrrnnn rhagn harrnaurh

After these features were implemented, the Wookie’s posts started to look almost real-ish. And whenever he tweets on his own, that is his range of capabilities. But he’s still not a real member of the Twitter community until he could play some other tricks. Thus began a completely separate effort; how to translate English into Wookie.

Mocking

Did I say ‘translate’? What I meant to say was ‘mock‘. After all, what can you really do with a nonsense grammar except make it look like it has meaning.

So, the Wookie was taught to mock existing sentences into his own dialect. He simply matches the initial letter (vowel / not) and preserves the word’s length and non-alpha characters (for contractions and the like). Special mappings were also added to deal with short words (the dialect only generates words 4-letters and above). And within a given tweet, he re-uses the same fake word for each instance of the real one. It’s an obscure feature, but it makes a helluva difference in some specific cases.

The totally awesome part of effort is identifying the words that don’t get mocked. There was no way I wanted to deal with semantic grammar detection, since tweets are often wildly non-grammatical. So as per usual, the Wookie fakes it. It mainly comes down to a weak analysis of quoting and capitalization patterns. He also keeps hash tags, links, handles, many acronyms, and argot — to the best of his ability.

And just for fun, he also recognizes a rather large lexicon of terms from the Star Wars universe. Well, except for the term ‘Star Wars’ that is. He doesn’t know what that means.

It took a lot of experimenting to get it right, and he still makes mistakes, but he’s getting smarter all the time. One of the interesting things I learned during development was how staccato the English language is, as compared to the long smooth yawls of Wookie. Reading back a mocked sentence out loud is a sublime experience.

You may ask, how can this awesome power best be used to serve the Good?

Re-Tweeting

Darn right the Wookie re-tweets! He simply selects a few users that he follows, derives their recent tweets, and mocks one of them up. There are some users — @darthvader for instance — which he will always re-tweet if the user has posted anything fresh. Otherwise it’s a simple random selection, after avoiding repeats (there’s extensive repeat-avoiding code all throughout the Wookie implementation). There is the slight hint of name-dropping here, since he tends to follow a lot of popular accounts, but that’s just the nature of this beast.

RT @warrenellis Rauuh’r @neilhimself ehr #neilfail ar hruun rraaauuuuhrrr? Au. #warrenfail. http://bit.ly/16IiUE
RT @KurzweilAINews: First Close Look At Stimulated Brain: Aghhrrrrnn gaauuuhhrrr ar hrrauur aauuuhrrn u gahrrn … http://bit.ly/17dg34
RT @cnnbrk Hruh. Urrnn nrrauuuhh Ted Kennedy ur “rrrhhrrr ghr rauhn hru rau wahr; wahr au Democratic Party; … http://bit.ly/3FmLyq

It turns out that injecting the re-tweet ‘header’ will often push longer ones past the 140ch barrier. He will attempt to preserve as much of the original tweet as possible, focusing on trailing URLs and hash tags. And if the tweet is short enough, he posts a shortened link to the original post, primarily to show off his mad skills. He is much inclined towards tweets which have a good blend of mockable and preservable words, again, to show off his mad skills.

This became the second Personality Engine bot. Yet still, re-tweeting is a one-way street, and interaction is the real key to user engagement.

Playing Well With Others

The third Personality Engine bot was borne of the need to perpetuate the following cycle of fun. On a regular basis, the Wookie will search for references to relevant words — wookiee, kashyyyk, etc. — and will respond to the user with a generated tweet. This is much less invasive and cruel than auto-following, a botting practice which I find to be quite gauche. I can only imagine the surprise on these user’s faces:

@amynicole21 WTF! aruh nrauuhehruhaaauuurr rraaahhhrr nruuh urrrrn ghrn wurh: euu haarrn nuuh grruhuhrnuhhrrn erruuuuhh :)
@DZ1641 gauuuhrr??
@vfigueroa1 rghrurr waahr! rrhneuuuhr urr nruuh hrauh – *ehrraaah* nrauh ^_^ nraur hruun rrrghhnn

However, before he goes searching, he first looks at his recent mention history, specifically at tweets starting with @cr_wookie. If one is found, he will mock and publicly respond to it, linking back to the original post when length permits. So if you talk to the Wookie, there’s a reasonable chance that he’ll republish you. To minimize abuse of this feature, he doesn’t follow quite the same word preservation rules as he does for follower re-tweeting. But he’ll keep Star Wars words, and that opens up a vast realm of potential amusement.

. @kindadodgy Nurh U hraaauuuu rghn Wookie rauuhrr wau’r hruh nrh ghrn hrun ‘rhngn ruhrn uhr wurh rn nuuh gh, ahrn au nrruuuuhh gauurh.
> @adamlampert Ar. U hru nrh raaghrr gh HR’N! Rghn ruuhr au! http://bit.ly/591qW
.@Lillput Nrh, rhag’n rauuhr nurh ghr haurr au a rghhnn ur a rghn.

Greeting New Followers

The fourth and final Personality Engine bot is the greeter. When you follow him, he’ll DM you. Short and to the point.

Summary

Whew! All in all, the project required about 6 weeks of spare time. My only hope is that much hilarity will ensue from these efforts.

If you want to read a bit more about the Wookie — and who wouldn’t, right? — you can check out his Wiki page.