Yes, I’ve had to ask myself that question a lot recently. At least from the perspective of how he would use Twitter.
Base-line Aesthetics
The first step in bringing a Wookie to life was to establish a basic phoenetic dialect. So I came up with a set of candidates ‘words’ along the lines of; auhrn, rghrn, gahrn, hraur, urau, ehruh, and nrauh. Hey, they sounded good. There’s about 60 of them total, comprised of the letters A E G H N R and U. After having watched The Empire Strikes Back, where Chewbacca seems to get most of his good lines, I expanded into a few W words as well (wahr seems to work particularly well). Many folks postulate that he was capable of speaking Os and Ks, yet I myself do not subscribe to that opinion.
I then used MadderLib to construct simple word generators which allowed the phoenemes to stretch appropriately for different word lengths. Long vowels, double Rs and Hs, whatever looked good. And with a little compositing logic, I came up with some sentence patterns that were quite fun to read out loud:
rauhr nruuuhh raghr uhr rghrrnnauhrnaauuuuuurhhh
nrauuh euu gauuhrr ruhr
urhn ehrraaah rhhreuhraahhrrrrnn gaurrh
Those are rather plain-looking though. In order to make the Wookie’s statements seem more like txt argot, some variety was needed. Punctuation was obvious, both terminating and delimiting (commas, semicolons). Plus there’s the proper use of txt idioms, the LOLs and WTFs that are so popular with the youngsters these days (AFAIK). Throw in a nice little collection of emoticons, and behold; the Wookie has charm:
LOLZ! uhrrn euueuuhhaur, eruh hraur nrauururhnehrrraaah auhrn harn aurreruhaaarruuuhh rraaghrrr ^_^
hraur rghneruh nuh waahhr???
rhhhnnghn uhrrrnn ehrah. euu urau ehuuurrr urrn aurheuuuh haarn uhrrrn k?… haauurrr nruharuhuhhrrh hruaaaauuh ghrn rghn nrruuhh
Well, yeah, they still look sorta flat. Real people quote and capitalize portions of what they type, and there are other non-verbal components to the average sentence. So, the Wookie was taught to inject numbers, times, abbreviations, and even Star Wars calendar years into his sentences:
gahrn rghhr ghrrnehur rghhrrn hruauh raghrehurauuuuuuhrn gahrn?? hrraaauu: hrrraaauu rauhr ;) ehruuh ‘Rahr Ehrrraaaah’
raauh hrau Ehrahurrrrh *rauhr* gruh
auhrneuuhr. harnuhrrnuhrrr uhr - raauhneuuu 1:30 gahrn raauuughrrr *nuuh* aahhrruuuunn uhr. wuurh harn rhr?
Once the Wookie was at this point, he could talk for quite some time and produce diverse aesthetic results. Reading them out loud is a hoot! Thus was born the first Personality Engine bot (the Wookie is comprised of four of them). But he still wasn’t really tweeting until he could follow some of the core Twitter memes.
Twitter Memes
Hash tags were the first obvious choice, since they were easy to fake. To this day, the Wookie can simply prepend a # to any word or composite that he speaks. But to make this feature really zing, I added support for Twitter’s trending searches. This allowed him to use real-world releveant tags, injecting them into his sentences, or appending them to his tweet (as is common convention). It turns out that one of the joys of a nonsense grammar is that anything which isn’t nonsense magically becomes the ‘meaning’ of the sentence:
Harn ehureruheuuu & ahrn rrghhhn! AAAHHRRNNAUHRN EHUUR! #itsnotgonnawork
WAHR NUUUH RAUHRR!!! euu ‘aauurr nraur’ haaauurrurau! #fact
GRRUUUH! uhr hraheuuuuhhr aauurh #ChargersSuck rrhnn aur! gauhr aaurh haurerruuuhh! !! #aurh
Of course, no Twitter user can resist posting shortened URLs. They’ve been a cornerstone to the explosive growth of the service, maybe because there’s just so much interesting fast-moving crap out there on teh Internets. The Wookie follows several aggregation services – Digg, Technorati – and a smattering of other popular blogs – TMZ, The Onion Daily, LOLcats – etc. He pulls out links to recent content and shortens them with the bit.ly API. Again, since the Wookie is totally faking it, the results just cannot be accounted for. The best he can hope for is that the emotional texture of his tweet sometimes support the referenced source:
nrauh rrhn hrauuur ehrruhauhrnurrh. nrraauuuhh IMHO. hraur grraauhurhnauhrn rauh http://bit.ly/mjOD7
Ghrn euuhrr? haarneruuuhh urrn aruh rghrnn aaur, uhr hruh urrr :) http://bit.ly/1a9j1K
And no tweeter lives in a vacuum; their posts are replete with the user handles of friends, comrades and mentors. The Wookie wasn’t about to make up handles, so his likely choices were his followees and followers. Rather than take the name-dropping approach here – more on that later – the Wookie chooses to occasionally reference his most recent followers:
OMG! rrrhnnneuh euh: urr wuuurrh rghnurrnh urhn hruhn @cantremember rhagn ghrn rrhn waaahhr hruauhehuraghrrrrrnnn rhagn harrnaurh
After these features were implemented, the Wookie’s posts started to look almost real-ish. And whenever he tweets on his own, that is his range of capabilities. But he’s still not a real member of the Twitter community until he could play some other tricks. Thus began a completely separate effort; how to translate English into Wookie.
Mocking
Did I say ‘translate’? What I meant to say was ‘mock’. After all, what can you really do with a nonsense grammar except make it look like it has meaning.
So, the Wookie was taught to mock existing sentences into his own dialect. He simply matches the initial letter (vowel / not) and preserves the word’s length and non-alpha characters (for contractions and the like). Special mappings were also added to deal with short words (the dialect only generates words 4-letters and above). And within a given tweet, he re-uses the same fake word for each instance of the real one. It’s an obscure feature, but it makes a helluva difference in some specific cases.
The totally awesome part of effort is identifying the words that don’t get mocked. There was no way I wanted to deal with semantic grammar detection, since tweets are often wildly non-grammatical. So as per usual, the Wookie fakes it. It mainly comes down to a weak analysis of quoting and capitalization patterns. He also keeps hash tags, links, handles, many acronyms, and argot – to the best of his ability.
And just for fun, he also recognizes a rather large lexicon of terms from the Star Wars universe. Well, except for the term ‘Star Wars’ that is. He doesn’t know what that means.
It took a lot of experimenting to get it right, and he still makes mistakes, but he’s getting smarter all the time. One of the interesting things I learned during development was how staccato the English language is, as compared to the long smooth yawls of Wookie. Reading back a mocked sentence out loud is a sublime experience.
You may ask, how can this awesome power best be used to serve the Good?
Re-Tweeting
Darn right the Wookie re-tweets! He simply selects a few users that he follows, derives their recent tweets, and mocks one of them up. There are some users – @darthvader for instance – which he will always re-tweet if the user has posted anything fresh. Otherwise it’s a simple random selection, after avoiding repeats (there’s extensive repeat-avoiding code all throughout the Wookie implementation). There is the slight hint of name-dropping here, since he tends to follow a lot of popular accounts, but that’s just the nature of this beast.
RT @warrenellis Rauuh’r @neilhimself ehr #neilfail ar hruun rraaauuuuhrrr? Au. #warrenfail. http://bit.ly/16IiUE
RT @KurzweilAINews: First Close Look At Stimulated Brain: Aghhrrrrnn gaauuuhhrrr ar hrrauur aauuuhrrn u gahrrn … http://bit.ly/17dg34
RT @cnnbrk Hruh. Urrnn nrrauuuhh Ted Kennedy ur “rrrhhrrr ghr rauhn hru rau wahr; wahr au Democratic Party; … http://bit.ly/3FmLyq
It turns out that injecting the re-tweet ‘header’ will often push longer ones past the 140ch barrier. He will attempt to preserve as much of the original tweet as possible, focusing on trailing URLs and hash tags. And if the tweet is short enough, he posts a shortened link to the original post, primarily to show off his mad skills. He is much inclined towards tweets which have a good blend of mockable and preservable words, again, to show off his mad skills.
This became the second Personality Engine bot. Yet still, re-tweeting is a one-way street, and interaction is the real key to user engagement.
Playing Well With Others
The third Personality Engine bot was borne of the need to perpetuate the following cycle of fun. On a regular basis, the Wookie will search for references to relevant words – wookiee, kashyyyk, etc. – and will respond to the user with a generated tweet. This is much less invasive and cruel than auto-following, a botting practice which I find to be quite gauche. I can only imagine the surprise on these user’s faces:
@amynicole21 WTF! aruh nrauuhehruhaaauuurr rraaahhhrr nruuh urrrrn ghrn wurh: euu haarrn nuuh grruhuhrnuhhrrn erruuuuhh :)
@DZ1641 gauuuhrr??
@vfigueroa1 rghrurr waahr! rrhneuuuhr urr nruuh hrauh - *ehrraaah* nrauh ^_^ nraur hruun rrrghhnn
However, before he goes searching, he first looks at his recent mention history, specifically at tweets starting with @cr_wookie. If one is found, he will mock and publicly respond to it, linking back to the original post when length permits. So if you talk to the Wookie, there’s a reasonable chance that he’ll republish you. To minimize abuse of this feature, he doesn’t follow quite the same word preservation rules as he does for follower re-tweeting. But he’ll keep Star Wars words, and that opens up a vast realm of potential amusement.
. @kindadodgy Nurh U hraaauuuu rghn Wookie rauuhrr wau’r hruh nrh ghrn hrun ‘rhngn ruhrn uhr wurh rn nuuh gh, ahrn au nrruuuuhh gauurh.
> @adamlampert Ar. U hru nrh raaghrr gh HR’N! Rghn ruuhr au! http://bit.ly/591qW
.@Lillput Nrh, rhag’n rauuhr nurh ghr haurr au a rghhnn ur a rghn.
Greeting New Followers
The fourth and final Personality Engine bot is the greeter. When you follow him, he’ll DM you. Short and to the point.
In Summary
Whew! All in all, the project required about 6 weeks of spare time. My only hope is that much hilarity will ensue from these efforts.
If you want to read a bit more about the Wookie – and who wouldn’t, right? – you can check out his Wiki page.