Posts Tagged ‘bot’

What Would a Wookie Do?

Saturday, September 12th, 2009

Yes, I had to ask myself that question a lot recently, at least from the perspective of how he would use the Twitter service. As much as this is a theoretical question, I believe I came up with some answers, and they are made manifest in the @cr_wookie Personality Engine.

Base-line Aesthetics

The first step in bringing a Wookie to life was to establish a basic phoenetic dialect. So I came up with a set of candidates ‘words’ along the lines of; auhrn, rghrn, gahrn, hraur, urau, ehruh, and nrauh. Hey, they sounded good. There’s about 60 of them total, comprised of the letters A E G H N R and U. After having watched The Empire Strikes Back, where Chewbacca seems to get most of his good lines, I expanded into a few W words as well (wahr seems to work particularly well). Many folks postulate that he was capable of speaking Os and Ks, yet I myself do not subscribe to that opinion.

I then used MadderLib to construct simple word generators which allowed the phoenemes to stretch appropriately for different word lengths. Long vowels, double Rs and Hs, whatever looked good. And with a little compositing logic, I came up with some sentence patterns that were quite fun to read out loud:

rauhr nruuuhh raghr uhr rghrrnnauhrnaauuuuuurhhh
nrauuh euu gauuhrr ruhr
urhn ehrraaah rhhreuhraahhrrrrnn gaurrh

Those are rather plain-looking though. In order to make the Wookie’s statements seem more like txt argot, some variety was needed. Punctuation was obvious, both terminating and delimiting (commas, semicolons). Plus there’s the proper use of txt idioms, the LOLs and WTFs that are so popular with the youngsters these days (AFAIK). Throw in a nice little collection of emoticons, and behold; the Wookie has charm:

LOLZ! uhrrn euueuuhhaur, eruh hraur nrauururhnehrrraaah auhrn harn aurreruhaaarruuuhh rraaghrrr ^_^
hraur rghneruh nuh waahhr???
rhhhnnghn uhrrrnn ehrah. euu urau ehuuurrr urrn aurheuuuh haarn uhrrrn k?… haauurrr nruharuhuhhrrh hruaaaauuh ghrn rghn nrruuhh

Well, yeah, they still look sorta flat. Real people quote and capitalize portions of what they type, and there are other non-verbal components to the average sentence. So, the Wookie was taught to inject numbers, times, abbreviations, and even Star Wars calendar years into his sentences:

gahrn rghhr ghrrnehur rghhrrn hruauh raghrehurauuuuuuhrn gahrn?? hrraaauu: hrrraaauu rauhr ;) _ehruuh_ ‘Rahr Ehrrraaaah’
raauh hrau Ehrahurrrrh *rauhr* gruh
auhrneuuhr. harnuhrrnuhrrr uhr – raauhneuuu 1:30 gahrn raauuughrrr *nuuh* aahhrruuuunn uhr. wuurh harn rhr?

Once the Wookie was at this point, he could talk for quite some time and produce diverse aesthetic results. Reading them out loud is a hoot! Thus was born the first Personality Engine bot (the Wookie is comprised of four of them). But he still wasn’t really tweeting until he could follow some of the core Twitter memes.

Twitter Memes

Hash tags were the first obvious choice, since they were easy to fake. To this day, the Wookie can simply prepend a # to any word or composite that he speaks. But to make this feature really zing, I added support for Twitter’s trending searches. This allowed him to use real-world releveant tags, injecting them into his sentences, or appending them to his tweet (as is common convention). It turns out that one of the joys of a nonsense grammar is that anything which isn’t nonsense magically becomes the ‘meaning’ of the sentence:

Harn ehureruheuuu & ahrn rrghhhn! AAAHHRRNNAUHRN EHUUR! #itsnotgonnawork
WAHR NUUUH RAUHRR!!! euu ‘aauurr nraur’ haaauurrurau! #fact
GRRUUUH! uhr hraheuuuuhhr aauurh #ChargersSuck rrhnn aur! gauhr aaurh haurerruuuhh! !! #aurh

Of course, no Twitter user can resist posting shortened URLs. They’ve been a cornerstone to the explosive growth of the service, maybe because there’s just so much interesting fast-moving crap out there on teh Internets. The Wookie follows several aggregation services — Digg, Technorati — and a smattering of other popular blogs — TMZ, The Onion Daily, LOLcats — etc. He pulls out links to recent content and shortens them with the bit.ly API. Again, since the Wookie is totally faking it, the results just cannot be accounted for. The best he can hope for is that the emotional texture of his tweet sometimes support the referenced source:

nrauh rrhn hrauuur ehrruhauhrnurrh. nrraauuuhh IMHO. hraur grraauhurhnauhrn rauh http://bit.ly/mjOD7
Ghrn euuhrr? haarneruuuhh urrn aruh rghrnn aaur, uhr hruh urrr :) http://bit.ly/1a9j1K

And no tweeter lives in a vacuum; their posts are replete with the user handles of friends, comrades and mentors.
The Wookie wasn’t about to make up handles, so his likely choices were his followees and followers.
Rather than take the name-dropping approach here — more on that later — the Wookie chooses to occasionally reference his most recent followers:

OMG! _rrrhnnneuh_ euh: urr wuuurrh rghnurrnh urhn hruhn @sleepbotzz rhagn ghrn rrhn waaahhr hruauhehuraghrrrrrnnn rhagn harrnaurh

After these features were implemented, the Wookie’s posts started to look almost real-ish. And whenever he tweets on his own, that is his range of capabilities. But he’s still not a real member of the Twitter community until he could play some other tricks. Thus began a completely separate effort; how to translate English into Wookie.

Mocking

Did I say ‘translate’? What I meant to say was ‘mock‘. After all, what can you really do with a nonsense grammar except make it look like it has meaning.

So, the Wookie was taught to mock existing sentences into his own dialect. He simply matches the initial letter (vowel / not) and preserves the word’s length and non-alpha characters (for contractions and the like). Special mappings were also added to deal with short words (the dialect only generates words 4-letters and above). And within a given tweet, he re-uses the same fake word for each instance of the real one. It’s an obscure feature, but it makes a helluva difference in some specific cases.

The totally awesome part of effort is identifying the words that don’t get mocked. There was no way I wanted to deal with semantic grammar detection, since tweets are often wildly non-grammatical. So as per usual, the Wookie fakes it. It mainly comes down to a weak analysis of quoting and capitalization patterns. He also keeps hash tags, links, handles, many acronyms, and argot — to the best of his ability.

And just for fun, he also recognizes a rather large lexicon of terms from the Star Wars universe. Well, except for the term ‘Star Wars’ that is. He doesn’t know what that means.

It took a lot of experimenting to get it right, and he still makes mistakes, but he’s getting smarter all the time. One of the interesting things I learned during development was how staccato the English language is, as compared to the long smooth yawls of Wookie. Reading back a mocked sentence out loud is a sublime experience.

You may ask, how can this awesome power best be used to serve the Good?

Re-Tweeting

Darn right the Wookie re-tweets! He simply selects a few users that he follows, derives their recent tweets, and mocks one of them up. There are some users — @darthvader for instance — which he will always re-tweet if the user has posted anything fresh. Otherwise it’s a simple random selection, after avoiding repeats (there’s extensive repeat-avoiding code all throughout the Wookie implementation). There is the slight hint of name-dropping here, since he tends to follow a lot of popular accounts, but that’s just the nature of this beast.

RT @warrenellis Rauuh’r @neilhimself ehr #neilfail ar hruun rraaauuuuhrrr? Au. #warrenfail. http://bit.ly/16IiUE
RT @KurzweilAINews: First Close Look At Stimulated Brain: Aghhrrrrnn gaauuuhhrrr ar hrrauur aauuuhrrn u gahrrn … http://bit.ly/17dg34
RT @cnnbrk Hruh. Urrnn nrrauuuhh Ted Kennedy ur “rrrhhrrr ghr rauhn hru rau wahr; wahr au Democratic Party; … http://bit.ly/3FmLyq

It turns out that injecting the re-tweet ‘header’ will often push longer ones past the 140ch barrier. He will attempt to preserve as much of the original tweet as possible, focusing on trailing URLs and hash tags. And if the tweet is short enough, he posts a shortened link to the original post, primarily to show off his mad skills. He is much inclined towards tweets which have a good blend of mockable and preservable words, again, to show off his mad skills.

This became the second Personality Engine bot. Yet still, re-tweeting is a one-way street, and interaction is the real key to user engagement.

Playing Well With Others

The third Personality Engine bot was borne of the need to perpetuate the following cycle of fun. On a regular basis, the Wookie will search for references to relevant words — wookiee, kashyyyk, etc. — and will respond to the user with a generated tweet. This is much less invasive and cruel than auto-following, a botting practice which I find to be quite gauche. I can only imagine the surprise on these user’s faces:

@amynicole21 WTF! aruh nrauuhehruhaaauuurr rraaahhhrr nruuh urrrrn ghrn wurh: euu haarrn nuuh grruhuhrnuhhrrn erruuuuhh :)
@DZ1641 gauuuhrr??
@vfigueroa1 rghrurr waahr! rrhneuuuhr urr nruuh hrauh – *ehrraaah* nrauh ^_^ nraur hruun rrrghhnn

However, before he goes searching, he first looks at his recent mention history, specifically at tweets starting with @cr_wookie. If one is found, he will mock and publicly respond to it, linking back to the original post when length permits. So if you talk to the Wookie, there’s a reasonable chance that he’ll republish you. To minimize abuse of this feature, he doesn’t follow quite the same word preservation rules as he does for follower re-tweeting. But he’ll keep Star Wars words, and that opens up a vast realm of potential amusement.

. @kindadodgy Nurh U hraaauuuu rghn Wookie rauuhrr wau’r hruh nrh ghrn hrun ‘rhngn ruhrn uhr wurh rn nuuh gh, ahrn au nrruuuuhh gauurh.
> @adamlampert Ar. U hru nrh raaghrr gh HR’N! Rghn ruuhr au! http://bit.ly/591qW
.@Lillput Nrh, rhag’n rauuhr nurh ghr haurr au a rghhnn ur a rghn.

Greeting New Followers

The fourth and final Personality Engine bot is the greeter. When you follow him, he’ll DM you. Short and to the point.

Summary

Whew! All in all, the project required about 6 weeks of spare time. My only hope is that much hilarity will ensue from these efforts.

If you want to read a bit more about the Wookie — and who wouldn’t, right? — you can check out his Wiki page.

Snake ‘n’ Bacon in The DDOS Caper!

Friday, August 7th, 2009

ah, come in! we’re so glad you’ve come Snake ‘n’ Bacon!
i’m crisp delicious bacon
sssss

glad you asked. it seems there’s a group of hackers, and we want you to go in under-cover
i go great on a sandwich
sssss


When Twitter came back online yesterday afternoon after their networking attacks, I got a torrent of @cr_snake_bacon tweets. Wasn’t sure why, but it seemed suspicious. Twitter’s API had flopped around for most of the day, so the logs were full of Exceptions and … oops! … re-connect attempts!

Of course I’d built the bots to re-tweet on an Exception. They’re all configured to wait 60 seconds, then try again. But of course until I fixed the configuration over night, they did exactly what a bot would doconspicuous

The service attacks on Twitter continued through today, and I’m sure that the birdy techs are furiously building black ice fortresses in Scala even now. Again, I saw a burst this afternoon from all of my bots. Pokey the Penguin, Conet Project, and Chewbacca all had several things to say, all at once. Obviously I had fucked something else up, so I hurriedly checked the logs. And nope … actually, my change had worked … Twitter had just un-blocked my IP.

*whew*

I’m not exactly sure how many bots are out there … here’s a nice wiki being kept of them. But I can imagine I’m not the only one who made that try-again coding mis-calculation. What’s sweet is that it’s un-done now, and my toys can continue prattling on.

Thanks, guys. Sorry we looked like a vicious autonoma for a while there. Glad to be back.

When Broken Toys Impact your Friends

Friday, February 6th, 2009

This sure was an interesting morning! I woke up to find that I’d unintentionally sent direct messages to all of the followers on my personal Twitter account. And I’d sent them out at 1a PST, which means that anyone who (a) uses SMS capabilities, and (b) has some text message notification sound set up would have been rudely interrupted in the middle of the night.

Fortunately, I haven’t lost any followers (yet). But this was a perfect case of how mixing business with pleasure can have unintended consequences.

What Have I Learned

Or rather, what have I re-learned

Soft-disable features in Production at Launch Time
 
My Twitter engines are built with both an :enable_tweet and :enable_greeting config setting. In the git repo, they’re both true. When I did my local testing, I’d disabled them correctly. When I launched in Production, I neglected to make the quick-and-dirty changes; after all, everything worked great. And once started, my scripts correctly responded to the no-initial-state condition, and greeted everybody.

Launch preparation is critical, even for little projects. The start-up mentality is to move fast and lean, but there’s such as thing as too fast, and probably as too lean too. Gradual uptake migration is a wise strategy even for the ‘little things’.

Mock and Integration Testing Only Gets You So Far
 
I used rspec to mock out the full capabilities of the engine. Found some real-world issues, resolved them. I also wrote some core integration tests, ran them locally. Immediate failures. I had mocked documented features that didn’t actually exist. Fixed, re-mocked, re-tested, fixed again, etc .

Another great reminder that you can only mock something you trust, and how can you trust something you haven’t actually run under integration conditions to start with! Re-tested integration, and everything passed with flying colors. Sure, the features worked great now! And when I launched them, they did exactly what I asked.

So, as if we haven’t heard it enough times, be careful what you ask for!

All of this is familiar to anyone who has made a mistake in the software industry. It’s not like I haven’t successfully executed dozens of critical launches in the past, and most with virtually no issues at all. But what’s interesting is what happens when these mistakes happen in a public forum, and whom you expose them to — say, your friends :)

And who can say when two ounces of caution is more deserving than one … without the benefit of hindsight.

Just ask anyone who has a stringent backup policy how much time & effort they invest to avoid an event that may never actually happen. That stringency usually comes from that one unforgettable experience, and from there is born an extra layer of caution, and an additional time-sink (eg. mock & integration testing)

Heh. KISS. So, what exactly is simple? DRY. Isn’t that supposed to be a time-saver? Well, it depends on what you’re not trying to repeat. Strange how these cuddly and liberating acronyms can have more than one interpretation.

Experience taints everything.