Twitter Needs an Offline Mode and an Open Client
Dave Winer gave me the idea that clicked a bunch of pieces into place on my thoughts about Twitter’s need to scale. I want to put forth an idea that comes from my background in enterprise IT, where we had an application that wrote hundreds of thousands of short records a second to a database, and where we had processes in place for when the platform went down. Here are the problems, and then how I’d solve them with my enterprise IT hat on:
1.) Twitter, when it goes down, has nowhere to pass the traffic. This frustrates the customer base badly.
In my former wireless telecom world, we made a function that would permit calls to process while our database was offline. Call detail records would store up in a smaller database run by the same front end and middleware, and then when the databases were back up, we’d insert the records, and process everything accordingly.
2.) Twitter is essentially an app writing into a back end.
We need one layer of abstraction. One way to do this would be to use a client like Twhirl, and give it the availability to write tweets to two places: an RSS feed (so we could do more with the data- and I want that feature anyway, Loic. Okay?), and the second to an intermediate database somewhere on the Amazon S3 cloud.
When Twitter’s down, we run from Twhirl’s second pointer. When it comes back up, the database of new tweets gets reinserted.
If this is best accomplished by an XML feed, think about it: how much storage are 140 characters (okay, plus the meta data) for everyone you’re following. Make it a Twhirl-only feature for all I care.
Or, if Not Twhirl
I was thinking that we need SOME kind of front end, like a Firefox for this new kind of app. Something that resides in open source so that we can fork it, adjust it, adapt it, and work on the same code core, with the same baseline features, but with our own bells and whistles.
Enterprise Thinking
It’s strange that reading that one comment from Dave gave me a whole refresher course in how I used to work with fast-moving enterprise-grade data, and that it certainly has some parallels in what Twitter’s trying to do.
Take all these thoughts with love, Twitter and Twhirl. And please try and help us keep the flow alive.
If you enjoyed this post, please consider leaving a comment or subscribing to the feed to receive future articles delivered to your feed reader.
Comments
Assetbar, seeing the same issues, brought up this topic back in February, arguing they could act as proxy, and submit Twitter data around the outages.
See more here…
Twitter-proxy: Any Interest?
http://assetbar.wordpress.com/2008/02/08/twitter-proxy-any-interest/
I’m not sure I see how this would help without fundamentally fixing the back end anyway. Twitter has trouble keeping up with volume during normal operations- what room do they have to quickly process the backlog of messages that occur while down along with the normal traffic they’d be suffering?
The idea of having a redundant system is definitely necessary to handle down times, but it does nothing for the problems they’re suffering now unfortunately.
There has been a lot of thinking in this direction (I guess Twitter only has themselves to blame) and I’m hopeful that something like you describe will come out of it. An open client can only be a good thing too.
(Barely related sidenote: I miss using Twhirl! Adboe AIR doesn’t really work properly on Linux yet. Someone should make the point to them that cross-platform means cross-platform!)
They don’t have to quickly process the backlog. They can thread it in over time.
When we’re mad about Twitter being down, it’s usually because we want the point-in-time conversation. We’re at a show, or we’re watching the election, or we’re looking for people at the Tweetup, or the conference.
The real premise here is that we have something that can handle the crash, keep us typing, and then move us forward until the next opening.
When Twitter is down, I write down messages on a piece of paper until the urge passes. Most of the time the paper can be discarded.
@Edward Vielmetti If you didn’t get that comment from http://whentwitterisdown.com/ you should submit it to @lonelysandwich for inclusion! Brilliant!
But the problems Twitter is having is in relation to sending out all the updates to all the necessary people. It’s not as simple as just dropping it into a database and things magically happening when Twitter itself is down.
You’ll miss real time conversation if it’s just dropping into an extra database somewhere. And you’ll miss the relevancy if you’re just storing the updates for later processing by Twitter itself.
Optimization of message handling is ultimately the issue Twitter is having - it’s not about throwing a few more boxes at it.
You make a good point about Twitter and give me things to think about for my own application design. I’ve never handled more than 80k requests a day. I’d love to deal with this level of traffic–wouldn’t we all–and work on Twitter-sized problems.
Yes, all great comments. And as an enterprise UNIX engineer myself, I agree they are all necessary to ensure Twitter’s longterm scalability.
But they are all expensive. Twitter is going to pay for all of this extra hardware, and engineering labs, and $100k+ people to design, test, QA and deploy it how?
you’re idea is fuzzy and has a lot of potential problems, and to be honest, makes very little sense. Maybe you could give us a more detailed idea of the architecture you envision.
For example, what exactly do you mean by “publishing to an RSS feed.” ? Are you seriously suggesting writing tweets directly into a flat text file rather than into a database?
You can use MessageDance to make tweets when Twitter is down (and they are accessible publicly through RSS). When Twitter comes back, we push them through. We also cache all of your tweets (and those you follow) in our MyTweets section.
Joe Cascio will be talking about the idea of “Distributed Twitter” at next Thursday’s Boston Ignite Night. So if anyone is in the Boston area next week stop by and discuss.
Does anyone know what the relationship is between the outages and new twitter subscribers, downtime to events, or the influx of new twitterfeeds? It would be interesting to see this graph.
Everyone uses twitter for his or her own reason, but for me, I’m a news junkie. I follow many news sources. I notice these outlets use twitterfeed. Sometimes they have so much tweet output that it is necessary for me to choose to unfollow them just to see my friends.
I too have thought about a solution to the Twitter outages and wonder if the Twitter engineers could come up with a tiered level of service for some types of high volume media outlets. As a twitter subscriber, it would be great if I could toggle these groups on/off without having to unfollow each one individually. I think this would also allow the Twitter folks to throttle recourses during peak times much like data centers throttle bandwidth.
Any thoughts?
I’m not too techwise, so most of the post went right over my head, but the part about one tweet getting the mind rolling is what happens to me. One thought generates a thunderstorm of thought, which sometimes, either causes reflection or moves me in a totally different direction than I’d previously thought of.
For me, this is one of the beauties of Twitter, getting new ideas from unlikely places.
BTW, Chris, I tagged you for a meme, http://www.iowaavenue.com/profiles/blog/show?id=774881%3ABlogPost%3A27844. If you’re interested, I love to hear your response……………..:)
Having a database stored on Amazon S3 doesn’t really make much sense. But using Amazon SQS would.
I could imagine a client that does something like:
* When you tweet, it gets added to the queue
* Every so often (whatever the API throttling is capped at), pull a tweet off the queue, and try to post it
* If the post fails, ie twitter is down, make note of it, stick the tweet back on the queue, and wait awhile before trying again
It’s moments like this that I wish I knew something about desktop application development.
Chris,
I appreciate the mature and well thought out point-of-view of your post.
I am a blog and Twitter peep of yours ( you were recommended)and now I read why.
:-)
Chris-
Twitter needs help. I think the idea of tweets having a second home is nice. Maybe somewhere near Maui so we can all go visit.
Ed V. Your message cracks me up.
Okay, for those who wanted more detail, let me lay it out:
1.) Let’s say that Twitter has three primary pieces to its architecture right now (not counting maintenance stuff).
– a.) Front end interface (that which connects to the web, to SMS, to Jabber, etc)
– b.) Message gateway - the actual spot where the data gets processed, marked, and stored.
– c.) Database servers and storage.
2.) In the current situation, part c (servers/storage) goes down, and we’re offline.
3.) In my proposal, we do a few things:
– a.) Add a function to the message gateway to shut down writes to the main Twitter database in times of downtime, and throw a flag to alert that we’re on the standby.
– b.) Add a function to the message gateway to write to a separate database.
– c.) Add a function to the message gateway *and* a new database to the primary servers to write/store an XML/RSS feed of our twitter stream.
– d.) Add a function to Twhirl (or an open source Twitter Front End client) that allows Twhirl to detect when Twitter prime is down.
– e.) Add a function to Twhirl (or similar) that allows Twhirl to write to the secondary database.
– f.) Add a function to Twhirl to access the XML copy of our stream and the stream of our friends.
– g.) Add a function to the message gateway that trickle-inserts our “out of service” tweets back into the primary copy of the database.
The outcomes are:
* Twitter functionality runs, even when offline.
* Data retention and integrity.
* Enhanced usability (the RSS feed).
The benefits to Twitter are:
* Continued operation.
* Less bitching and moaning by us.
The benefits to Twhirl are:
* A hands-down reason why we’d use this app as our #1 Twitter interface.
That’s the plan, roughly.
Sometimes on the internet we are able to find some ideas that are very useful to us. They can be useful to us in respect of our business or our personal lives. These ideas help us a lot in our decision making process.
Chris,
Your still missing the point of what the problem is. The problem is, from my understanding, in b) Message Processing. The storage is trivial - anyone with a rudimentary understanding of servers can solve that part.
Logically processing the messages in the way twitter is structured is the problem. Now my initial thought is that this suggests they built twitter from the wrong direction when it comes to message processing and that it’s a relatively easy fix overall but they started out in the wrong direction architecturally. I could be completely off on this and oversimplifying the architecture and the problem they’re facing. But regardless, it’s not a hardware problem, it’s a software problem either way.
You suggest it’s as easy having Twhirl read an XML copy, but think about how many different XML copies it has to make and what decisions have to go into those XML feeds. When Twitter is up and running the API call is simple - Here’s a couple of variables, give me the 20 most recent messages along those values. Unfortunately when Twitter isn’t up, I don’t believe it’s anywhere near that simple - at least not given what I think their approach has been so far.
It’s not so much that I don’t see how your idea would work, it’s just that it doesn’t factor in a lot about Twitter. For example, Twitter allows you to block people from seeing your updates - that doesn’t work if you’re throwing up a public feed to everyone regardless (sure there are ways around it but not inherent to twitter itself per se). @ replies are also broken in this model which for many are a crucial function of Twitter.
Ultimately you end up with a band-aid fix that ignores the fundamental problems Twitter is having and simply drags on the problem further. Why not fix part b the Message Processing where the problem actually is and move on instead of backwards ways to pretend to cover up the issue?
@James - some really good points in your reply. You’re right that I didn’t think much about the block feature, for instance. Same with @replies.
And yet, I think the core of what I’m trying to do is get it to run when the back end is down. It’s not that I want to re-architect to clear the flaws. I want the patch to fix the bad stuff.
Still, great points that I wasn’t considering much.
Leave a comment
-
yes, this is good idea .. but won' t the service then become like SMTP
-
this is a fascinating demonstration of the organic nature of this medium AND THUS why it's so IMPORTANT that we be able to depend on it!
-
See also: Twitter-proxy: Any Interest? http://assetbar.wordpress.com/2008/02/08/twitter-proxy-any-interest/
-
That's funny, now we have Rights to a reliable Twitter.
-
My first thought was "Jabber"
-
The proxy idea is interesting -- based on the assumption that the proxies can handle the load as well
-
Store and forward will just kill Twitter when it comes back up. If there is a TwtterThingy that jabbers your tweets directly to someone else's TwitterThingy (without storing them) that's fine. But it only works if all your followers and anyone you want to direct message has the Thingy. It's an all or nothing deal and therefore unlikely to happen. It's too late for this idea as far as Twitter is concerned. Twitter is too popular now and will have to fix the problem on its own.
-
What about RSS?
-
I think the potential for XMPP relays has a lot of potential and could also help with enterprise use as well.
-
Down again . .







I take back all that stuff I said about you before :)
awesome post… finally someone asks twitter - where is the queue?