Musings of Geekdom by Eric Newton

tail /var/log/thoughts
posts - 88 , comments - 41 , trackbacks - 68

Reducing bandwidth required by blogs

I've always wanted to touch on this, after reading a few blog posts about the problem.

I see the problem as sending too much info (high level wise) out to all of your consumers (rss aggregate readers)

Consider the fact that aggregate readers form the bulk of the blog post requests, I still haven't seen any effort to streamline THAT traffic... I've only really seen efforts to scale back on the “web interface” for the blogs... which in a way is good, but not the answer, again making the assumption that the bulk of the traffic is through the aggregators.

Considering how most of the aggregators are set up, where they check the “Main Feed” every 60 minutes to determine whats new, and maybe an active “Main Feed” will have >25 posts within 60 minutes, then the aggregator is losing posts.  Also consider the other extreme: a blogs' MainFeed has maybe 2 new posts within 60 minutes... thats 23 posts (+bandwidth) that the aggregator throws out.  Thats wasted bandwidth!

Here's my answer:

The server should generate GUIDs for all blog posts (standard issue I would assume), then simply have the RSS aggregator say “give me new blog posts after this GUID.”  Sure, its potentially more DB intensive then the easily cached “last 25 posts”... Are we trying to make it easier on SQL server (what's the point of that?) or are we trying to reduce bandwidth?  Since bandwidth directly costs money, I'm gonna make that SQL server do a little more work to save money/time/bandwidth.

Given the above scenario, a heavily posted-to MainFeed will still have to serve out each client's “delta post set” (meaning, the posts that the client hasn't even downloaded yet). 

Now are there ways to reduce that traffic still?  A most definite “YES”... I've seen a project in the works to download RSS feeds via torrents... great idea, considering the P2P technology that torrents represent... thus the or server hands out these posts to the consumers that are also distributors...

There's also simple redirects to the aggregators to pull from “cache” servers based on how many posts need to be downloaded...

I'm somewhat off on a tangent but the point is that more effort should be made on the aggregator side for reducing bandwidth.

Print | posted on Monday, June 6, 2005 3:57 PM |



# re: Reducing bandwidth required by blogs

Unfortunately, hanging parameters off the query creates a new problem: It foils caching proxies. If you have a lot of clients behind caching proxies you can see your bandwidth consumption go *up* instead of down.
6/6/2005 5:04 PM | Raymond Chen

# re: Reducing bandwidth required by blogs

hmmm, not sure why changing query parameters would foil caching proxies, but I would agree that would cause a bigger problem.

Simplest answer is how to leverage the proxies' caching abilities, I would guess via header parameters of some sort?
6/7/2005 5:44 PM | Eric Newton
Comments have been closed on this topic.

Powered by: