www.Level3.com http://blog.level3.com

CDNs and DNS: A Flawed Study?

Health warning – what follows is for geeks ;)

Some weeks ago a very long and scientific-looking blog post appeared that analysed some CDN’s use of DNS to determine the location of an end-user. At the time I read it and dismissed it because it was so obviously flawed. But several people have asked me about it so I thought I’d try and unpick some of the assertions in that post. I’ll try and do it as simply as possible.

The Domain Name Server (DNS) infrastructure is a fundamental part of the Internet and its job is to translate human-readable (and memorable) web site names like www.level3.com into network addresses like 4.68.90.77, which end-users could never remember, but that network equipment needs to route end-users to those web sites.

That DNS infrastructure is largely split into two types; open (public) and closed (private). Open DNS is provided by companies like OpenDNS, Google and Level 3. You can use it wherever you are on the Internet with no restrictions or authentication required. Closed DNS is provided by an ISP, like CableVision, and is only accessible by the ISP’s own authenticated broadband consumers. The vast majority of all DNS requests (in excess of 90%) take place in a closed DNS environment. This study appears to have used only data from open DNS infrastructure (or maybe some poorly configured closed systems) so the statistical validity is already compromised at the outset of the study.

The author does admit that his list of CDN server locations is not complete because of his methodology (the only one open to him), which would never find all the unique addresses of a CDN. This is the fundamental flaw in this study. He then persists in using those locations as the basis for his other assertions anyway! But the commentary seems to miss a very vital part of a CDN’s architecture. That is that a host name, (one of those web sites) that leverages a CDN for delivery is actually “bound” to an optimized sub-set of the entire CDN infrastructure. These “bindings” are optimized for customer content with specific characteristics and share a common infrastructure.

Mixing content together with different characteristics (large objects and small objects for instance) on a single binding set may degrade performance or lead to less efficient use of the deployed computer infrastructure. For the customer example that was chosen to represent Level 3, a large number of locations from which that customer was served were completely missed and this severely limited data set effectively invalidates any of the assertions that followed.

The post goes on to compare the use of DNS as a tool to select the best computer to serve from, with Anycast. Sure both have pros and cons. DNS alone assumes that the location of the end-user’s DNS resolver is a proxy for that end-user’s actual geographic location … in a very small percentage of cases, it is not.  Anycast, however, suffers from the inherent linkage to the Internet’s routing connectivity (who has peered with who and who has bought transit from who) to determine “closeness”.

While a performance-based view of proximity may seem inherently better, in a widely distributed CDN, this approach could toggle an end-user back and forth between several geographically close CDN clusters with disastrous results for jitter-sensitive applications. Level 3’s DNS rendezvous system dynamically allocates the best content source based on both DNS location and a real time view of performance – the system makes appropriate use of the DNS infrastructure, but augments it with a very intelligent real time “weather map” of Internet latency measurements … trust with verification. This, in fact, is a large part of our “secret sauce”. If I tell you how it works I will have to shoot you :)

The poster’s “study” is analogous to a book that begins “I know that we can’t travel faster than light, but if we could … “ All of us would recognize that anything following that statement, no matter how well written it was, would be science-fiction. A study based on a flawed premise and incomplete data is not so easily recognized as speculative fiction.

But all this actually misses the point. All that matters is the end-user experience, which can be measured by independent testing services like Conviva or Gomez. In test after independent test, as I’ve said before, Level 3’s DNS based, dual-tiered approach outperforms our competitors.

About Mark Taylor

I work as VP of Content and Media here at Level 3. English expat and passionate new tech energy evangelist.

Comments

  1. Mark, I enjoyed reading your post. As a “cloud performance guy”, I must say that we at Cedexis have a slightly different perspective. I welcome a healthy conversation with you and your audience.

    As my post below discusses, we measure Level 3 (and all major ASs worldwide) from the end-user’s perspective. The disruptive part is that this data is ACTIONABLE.

    http://www.cedexis.com/cloud-performance-measure-it-and-take-action/

    I’d be happy to work with you to collaborate on future posts and share the much more data for your perusal.

    Ed Sarausad
    ed@cedexis.com

  2. I have read the original DNS article, not very deeply, but enough to see a lot of different flaws, some discussed above, others like the content and the end user expectation not. The article is thought provoking though.

    Mark has put forward the Level3 view, using a single CDN to provide all, Ed has given the Cedexis view.

    The problem I have with a single vendor CDN is that they may provide a fantastic service for a geographical location/type of content, but if something central goes wrong, then the end user can be faced with the Red X/lack of media. The internal measurement of both availability and performance is also hard to achieve.

    Cedexis’ approach of using the end user to measure the speed (Radar) is clever, but it does add page weight. As it is called asynchronously the page appears to have loaded properly to the end user, and provides useful data for Cedexis users.

    The analogy I see here is that of a loadbalancer, in Cedexis’ model a performant CDN will receive more traffic than a slow CDN. If the good CDN starts to suffer, Cedexis will move traffic away. The problem for the company is getting the agreements in place with a number of CDN providers to allow the company to use a service like Cedexis’.

    The most important thing is that you understand what you are trying to achieve as a company. Are are you trying to reduce your bandwidth costs? or are you trying to speed up delivering millions of small images, or a large live sports event stream, or both? You need to test each scenario, and have a continuity plan to cover the loss of a single CDN. Remember you may have to provide the content to the CDN in the first place, so companies which offer a shield (single IP which acts as a CDN cache) are useful if you need to upload many images, less so if you are providing a live sports feed.

    Remember that the end user experience is everything, so test, and test again!

    • Luke, thanks for your great insights. You point out the most important part of our client engagements which is to understand the business priorities prior to adopting a multi-platform cloud strategy. We’ve deliberately chosen to focus on empowering our clients with RIGHT the data to adopt the RIGHT strategy for their business.

      For some clients, it means choosing a more performant cloud. For others, it means choosing the “good enough” cloud (when additional cost isn’t worth a small performance gain). At the end of the day, our clients decide what data is most important for them (performance, cost, server load, even carbon footprint!). The Cedexis platform does the rest. Sorry to sound like a sales pitch, but you teed up the best part about the company.

      Ed

      P.S. Radar loads by default 2 seconds after page load completes, so the nominal page weight does not impact the user experience– which I agree is everything!

  3. Thanks for great article!

    The possible flaw in your article is “I know that we can’t travel faster than light, but if we could … “.
    As recent – not unquestionable – research has shown, we might be able to travel faster than light, at least between Cern and Gran Sasso… :-)

  4. Hello Mark,

    I bumped into Cotendo a few weeks ago. The sales guy explained that their own anycast DNS platform is fully integrated with their own CDN platform meaning that every PoP (30 or so) contains DNS-servers and proxy delivery servers interconnected in the same datacenter rack. Two major benefits were that both lookup as delivery is done close to the end-users and they terminate a whole roundtrip due to the seamless integration. In addition their proprietary software also insulates the DNS from the known and potential vulnerabilities of open-source solutions + their Site Assure service offers monitoring and failover system with automated DNS balancing capabilities + Flexible handling of DNS queries based on rules and real-time conditions.

    This sounds great, but for one reason or another it sounds to good to be true. Do I miss anything in here? What are the cons. in here?

    • Mark Taylor says:

      The con is that most consumers simply use the DNS infrastructure provided by their broadband provider. In fact they do so because 99% of people would have no idea how to change their settings to use some other provider; whether that’s Cotendo, Google, Level 3 or anyone else. The Cotendo system will not get used in most cases. And of course the combined distribution of DNS servers across all broadband network operators is a lot more than 30. So is the performance improvement associated with resolution and distribution being in the same place better than resolution being even closer to the requestor?

Speak Your Mind

*