Our new DNS: under the hood

At Enom, we know that every DNS performance issue we can solve proactively saves you from countless support headaches down the line. Striving to ensure that it “just works” is a given, but we’re also constantly searching for ways to go above and beyond, pushing the bounds of what’s possible on a global scale. Every facet of our operation is dedicated to that cause, and we’ve been working hard on some major changes to our DNS to deliver on it. This is why we wanted to peel back the curtain a bit, talk with some of the experts close to these changes, and give you an idea of what it all means for you.

Three billion DNS queries. That’s what Enom’s platform handles every day to service our customers. In some ways, it’s a daunting number that reinforces how many portfolio owners, resellers, and end-users rely on our DNS, but it also presents an opportunity. That’s because each improvement made to the system sends ripples through every level of the experience. Thanks to the Rightside and Enom engineering team, we are now happy to introduce our new DNS platform, one that is faster, more reliable, and scalable to your growing needs.

We’ve undertaken this massive overhaul because those three billion DNS queries will only continue to grow. In addition to total volume, the standards and requirements for DNS have changed dramatically since our last upgrade. Yet, if we’ve done our job, most won’t realize a change even occurred. “The new DNS infrastructure will go largely unnoticed,” says Rightside CTO Wayne MacLaurin. “Infrastructure is like plumbing; nobody really notices it unless it breaks.”

New platform

The new DNS platform, which has been live since November, is based on BIND, one of the most widely used, open-source implementations. The decision to change from a PowerDNS infrastructure to BIND 9.10 came about due to a confluence of factors: ageing hardware, rapidly expanding data volume, a need for more standardized, secure solutions, and the constant pursuit of general performance improvements. Newer servers optimized for BIND were the antidote to a highly customized, and increasingly unwieldy architecture.

“The previous infrastructure relied heavily on MS SQL and replication to distribute data to our various DNS PoPs,” MacLaurin says. “It also had a large amount of programmatic logic built into the DNS software itself to handle various customized features. As time passed and DNS query volumes increased, the infrastructure started showing its age. It was hard to update, hard to debug, and it could only handle a fifth of the volume of our new modern DNS server.”

Additionally, Enom’s servers now employ Kafka to provide high-throughput messaging between our central system and distributed DNS nodes. A series of complex transformations handle the translation of older data into standard DNS types, managing the emulation of non-standard features such as CNAME at apex. To improve monitoring, an ELK (Elasticsearch, Logstash, Kibana) stack now manages statistics gathering and logging.

What’s different?

Again, many of the changes may go unnoticed, but the actual differences are remarkable for those with a keen eye for DNS performance. In addition to the new architecture, Enom made significant investments into SSD-based storage, which were then deployed throughout our production environments. Together, software and hardware greatly reduced response times, with updates (such as host changes) now being delivered to the world in under 3 seconds. The everyday end-user may not be aware of these faster response times, but cumulatively, this equates to vastly improved interactions.

“CNAME at apex is possibly the most significant change,” says Ron West, a Senior Software Architect at Enom. Currently, Enom’s DNS answers with what’s known as a CNAME record at the apex of a zone. “This isn’t allowed in DNS, and has undefined behaviour to end-users. It often works, but sometimes fails badly.” Standard DNS software doesn’t support these records creating uncertainty as to how they would behave in the wild. A workaround was deployed to eliminate that uncertainty.

“We now take the target of the CNAME record, look up the records it points to, and add them to our own DNS instead of the CNAME record. These look-ups are refreshed continuously, and allow all affected domains to keep working.” These previously undefined behaviours—affecting more than 65,000 domains in Enom’s system—represented many headaches for support at the reseller level, an effort that is no longer necessary thanks to these new enhancements.

The new DNS also supports deletion holds to benefit customers moving their domains to another registrar or DNS host. “Instead of suddenly refusing to answer queries, or answering with parking,” West says, “we can now facilitate transitions by continuing to answer with the last known DNS records for 4 days. After deletion, we either decline to answer queries (if no longer aimed at our nameservers) or answer with parking.”

The only exception is multiply hosted domains (multiple DomainNameIDs with the same domain name) where deletion is immediate, leading to the next-highest-priority domain’s records being activated immediately.

Philosophy shift

Ultimately, all these changes have come about due to a philosophy of looking creatively at different technologies to solve challenges more quickly than traditional approaches. “Doing so is crucial because the requirements and limitations of one registry aren’t necessarily the same as another,” MacLaurin says. “It’s something that we embrace wholeheartedly, especially if we can reduce complexity, improve reliability, and move faster.”

The change to Enom’s DNS infrastructure is just one step in a continuous improvement process. “We try not to make technology choices the limiting factor in designing great product architectures. For instance, Kafka was used heavily in the DNS project, but we are also looking at RabbitMQ because each messaging platform has different strengths and weaknesses, depending on the engineering requirements. Elasticsearch is only one of the technologies we are looking at to redefine how we manage our data.”

Looking ahead

The DNS improvement project isn’t just a way to squeeze more speed out of the system; it’s an outcome of our goals in using technology to advance the domain name industry and make it easier and less complex to manage for you and your customers. The standards-based nature of BIND, for instance, means we will be able to protect our users better by pushing ahead with support for DNSSEC and other new extensions such as DANE.

System-wide stability and security are also always in mind. DDOS mitigation is now a major component of any modern DNS infrastructure both in terms of the scale of DDOS we see, but also in how we handle and mitigate DDOS in general. We want to reduce or eliminate every instance of downtime that we can for you and your users, and won’t stop until we have.

“Nobody really notices DNS unless it breaks.” It highlights the reactive nature of most DNS support efforts. But it’s also why our engineering team has bucked the trend by taking a creative, proactive approach to fixing major DNS issues, some of which are only just now on the others’ radars. We’re always busy applying modern technology solutions to an ageing global network. It’s the only way to ensure that our industry keeps prospering, and we are thrilled to keep you updated about all the exciting innovations we’re developing (and surprising things we’re learning) as the Internet continues to grow and adapt.

If you’d like to learn more about DNS, check out this article from ICANN.