Problems with Hosted PBX and Google DNS 8.8.8.8

DNSIt is common for Hosted PBX resellers to configure Google’s public DNS servers (8.8.8.8 and 8.8.4.4) for domain resolution on SIP devices. In most cases Google is a good alternative to the local ISP for DNS, as Google has gone to great lengths to engineer high performance into its service. Without a doubt it is fast and reliable.

In some cases, however, using Google DNS can cause problems due to rate limiting steps that Google has taken to thwart DoS attacks.

As described here, Google Public DNS implements two kinds of rate limits:

  • Rate control of outgoing requests to other nameservers. To protect other DNS nameservers against DoS attacks that could be launched from our resolver servers, Google Public DNS enforces per-nameserver QPS limits on outgoing requests from each serving cluster.
  • Rate control of outgoing responses to clients. To protect any other systems against amplification and traditional distributed DoS (botnet) attacks that could be launched from our resolver servers, Google Public DNS performs two types of rate limiting on client queries:
    • To protect against traditional volume-based attacks, each server imposes per-client-IP QPS and average bandwidth limits.
    • To guard against amplification attacks, in which large responses to small queries are exploited, each server enforces a per-client-IP maximum average amplification factor. The average amplification factor is a configurable ratio of response-to-query size, determined from historical traffic patterns observed in our server logs.

    If DNS queries from one source IP address exceed the maximum QPS rate, excess queries will be dropped. If DNS queries over UDP from one source IP address exceed the average bandwidth or amplification limit consistently (the occasional large response will pass), queries may be dropped or only a small response may be sent. Small responses may be an error response or an empty response with the truncation bit set (so that most legitimate queries will be retried via TCP and succeed). Not all systems or programs will retry via TCP, and DNS over TCP may be blocked by firewalls on the client side, so some applications may not operate correctly when replies are truncated. Nonetheless, truncation allows RFC-compliant clients to work properly in most cases.

When SIP devices are configured with multiple BLF keys, the SIP SUBSCRIBE and NOTIFY messages that are used to communicate presence can amount to many thousands of  DNS queries on an active network.

Google does not publish the specific number of queries per second (QPS) that will trigger the safeguards.  However,  as all of these messages will likely resolve to a single SIP server (and therefore a single nameserver), the aggregate traffic will likely trigger the outbound nameserver QPS limit.  And, if all of the SIP devices are running on the same LAN (behind a single router), then the DNS traffic could also trigger the one source IP address QPS limit.  The resulting dropped queries will greatly slow the presence updates to phones, leading to sluggish status changes or even dropped calls.

For these reasons, it’s important to understand the limits of Google DNS and check the SIP device’s error logs when these types of problems occur (one good way to do this is to point the SIP Device to a syslog server as described here).   In these situations DNS caching on the SIP device or the router is often the only way to circumvent these issues.