DNSOP Working Group Paul Vixie, ISC
INTERNET-DRAFT Akira Kato, WIDE
<draft-ietf-dnsop-respsize-06.txt> August 2006
DNS Referral Response Size Issues
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Copyright Notice
Copyright (C) The Internet Society (2006). All Rights Reserved.
Abstract
With a mandated default minimum maximum message size of 512 octets,
the DNS protocol presents some special problems for zones wishing to
expose a moderate or high number of authority servers (NS RRs). This
document explains the operational issues caused by, or related to
this response size limit, and suggests ways to optimize the use of
this limited space. Guidance is offered to DNS server implementors
and to DNS zone operators.
Expires January 2007 [Page 1]
INTERNET-DRAFT August 2006 RESPSIZE
1 - Introduction and Overview
1.1. The DNS standard (see [RFC1035 4.2.1]) limits message size to 512
octets. Even though this limitation was due to the required minimum IP
reassembly limit for IPv4, it became a hard DNS protocol limit and is
not implicitly relaxed by changes in transport, for example to IPv6.
1.2. The EDNS0 protocol extension (see [RFC2671 2.3, 4.5]) permits
larger responses by mutual agreement of the requester and responder.
The 512 octet message size limit will remain in practical effect until
there is widespread deployment of EDNS0 in DNS resolvers on the
Internet.
1.3. Since DNS responses include a copy of the request, the space
available for response data is somewhat less than the full 512 octets.
Negative responses are quite small, but for positive and delegation
responses, every octet must be carefully and sparingly allocated. This
document specifically addresses delegation response sizes.
2 - Delegation Details
2.1. RELEVANT PROTOCOL ELEMENTS
2.1.1. A delegation response will include the following elements:
Header Section: fixed length (12 octets)
Question Section: original query (name, class, type)
Answer Section: empty, or a CNAME/DNAME chain
Authority Section: NS RRset (nameserver names)
Additional Section: A and AAAA RRsets (nameserver addresses)
2.1.2. If the total response size exceeds 512 octets, and if the data
that does not fit was "required", then the TC bit will be set
(indicating truncation). This will usually cause the requester to retry
using TCP, depending on what information was desired and what
information was omitted. For example, truncation in the authority
section is of no interest to a stub resolver who only plans to consume
the answer section. If a retry using TCP is needed, the total cost of
the transaction is much higher. See [RFC1123 6.1.3.2] for details on
the requirement that UDP be attempted before falling back to TCP.
2.1.3. RRsets are never sent partially unless TC bit set to indicate
truncation. When TC bit is set, the final apparent RRset in the final
non-empty section must be considered "possibly damaged" (see [RFC1035
6.2], [RFC2181 9]).
Expires January 2007 [Page 2]
INTERNET-DRAFT August 2006 RESPSIZE
2.1.4. With or without truncation, the glue present in the additional
data section should be considered "possibly incomplete", and requesters
should be prepared to re-query for any damaged or missing RRsets. Note
that truncation of the additional data section might not be signalled
via the TC bit since additional data is often optional (see discussion
in [RFC4472 B]).
2.1.5. DNS label compression allows a domain name to be instantiated
only once per DNS message, and then referenced with a two-octet
"pointer" from other locations in that same DNS message (see [RFC1035
4.1.4]). If all nameserver names in a message share a common parent
(for example, all ending in ".ROOT-SERVERS.NET"), then more space will
be available for incompressable data (such as nameserver addresses).
2.1.6. The query name can be as long as 255 octets of network data. In
this worst case scenario, the question section will be 259 octets in
size, which would leave only 240 octets for the authority and additional
sections (after deducting 12 octets for the fixed length header.)
2.2. ADVICE TO ZONE OWNERS
2.2.1. Average and maximum question section sizes can be predicted by
the zone owner, since they will know what names actually exist, and can
measure which ones are queried for most often. Note that if the zone
contains any wildcards, it is possible for maximum length queries to
require positive responses, but that it is reasonable to expect
truncation and TCP retry in that case. For cost and performance
reasons, the majority of requests should be satisfied without truncation
or TCP retry.
2.2.2. Some queries to non-existing names can be large, but this is not
a problem because negative responses need not contain any answer,
authority or additional records. See [RFC2308 2.1] for more information
about the format of negative responses.
2.2.3. The minimum useful number of name servers is two, for redundancy
(see [RFC1034 4.1]). A zone's name servers should be reachable by all
IP transport protocols (e.g., IPv4 and IPv6) in common use.
2.2.4. The best case is no truncation at all. This is because many
requesters will retry using TCP immediately, or will automatically re-
query for RRsets that are possibly truncated, without considering
whether the omitted data was actually necessary.
Expires January 2007 [Page 3]
INTERNET-DRAFT August 2006 RESPSIZE
2.3. ADVICE TO SERVER IMPLEMENTORS
2.3.1. In case of multi-homed name servers, it is advantageous to
include an address record from each of several name servers before
including several address records for any one name server. If address
records for more than one transport (for example, A and AAAA) are
available, then it is advantageous to include records of both types
early on, before the message is full.
2.3.2. Each added NS RR for a zone will add 12 fixed octets (name, type,
class, ttl, and rdlen) plus 2 to 255 variable octets (for the NSDNAME).
Each A RR will require 16 octets, and each AAAA RR will require 28
octets.
2.3.3. While DNS distinguishes between necessary and optional resource
records, this distinction is according to protocol elements necessary to
signify facts, and takes no official notice of protocol content
necessary to ensure correct operation. For example, a nameserver name
that is in or below the zone cut being described by a delegation is
"necessary content," since there is no way to reach that zone unless the
parent zone's delegation includes "glue records" describing that name
server's addresses.
2.3.4. It is also necessary to distinguish between "explicit truncation"
where a message could not contain enough records to convey its intended
meaning, and so the TC bit has been set, and "silent truncation", where
the message was not large enough to contain some records which were "not
required", and so the TC bit was not set.
2.3.5. A delegation response should prioritize glue records as follows.
first
All glue RRsets for one name server whose name is in or below the
zone being delegated, or which has multiple address RRsets (currently
A and AAAA), or preferably both;
second
Alternate between adding all glue RRsets for any name servers whose
names are in or below the zone being delegated, and all glue RRsets
for any name servers who have multiple address RRsets (currently A
and AAAA);
thence
All other glue RRsets, in any order.
Expires January 2007 [Page 4]
INTERNET-DRAFT August 2006 RESPSIZE
Whenever there are multiple candidates for a position in this priority
scheme, one should be chosen on a round-robin or fully random basis.
The goal of this priority scheme is to offer "necessary" glue first,
avoiding silent truncation for this glue if possible.
2.3.6. If any "necessary content" is silently truncated, then it is
advisable that the TC bit be set in order to force a TCP retry, rather
than have the zone be unreachable. Note that a parent server's proper
response to a query for in-child glue or below-child glue is a referral
rather than an answer, and that this referral MUST be able to contain
the in-child or below-child glue, and that in outlying cases, only EDNS
or TCP will be large enough to contain that data.
3 - Analysis
3.1. An instrumented protocol trace of a best case delegation response
follows. Note that 13 servers are named, and 13 addresses are given.
This query was artificially designed to exactly reach the 512 octet
limit.
;; flags: qr rd; QUERY: 1, ANS: 0, AUTH: 13, ADDIT: 13
;; QUERY SECTION:
;; [23456789.123456789.123456789.\
123456789.123456789.123456789.com A IN] ;; @80
;; AUTHORITY SECTION:
com. 86400 NS E.GTLD-SERVERS.NET. ;; @112
com. 86400 NS F.GTLD-SERVERS.NET. ;; @128
com. 86400 NS G.GTLD-SERVERS.NET. ;; @144
com. 86400 NS H.GTLD-SERVERS.NET. ;; @160
com. 86400 NS I.GTLD-SERVERS.NET. ;; @176
com. 86400 NS J.GTLD-SERVERS.NET. ;; @192
com. 86400 NS K.GTLD-SERVERS.NET. ;; @208
com. 86400 NS L.GTLD-SERVERS.NET. ;; @224
com. 86400 NS M.GTLD-SERVERS.NET. ;; @240
com. 86400 NS A.GTLD-SERVERS.NET. ;; @256
com. 86400 NS B.GTLD-SERVERS.NET. ;; @272
com. 86400 NS C.GTLD-SERVERS.NET. ;; @288
com. 86400 NS D.GTLD-SERVERS.NET. ;; @304
Expires January 2007 [Page 5]
INTERNET-DRAFT August 2006 RESPSIZE
;; ADDITIONAL SECTION:
A.GTLD-SERVERS.NET. 86400 A 192.5.6.30 ;; @320
B.GTLD-SERVERS.NET. 86400 A 192.33.14.30 ;; @336
C.GTLD-SERVERS.NET. 86400 A 192.26.92.30 ;; @352
D.GTLD-SERVERS.NET. 86400 A 192.31.80.30 ;; @368
E.GTLD-SERVERS.NET. 86400 A 192.12.94.30 ;; @384
F.GTLD-SERVERS.NET. 86400 A 192.35.51.30 ;; @400
G.GTLD-SERVERS.NET. 86400 A 192.42.93.30 ;; @416
H.GTLD-SERVERS.NET. 86400 A 192.54.112.30 ;; @432
I.GTLD-SERVERS.NET. 86400 A 192.43.172.30 ;; @448
J.GTLD-SERVERS.NET. 86400 A 192.48.79.30 ;; @464
K.GTLD-SERVERS.NET. 86400 A 192.52.178.30 ;; @480
L.GTLD-SERVERS.NET. 86400 A 192.41.162.30 ;; @496
M.GTLD-SERVERS.NET. 86400 A 192.55.83.30 ;; @512
;; MSG SIZE sent: 80 rcvd: 512
3.2. For longer query names, the number of address records supplied will
be lower. Furthermore, it is only by using a common parent name (which
is GTLD-SERVERS.NET in this example) that all 13 addresses are able to
fit, due to the use of DNS compression pointers in the last 12
occurances of the parent domain name. The following output from a
response simulator demonstrates these properties.
% perl respsize.pl a.dns.br b.dns.br c.dns.br d.dns.br
a.dns.br requires 10 bytes
b.dns.br requires 4 bytes
c.dns.br requires 4 bytes
d.dns.br requires 4 bytes
# of NS: 4
For maximum size query (255 byte):
only A is considered: # of A is 4 (green)
A and AAAA are considered: # of A+AAAA is 3 (yellow)
preferred-glue A is assumed: # of A is 4, # of AAAA is 3 (yellow)
For average size query (64 byte):
only A is considered: # of A is 4 (green)
A and AAAA are considered: # of A+AAAA is 4 (green)
preferred-glue A is assumed: # of A is 4, # of AAAA is 4 (green)
Expires January 2007 [Page 6]
INTERNET-DRAFT August 2006 RESPSIZE
% perl respsize.pl ns-ext.isc.org ns.psg.com ns.ripe.net ns.eu.int
ns-ext.isc.org requires 16 bytes
ns.psg.com requires 12 bytes
ns.ripe.net requires 13 bytes
ns.eu.int requires 11 bytes
# of NS: 4
For maximum size query (255 byte):
only A is considered: # of A is 4 (green)
A and AAAA are considered: # of A+AAAA is 3 (yellow)
preferred-glue A is assumed: # of A is 4, # of AAAA is 2 (yellow)
For average size query (64 byte):
only A is considered: # of A is 4 (green)
A and AAAA are considered: # of A+AAAA is 4 (green)
preferred-glue A is assumed: # of A is 4, # of AAAA is 4 (green)
(Note: The response simulator program is shown in Section 5.)
Here we use the term "green" if all address records could fit, or
"yellow" if two or more could fit, or "orange" if only one could fit, or
"red" if no address record could fit. It's clear that without a common
parent for nameserver names, much space would be lost. For these
examples we use an average/common name size of 15 octets, befitting our
assumption of GTLD-SERVERS.NET as our common parent name.
We're assuming a medium query name size of 64 since that is the typical
size seen in trace data at the time of this writing. If
Internationalized Domain Name (IDN) or any other technology which
results in larger query names be deployed significantly in advance of
EDNS, then new measurements and new estimates will have to be made.
4 - Conclusions
4.1. The current practice of giving all nameserver names a common parent
(such as GTLD-SERVERS.NET or ROOT-SERVERS.NET) saves space in DNS
responses and allows for more nameservers to be enumerated than would
otherwise be possible, since the common parent domain name only appears
once in a DNS message and is referred to via "compression pointers"
thereafter.
4.2. If all nameserver names for a zone share a common parent, then it
is operationally advisable to make all servers for the zone thus served
also be authoritative for the zone of that common parent. For example,
the root name servers (?.ROOT-SERVERS.NET) can answer authoritatively
for the ROOT-SERVERS.NET. This is to ensure that the zone's servers
always have the zone's nameservers' glue available when delegating, and
Expires January 2007 [Page 7]
INTERNET-DRAFT August 2006 RESPSIZE
will be able to respond with answers rather than referrals if a
requester who wants that glue comes back asking for it. In this case
the name server will likely be a "stealth server" -- authoritative but
unadvertised in the glue zone's NS RRset. See [RFC1996 2] for more
information about stealth servers.
4.3. Thirteen (13) is the effective maximum number of nameserver names
usable traditional (non-extended) DNS, assuming a common parent domain
name, and given that implicit referral response truncation is
undesirable in the average case.
4.4. Multi-homing of name servers within a protocol family is
inadvisable since the necessary glue RRsets (A or AAAA) are atomically
indivisible, and will be larger than a single resource record. Larger
RRsets are more likely to lead to or encounter truncation.
4.5. Multi-homing of name servers across protocol families is less
likely to lead to or encounter truncation, partly because multiprotocol
clients are more likely to speak EDNS which can use a larger response
size limit, and partly because the resource records (A and AAAA) are in
different RRsets and are therefore divisible from each other.
4.6. Name server names which are at or below the zone they serve are
more sensitive to referral response truncation, and glue records for
them should be considered "less optional" than other glue records, in
the assembly of referral responses.
4.7. If a zone is served by thirteen (13) name servers having a common
parent name (such as ?.ROOT-SERVERS.NET) and each such name server has a
single address record in some protocol family (e.g., an A RR), then all
thirteen name servers or any subset thereof could multi-home in a second
protocol family by adding a second address record (e.g., an AAAA RR)
without reducing the reachability of the zone thus served.
5 - Source Code
#!/usr/bin/perl
#
# SYNOPSIS
# repsize.pl [ -z zone ] fqdn_ns1 fqdn_ns2 ...
# if all queries are assumed to have a same zone suffix,
# such as "jp" in JP TLD servers, specify it in -z option
#
use strict;
use Getopt::Std;
Expires January 2007 [Page 8]
INTERNET-DRAFT August 2006 RESPSIZE
my ($sz_msg) = (512);
my ($sz_header, $sz_ptr, $sz_rr_a, $sz_rr_aaaa) = (12, 2, 16, 28);
my ($sz_type, $sz_class, $sz_ttl, $sz_rdlen) = (2, 2, 4, 2);
my (%namedb, $name, $nssect, %opts, $optz);
my $n_ns = 0;
getopt('z', %opts);
if (defined($opts{'z'})) {
server_name_len($opts{'z'}); # just register it
}
foreach $name (@ARGV) {
my $len;
$n_ns++;
$len = server_name_len($name);
print "$name requires $len bytes\n";
$nssect += $sz_ptr + $sz_type + $sz_class + $sz_ttl
+ $sz_rdlen + $len;
}
print "# of NS: $n_ns\n";
arsect(255, $nssect, $n_ns, "maximum");
arsect(64, $nssect, $n_ns, "average");
sub server_name_len {
my ($name) = @_;
my (@labels, $len, $n, $suffix);
$name =~ tr/A-Z/a-z/;
@labels = split(/\./, $name);
$len = length(join('.', @labels)) + 2;
for ($n = 0; $#labels >= 0; $n++, shift @labels) {
$suffix = join('.', @labels);
return length($name) - length($suffix) + $sz_ptr
if (defined($namedb{$suffix}));
$namedb{$suffix} = 1;
}
return $len;
}
sub arsect {
my ($sz_query, $nssect, $n_ns, $cond) = @_;
my ($space, $n_a, $n_a_aaaa, $n_p_aaaa, $ansect);
$ansect = $sz_query + 1 + $sz_type + $sz_class;
$space = $sz_msg - $sz_header - $ansect - $nssect;
$n_a = atmost(int($space / $sz_rr_a), $n_ns);
Expires January 2007 [Page 9]
INTERNET-DRAFT August 2006 RESPSIZE
$n_a_aaaa = atmost(int($space
/ ($sz_rr_a + $sz_rr_aaaa)), $n_ns);
$n_p_aaaa = atmost(int(($space - $sz_rr_a * $n_ns)
/ $sz_rr_aaaa), $n_ns);
printf "For %s size query (%d byte):\n", $cond, $sz_query;
printf " only A is considered: ";
printf "# of A is %d (%s)\n", $n_a, &judge($n_a, $n_ns);
printf " A and AAAA are considered: ";
printf "# of A+AAAA is %d (%s)\n",
$n_a_aaaa, &judge($n_a_aaaa, $n_ns);
printf " preferred-glue A is assumed: ";
printf "# of A is %d, # of AAAA is %d (%s)\n",
$n_a, $n_p_aaaa, &judge($n_p_aaaa, $n_ns);
}
sub judge {
my ($n, $n_ns) = @_;
return "green" if ($n >= $n_ns);
return "yellow" if ($n >= 2);
return "orange" if ($n == 1);
return "red";
}
sub atmost {
my ($a, $b) = @_;
return 0 if ($a < 0);
return $b if ($a > $b);
return $a;
}
6 - Security Considerations
The recommendations contained in this document have no known security
implications.
7 - IANA Considerations
This document does not call for changes or additions to any IANA
registry.
8 - Acknowledgement
The authors thank Peter Koch, Rob Austein, Joe Abley, and Mark Andrews
for their valuable comments and suggestions.
Expires January 2007 [Page 10]
INTERNET-DRAFT August 2006 RESPSIZE
This work was supported by the US National Science Foundation (research
grant SCI-0427144) and DNS-OARC.
9 - References
[RFC1034] Mockapetris, P.V., "Domain names - Concepts and Facilities",
RFC1034, November 1987.
[RFC1035] Mockapetris, P.V., "Domain names - Implementation and
Specification", RFC1035, November 1987.
[RFC1123] Braden, R., Ed., "Requirements for Internet Hosts -
Application and Support", RFC1123, October 1989.
[RFC1996] Vixie, P., "A Mechanism for Prompt Notification of Zone
Changes (DNS NOTIFY)", RFC1996, August 1996.
[RFC2181] Elz, R., Bush, R., "Clarifications to the DNS Specification",
RFC2181, July 1997.
[RFC2308] Andrews, M., "Negative Caching of DNS Queries (DNS NCACHE)",
RFC2308, March 1998.
[RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)", RFC2671,
August 1999.
[RFC4472] Durand, A., Ihren, J., Savola, P., "Operational Consideration
and Issues with IPV6 DNS", April 2006.
10 - Authors' Addresses
Paul Vixie
Internet Systems Consortium, Inc.
950 Charter Street
Redwood City, CA 94063
+1 650 423 1301
vixie@isc.org
Akira Kato
University of Tokyo, Information Technology Center
2-11-16 Yayoi Bunkyo
Tokyo 113-8658, JAPAN
+81 3 5841 2750
kato@wide.ad.jp
Expires January 2007 [Page 11]
INTERNET-DRAFT August 2006 RESPSIZE
Full Copyright Statement
Copyright (C) The Internet Society (2006).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors retain
all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR
IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in this
document or the extent to which any license under such rights might or
might not be available; nor does it represent that it has made any
independent effort to identify any such rights. Information on the
procedures with respect to rights in RFC documents can be found in BCP
78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an attempt
made to obtain a general license or permission for the use of such
proprietary rights by implementers or users of this specification can be
obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary rights
that may cover technology that may be required to implement this
standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Acknowledgement
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA).
Expires January 2007 [Page 12]
Copyright 2K16 - 2K18 Indonesian Hacker Rulez