A common problem you might face soon (World IPv6 Day is 5 days away) is reachability to IPv6 sites due to MTU issues. ICMPv6 has a nice internal mechanism which is supposed to help the application overcome these issues, but like in the IPv4 world, not everything is perfect.
Let's suppose that an IPv6 subscriber is using a DSL router and is connected through PPPoE to a BRAS.
TARGET <=> BRAS <=> DSL-ROUTER <=> HOST
The usual MTU for PPPoE connections is 1492 bytes, as shown below.
1500 bytes = Ethernet Payload
- 6 bytes = PPPoE header
- 2 bytes = PPP ID
---------------------------------
1492 bytes = IPv6 Packet that can be carried over a PPPoE connection
If your host is configured with 1492 (or something lower) as MTU on its LAN interface, then the OS running on it will automatically take care of "fragmentation", so you don't need to worry for anything. Unfortunately this isn't a common scenario by default. You either have to configure it manually on the host or if you are lucky enough and the DSL modem supports advertisement of MTU to its LAN interface through RA messages (and your host accepts them), it will happen automatically.
If your host is configured with anything larger than 1492 on its LAN interface (in most cases it's the default of 1500), problems might arise.
Users with hosts running Windows can try to ping an IPv6 address (i.e. the next hop after the DSL router) in order to find possible issues with the MTU. The closer the target is, the easier it will be to troubleshoot the problem. Then you start moving towards the target until you meet the issue.
First, some numbers you will need regarding the various headers
1492 bytes = IPv6 Packet
- 40 bytes = IPv6 Header
- 8 bytes = ICMPv6 Header
-------------------------------
1444 bytes = ICMPv6 payload data
Since Windows ping uses the actual payload as a size, if you want to send a total of 1492 bytes, you have to send 1492-40-8=1444 bytes of ICMPv6 payload data. Anything larger will lead to either a problem or to fragmentation.
Windows>ping -l 1444 x:x::x Pinging target [x:x:xx] with 1444 bytes of data: Reply from x:x:xx: time=53ms Reply from x:x:xx: time=51ms Reply from x:x:xx: time=54ms Reply from x:x:xx: time=53ms Ping statistics for x:x:xx: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 51ms, Maximum = 54ms, Average = 52ms
These are the relevant Wireshark captures.
The ICMP conversation between all involved devices |
1444 bytes ICMP request from HOST to TARGET |
If you increase the above number, you'd better start looking for "Too big" ICMPv6 received messages from any hop towards the target, otherwise you are in trouble.
i.e. if you ping with 1446 bytes of data, you get the following:
Windows>ping -l 1446 x:x:xx Pinging target [x:x:xx] with 1446 bytes of data: Packet needs to be fragmented but DF set. Reply from x:x:xx: time=53ms Reply from x:x:xx: time=55ms Reply from x:x:xx: time=57ms Ping statistics for x:x:xx: Packets: Sent = 4, Received = 3, Lost = 1 (25% loss), Approximate round trip times in milli-seconds: Minimum = 53ms, Maximum = 57ms, Average = 55ms
These are the relevant Wireshark captures.
The ICMP conversation between all involved devices (fragmentation included)) |
1446 bytes ICMP request from HOST to TARGET |
ICMP reply ("Too big") from DSL-ROUTER to HOST (original truncated message included) |
As you can see, device DSL-ROUTER is replying with "Too Big" message in the first packet to the HOST and informs it about the MTU (1492) supported in the next-hop link (see RFC 4443 for ICMPv6 info); that's the WAN link towards the BRAS, where PPPoE is running on.
If you are in the unfortunate position to not get any incoming packets, you can safely assume (if everything else is fine) that someone in the path is blocking ICMPv6 messages.
The reply message is exactly 1280 bytes, which is the minimum packet size IPv6 supports. This leads to the original message being truncated in the reply message to 1280-40=1240 bytes for the ICMPv6 packet or 1240-8-40-8=1184 bytes for the actual payload data. So you loose 1446-1184=262 bytes of payload data in the reply message.
Next packets get a successful answer from the target, because they are sent as fragmented (1432+14 bytes).
1432 bytes ICMP request from HOST to TARGET |
14 bytes ICMP request from HOST to TARGET |
Windows is "smart" enough to keep track of this status for some minutes (in the so called destination or route cache), so next time you send large packets, the first packet is not lost, because fragmentation happens right away.
Windows>ping -l 1446 x:x:xx Pinging target [x:x:xx] with 1446 bytes of data: Reply from x:x:xx: time=54ms Reply from x:x:xx: time=53ms Reply from x:x:xx: time=55ms Reply from x:x:xx: time=52ms Ping statistics for x:x:xx: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 52ms, Maximum = 55ms, Average = 53ms
Imho, it's better to make your host use the appropriate MTU from the beginning (i.e. hardcode 1492 or use RA's value) and not depend on ICMPv6 messages to do fragmentation. Some people have proposed to always use the minimum of 1280 (Geoff Huston, Tore Anderson), in order to be safe on every possible case (tunnels involved). I generally prefer to use the maximum possible, hoping that someone in the middle won't mess with ICMPv6 messages. I know that currently this is not the case (so stick with something lower, like 1280, for now), but this will probably change as native IPv6 gets deployed. Unless we start filtering ICMPv6 messages uncontrollably...like many do on IPv4. Does "Internet Control Message Protocol" say anything to you?
Notes
1) RFC 1982 describes Path MTU Discovery (PMTUD) for IPv6.
2) RFC 4821 will help a lot in PMTUD, when and if all vendors start implementing it.
3) In order to see clearly the fragmented IPv6 packets in Wireshark, you have to disable reassembly in preferences.
4) You can use the commands "ipv6 rc" and "ipv6 rcf" in order to view and clear the destination/route cache in WindowsXP
Hello friend!
ReplyDeleteI have been following your blog since you passed the CCIE. I was really amazed by his achievement in a very short time indeed.
Today I work as Coordinator of Technology and am well away from the switches and routers. At the company where I work the focus is security and operating system.
My contact with cisco were exactly 5 months. 3 months, when I bought 4 and mounted on a lab equipment for CCNA, and two months in a company that worked after winning the CCNA. But I did not adapt to this company. Nothing professional, just pressure, and complete lack of resources to work. It's the kind of company that works in "turns or you'll be unemployed. " I left off after 2 months. I'm back to studying for the network area. I will renew my CCNA and CCNP do.
My dream is to work with it. Do you believe that I can pass the CCIE lab without even being working in the area?
I am now working hard with management. I am sure in PMP, ITIL, ISO 27002 and COBIT.
Which way do you think I should follow in order to make the CCIE?
Thanks, Pedro Jr
Pedro, hands-on experience is a must for CCIE. You might not need to actually work in the area (although that helped me a lot), but you surely need to be fluent on everything cisco related.
ReplyDeleteHey there. Do you mind changing the link to my blog from kpjungle.wordpress.com to www.packet-forwarding.net?
ReplyDeleteThanks! :)
Kim, i changed it.
ReplyDeleteAnd this is where we have fun stuff. Now I'm working on a deployment of new version of Riverbed RiOS 7 (with IPv6) support. The branches are connected over normal MPLS L3 VPN, but we have some GRE tunnels on it. Tomorrow we want to test firs application over IPv6 (with acceleration). Time will tell what will happen :)
ReplyDeleteThanks for the article. Mirek.