...or why a shut interface is not always shut...
Copying from Wikipedia:
Address Resolution Protocol (ARP) spoofing, also known as ARP poisoning or ARP Poison Routing (APR), is a technique used to attack an Ethernet wired or wireless network which may allow an attacker to sniff data frames on a local area network (LAN), modify the traffic, or stop the traffic altogether (known as a denial of service attack). The attack can obviously only happen on networks that indeed make use of ARP and not another method.
The principle of ARP spoofing is to send fake, or "spoofed", ARP messages to an Ethernet LAN. Generally, the aim is to associate the attacker's MAC address with the IP address of another node (such as the default gateway). Any traffic meant for that IP address would be mistakenly sent to the attacker instead. The attacker could then choose to forward the traffic to the actual default gateway (passive sniffing) or modify the data before forwarding it (man-in-the-middle attack). The attacker could also launch a denial-of-service attack against a victim by associating a nonexistent MAC address to the IP address of the victim's default gateway.
ARP spoofing attacks can be run from a compromised host, a jack box, or a hacker's machine that is connected directly onto the target Ethernet segment.
"...or a backup 7200 with shut interfaces." i should add :-P
Here is the complete story...
Some months ago a fellow engineer had to replace a 7200/G1 with a 7200/G2. He had prepared the 7200/G2 with the same configuration as the 7200/G1, shut all the interfaces and made sure that the new ethernet/optical cables were working ok.
Both the 7200s were connected to a 6500 switch. 7200/G1 was the production router, 7200/G2 was the backup router to be put in production during a maintenance window in 2 days. One day later, we had a power failure on the 7200/G2 which caused the router to reload. Nothing to worry, since its power supplies were connected to a test facility and during the maintenance window they would be moved to the normal facilities. But here comes the fun part that came up during the reload:
6500
Sep 24 13:29:12: %LINK-3-UPDOWN: Interface GigabitEthernet3/21, changed state to up
Sep 24 13:29:12: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet3/21,
changed state to up
Sep 24 13:29:14: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet3/21,
changed state to down
Sep 24 13:29:14: %LINK-3-UPDOWN: Interface GigabitEthernet3/21, changed state to down
7200/G1
Sep 24 13:29:12: %IP-4-DUPADDR: Duplicate address x.x.x.x on GigabitEthernet0/1,
sourced by xxxx.xxxx.xxxx
During the reload, the ethernet interfaces of the 7200/G2 (which were shut in the configuration) moved to the up state for some seconds. And during those seconds, the ip functionality of them was activated! So we had 2 identical ip addresses on the same LAN, sending similar arp messages. This is more than enough to cause havoc in the production router (something that happened in our case), until arp-cache times out or arp-table is updated/cleared.
For everyone interested, the bug opened is CSCsv69222 and it's still being investigated by the developers, who think that it's a very corner case and fixing it might break something else.
It applies to 7200s with either NPE-G1 or NPE-G2 using a Marvell ethernet controller, under these conditions:
GBIC & no autonegotiation (enabled through "no negotiation auto")
RJ-45 & autonegotiation (enabled through "speed/duplex auto")
RJ-45 & no autonegotiation (enabled through "speed/duplex 1000/full")
In the case of
it doesn't seem to apply.
GBIC & autonegotiation (enabled through "negotiation auto")
What worries me most is the fact that a part of the configuration ("ip address") seems to be activated before everything else, even the shut command. Or even better, according to Cisco's answer, the shut command is ignored :
resolution to this problem is having "no ip addr" or to change the IP address statement in the configuration, since this is common to all interfaces and the problem is due to the IOS init code implementation. The actual implementation is that we should have some different "ip addr" related statement in the configuration. Otherwise during reload time, it will consider them as new interface, so to put them in shutdown mode we have to include the "no ip addr".
Conclusion:
shut + ip addr => no shut + ip addr
shut + no ip addr => shut + no ip addr
Surely there are quite a few of workarounds that can be implemented (shut the 6500 ports instead, remove portfast from switch, enable carrier-delay on 7200, etc), but why should i prefer a workaround instead of a bug fix?
Hi Tassos,
ReplyDeleteNice to know. The G2 seems to have a lot of bugs.
Best regards
You should prefer the workaround instead of a bug fix when the bug fix is simply not available or the fix has chances of breaking something else :-) Irrespectively of the existence of documented issues such as the one you described, the best practice for such maintenance procedures is to not have the same LAN address in the new router. Even if power does not fail, someone might accidentally issue a "no shut" on the LAN interface by mistake or anything else weird might happen.
ReplyDeleteThe steps and speed of the router replacement overall will depend on many factors, but if we consider only the LAN interface in a somewhat stub design, the procedure that worked best for me is:
1) new router: no shut
2) old router: shut + no ip addr
3) new router: ip addr
At step 1 you check if the new router is indeed ready (yes, you checked that the fiber was ok at some point the previous day or so, but who knows what happened since then?). You can do this last check only if a duplicate IP address is not getting in your way. By the way, at this point you can check if the switch has put the corresponding port in forwarding state before moving to execute the next 2 steps rapidly. This is more important when you do not have portfast for that port. The OSPF on the LAN interface of the new router will converge faster if the switch port is already forwarding (you might even see the other neighbors reporting that the "old" adjacency just went from loading to full, i.e. they might not declare that the neighbor went down).
I have done this procedure many times without portfast and was amazingly fast. It might be even faster than doing "shut, no shut" in succession to the 2 interfaces with duplicate addresses, even with portfast in use (because my new LAN interface is ready to forward when I go to steps 2,3). I wonder if anyone has suggestions on how to improve this procedure :-)
@Maria, when the backup router is accessible only though its console, i consider it (keeping its interfaces shut) the best (and possibly fastest) solution. Otherwise someone has to keep notes of ips when doing the replacement. And if the production router dies, you'll have to dig out its config from somewhere else.
ReplyDeleteBut this is not the problem that worries me. I can surely change the router replacement procedure. What worries me is that Cisco considers the above issue as expected behavior (the usual moto : "it's a feature, not a bug").
You are right, I assumed people keep copies of configuration in multiple places like I do :-) You could say I sympathize too much with the developers of embedded systems. This is probably because I have had such a job in the past and I know how hard it is. When I had to fix a bug, no amount of money could make me find it and fix it faster. It is also particularly hard to find people to write initialization code that runs smoothly.
ReplyDeleteIt seems like the sort of bug Cisco should really be ashamed of. They should also be even more ashamed of themselves for suggesting workarounds instead of correctly addressing it.
ReplyDeleteTheir software is very expensive. You have a reasonable expectation that it should be bug free, or they should be trying their d**** best to fix issues like that one, rather than suggest obvious workarounds.