Saturday, March 8, 2008

The Good, the Bad and the Ugly

C.Eastwood : "This world had two kinds of people my friend: those with loaded guns and those who dig!" Guess who's digging ;)

Some days ago Cisco introduced the ASR1000 Router Series. A breakthrough in technology according to Cisco; something that will probably change the networking industry.

What does it have to do with the title?

The Good (QFP), the Bad (PXF) and the Ugly (SW)!!!

QFP is the QuantumFlow Processor, where ASR 1000 Series Embedded Service Processors (ESPs) are based on.
PXF is the Parallel Express Forwarding and it's the where 10000's parallel multiprocessor architecture is based on.
SW is just software (CPU) switching, something that 7200/NPE-G1 (and most other routers) is based on.

So let's start with the Bad...the PXF and the 10000:

When Cisco announced its 10000 Edge Services Router in 2000, it said:

The Cisco 10000 ESR is based on Cisco's revolutionary Parallel eXpress Forwarding (PXF) architecture. PXF utilizes Cisco's latest parallel-pipelined network processor to deliver wire speed performance for a broad set of Cisco IOS. advanced IP services. The combination of a rich IOS feature set and PXF performance enables ISPs to increase revenues by deploying value-added services such as Premium Internet Access to customers in volume. The highly-scalable router delivers maximum expandability to handle growing customer populations and expanded service offerings. In addition, the Cisco 10000 ESR and its breakthrough architecture have been designed for maximum reliability and availability to support customers' ever-increasing dependence on network-based services.

The Cisco 10000 ESR makes it possible for service providers to be able to turn on QoS features without degrading performance, for the first time.

But we all know how marketing is. When you're about to buy something, you get the answer "YES" in everything. But when it comes to technical deployment and you find out that something cannot be done, you're said "this is a known limitation".

I've been working with 10k routers for quite a few years. Their characteristics seemed (at the time of buying) above the relevant market's offers and they were Cisco, just like many other products of us. If you want to insert a new product into your network and you're being stressed in terms of time, you're looking for something that will adapt as easily as possible with your existing infrastructure. If your account team reassures you that everything a 7200 can do, can also be done (much faster) by the 10k router, then you have another good reason for choosing it.

After ~20 TAC cases opened in a period of 4 years, regarding things that should be done but cannot be done due to the PXF, or things not working as supposed to due to the PXF, i must say that PXF is a very bad thing. If my memory serves me right, there isn't a single IOS release i have tried on the 10k router (from XI & SB series) that i haven't met a PXF issue. And the worst part is when you find out that the issue is due to the PXF (you can disable PXF manually, although it's not recommended because CPU will get high with very little traffic). You start wishing for someone else (bigger companies are preferred) to have found the same issue before you, so cisco will have already started its fix, otherwise you'll wait for many months (years?) to get a solution. It was a little secret between 10k developers that PXF is not easily programmed and there must be a BU approval for many things to be done.

But i want to be honest. Starting from XI2 we ended to XI9 where most things worked fine. Afterwards we started from SB2 and ended to SB11 where most things work fine until now. In the meantime we changed 10% of our systems infrastructure in order to follow 10k's gimmickry.

As it seems, everyone, even 10k routers, need their time...You just have to learn to accept the "NO" as an answer.

Then we have the Ugly...the SW and the 7200:

7200 router is a humble but respectable router which uses its CPU for everything. I have been using various 7200 routers for all kinds of jobs and there must have been less than the-fingers-of-one-hand things that the router cannot do. Of course, the router cannot do many things simultaneously without affecting its CPU.
That's its biggest drawback. But you won't get an answer from TAC saying that "this cannot be done due to XXX limitations".

Just for your reference: 64k sessions officially supported on the 10k, 14k sessions (75% cpu) actually on our 10k routers (with many things disabled). 16k sessions officially supported on the 7200, 3k sessions (75% cpu) actually on our 7200s (with everything enabled). It's all a matter of traffic and extra features.

And finally we have the Good...the QFP and the ASR1000:

Looking at the specifications you'll see the numbers decrease as more features are added.
i.e. using the ASR1000-ESP5 and looking at the performance:

Up to 7 MppsForwarding performance will vary depending on features configured
4 MppsFor the combination of the following commonly-used features:
IPv4 forwarding, IP Multicast, ACL, QoS, Reverse Path Forwarding (RPF), load balancing, and Sampled NetFlow
1 MppsFor the combination of commonly-used features above + Firewall and Network Address Translation (FW/NAT);
for the combination of commonly-used features above + IPsec hardware-assisted encryption

Now, looking at the introduction page, we see the following:

Cisco ASR 1000 Series routers offer service providers and enterprises industry-leading performance, service capabilities, reliability, and efficiencies in a compact form factor. Using an innovative new Cisco QuantumFlow processor, current and future services can be instantly turned on to operate at line rate without compromising network performance or availability.

I already know the answer from our account team. "Yes, you can do whatever you like with ASR1000". But i also know the answer from TAC : "Sorry, this cannot be done due to QFP". So why am i giving it the characterization of "Good"? Because i'm hopping (at least) for quicker fixes:

...the Cisco QuantumFlow Processor uses a software architecture based on a full ANSI-C development environment implemented in a true parallel processing environment. Some traditional network processors rely upon difficult-to-implement microcode, making it difficult and time-consuming to add new capabilities. Other network processors offer higher-level language development but into a feature pipelined architecture. With the Cisco QuantumFlow Processor, new features can be added quickly as customer requirements evolve by taking advantage of industry-standard tools and languages built upon a powerful parallel processing architecture. This architecture represents a paradigm shift and evolution in the software architectures associated with network processing today...

And this is the part i liked most:

The Cisco IOS Software has no direct access to the hardware components in the system and is largely isolated from the platform architecture. This concept allows for different types of redundancy and modularity in the system. Even if the Cisco IOS Software is down (or has crashed), router administration personnel can still access the console and auxiliary console, and they can even perform Telnet, Secure Shell (SSH) Protocol, and Secure Sockets Layer (SSL) in the system and restart the Cisco IOS Software or perform Trivial File Transfer Protocol (TFTP) out the core dumps and other relevant information through the route-processor management port.

I've also read the isocore report. But after reading all these test reports (still waiting for someone to come out with a negative report) i'm little bit skeptical about the difference between their results and the results of real/actual network traffic.

BTW, reading all the redundancy stuff, an old question of mine came back to my mind: Why Cisco doesn't make the standby processor/supervisor/whatever be in active state too, so the whole system can "double" its power? Like we can choose the dual power-supply operation mode, we should be able to choose the redundancy mode : standby or cooperation.

PS: Am i the only one worried about the future of Service Modules? Until now, Cisco was pushing people to buying extra modules for each one of their services (application networking, security, wireless, etc) for better performance and wider features. Now Cisco integrated some of them into a single card and it's planning to continue doing so. Are we going round and round just to make Cisco richer?

No comments:

Post a Comment

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Greece License.