With many enterprises and cloud providers preparing to migrate their data center networks to OpenFlow, Nick McKeown, OpenFlow and SDN icon, and professor in the Electrical Engineering and Computer Science Departments at Stanford University, shares his thoughts about OpenFlow protocol challenges, controller scalability issues, switch hardware challenges, and efforts by some vendors to hold back OpenFlow.
Do you see any significant OpenFlow protocol challenges in cloud provider data centers?
Nick McKeown: OpenFlow provides a simple application programming interface for a remote control plane to directly program the forwarding state in lots of switches. In many ways, OpenFlow is so simple that it's not really affected by scalability. Once the control plane has decided how it wants packets to be forwarded by each switch, it just uses OpenFlow to program the switches. The task of programming the switches is quite scalable, and there are many protocol choices that would be just fine; OpenFlow is one design choice among many. Programming the switches isn't the difficult part of the problem. The control plane has a much more difficult job of calculating and then deciding what forwarding state to put into the switches.
How about controller scalability issues?
McKeown: It's of course challenging to build any control plane for a large multitenant data center, regardless of the technology. The amount of state, the number of virtual machines (VMs), the number of tenant policies, the number of service-level agreements, the number of flows … all create a challenge for the control plane -- SDN or not, virtualization or not -- particularly when VMs and workloads are moving around. But there are some existence proofs now of control planes that scale to very large data centers. The ONIX paper from three years ago explains quite well how to build such a scalable control plane. Now that it has been done at scale, no doubt many others will build them too.
Are there any switch hardware challenges?
McKeown: Current switch chips -- from folks like Broadcom, Intel, Marvell, Mellanox -- are all pretty good. They have high capacity and ample features for most data centers. Really, all they need is the ability to do line-rate forwarding at 10Gbps, which they all do, with reasonable forwarding tables that are mostly okay in this generation, but will be even better in the next. They also need features like equal-cost multipath (ECMP) routing, which is well supported these days. Newer chips do things like VXLAN, as well. But because network virtualization is much easier using an overlay, you don't need hardware support for virtualization. These features were added by companies like Cisco [to protect]their turf, making the switching happen in the hardware rather than in the hypervisor switch (which is out of their control), and therefore forces the customers to churn their hardware.
But luckily for the customers, over time we'll see the switching chips get simpler and simpler, and therefore more streamlined. It was how the computer industry evolved in the 1980s (which is where the networking industry is today) as reduced set instruction computing came about: A clean separation of the roles of hardware and software; and the creation of simple, minimal pipelined hardware that can be made faster and faster with Moore's Law.
The story isn't quite so simple for the switch boxes. For a while the box vendors will continue to sell boxes that are mostly far too complex, and with far too much old software inside. This is partly to maintain a high margin, and partly to stay in control of and therefore limit what the customer can do. But this is ultimately a losing proposition -- the writing is on the wall for the boxes that simply "add an OpenFlow interface and declare they have SDN." The real winners will be those who rethink the entire box design, reduce the complexity and move all the control functionality up and out of the box into the central control plane. It's starting to happen, and we'll see it happen more and more during the next two to three years as customers demand it. Customers will want networks with a forwarding plane that's simple, low power, low cost and reliable. Most boxes sold today are quite far from this ideal.
Will there need to be a "bridge" to OpenFlow/SDN? We're being bombarded with "hybrid" approaches, which are causing confusion. Is there a simple way to explain what needs to happen?
McKeown: Most data centers are built from a clean start with one vendor for the network. This makes the transition quite easy: In one generation there might be legacy equipment, and then in the next build there is simpler hardware -- for example, controlled by OpenFlow/SDN -- with virtualization running transparently over the top.
In enterprise and service provider networks, the migration path is more complex because of the need to interoperate with legacy equipment. To start with, we can expect islands of SDN/OpenFlow networks inside legacy networks, with the islands gradually growing from the middle out to become the entire network. The Open Networking Foundation just created a new Migration Working Group to help users with this transition.
To me, most of the so-called hybrid boxes are the result of incumbent vendors trying to contain and constrain the change by limiting the amount of control exposed to the customer's external control plane. This might work for a while, but it makes the network more complex -- there are now control planes in the boxes and outside the boxes -- and less reliable.
In the end, SDN is simply the physical separation of the control plane from the switches, where one control plane controls multiple switches. Many folks will try to redefine SDN to fit the products they planned to build anyway. But in the end, the separation is so natural that we'll get there despite some road bumps along the way.