There has been an awful lot of talk about network fabrics lately, but the message is muddled and there is ample confusion about what fabrics can or should do for the industry.
One thing is clear: Moving to fabric is one of the more significant events in data center networking. But it is important to understand exactly what we should expect from network fabrics -- and what is necessary. After all, in a world of edge software overlays, it's possible that network fabrics don't necessarily need to be so feature rich, but should instead focus solely on providing lots of raw bandwidth for multicast traffic.
Network fabrics make compute, storage and network generic pools of resources
Let's first tackle the questions: “Why fabric?” and “Why now?” The short answer is that traditional network architecture was not designed for modern data center workloads.
The longer answer is that data center design has evolved to treat all aspects of the infrastructure -- compute, storage and network -- as generic pools of resources. This means that any workload should be able to run anywhere. However, traditional data center network design does not make this easy.
The classic three-tier architecture -- top of rack (ToR), aggregation, core -- has non-uniform access to bandwidth and latency depending on the traffic matrix. For example, hosts connected to the same ToR switch will have more bandwidth (and lower latency) than hosts connected through an aggregation switch, which will again have access to more total bandwidth than hosts trying to communicate through the core router. The net result? It matters where you decide to put a workload.. That means that allocating workloads to ports is a constant bin-packing problem, and in dynamic environments, the result is very likely to be sub-optimal allocation of bandwidth to workloads, or sub-optimal utilization of compute due to placement constraints.
Enter fabric. While there is ample disagreement on what fabric is, in Nicira's vernacular a fabric is a physical network that doesn't constrain workload placement. Minimally, this means that communication between any two ports should have the same latency, and the bandwidth between any disjointed subset of ports is non-oversubscribed. More simply, the physical network operates much as a backplane does within a network chassis.
What features does a network fabric really need to offer?
The big question is, in addition to dumb -- but unified! -- bandwidth, what should a fabric offer? Let's get the obvious out of the way. In order to offer multicast, the fabric should support packet replication in hardware, as well as a way to manage multicast groups. Also, the fabric
should probably offer some QoS support in which packet markings indicate the relative priority to aid drop decisions during congestion.
But going further, most vendor fabrics on the market tout a wide array of additional capabilities. A few examples of these include isolation primitives (VLAN and otherwise), security primitives, support for end-host mobility and support for programmability.
Fabric in a world of edge software overlays
Clearly these features add value in a classic enterprise or campus network. However, the modern data center hosts very different types of workloads. So data center system design often employs overlays at the end hosts, which duplicate most of these functions. Take for example a large Web service. It isn't uncommon for load balancing, mobility, failover, isolation and security to be implemented within the load balancer, the back-end application logic or a distributed compute platform. Similar properties are often implemented within the distribution harness rather than relying on the fabric. Even virtualized hosting environments -- such as IaaS -- are starting to use overlays to implement these features within the vSwitch (see, for example, NVGRE or VXLAN).
There is good reason to implement these functions as overlays at the edge. Minimally, it allows compatibility with any fabric design. But much more importantly, the edge has extremely rich semantics with regard to true end-to-end addressing, security contexts, sessions, mobility events and so on. Implementing at the edge allows the system builders to evolve these features without having to change the fabric.
In such environments, the primary purpose of the fabric is to provide raw bandwidth. That makes price/performance -- not features -- king. This is probably why many of the data center networks we are familiar with -- both in “big data” and hosting -- are, in fact, IP fabrics. These fabrics are simple, cheap and effective. That is also why many next-generation fabric companies are focused on providing low-cost IP fabrics.
If existing deployments of the most advanced data centers in the world are any indication, edge software is going to consume a lot of functionality that has traditionally been in the network. It is a
non-disruptive disruption whose benefits are obvious and simple to articulate. Yet the implications it could have on the traditional network supply change are profound.
About the authors: Martin Casado is the founder and CTO of Nicira Networks, a startup in the network virtualization space. He is also consulting professor at Stanford University, where his research led to the SDN/OpenFlow now being shepherded by the Open Network Foundation. He also writes the blog Network Heresy.
Andrew Lambeth has been virtualizing networking for long enough to have coined the term "vSwitch", and he led the vDS distributed switching project at VMware. He currently works at Nicira.