In the research paper "Knowledge-Defined Networking," the authors describe three experiments that take telemetry...
data gathered from all the nodes in a network, process it using machine learning, and then use the resulting data to manage characteristics of network configuration and forwarding through an SDN controller.
They call this process of using elements of SDN and machine learning knowledge-defined networking.
In this article, let's look at the first of these three experiments discussed in the research paper -- telemetry data gathered from network nodes -- in light of potential applications and challenges in real-world networks. The experimental setup consists of an underlay network with 19 underlay elements and 12 overlay nodes.
The figure below illustrates a simplified network similar to the one described by the authors.
In this network, the system described in the paper collects information at the C, D, K and N routers, including the delay across the underlay network of E, F, G and H. Neither the SDN controller (Q) nor the overlay network has any access to the routing information or even the available paths in the underlay network. Using the information gathered at the four overlay routers, however, the SDN controller uses machine learning to infer the paths available in the underlay network.
Using this information, the SDN controller can try placing different combinations of different flows on each available path to determine which sets of flows provide the best performance against a given set of factors. In the research paper, the shortest delay through the network is chosen as the network optimization of interest.
For instance, if there are two flows from M toward B and one from P toward A, the controller will place the two flows from M toward B on the two available links. The controller can work through the available paths at K, placing the flows as follows:
- the first flow on the path through H and the second flow on the path through G;
- the first flow on the path through G and the second flow on the path through H;
- both flows on the path through G; or
- both flows on the path through H.
In moving through the combinations, the SDN controller can use the instrumentation at the four overlay routers to determine which of these possible combinations provides a traffic pattern with the correct delay through the network. This work provides what is perhaps one of the best available use cases for SDN and machine learning in network operations. The use case is solvable at least in a trivial case, yet it is possible to imagine a real-world deployment of this kind of technology to solve specific problems.
No work of this kind, however, is without its problems; if you haven't found the tradeoff, then you haven't looked hard enough. It's important to look at the challenges this kind of work is going to face before it can be deployed in a meaningful way in large-scale fabrics.
SDN and machine learning tradeoffs
First, there is an underlying assumption about the system's ability to measure at every edge of the overlay -- to consume all of this data and derive meaning from it. But at scale, this is not as easy a problem to solve as it might initially seem. For instance, many hyperscale fabrics carry terabits of data every day. Collecting information on this number of flows will be challenging. In fact, it's likely such a system would require a high-speed management network just to carry the network telemetry. Processing this amount of information in near-real time so it is useful for adjusting the flow of traffic through the network will be difficult.
Second, many of the flows in a large-scale network are mouse flows, or microflows. Many applications will generate mouse and elephant flows with different characteristics. Mouse flows are almost always going to be too short-lived to usefully characterize on a per-application basis using any form of machine learning. It is tempting to classify all mouse flows as a single thing, but each application may perform in a completely different way and treat its mouse flows differently. This will probably be a difficult problem to solve.
Third, thousands of applications can run across a single fabric, each with different requirements. These requirements must be expressed in a way the SDN controller will understand. The process of collating and interpreting this information will be enormous in its own right.
Fourth, the characteristics of any application are going to change over time. Machine learning must constantly be trained on new data sets, but these data sets are modified by the application of older sets of rules. This interaction may set up a situation where it is difficult to truly understand the native flow of the traffic through the network and hence to continue the learning process.
Fifth, it is difficult to see how network failures can easily be accounted for in this kind of system.
Overall, this is interesting work, but SDN and machine learning -- and artificial intelligence, in general -- have many large hurdles to jump over before they will be useful for solving broad problems in the network engineering world.
Since SDN technology still isn't mainstream, learn how 'SDN-lite' principles can improve LAN automation and flexibility.
The relationship between machine learning and security
Machine learning grows more popular in the data center