ltstudiooo - Fotolia
I'll admit it, I'm a space geek. Worse, I'm also an occasional pilot, licensed dispatcher and I started my IT career with an airline. I watched the 1970s Apollo flights as a kid, and like more than a few engineers, have followed the technology shift from Werner von Braun's marvelous, hand-welded, pyro-mechanical giants to the smaller, digital, far more complex space systems of today.
Having recently watched the first flight of the new Orion Multi-Purpose Crew Vehicle, after watching its first scheduled take-off get scrubbed, I realized that SDN and programmable networks will face the challenges as NASA's launch system. We'll depend on network automation, but sometimes need human intervention. The big difference is, enterprise IT must provide constant uptime with very little opportunity to scrub an effort.
Orion's primary communication system is Ethernet
The central on-board data network of Orion is Ethernet. Let that sink in for a minute. Life support, navigation, thruster firing, communications and inter-module coordination -- everything -- is interconnected with time-triggered Gigabit Ethernet. True, the network is a proprietary implementation of IEEE 802.3, but it is compatible(ish), and adds a guaranteed delivery message class. It's also a thousand times faster than what's on the International Space Station, and includes radiation-hardened ASICs, which seems like something every core switch should have anyway.
Most interestingly, just like modern enterprise networks, the Orion network concentrates multiple, once-disparate services onto a single circuit. It mixes critical command traffic, best-effort file copy, and intermediate -- though latency- and jitter-sensitive -- audio and video. Unlike enterprise networks, the engineers in charge of the Orion network are a little more careful to maintain plenty of bandwidth, with prioritization in silicon rather than depending on router QoS policy maps.
Autonomous software defined decision-making
During the Orion launch, momentary wind gusts automatically halted the initial takeoff process twice, and in the middle of the countdown. This prompted the flight director to call the dreaded, "Abort! Abort! Abort!" The team then made a telltale decision -- it disabled the automated wind gust sensor and reverted to the Apollo-era control -- a human watching wind gauges for velocity and azimuth.
When we look at SDN and network automation, we expect similar problems to arise. We hope the future will deliver autonomous decision making systems, but we also know we will also require human intervention under certain circumstances.
IT, however, faces a unique challenge that NASA doesn't share. If NASA doesn't have a backup system, it can choose to just postpone a launch. Enterprise IT can't do that. We're always pressured for uptime above all else. Unfortunately, security is the one network quality that usually suffers most when decisions are made quickly under pressure.
In the case of Orion's first launch attempt, a seasoned flight director stepped in to monitor a single go/no go countdown item. The flight director brought years of experience in monitoring a single critical value for SLA excursion. But with SDN, we won't have enough flight directors to take over 100, 1,000 or 10,000 potential access policy conflicts per minute in highly virtualized, converged networks. We'll define service delivery requirements as part of our applications, place security policy access rules in our SDN controllers and then step back and let the self-configuring network sort itself out.
Houston, we've had a network management problem for sometime
The challenge for us is that our current approach to management and troubleshooting won't work if the network operations center (NOC) console glows red with hundreds of policy collisions that pop in and out of existence like top quarks. Imagine an application of today vMotioning, not just from ESX to ESX in a cluster, but VMs packaged for transport with Docker, autonomously moving from Amazon, to your onsite datacenter, to Azure, day in and day out, by the hundreds. These instances will regularly collide with defined access policies as our controllers dynamically seek ever more efficient and cost-effective configuration. And for that to happen, we're going to have to learn to teach the network itself.
There's a lesson here from NASA, too. Whether it's a mission to Mars 20 minutes away, or four hours out into the Kuiper Belt, probes have had to execute autonomously for decades. Thirty years ago, it was mostly down to timers and greatly informed guessing, but now they use a little machine learning and a surprising number of pre-programmed policy definitions to make weighted what-if decisions without aid.
As administrators, we're going to have to learn how to convert our deep knowledge of networking and expertise with individual networks into intelligent rules, not just static configurations. Allow-and-block must become allow-if and block-when. QoS classes will need to become time/congestion context aware. Our network performance monitoring systems will need to understand not just monitoring an IP range, but automatically discover devices based on the traffic they emit.
And if NASA can do it...
NASA's bureaucracy should give us hope. NASA is staffed by some of the smartest engineers in the world, but they use their passion for discovery to push through more red tape than any of us have to deal with at the office. And still, they manage to pull off spectacular feats of engineering that is remarkably free of defects. With less overhead, occasionally enlightened management, and the right tools, SDN will allow admins to gift previously dumb config-based networks with real intelligence. Still though, we might need at least a little radiation-hardened Ethernet here and there.
How network automation works
The difference between network orchestration and SDN
Where network automation and DevOps meet
Understanding OpenStack networking