Is NFV OpenStack’s Last Line of Defense?

In the last few months, OpenStack has been under pressure on the enterprise side. VMware is proving to still be the preferred choice for “cloud hosted” apps (legacy apps that are simply hosted in a cloud) and public clouds are winning big on the “cloud-optimized” (somewhat optimized for the cloud) and “cloud-native” (apps built for the cloud i.e. microservices etc.) Whatever action was there for OpenStack, is also under attack by native container frameworks such as Kubernetes and Mesos. I'm not saying OpenStack won't have any place in the enterprise, simply that it's going to be quite a bit smaller than initially expected.

However, one use-case where OpenStack is thriving is NFV (Network Functions Virtualization). Open Source projects such as OPNFV use OpenStack as the exclusive VIM+NFVI component (VIM = virtualized infrastructure manager aka cloud software, and NFVI = NFV infrastructure i.e. the hypervisor + virtual storage + virtual networking layers) along with a few SDN controllers. (Strictly speaking OpenStack is a VIM, but it's often packaged with NFVI components making it a unified stack).

Despite the seeming inevitability of OpenStack in NFV, I think the situation is more fragile than people think. VMware, similar to the enterprise, has locked up “cloud-hosted” VNFs (virtual network functions i.e. the virtual versions of physical networking boxes). OpenStack has a really good shot at winning “cloud-optimized” and “cloud-native” VNFs if, and only if, the community and the involved vendors solve a few key problems.

Performance
Improvements to the scheduler
NFV-centric functionality
Ease-of-use

Performance

A lot of people forget that NFV is first and foremost a networking workload meant to process packets. So performance is a primary concern. For instance, NFV is of no use if it takes 10 industry standards servers with virtual machines to match the performance of 1 physical networking box. Performance affects many layers of the stack. While there has concerted focus on technologies such as DPDK and real-time KVM, I think this topic can go a lot broader with hardware offload of OVS/ vRouter, security functions, networking probing; and use of shared memory instead of virtual switching for service-function chaining. Hardware offload could be implemented either in smartNICs or FPGAs. Net-net, OpenStack could place a lot more emphasis on performance.

Improvements to the scheduler

OpenStack Nova, arguably the essence of what makes OpenStack-OpenStack, will be a decade old technology by the time NFV takes off. In itself there's nothing wrong with that, but Nova needs to go through significant functionality investment (or be replaced entirely by a different scheduler) with features in following areas:

Support for containers in addition to VMs - this is different from the enterprise use-case where you could have two availability zones, one for VMs and one for containers. Here, you could literally have one VNF packaged as a container and the other as a VM, and both being part of the same service chain. So the container and VM may have to be in the same availability zone or even the same node, and share the same SDN and other infrastructure.
Support for alternative schedule methods e.g. event-driven (also called serverless) will be required over time. AWS lambda has unleashed a powerful new way of maximizing hardware utilization by running a piece of code only when required, triggered by an event. A scheduler such as this could improve hardware utilization by 10x for NFV use cases such as vCPE.
Support for distributed NFV (also called fog computing). vCPE with thin clients already requires distributed NFV. With 5G distributed or edge computing, also called fog computing, will become pervasive; and will need to be supported by the scheduler.

NFV-centric functionality

By virtue of being different from the enterprise use case, NFV has some unique requirements. These range from service chaining, service assurance, distributed NFV, smaller footprint (each cluster could be just a few nodes), higher levels of availability etc. Unless the community embraces relevant NFV projects (e.g. Vitrage, Gluon, Mistral, Congress, Blazar, Kingbird, Tricircle), rather than considering them extraneous, the OpenStack community will have a tough time convincing NFV users that they really care about this use case.

Ease-of-use

It has been almost 7 years since OpenStack was kicked off (June 2010), and “day-2” tasks such as post-deployment configuration changes, monitoring and troubleshooting, updates and upgrades continue to be extremely challenging. OpenStack vendors have not adequately invested in these areas making the technology difficult to use. With new cloud-native approaches, where OpenStack will be containerized and orchestrated by Kubernetes (see approaches by Mirantis, Kolla-Kubernetes and CoreOS Stackenetes), these “day-2” challenges should, one hopes, get a lot simpler. OpenStack expertise has also been hard to come by making OpenStack hard to consume. Vendors do offer short courses on OpenStack, but these courses are simply inadequate to create a true OpenStack IT/OPS expert. A 2 month bootcamp instead of a 5 day class might yield better results!

Net-net, NFV is OpenStack’s last line of defense. The sooner the community realizes this and throws their entire weight behind this use case, the better off OpenStack’s prospects will be.

‍