Adam Dunstan

System, Platform & Infrastructure Engineering

Adam Dunstan

Adam Dunstan

System, Platform & Infrastructure Engineering

Adopting White Box Switches - 3 key considerations

May 31, 2018

White Box Switching (wbS) is radically changing the network equipment ecosystem, providing the opportunity to reduce port costs by two orders of magnitude. At my previous employer, I managed a project where we went “in deep”, sourcing hardware from Original Device Manufacturers (ODM), working with the Switch silicon vendors and developing our own in-house Network Operating System. It’s a significant undertaking, but in the right circumstances can yield huge benefits, but you don’t need to jump into the deep end to benefit from wbS.

The foundation of wbS is volume merchant switch silicon. Tradition vendors have used a combination of merchant silicon and their custom ASICs in their switching product portfolio with merchant silicon usually represented as lower cost, less complex products. However the development of merchant silicon has continued to narrow the performance and functionality gap between custom and merchant silicon, a fact noticed by webscale and Cloud Services companies as well as new switching vendors.

The release of standardized Open Source hardware designs by the Open Compute Project, initiated by Facebook, was the catalyst that mainstreamed wbS. These designs based upon the most common merchant silicon and the most common form factors provided readily available high performance hardware platforms separated from software and a vehicle for ODMs to begin to market their hardware directly and new unbundled Network Operating Systems to emerge.

Why am I considering wbS? I am able to get the functionality I need from an OEM but Im looking to reduce costs, or, I cannot get the functionality I need from an OEM and I want to reduce costs. What skills do I have? Where do I fit on this scale? We run a system but do little development - We develop web and business application but mostly high level programming - We have some low level developers competent in C and hardware. Figuring out the numbers? Develop a baseline by taking the “Present Mode of Operation”, including hardware and all services over the equipment amortization period that supports your business needs (not based upon the expense of the hardware). With the baseline in hand, calculate the equipment, software, services and staff costs over the same amortization period and compare the results.

The answers to these three questions should determine how shallow or deep you should dive into the wbS ecosystem.

White box Switches consist of a number of key components, Switch Silicon, ODMs, Software and On-going Services.

Switch Silicon

The foundation, consisting of two broad categories, programmed and programmable. Programmed silicon contains a structure of forwarding functions defined in silicon and programmable silicon, historically called Network Processors, execute specialist programs developed to provide forwarding functions, both are manipulated via a software hardware control layer. The programmed vs programmable distinction is important as it drives functionality, flexibility and importantly costs. While programmable silicon sounds like the obvious choice, programming this silicon is a specialist task even with new more accessible mechanisms such as P4 and DPDK, and as most functionality is available in programmed silicon is more commonly used.

In many cases programmed silicon is more targeted at a specific use case and therefore often has more integrated components. The switch silicon determines the port counts and speeds of the switch, and is combined with a varying number of other components necessary to build the switch, more components generally results in higher costs. While the switching silicon provides the primary function of packet forwarding, there are a number of other mundane tasks are necessary for an operational switch, power control, fans, physical layer controls and LEDs requiring control via a hardware API and are usually different between ODMs even when switch silicon is identical. The other important component is the control board which is somewhat more straightforward as they are derivatives of general purpose compute designs and their connectivity to the underlying switching components is largely defined by the switch silicon provider. Importantly volume drives form factors and fixed function 1u is less expensive than chassis based.

Original Device Manufacturers

They assemble the switch, for traditional vendors (OEMs) based upon their designs, derivatives of silicon vendors reference designs and other sources such as the Open Compute Project. The ODM’s have in-house hardware design expertise that is used by OEMs and leveraged by ODMs to create their own switches. Purchasing directly from an ODM is different from a OEM. ODMs primarily manage materials and production lines, so volume expectations and lead-times are quite different, as well as the commercial terms and services offered. Some of the ODM’s also operate an OEM business usually under different branding, providing a purchasing alternative for smaller volumes and access to warranty services similar to that of OEM’s. However as they do not package the complete solution, software is still required from another source.

Switch Software

Often incorrectly referred to as a Network Operating Systems, it provides the final component to create an operating network switch and should be more accurately described as switch applications. Decomposed the software consists of an operating system often Linux based, a silicon/hardware control layer and control plane applications combined with management interface/API for configuring and controlling the device. In the case of programmed silicon the switch software is running on a control board, programmable silicon will have both code running on the switch silicon and the control board. While older OEM switch software often used VxWorks, today Linux dominates the control board operating systems in most cases with little modification.

Arguably the most important software layer is the control layer. Using SDKs provided by the hardware vendors, both switch silicon and ODM, it provides the hardware abstraction used to program the switch silicon and other hardware functions. The process is similar with both programmed and programmable silicon, however with programmable silicon the contents of API are defined by the programs running on the silicon. With programmed silicon the API’s are a function of the silicon, however they are not necessarily simpler. The programmed silicon used in common wbS has a significant amount of functionality and flexibility as to how that functionality is employed. When adding new functionality it is not uncommon for programmed silicon to have multiple alternatives for implementation and to consult with the silicon suppliers developers to find the best method. Therefore, while the switch silicon determines functionality, even in programmed silicon the functionality and capabilities enabled by switch software implementations from different software sources can vary greatly. It is also important not to forget the need to control other switch functions using the API provided by the ODM, this includes setting PHY controls and mundane functions such as monitoring temperature to enable power supply fan controls. Most importantly, this software layer determines what ODM products are supported by the software, using the same switch silicon does not make ODM hardware compatible. Bottom line is the Switch software is not portable across ODM vendors.

The switch control layer has historically been hidden from end users with most of the focus on control plane protocols and management tools. These control protocols are available from many sources both proprietary and open source, and as the protocols are mature most sources are good. In source code evaluation there is one specific area worthy of focus to get an insight into the code, because its an area where there is no perfect or standardized solution. It is the code that reduces the Routing Information Base (RIB) to the Forwarding Information Base (FIB) examining how it operates when large numbers of routes are being added and removed such as when route sources are flapping. This also provides an insight into the structure of the code, code that was originally developed to be sold as source code usually looks very different to code that was developed to be part of a packaged platform and never seen by customers. They all start the same but over time, the packaged platform code often becomes “impenetrable”.

When you consider switch control functions as applications running on standard hardware, a different model of management emerges. You can choose to manage the switch using the traditional tools used in the past or manage the switch using server management tools integrating switch management into overall server/infrastructure management. In cases where Operating System access is unrestricted, your standard suite of server management tools can be installed. This is a level of flexibility will increasingly be supported by OEM’s and Switch software vendors as it provides one of the simplest and powerful ways to benefit from a Switch upgrade.

So which end of the pool is right for you?

Shallow. Getting the “lion’s share” of the first order of magnitude is relatively easy and would be the prudent if you answered question one seeking just a cost saving. By leveraging the fact that existing OEMs use the same switch silicon as ODMs, you can figure out which switches should be the least expensive by enquiring as to the switch silicon used. Compare a similar wbS from one of the wbS hardware only vendors and the value of hardware vs software can be derived providing an appropriate negotiating position. By focusing on only switches using volume merchant silicon, capital savings can be achieved.

Middle ground. Your infrastructure would benefit from greater integration of networking with the overall operational model, your willing to do some development, you would like to save capital along the way and your willing to engage a new vendor/platform. This is a great place to be, there are a number of new and emerging vendor with solutions that range from OEM products with modern configuration and management APIs, through to solutions where software and hardware are procured from different sources with the software provider taking care of switch hardware interoperability. In most cases, the “decoupled solutions” offer access to the underlying operating system providing the ability to use server style management tools to integrate switching with server infrastructure. This level of integration does not require low level network equipment development skill, but can provide significant operational savings. Further if a decoupled solution is selected and your volume exceeds ~1000 switches annually and you can warehouse and/or plan your deployment, you should be able to get the attention of the ODM’s and negotiate directly to get increased capital savings

In Deep. The deep end is taking on putting the switch together yourself. This requires specialist and low level skills, as irrespective of if you purchase source code as many OEM’s do, or use publicly available hardware control layer code such as SonIC, you will need to understand the silicon, and silicon programming is difficult. This is a complex undertaking requiring specialist technical and business skill. However, at this end of the spectrum, any modification to the system that can improve operations or provide competitive advantage are possible. Your network infrastructure can have capabilities unique to your business requirements. However, unless required switch volume is exceeds the ~1000 switches annually level and the attendant skills, this is usually impractical.

Ongoing services

Irrespective of your chosen strategy there is a need for ongoing services. These range from hardware RMA to outages and bugs. Selecting a strategy resulting in current OEM context where a single packaged product is procured allows the present operating model to remain. However when the switch components are disaggregated the services obligations also are disaggregated. It is not commonly know, but ODM often provide RMA services on the products they build for OEMs and in many cases the OEM never touches the product, therefore a wbS provider can often handle based warranty and returns. Where disaggregated packaged software is used, that vendor provides services based upon the use of an approved wbS and also handles the interaction with the silicon vendor, creating a straight-forward separation of responsibilities. It gets significantly more complex when developing your own software, you will need to interact with the silicon vendor and provide support. Add the complexity of procuring commercial source code to help the development and a complex structure of commercial terms and responsibilities emerges, one that will most likely require new contracts and that only makes sense at volume.

So how much ROI is enough to make this worthwhile. I took the position that anything more than break even was sufficient because inherent in the adoption of new switching technology is moving to a new more automated operating model. A new automated operating model can remove significant operating costs in a large scale network while reducing service delivery times and reducing configuration errors by removing human intervention. Ultimately productivity and removing impediments to business growth should be the goal of a wbS program, using wbS hardware capital savings can help pay for it.

Note to reader. I could have put example vendors however to position vendors when I am not party to their strategy is something I would rather avoid. Instead I thought it would be useful to provide a list of vendors, in alphabetical order that I was thinking about as I wrote this as a place to start, by no means is it complete. They include: Agema/Delta Electronics, Arista, Barefoot Networks, Broadcom, Cisco Systems, Cumulus Networks, Edgecore/Accton, IP Infusion, Juniper, Mellanox, Metaswitch. There are a couple of open source projects as well, Free Range Routing (frrouting.org), P4 (p4.org) and OCP Networking (www.opencompute.org)