A general rant on Qfabric and other related technologies …
“Keep It Simple Stupid”, how many times have you used that phrase? How many times have you applied that principal? As we step into the world of the next generation data center I try to keep this simple principal at the forefront of my thought process. With that said, can Juniper Qfabric simplify current data center quandaries and simplify our lives?
Before we begin I am going to assume that you understand the basics around data center fabric technologies, specifically Juniper Qfabric. If you are not familiar with Qfabric I suggest that you first check out packet pushers podcast show 51:
When I first heard Juniper’s Qfabric marchitecture pitch I immediately thought to myself, now why would I want to totally flatten my network to begin with? Within a given module it certainly could improved performance across very large environments. And yes, there are also certain potential advantages for large mutli-client provider environments as well.
The simple fact is that it looks like you will need something in the ballpark of 300 10GB attached bladecenter/hypervisor attached nodes that need direct connectivity to make effective use of a full Qfabric deployment and make the ROI numbers work out to something reasonable. We still do not have hard numbers on the total cost of the solution so this is speculative at best. If this doesn’t sound like a fit for your environment you can quit reading here, however there is much more to the story so feel free to read on.
What is the ‘fabric’ buzz anyway?
Virtual cloud fabric unicorn tears for everyone! As a general definition when I say ‘fabric extension’ I am speaking of extending the internal switch fabric/backplane switching technologies outside of a single physical chassis. Different vendor’s have their different mechanisms for doing this but the general concept is all the same. There are obvious benefits to fabric extension technologies in the data center.
In conjunction with these technologies is a desire to replace the traditional STP (Spanning-Tree Protocol) with a multi-path, loop prevention mechanisms in which some vendors have developed their own proprietary methods with standards entities making their case for two emerging mulit-path, loop-prevention protocols: TRILL and SPB. It seems to me that TRILL will win the enterprise while SPB make take the provider space.
I will not further discus the merits of these protocols in this blog post. Suffice it to say, as an architect I will watch the development of the proprietary mechanisms as well as the maturation of the standards. A small aside, it is always beneficial to wait for and adopt open standard protocols when they become available. However, as necessity breeds invention, technologists are often are forced to adopt a proprietary solution until open standards become available. With that said I will watch with keep interest to see which vendors are going to provide the most open protocol friendly path AWAY from their proprietary solutions.
Do we need a completely new model?
The first question I am grappling with is do we need to throw away the traditional 2/3 tier hierarchy and move to a flat single tier network. In short, the conclusion I keep coming to is, it wouldn’t be a necessity. Designing to a hierarchy is something I generally preach about, but why? Well, I have heard some say that Ethernet networks were originally formed in a hierarchy because of spanning-tree. That since spanning tree needed to have a root switch and triangles are better than squares, we built hierarchies.
There is some truth to this but there is much more to the story. My view of the hierarchy is somewhat independent of network protocol behavior, rather it is a structure to which we can apply appropriate segmentation of the individual components to the data center. For example, we can have dedicated ‘modules’ whose traffic flows up the hierarchy to be connected to other ‘modules’. This allows for simplification within the given modules.
The use of various hierarchical structures in technology is well documented and established. Generally speaking, building in a hierarchy provides a simplified structure that benefits human’s primarily. We work, live and thrive in hierarchical systems. Straight tree or pyramid hierarchies are not always the most efficient, take for example relational database systems, however. In theory a relational database isn’t built in true hierarchical fashion (this is debatable, depending on your definition of hierarchy) but in practice, the human application of relational databases generally follows a hybrid horizontal hierarchy.
This is not to say that I believe in staying with the layered approach currently in use today either. The predominately referenced three tiered structure is rarely deployed in the traditional fashion. Most practical applications collapse the logical layers into a single physical device. The collapsed core/distribution in the enterprise is the norm today. It seems to me that a two tiered structure will work well and provide suitable scale. We will shelve further discussion on this point at this time. Suffice it to say the traditional model could use a re-working but this isn’t to say that it needs to be discarded completely.
Major outstanding Qfabric design questions
What of purposed segmentation and security inspection of traffic flowing between logical domains? For example, in the traditional tiered model I snap firewalls, IPS and other security devices onto the distribution module. As the traffic flows up the tree between domains it naturally traverses this layer. At this layer I can provide some level of traffic inspection, validation, logging and traffic visibility. This was the first major question and concern that I posed to the Qfabric sales and development team. Their response was we are working on a solution but in the mean time you will need to hairpin security devices off of one of the edge switches.
Also, what of traffic visibility? How do I gain visibility to the network traffic traversing the fabric? Visibility is very important to enterprises. As I understand it they have a solution available to mirror traffic and gain visibility but I am concerned about the viability of that given the over subscription levels we would be dealing with in an any-to-any connectivity model.
Proprietary is bad, open standards are good (more or less)
Since Qfabric solution is rather ‘creative’ solution it is also a very proprietary solution. This can be somewhat of a double edged sword in that creativity is a key component to drive technology forward but boundless creativity can create more issues than it solves. The is particularly true for production enterprise networks. It’s one thing to deploy create and inventive solutions in a test sand box it’s quite another to do so in a production network. The inevitable truth is that enterprise architect’s job is really to keep an eye out for new whiz bang next generation technologies bathed in unicorn tears and to keep the other focused on industry standard, best practice guidance. At the end of the day what we deploy now is tried, true, tested technologies and we generally deploy them as close to best practice as possible given the particulars of our environments.
As I heard recently stated on a podcast there is almost a mathematical equivalent between the evolution of new technologies and the maturity of these of technologies towards adoption. Generally speaking, from the inception of a protocol to it’s deployment in the real world is roughly 10 years. Wide scale adoption happens another 3-5 years after. Many major protocols have followed this timeline closely including IP, Ethernet, MPLS, IPv6 and the like. Back to our fabric discussion, extension of the switching fabric has actually been discussed for a few years now. Greg Ferro did a nice piece on the evolution of switching technologies and an explanation of the switch fabric:
The wrench in the cogs
Google has published several very interesting pieces on data center efficiency discussing data-center efficiency (see the article from Wired magazine below for an introduction). They have done some real number crunching and have found what I have suspected and many have feared to be true, using tried and true methods on slower but more widely scaled hardware, is more efficient and less expensive. The proof is in the pudding and Google has a pile of empirical data to show the reality of these claims. What does this mean for network technologies? Well, simply put, vendor’s will always try to convince us that their new wiz-bang products are going to ship with magic jellybeans that will sprout money trees when this is in fact is often far from the truth.
We generally need to get back to the roots of any given problem and identify the true requirements we are trying to design to. This is where I get back to the question of necessity. Does my environment currently have a need to provide faster, flatter, multi-path access on the scale necessary to justify the cost? What is really at the root of driving these requirements upward? The answer is generally virtual environment flexibly. The what-ifs come into play, what if I want to move this VM from point A in the network to point Z. Are these really necessities or niceties?
Another interesting point I picked up from digging into some of Google’s more recent data center findings is that they are building the network as it should be. They handle many of the scale and flexibly at the application layer rather than depending on elegant network and systems solutions. Really, many of these issues being introduced into the data center environment are artificially created at the application layer. All too often the network team takes the requirements given to them at face value and starts building solutions. In reality we need to be focused on working together better as a team to develop synergistic solutions that are simple, realistic and that can scale. This may be one of the most difficult to achieve as well as one of the most under appreciated aspects in the networking field, perhaps a topic for another blog unto itself.
The question that has to be asked is the risk worth the potential benefits? In the case of Qfabric in the enterprise I have my doubts. I appreciate the forward thinking and initiative being shown and I do not doubt that Qfabric will find adoption in certain provider circles. There is a reason the KISS (Keep It Simple Stupid) has been viable for so long … It seems to me that, given the information we have now, the downsides far outweigh the benefits for enterprise environments. Stay tuned.