A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it. Max Planck, Scientific Autobiography
As I became more involved in workshops and training, I observed many people coming from old-school sysadmin backgrounds struggle with the moderns tools and methodologies that the SRE and DevOps movements are promoting. I often wonder why it so hard for some of them to learn the “new ways”. I’ve heard many people comment (often disrespectfully) that these difficulties are due to lack of “technical skills” - frequently citing poor programming abilities. While I agree this may be a real problem for some people, this explanation is far from being sufficient. Sysadmins of old have traditionally used tools of great complexity and power, so much so that ordinary developers feared to wield them. I have met many developers that still refuse to touch “Ops tools” or production systems because they feel the tools would run amok. Surely the people that tamed such power possess significant technical skills. We must look for another explanation.
I believe the deeper explanation lies in the tools and work processes of the sysadmin profession. We must understand the world in which they live and work in and how they think about systems. In this sense, Bash is not a scripting language nor a shell; it is a thought pattern that we need to understand.
The Bash world
In the “programming” world, we are used to thinking of basic constructs out of which we compose our programs. These constructs can be either intrinsic statements or expressions made of variables and values. Every statement has one input - a set of arguments; every expression has an input (again, a set of arguments) and an output that we can assign to a variable. From these Lego parts, we construct functions which are used in statements and expressions. A function has an input (a set of parameters) and an output - its return value. Sometimes, we add another form of output known as an exception - which cannot be understood without a clear concept of scope. We divide our toolchain to “runtime”, “libraries” and “modules”. The runtime is often regarded as magic - we simply believe it works and don’t look too deeply into its guts, but libraries and modules are ours to write, read and use frequently. They are often written in the same language we use and follow the same principles. In the programming world, we have APIs - sets of functions and data structures that together create an interface.
In the Bash world, on the other hand, functions and commands do not have a single input and output - instead they have several. A Bash function can have a set of arguments but can also have an input channel known as stdin. On the output side, Bash functions have 3 outputs - the return value, stdout and stderr. In Bash we do not have exceptions because the concept of scope is not well defined, again we have several intertwined pseudo scopes: environment variables, Bash variables, subshells, blocks, local and global variables and so on. Instead of dividing our world into libraries and modules, we have Bash built-ins and program binaries, both of which are regarded as magic and written in a non Bash language. Bash does have libraries but they are not used very often. In the Bash world, we don’t have APIs, instead we have filesystems - a set of files and various file syntaxes that together create an interface through which we can manipulate the world.
Clearly then, a Bash programmer thinks very differently from a Python or Ruby programmer.
The production world
Sysadmins do not write tests. This may seem reckless but is the result of their unique experience with complex production systems. Experience has taught sysadmins that the value of tests in those systems is very small and is most often not worth the effort. Why is the value of tests small? In order to write meaningful tests, the system must be deterministic and we must be able to control all its inputs. In addition, we must also be able to capture all parts of the system’s output and be able to compare it with the “expected” output of the test. In programming parlance, we say that we are either testing “pure functions” or that we mock and stub whatever components the functions being tested interact with. This is simply not possible in complex production systems. Just as “pure functions” exist only in theory (every function has at least the side effect of consuming CPU time), so do “stateless services” exist only as a figment of imagination - there is always some state somewhere. But unlike “pure functions” whose side effects can be neglected for our purposes, the state and interactions of “stateless” services cannot be ignored so simply. And to make matters worse, we can’t stub and mock the components used by the process under test - either because we don’t even know what it’s using or because mocking would cause the test to be meaningless. Think about a database backup process - if we mock the database, what is the point of our test?
In recent years, this problem is encountered by programmers more often, as the systems they work on become more and more complex and stateful. Most programmers however, spend the majority of their time working with systems that are reasonably testable.
The manual labor world
The sysadmin profession is characterized by “manual labor” - tasks that were designed to be carried out by human operators. Often, a sysadmin would not agree with the ideas of system designers on what should be automated and what should be left for human operator and would automate parts of the system using scripts. In the absence of proper APIs (mostly due to the designers’ notion that this doesn’t need to be automated) sysadmins used “virtual human” tools for automation:
yes |, text parsing and the notorious
What differentiates programming from “automation” is this notion of a “virtual human” manipulating the world. The absence of APIs and focus on side effects (as opposed to data transformations) makes the automation code hard (if not impossible) to check and test. The wide variety of output signals makes it impossible to capture all error conditions and outputs, so error handling is often ignored completely.
The obvious solution would be to stick APIs everywhere and write proper software to manipulate our systems. Unfortunately, the difference between APIs and UIs is bigger than just the medium of communications - the semantics and even primitive concepts are very different - because computers and humans don’t behave in a similar manner. For example, it makes sense to have a notification API fire a 100 messages per minute, but no human can cope with such a notification rate. We are forced to build yet another layer of human control software on top and push the operator interface/automation problem upwards in the stack.
The black box
Sysadmins (and operators in general) primarily deal with “black boxes” - systems whose internal structure and state is obscure. This can be the result of the system being designed and built by another group (perhaps vendor) or simply due to not having the time needed to learn the internals of the system. Sysadmins often deal with a large number of systems such that it’s infeasible to learn all of them to an expert level. It’s not uncommon to have a single person in charge of: MongoDB, MySQL, Linux, ElasticSearch, Apache, HAProxy, Consul, Tomcat and RabbitMQ - and you could spend months if not years mastering even one of these.
This “black box” view, when carried to its extreme (actually quite common in enterprises) assumes that we cannot change the box at all. If there are bugs, we will work around them. If the interface is hard to work with, we will wrap it. If it crashes, we will reboot it.
An unsure future
Can sysadmins learn the programmer’s way of looking at the world? Can they master the concepts that power the software engineering discipline? Can they learn to think in terms of logic and APIs, of artifacts and libraries, of well defined and (relatively) predictable and testable components? I hope so.
Can programmers learn the sysadmin way of thinking? Can they master the concepts that describe a world of complexity and constant flux? Adapt to this world in which the dynamics of the system are stronger than the individual behavior of components, where nothing is well defined and flexibility is everything? Again, I hope so.
I believe we should learn from both disciplines and let ourselves be convinced by one another. In an industry that invents new technologies at a staggering rate, we simply don’t have the time to wait for the old generation to die and for another to grow up.