allenc allencheung

Warnings as Errors in Production Environments

I listen to Accidental Tech Podcast[1], and in episode 54, there was a hearty discussion on whether enabling warnings as errors in production – or in some cases, development – was a good idea. The context was around whether that kind of strict error checking could have prevented the #gotofail SSL bug that affected iOS and OSX in late February, that turning on something like the Clang -Weverything[2] in production would have caught this sooner.

One side of the argument says that warnings are exactly what they imply: they’re things to be aware of, but aren’t system critical and over time would rightfully be glossed over or ignored. The other side of the argument would be that all warnings are meaningful, and that by forcing errors to happen in both development and production, the developer is very incentivized to fix potential issues, with further implication that failure to heed these warnings are a development smell (e.g., the methodology equivalent of code smells). It’s a pragmatic versus idealist debate at heart.

If you’ve read my writing you can pretty quickly guess which side I lean towards.

Using warnings-as-a-hammer is a very hard line to draw in the sand of production systems. In development, this adherence implies a reverence to maintaining a certain style of code and level of systems integration, which itself is insufficient as an indicator for code quality, i.e., compilers can’t teach and enforce good coding by themselves. While in the best case scenarios this can draw attention to potential problems, I find it much more likely that for the mediocre engineers where’d this would have the most impact, it would be faster for them to just find ways around the restriction. If the C++ compiler warns of a bare (int*) cast, then the path of least resistance would be to rewrite it as a reintrepret_cast<int*>.

In production, warnings-as-errors is even worse as it places engineering sanctity above business realities. Code development and execution are means to an end, and businesses would be paralyzed if every library or framework or point-release upgrade has the potential to throw a warning that’d ground entire system to a halt. Much like how Google builds its infrastructure by assuming component failure and spending its efforts in making recovery smooth (e.g., a Google hard drive dies and needs replacement every few seconds), the right solution to recoverable problems in production is log, triage and prioritize, not to explode in the most spectacular way possible.

There are also many systems where this rule simply would not work as the production environment is not in your complete control, or that the cost of patching others’ warnings and errors becomes prohibitively expensive.

Any type of code that runs in the browser is a victim of an unstable production environment; different browsers, and different plugins and extensions on top of your user’s browsers make reporting even errors a completely lost cause. I remember turning on error logging on an older version of Square’s Dashboard, and quickly flooding the system with so many Internet-Explorer-specific errors (with minified stack traces, if they existed) that it pretty much killed the logging server and the email server trying to dutifully notify us of what we thought we wanted to know. It wasn’t just the volume that made this worthless: many of these errors were non-actionable, and even acknowledgement became a waste of time for the team.

As to worrying about library and framework warnings, fretting over others’ minor inadequacies is a particularly insidious form of the Not Invented Here syndrome. Assuming that the libraries are even open sourced (e.g., they can easily be forked and modified), the opportunity cost of tracking down and fixing minor integration issues – which surface as warnings – is tremendous, that is, if they can be patched at all. Warnings about deprecation with no clear successor, runtime issues which are already gracefully handled, or just plain old mislabeling of INFO as WARNING level messaging; there are plenty of false positives with warnings that seasoned engineers have learned to rightfully ignore in favor of doing more important things.

Tools can help us write better code up to a point. Most of the time, the code exists to serve a purpose that is only somewhat correlated with its quality.

Footnotes    (↑ returns to text)

  1. Though the general “tech” label is a bit of a misnomer, the hosts and topics are focused around Macs and iPhones and areas of tech that interoperate with Apple’s systems.
  2. Interestingly, GCC doesn’t appear to have an equivalent, and perhaps for good reason.
By allen
allenc allencheung

Elsewhere