The cobbler's children are the last to be shod.
So it is too for software developers these days. I'm reminded of this hoary trope when I think about the state of twenty-first century programming languages and technology. Because its is still very much mired / anchored in the 95-character ASCII/EBCDIC punchcard-era practices of the fifties and sixties.
This reverie was triggered by a chance Twitter sighting of a 'blog post by Canadian Computer Scientist Greg Wilson on why Donald Knuth's vision of Literate Programming had failed.
Greg's piece resonated with me because it addresses two themes I've trotted out myself on a couple of occasions. Such is the nature of this sort of triggering.
The first is the enduring irony that, despite having delivered all manner of post-ASCII visual GUI bliss and razzle-dazzle to countless users of all descriptions, programmers themselves must still trade in primitive text-based program representations that would be familiar to time-traveling hackers from the Nixon Administration. With a handful of exceptions, today's production code would be quite at home on a keypunch (sans case). These textual accounts of programmer intent remain the canonical, authoritative representations of code, from which any supporting executable binary accounts must be derived. For most, the vast majority, in fact, programs are ASCII characters that live in ASCII files, in directories, which these days, might be under some sort of source control, but still...
Now, one of Greg's points in his "ASCII" passage, was to remind us that Knuth's vision of integrating complex typeset annotations like equations and tables with code, and binding them to the codebase, is even now only clumsily possible using embedded HTML in "doc" comments.
And, it's true, we don't do this well. In general, tying diagrams, equations, and commentary together with code in a way that presents it conveniently along with the code so as to make it even remotely possible to maintain it and keep it consistent with changes to the code, is just not something one sees well supported. And given the polyglot nature of so many applications these days, this ASCII common denominator too often places important cross-language artifacts beyond the reach of refactoring tools that might, given a richer representation, have had a better shot at keeping such support material in sync. Think XML configuration, definition, and data files, even schema.
The cobbler's children lament, then, is borne of the envy that one feels over the fact that so many our customers all get to enjoy much richer representations of the important objects in their domains than we programmers do.
This theme has been showing up of late when I talk about the ubiquitous Ball of Mud phenomenon. These talks have come to have several "DVD" (select your own) endings, one of which explores the idea that one source of our perception of muddiness is the primitive tools and representations we currently employ to navigate and visualize our codebases.
This in turn, is a partial result of working with refactoring tools developers at UIUC a few years back, where I came to believe that program representation is one of the most interesting and important unsolved problems in program language research. By this, I meant getting beyond 95-character ASCII into a richer, round-trip AST-level representation amenable to analysis, refactoring, and annotation. Such representations would need to be cast somewhere beneath text, but above the level of say, Java .class files.
This is neither a new nor an original idea, but it remains, alas, unfinished business. Ever wondered why refactoring tools for C and C++ are so slow in coming? This is part of the problem.
The second reverie that Greg triggered, which was partially reinforced by another Twitter citation, this Redis Road Map from Paul Smith, was this one:
One of the underlying premises of the now discredited waterfall division of labor between skilled architect/designers and rote programmer/coders was the implicit, presumably self-evident assumption that the code was unreadable, and that design must therefore be conveyed by "higher-level" descriptions like documents and cartoons.
Nowadays, a fundamental article of faith underlying the craftmanship, clean code, and iterative, emergent design communities is that the code itself can be wrought to carry the dual burden of conveying to the compiler a comprehensive set execution instructions, while at the same time communicated to at a suitable level of literacy and abstraction the details and intent of the artifacts design. A lot of stock has been placed in this conceit, so one can only hope it holds up.
It is clear that code can do so far better than the woefully minimal degree of adequacy that many shops tolerate. But the question of whether, even at our best, meticulously tended source code will be enough remains an open one.