Thursday, November 14, 2013

Ramblings on Haskell's 'cognitive overhead'

Does Haskell have 'excessive' cognitive overhead?

The question about whether Haskell (or a very similar language) might one day become a popular language for solving the increasingly tough programming problems ahead of us (concurrency, distributed computing etc.) seems to come up quite often.

It's an interesting question.  

One of the great things about Haskell is that it does seem to have answers to big problems and those answers can be traced directly back in many cases to the defining fundamentals of the language (universal purity, composability and to some extent laziness).  Pure functional programming tackles many of the ills that beguile software today.  That ideas from the functional programming world are highly valuable is less and less controversial, as features like lambda functions/closures and even monads are being imported into other languages and their libraries with gusto.  

However, those features alone aren't the sum total of Haskell.   Indeed, Haskell is characterized as much by its powerful type system as it is by its exclusive use of functions for expressing computation.  Haskell's type system is also differentiated from most other languages in that, while static, it is also inferred - so for the most part the programmer never has to write type annotations… as much as this is considered good practice.  This property is very hard for imperative languages to achieve in general (although languages like C# have introduced limited forms of it).  

I have watched many developers, who have spent most of their careers being trained and practiced in imperative programming, attempt to take to Haskell.  Admittedly most of these developers have not spent much time thinking about and using functions as first class objects, even as these have been added to imperative languages.  That's because most developers I know have spent years with Java, which unfortunately has been a laggard in terms of first class functions.  Nevertheless, most programmers do believe they understand functions when first asked to compare their experience with what they think they'll find in Haskell.  What they may be unaccustomed to doing however, is thinking exclusively in terms of functional composition.  

Invariably, there's a lot to learn when approaching a new language, and even more when meeting a new programming paradigm for the first time.   
Haskell presents a number of potentially avoidable initial challenges to those seeking to learn how to use it effectively, for example:
  • Tutorials are getting much better, but traditionally have been couched in mathematical terms of reference.  Even today, a lot of conversations on Haskell language features and programming issues are discussed in terms that the average software developer will find completely foreign.  Unfortunately, it seems that the established Haskell community enjoys a certain amount of this mystique and even some outright ‘occultism' ;-)  
  • Similarly, Haskell's library documentation, especially beyond the standard libraries, is somewhat sparsely documented.  Neither does it help that Haddock generates type signatures for functions, but does not include argument names.  Argument names are also often incredibly terse in many Haskell libraries and programs, but where they do convey useful information this does not appear in the resulting Haddock docs.  This can make browsing and discovery of library functions more difficult that it should be.  There are plenty of examples of functions with arguments like drawRectangle :: Widget -> Bool -> Int -> Int -> Int -> Int , where the meaning of most of the arguments is completely opaque.  
  • Developers used to IDE support have been somewhat challenged with the support for Haskell, although again this is improving nicely with EclipseFP and Leksah.  Even so, it is remarkably hard to search for functions when you know some arguments or some type that you want.  Hoogle is one tool that purports to do this, but I still find this very awkward and I've been playing Haskell for years.

Notwithstanding these peripheral issues however, for imperative programmers new to the Haskell language the real unavoidable conceptual displacement surely comes with:
  • Learning how to think in terms of composing functions
  • Learning how Haskell's type system works
  • Getting comfortable with common language features and patterns like type classes and monads
  • Learning a basic 'vocabulary' of functions in standard libraries

The first two of these seem to present the largest overhead for newcomers.  The plain fact is that most people just have not created the mental apparatus to think naturally about function composition and polymorphism with the degrees of expressibility that Haskell's type language offers (although thankfully "generics" have been a feature of OO languages like Java and C# for years now).  

New tutorials like the excellent Learn you a Haskell finally provide an accessible and well-paced route through the material that seems to work for most developers converting to Haskell from the imperative world.  This has really helped with some of the 'unavoidable' learning curve. 

Once developers have genuinely grasped the fundamentals of Haskell, they are typically aware of the following dynamics:
  • It can take more front-loaded reasoning to get to an initial shape of a solution
  • The type system is an ever-present reality.  
  • Once the program compiles, it usually works solidly
  • However, it may not work efficiently or perform as desired (and then exploring and debugging a running program is quite a different experience)
  • It may also begin to dawn on them just how flexible and powerful functions are as a raw material - due to every function having its own 'interface' and being directly composable with other functions.  

Some developers when initially confronted with the ideas of Haskell, or perhaps as they are learning the ropes, start to emote that the type system is an impediment.  This seems to be a reaction that is far more pervasive when newbies connect with Haskell than it is when they pick up something like Java or C#.  Adding to this is perhaps the increasing exposure that programmers get to scripting and dynamic languages (such as Python, Clojure or Ruby) that can condition them toward code-test cycles that do not involve any form of compile time type checking. 

This gets us squarely into the discussion about static vs dynamic languages.  While this is a dichotomy that lies outside the scope of Haskell's advantages compared to imperative languages in general, it is nonetheless an important overall factor in how Haskell is perceived these days within the community of people likely to express an interest.  So, it's worth a bit of discussion on the topic.

People who support statically typed languages argue along these lines:
  1. Correctness in software is clearly critical
  2. Statically typing code is a way to get early validation and warning about a fairly broad category of problems that can exist in code (though runtime testing is essential for validating algorithms and workflows)
  3. Computers are extremely good about managing large quantities of detail.  Humans are good at planning and design, but can only focus on a handful of things at once.  Ergo, it's advantageous to have the computer play a role as a correctness tool for details as software is being developed.  Types provide low-level coverage of general details, where runtime testing provides higher-level validation of behaviour (but only for things you remember to test). 

People who support dynamic languages argue along these lines:
  1. All that really matters is whether some code eventually behaves correctly compared to its requirements.  
  2. Therefore runtime testing is the only thing you need if you are disciplined enough to write appropriate tests as you write your program logic.
  3. The cost of managing the annotations required to feed a static type checker to verify your code is outweighed by the benefits of just coding and writing tests.

Given that the use of dynamic languages in large, complex, software projects is a relatively new wave, I think the jury is still out of how the economics play out over the full lifetime of a large software system.  I'm certainly very curious to hear and read of experiences with this software practice.  However, I am already well aware of a number of projects that have gone very much off the rails, specifically with the use of Ruby at large scales.  

One aspect of all this that seems common sense (to me!) is that software development is an extremely human activity.  As we scale up the number of people actively engaged on a software project, we naturally have to be concerned with:
  • Division of labour and responsibility
  • Practical and effective coordination
  • Good communication (intent, architecture, design and implementation choices/constraints)

Personally, I think this social aspect of development is under-discussed as a determinant in the appropriate choice of programming language.  

The concept of contracts is essential in any activity that involves multiple actors.  These may be formal or informal, technical, legal or social contracts.  
What's pertinent to this discussion is that groups of developers have to establish and maintain many agreements between parties if they are to be efficient (which includes reaching an acceptable implementation quickly and finding problems with quality as early as possible).  Many agreements are not formalized and checkable by computer, e.g. who works on what, who accepts work as complete, what libraries should be used.   However, where we choose to divide up the design of software (for problem decomposition or assignment of responsibility) we create the need to agree contracts.  Software contracts are typically called interfaces of course and these are something that can be formally expressed and checked by a computer.  Preferably, the checking should be both types and behaviour - but type checking alone is nonetheless highly valuable as a computed proof of some very important properties of an interface.

It seems to me, while pondering some of the disasters I'm aware of, that dynamic vs static issue should be rather more nuanced than it is often played out.  I suppose people find it easier to cleave to black and white positions, as much as many people also like to flock together when taking a line on technology (usually motivated by their own prior investment in skills).  In truth, the value of formalizing contracts and interfaces does change with scale.  I might ask the neighbour's son to rake some leaves on the lawn in exchange for a small fee, but if I'm engaging a company to paint my house then invariably we'll want to ascertain more precisely what gets done and for what price.  Similarly, a dynamic language (we used to call many such things scripting languages) can be a very lightweight way to express solutions to small problems.  By small I probably mean a few tens of pages or code, or perhaps a few 'modules', but in fact that is proportional to the collective 'detail capacity' of the actors involved.  This is a very interesting part of the problem.  

It's far more likely that a single author can construct and memorize details of code they've personally created.  Of course, the passage of time is well known to disrupt this, but still.  
I have seen high-quality teams of a half-dozen people informally maintain the quality social contracts, discipline and communication necessary to keep everyone well-coordinated in software projects.  This works well when there is a strong sense of leadership or shared values in the team that can make it a 'high context' environment in which people mostly just know what the right things to do.  
However, there's a point where, whether by pure scale or by heterogeneity (dislocation, politics, disagreement, competition), a team will simply not be able to function with the requisite cohesion to allow informality, culture and values to constrain possibilities enough for efficient progress of the group as a whole.  At some point more formal contracts become highly valuable and arguably therefore so does well-typed interfaces.

Given the foregoing discussion, I would characterise Haskell as being an appropriate language for high-complexity, large-scale projects.  

However, as discussed, it is also differentiated from most statically typed languages by virtue of having almost universal type inferencing.  This means that theoretically you don't have to write a single type declaration in your program, but Haskell will still have the power to check and reject bad programs.  Of course, that's different to having no type checker at all, but it goes some way to addressing a complaint that you often hear from proponents of dynamic languages: that they'd like to write code without having to spend time writing out and managing a bunch of types as they add and modify things.  In that sense Haskell makes a great prototyping language, but I would add another aspect too.  In many languages, there's a lot of boiler-plate code to capture essential structural elements that are required by the language design.  In languages like Java and C# for instance, you will be typing quite a lot of just to express the minimum wrapping of code in a class or two, or code to express accessor methods on properties.  That's different in something like Python or Ruby where there is implied encapsulation for 'naked' code, and this is also true to a great extent in Haskell.  In Haskell, every function has its own interface, so you certainly get to experiment with combining them much more freely before you ever need to start thinking about the distillation of design structure (e.g. custom data types, type classes, etc.).  

Ultimately, you will be dealing with types in Haskell however, no matter what might happen at the small scales of prototyping and trivially small projects and experiments.  So you will certainly be thinking about types you're consuming and probably types you're producing.  If you consider this an overhead, then you certainly have to place it into the context of the cost:benefits of contracts in general.  As projects get larger, the nature of where the costs and benefits actually are changes.  With more actors and separation (of code and people) there are more interstitial elements.  Are these challenges going to be managed purely socially, or are they going to be at least partly formalized and checkable by a computer?

Given that the type system is one of the more unique things about Haskell, at least as differentiated from most other languages people will be using in commercial software development, I suspect that this is where most of the "cognitive overhead" lies.  Gaining intuition about functional composition is a close second, followed by learning common libraries (which isn't a conceptual issue and is a common problem when converting to any new language).

In very practical terms, the most stumbling blocks I've seen with newbies are:
  • Understanding, in terms of type expressions, how functions work and what application does to a type. 
    That includes seeing a function type as a superposition of possible results (other functions).  Most imperative developers don't naturally understand application and partial application.  They tend to think of a function as having to consume all its formal arguments and producing a simple value. 
  • Interpreting type expressions and trusting them
    You can trust Haskell's type expressions in a much deeper way than you would an OO interface or method signature.  Imperative programmers tend to take a long while before they start trusting that type expressions are really awesome and that's the path to seeing Haskell's type system as your friend.
  • Understanding 'dispatch' on types, including the return type. 
    Functions like 'read' and 'return' can be completely mysterious to most OO programmers
  • Unnecessary confusion about monads.
    Too much fuss is made about monads.  They are called out as some strange new thing, whereas they are just a pattern in types (a wrapper around some value) with a way to produce new wrapped values from other ones. 
  • The relationship between 1st order and higher-order patterns is something that's probably net new to an imperative programmer and Haskell doesn't help with its current naming schemes.  I've always thought that "fmap" should just be called "map".  If you were going to qualify anything, the should be "lmap".  History is what it is I suppose.   This also touches on the mathematical nomenclature issue.  To most people, "Functor" is an awfully scary name that suggest a hard-to-grasp new concept.  When people actually have it explained (or learn to read type expressions as they should) it's just such a no-brainer!

So far, we haven't considered features of Haskell that set it apart as having real benefits compared to most alternatives.
Amongst these are its fitness for concurrency and the value that laziness has in separating producers and consumers.

Concurrency (and to a lesser extent even wider-area distributed evaluation) is such a big and growing problem that every significant language and application framework vendor is scrabbling to find answers to offer to their developer communities.  Apple has Grand Central Dispatch, a framework for scheduling closures on serial queues and one system managed parallel queue.  Microsoft has its async library and C# language extensions for continuation passing and futures/promises.  

Haskell gets so many of the fundamentals right here, in a way that is very hard to retrofit to other languages.  It is difficult to see how it won’t continue to be held in ever higher regard as its fitter DNA continues to make it differentiated in solving the hardest problems.   Moreover, it achieves its benefits quietly and cleanly through not having sold its soul to the devil, even when it was by no means certain how some of the ‘simpler’ issues such as side-effects could ever be accommodated satisfactorily.  Meanwhile the popular imperative languages like Java, C#, C++ and Python are adopting features so rapidly that they are becoming a multi-paradigm mess.  It will be interesting to watch them continue to evolve, but also to see how the folklore develops in terms of how to wield their many features appropriately.   The naturalness of expressing various forms of concurrency in Haskell is of course a different thing to automatic concurrency (still no more than a pipe-dream), yet with Haskell you can freely integrate concurrency of various types with libraries without distorting any of the general idioms of the language or forcing heavy-weight design patterns.  Still, concurrency is unlikely to be an issue that a newbie would find essential to learn anyway, so while Haskell has many advantages here, this is also no an ‘overhead’ thing for learning the language.

In Haskell, lazy evaluation allows generators to be described at any time.  If you have a function that produces a thing, it's trivial to create a generator for a stream of those things.  This generator will be as easily consumable as any data structure, e.g. a list of things.  You don't have to create a specific interface e.g. "getNextThing" or worry about the states of enumerators and iterators.  This is a very powerful feature in a language that is essentially about the transformation of values and it is also representationally powerful in that it allows a fully algebraic style in the code.  The benefits are laziness then are that it defers details of usage and has the power to conserves generality and reuse.  Laziness does present some challenges for newbies in several ways.  First, developers are not generally conditioned into thinking about whether processing occurs ‘in the provider’ or ‘in the consumer’.  This is often unimportant, but at scale it can become important for efficiency reasons (e.g. generating lots of suspensions/lazy-thunks in memory).  One of the first places that this is likely to be exposed is in discussion around the family of fold functions and which fold is correct in what circumstance.  A second difficulty arises with the subject of infinite data structures, quintessentially lists.  With nothing similar in OO/procedural languages, Haskell can seem a little mysterious.  One of the biggest challenges of laziness-by-default however is the differences it causes in terms of reasoning about time and space.  This is not likely to be an issue that newbies will encounter when they are just in their pure learning phase, but later as they tackle real projects with some scale they are likely to bump into performance issues that will require some additional knowledge of these subtleties.  In this sense, laziness adds some cognitive overhead in the ‘medium term’. 

In summary, I think it's probably true that Haskell has more overheads compared to something like Python, if you're just tackling small projects, unless these overheads have reduced to close to zero due to accrued experience of Haskell and its libraries.  Haskell arguably has features that scale much better for large projects/teams however.  It will be interesting to hear more reports from groups using Haskell at ‘industrial scale’ in the coming years, assuming the language continues to accrete followers.  With the continued and growing challenges in software engineering arising from scale, parallelism and distribution of data and processes, it will also be interesting to see if Haskell is indeed “doomed to succeed”, or whether the current popular engineering languages can evolve to be ‘good enough’ - and therefore maintain their ascendency.  I really hope that a handful of little impediments to using Haskell more broadly in general engineering will get addressed.  Improved tooling is a major theme here, including better Haddock, fixing cabal’s DLL-hell issues, and a full-featured and stable IDE.  Removing more of these unnecessary hurdles still leaves a real learning exercise and some real conceptual displacement for OO developers, but as the benefits become more evident then the motivation should override the learning curve issue.