Everyone seems to eventually do an "introduction to monads" in their Haskell ramblings. FWIW, here's my take on it.
A good way to grok monads is to look at the history of the advent of monads in programming - specifically, why and how they were adopted into Haskell.
Haskell is a pure functional language. That means it is fundamentally about using mathematical functions to build up computation. Mathematical (pure) functions cannot generate results that are independent in any way of their formal arguments. Therefore there can be no side-effects, no referencing of state outside of the function and a function that is passed the same argument values on several invocations must always return the same results.
Pure functions as a basis for programming have huge advantages (though we won't go into these here), but one big disadvantage is this: the world changes around the program and these changes appear to the program as values that change spontaneously. Therefore if you want to "readCurrentTemperature" or "getLineOfTextFromConsole" or "readMousePosition", then you would apparently need a function that returned values independent of any inputs it was given. Taking the console reading example, in Haskell's type language, it would seem that you would want a function:
getLine :: String -- a function that when called, gets a typed line of text from the console as a String value
It would seem that no arguments would appear to be needed to simply receive text from the console and you would expect to receive the text that user typed on the line.
It's easy to see that in a pure functional world, this is a big problem. A function with no arguments is a constant - it can return a value sure, but that value cannot change because there are no arguments that could be used to affect the computation of the result. If we allowed this function to return different results on each call, then it would no longer be a pure function. This is anathema in Haskell because the impurity would propagate through the whole program and suddenly Haskell would lose all the advantages it gets from guaranteeing that everything in the program definition is pure.
Another example of this is of course writing to the console. Here, we might expect to provide a String, the text, which will get written out to stdout or somewhere. There's nothing particularly interesting that this function might return (other than some success/failure flag, but it's reasonable to assume success in this case). In Haskell, such a function might be typed:
putStr :: String -> ()
This type expression says that putStr is a function that takes a String and returns a ()… which is called 'unit' and is a value that can carry no useful information. It's fine to think of unit being the same thing as "void" in the C languages. In other words, once the text has been written out, there's nothing useful to return from the function.
On the face of it this function isn't so bad as getLine… you could argue that it just happens that all text maps to the same (unit) value - so in isolation it isn't as intrinsically evil as getLine. Nevertheless it is required to perform a side-effect, and this is completely invisible from the type as we currently have it. Worse though, is the fact that Haskell reserves the right to optimise away some/any calls to putStr because it knows that it must result in a type that has exactly one value (unit) - i.e. is essentially a function that ignores its input and always returns a constant value.
Aside from the obvious problem with getLine in isolation, another issue comes with combining actions with side-effects like getLine and putStr. If we want to read a line of text from the console and then output it to the console, we might expect to combine these two actions together into an 'echo' like this:
echo = putStr (getLine) -- get a line of text and then output the string that is obtained
which would then have the type:
echo :: ()
This type says absolutely nothing about the actions that are implied by its definition. It just says that it results in no useful information!
Moreover, there's a deeper problem. One of the important things that Haskell is able to do because of its insistence on purity is to reorder function evaluation, as well as sharing and remembering any constant values in the program. There is nothing in the echo definition that would tell Haskell to always (re)run getLine before running the putStr action (especially if getLine had already been run once and produced a string that looks like it is the only constant output possible from that function!).
By all accounts Haskell languished with the incompatibility of stateful, action-based aspects of programs with its essential, existential requirement for purity for quite a long time in its early development. Eventually however, Philip Wadler (after reading some discourse on monads in a paper by Eugenio Moggi) made a connection between a mathematical object called a "monad" and the problem of encapsulating stateful effects in Haskell. So what is this "monad" thing and why did it appear to solve that problem?
A "monad" (and many would say that it's a shame that we have to use the mathematical name) is simply a object that wraps a value and that allows functions that operate on that inner value to be composed together - BUT those functions must always put the result back into the same kind of wrapper. This is what creates the encapsulation because:
• There's an outer shell around the value, with a distinct type
• You can only get at the inner value by using a monadic combinator to convert a function that operates on the inner value into one that operates on the whole monad (and must return a monadic value)
• It's possible to completely deny raw access to the inner value by refusing to create a function that simply unwraps the value
• It's possible for a given monad to build sequencing of evaluation into the definition of that monad's composition function. This means you can force the order in which IO operations must be performed, for instance.
Any monad has two main defining functions - these are the functions that any type must declare for itself in order to be a monad:
- return - a function to wrap an inner value with the monad type 'wrapper'. This has the type: a -> m a. You could also say that return lifts a value into the monad.
- bind - a function to compose some transformation on the inner value with access and re-wrapping in a monadic type. This has the type: m a -> (a -> m b) -> m b
With these two, you can define a monadic type and provide the means to make new monadic values of this type (wrapping other values 'inside') and the ability to apply functions to the inner value - but only so that the result is another monadic value of the same type.
What does it mean to have in 'inner' value and a wrapper? Well this happens all the time. Consider Haskell's Maybe type for instance. This is Haskell's type for adding a type-safe null value to any other type. The general type is Maybe a, so for a String value, this would be Maybe String. The inner type (payload) here then would be a String, while the wrapper is the Maybe. The outer type provides some extra functionality - in this case Maybe adds the one extra value "Nothing" that the value might represent if it doesn't actually hold a value of type String. The Maybe type predates monads in Haskell, but it illustrates the structural wrapping of a value in a simple way.
So, back to the the initial motivating problem of IO operations...
Wadler realised that these features of a monad were completely appropriate and transformative for the "IO problem".
All IO (reading, writing, input, output, etc.) can be encapsulated by the IO monad. This monad's values describe IO actions that can be performed at some time in the future. These IO actions can be composed together with the IO monad's bind operator in order to create more complex IO actions from more basic ones. However, the result is always an IO action value. Not only does this mean that any code that performs IO is 'honest' about this in its type, but it also keeps all IO bound up inside the IO monad and clearly separated from any pure code. The IO value itself remains completely pure. Furthermore, the IO monad forces IO operations to be performed in the strict sequence that is intended.
So, looking at the problematic getLine and putStr functions, getLine would now have the type:
getLine :: IO String -- a function whose return value represents an IO action that will produce a String
and putStr will have the type:
putStr :: String -> IO () - a function that creates an IO action (returning nothing of interest) when provided with a String to output
How do you combine these together to make our little echo program?
Well, now you can use the IO monad's bind function. Recollect this has the type m a -> (a -> m b) -> m b
If the first argument of bind is getLine (with type IO String) the "m" will be IO and the "a" will be Strring. This specializes the bind to be: IO String -> (String -> IO b) -> IO b
The second argument fits the type signature of putStr, making the "b" into (). So we can write the echo program as:
echo = bind getLine putStr
… or with Haskell's operator form of bind (>>=) we can write:
echo = getLine >>= putStr
… where bind has been specialized to the type: IO String -> (String -> IO ()) -> IO ()
… and the type of echo (which is the result of bind, the last component of the type above) is therefore just IO ()
We are free to continue binding more IO operations onto echo to chain together even more IO operations. For instance:
twoEchos = echo >> echo
will just perform one echo operation (getting a line and writing it out) and then another. By the way >> is like >>= (bind) except that it doesn't use a result from the first monadic value in the production of the resulting monadic value. That's needed here because echo doesn't take an argument and only produces an inner value of (), which we don't need for anything as it represents 'void'.
So, if we create a big conglomerate IO action by combining lots of smaller actions, when does the IO actually get done? Well, the performing of the IO only starts when the Haskell program is run. The top level of a Haskell program is its "main" function which is typed as "IO ()" - so the entry point of a Haskell program is an IO action itself. Haskell will start performing the contents of any IO actions that are composed up to the main function and the specific definition of the bind operator defined for the IO monad will ensure that the actions are executed precisely in the order that was intended by the program.
The IO monad was a huge success for separating IO operations in Haskell from pure code, but other monads are also used to separate stateful effects and/or enforce sequential execution. For example the State monad allows arbitrary state to be collected and maintained by a program, the Writer monad allows log-like actions to be emitted by code at various points, the STM monad defines transactions and the MonadPlus monad describes branching computations that generate values.
As a programming idiom, monads not only offer an encapsulation of values, but also a way to create embedded domain languages right in Haskell. Any functions that can operate on values inside a specific monad must declare the monad they operate over in their types and only functions 'in' the same monad will compose directly together. Of course, the inner value can remain polymorphic so monads can work like an interface.
Monads are so successful and prevalent in Haskell that the language has grown special syntax to make monadic composition (e.g. bind/>>= and anonymousBind/>>) more convenient. This is the so-called "do notation" and consists of the keyword "do" followed by a block of 'actions' that will be combined with bind or anonymousBind. The echo function could be written this way with do notation:
echo = do
lineText <- getLine
The "do notation" is sometimes called Haskell's embedded imperative language, as it resembles the 'top to bottom' style of programming of other languages.
So, that's my quick introduction to monads. For some reason monad 'introductions' are often seem very convoluted. Some people want to talk about the category theory from where the concept was derived and imported into Haskell, but that's an irrelevancy for most people. Most people just need to know that it's an encapsulation that preserves the outer wrapping type and allows operations on the inner value to be chained together. Put like that, you are quite likely to have seen monads as a pattern before used for different purposes in different languages… you just didn't know there was an awkward mathematical name for them :-)
Nevertheless, the real issues arguably lie with the problems of shared mutable state in software and the features that languages may or may not have to limit side-effects and handle change in a rational way. Concurrency and distribution of computing are both looming issues that we're struggling to deal with, aside from creating more options that developers must learn to wield effectively. In transitioning common languages from a state of being extremely unfit for these new challenges to at least offering some solutions, we are possibly creating quite a mess of patterns, idioms, libraries and folklore which we expect teams of developers to apply with necessary correctness and consistency.
While Haskell may not be the answer to all of this in the end, it does have some fundamental properties that address these issues head on. Indeed purity and laziness (with purity being the most important) look increasingly essential when code has to be analysed, decomposed, transmitted and evaluated in multiple cores or locations; especially as the complexity of sequence and coordination has to go somewhere. Patterns like monads were essential in Haskell to make the language pragmatic when it already had some of the fundamentals right, whereas we see other languages adopting feature after feature (e.g. lambdas/closures, monads, actors etc) as cool ideas when they still haven't put their houses in order in other more fundamental ways. In other words, monads (or a similar thing like arrows) are essential to Haskell, whereas they are merely perhaps just a fun pattern and some 'good practice' in other languages.
One reason that's cited as being why the world doesn't just adopt Haskell, given that it seems to otherwise be such a good fit for the emerging new challenges, is that it has some cognitive overhead. It's fascinating, to me at least, to ponder whether this overhead comes from training away from Haskell's concept space when developers are learning the ropes, or whether there are some essential features that are just genuinely harder to grok (when you strip away the boring issues of positioning and nomenclature) than using the current popular stock of languages. I may ponder on that some more in a future blog post.