variable assignment in haskell

Haskell for all

Sunday, july 16, 2017, demystifying haskell assignment.

This post clarifies the distinction between <- and = in Haskell, which sometimes mystifies newcomers to the language. For example, consider the following contrived code snippet:

The above program reads one line of input, and then prints that line twice with an exclamation mark at the end, like this:

Why does the first line use the <- symbol to assign a value to input while the second line uses the = symbol to define output ? Most languages use only one symbol to assign values (such as = or := ), so why does Haskell use two?

Haskell bucks the trend because the = symbol does not mean assignment and instead means something stronger than in most programming languages. Whenever you see an equality sign in a Haskell program that means that the two sides are truly equal. You can substitute either side of the equality for the other side and this substitution works in both directions.

For example, we define output to be equal (i.e. synonymous) with the expression input ++ "!" in our original program. This means that anywhere we see output in our program we can replace output with input ++ "!" instead, like this:

Vice versa, anywhere we see input ++ "!" in our program we can reverse the substitution and replace the expression with output instead, like this:

The language enforces that these sorts of substitutions do not change the behavior of our program (with caveats, but this is mostly true). All three of the above programs have the same behavior because we always replace one expression with another equal expression. In Haskell, the equality symbol denotes true mathematical equality.

Once we understand equality we can better understand why Haskell uses a separate symbol for assignment: <- . For example, lets revisit this assignment in our original program:

input and getLine are not equal in any sense of the word. They don't even have the same type!

The type of input is String :

... whereas the type of getLine is IO String :

... which you can think of as "a subroutine whose return value is a String ". We can't substitute either one for the other because we would get a type error. For example, if we substitute all occurrences of input with getLine we would get an invalid program which does not type check:

However, suppose we gloss over the type error and accept values of type IO String where the program expected just a String . Even then this substitution would still be wrong because our new program appears to request user input twice:

Contrast this with our original program, which only asks for a single line of input and reuses the line twice:

We cannot substitute the left-hand side of an assignment for the right-hand side of the assignment without changing the meaning of our program. This is why Haskell uses a separate symbol for assignment, because assignment does not denote equality.

Also, getLine and input are not even morally equal. getLine is a subroutine whose result may change every time, and to equate getLine with the result of any particular run doesn't make intuitive sense. That would be like calling the Unix ls command "a list of files".

Haskell has two separate symbols for <- and = because assignment and equality are not the same thing. Haskell just happens to be the first mainstream language that supports mathematical equality, which is why the language requires this symbolic distinction.

Language support for mathematical equality unlocks another useful language feature: equational reasoning . You can use more sophisticated equalities to formally reason about the behavior of larger programs, the same way you would reason about algebraic expressions in math.

Can we think of <- as andThen operator, applied in reverse?

404 Not found

Getting started with Haskell

Install the Haskell Platform or cabal + ghc.

ghc is the official Haskell compiler.

Hello World

Put this in a file ( hello_world.hs ). Compile it with ghc hello_world.hs , and run the executable.

Interpreter for Haskell. Not quite a read-execute loop like other languages, but it's useful.

The = sign declares bindings.
Local bindings with let
Haskell will auto-insert semicolons by a layout rule.
You can bind functions.
Tokens on the line are function arguments
Associativity - use parentheses for compound expressions

Haskell is a pure functional language.

No side effects
Deterministic - same result every time it is run with an input
x = 5; x = 6 is an error, since x cannot be changed.
order-independent
This means you can divide by 0, create infinite lists... etc. so long as you're careful that those don't get evaluated.
recursive - bound symbol is in scope within its own definition.

This program will cause an infinite loop (the program "diverges"), because the variable x in main is defined in terms of itself, not in terms of the declaration x = 5 :

How can you program without mutable variables?

In C, you use mutable variables to create loops (like a for loop).
Problem : The example recursive factorial implementation in Haskell uses function calls to loop, but those function calls will create stack frames, which will cause Haskell to consume memory.
Solution : Haskell supports optimized tail recursion . Use an accumulator argument to make the factorial call tail recursive.

Guards and where clauses

Pipe (" | ") symbol introduces a guard. Guards are evaluated top to bottom
the first True guard wins.
otherwise in the Haskell system Prelude evaluates to true
Where clauses can scope over multiple guards
Convenient for binding variables to use in guards

Variable names

It's conventional in Haskell to have versions of variables and functions denoted by apostrophes ('). But David Mazieres finds that this can cause difficult to find bugs, so he suggests that you use the longer symbol name for the larger scope .

Every expression and binding has a type (it is strongly typed )

The :: operator has the lowest precedence, so you need to parenthesize.

Haskell uses function currying .

Functions are called one argument at a time.
This is equivalent to (add 2) 3
(add 2) returns a function which takes one parameter - the second parameter in adding something.
It's a good idea to declare types of top-level bindings.

Defining data types

Types start with capital letters.

Give it a name
Give it a set of constructors
Tell what other types it derives from ( deriving Show allows it to print your type, for example) Example:
But, you can have multiple constructors by declaring them with different names.
Constructors additionally don't need to take arguments
Constructors act like functions producing values of their types.
Example in slides
Some useful, parameterized types: Maybe and Either .
You can deconstruct types and bind variables within guards. Example in slides.

So common that Haskell has Lists as a predefined type with syntactic sugar. Strings are just lists of Char s.

Bullets from slides

Constructors

Two constructors: x:rest and [] .

[] is the empty list
x:rest is an infix constructor of a variable to be prepended to the head of the rest of the list.

Note on error code:

error is a function of any type that throws an exception. It is intended for progamming errors that should never occur.

Other methods

The ++ infix operator is used for concatenation of lists: list1 ++ list2

Parsing with deriving Read and reads

Unfortunately, parsing is more complicated than printing, since the string for an object may not parse correctly or may even be ambiguous.
reads parses and returns a parsed object, along with the rest of the string.

Useful tool: Hoogle

A search engine for Haskell and the Haskell libraries. David Mazieres recommends that you make this a keyword in your search bar! Haskell may have a steep learning curve, but it is relatively easy to understand what code is doing (with tools like this).

Example: counting letters

Due to thunks you don't actually have to keep an intermediate list in memory at any point in time (see example in slides)

Function composition

The . infix operator provides function composition: (f . g) x = f (g x) .
The new version doesn't name the argument, which is called point-free programming.
This allows you to apply arguments kind of like right-to-left Unix piping.

Lambda Extraction

Sometimes you want to name arguments but not the function, which you can through lambdas .
Use backslash (" \\ ") to declare a lambda.

Infix vs Prefix notation

If it starts with a lowercase letter, it's a prefix invocation
If it is surrounded by backticks ("```"), it's infix.
Example: add 1 2 == 1 \ add` 2`.
If you don't add an argument, you're creating a function that is missing an argument (which can be applied to a new "first argument"

3 Functions

3.1 lambda abstractions, 3.2 infix operators, 3.2.1 sections, 3.2.2 fixity declarations, 3.3 functions are non-strict, 3.4 "infinite" data structures, 3.5 the error function.

How do I modify a variable in Haskell?

Michael Burge

Haskell programmers seem to get by without variable mutation, which is odd if you’re used to C, Javascript, or another imperative language. It seems like you need them for loops, caches, and other state.

This article will go over many of the techniques I use for these purposes.

The Problem

Suppose you want to print an identity matrix:

In an imperative language like Perl, you’ll likely split this into 4 steps:

In Haskell, we run into trouble when we realize there’s no built-in mutating assignment operation:

The expression writeArray array i j k is supposed to be equivalent to Perl’s

Each call to sequence_ executes a list of actions. The do notation in List context is a way of stretching the list generator notation out to multiple lines: Step 2 above is equivalent to

Notice that the indices are one-based( [1..10] ) and not zero-based( [0..9] ), because I declared the boundaries of the array to be (1,1) and (10,10) here:

The code above compiles but fails to run because writeArray is not defined. There’s no way to define writeArray , because values are immutable in Haskell. We use writeArray in 2 places: When we initialize the array, and when we modify the diagonal. We can initialize the array when we declare it, so that removes the first issue:

In arr and main , the do is in List context because sequence_ and array expect a List. In printElement , the do is in IO context, which allows us to define an action that chains together multiple dependent actions.

If you run this sample, you should see a 10x10 matrix with all zeroes:

Steps 1,2, and 4 are easy. Our problem is Step 3: We have no way to define writeArray , since the array is immutable.

Mutable Arrays

If arrays are immutable, could we somehow make it mutable?

For arrays specifically, there is a mutable variant IOArray that lets you allocate, read, and write mutable arrays in IO context. It even has a writeArray function that is very similar to the one we need. Here’s how you would use it:

The line let _ = arr :: IOArray (Int,Int) Integer is a way of giving a type signature to arr in do notation.

People have written immutable, mutable, contiguous, automatically-parallelized, GPU-accelerated, and other types of arrays. It’s always a good idea to choose the right data structure when writing a program in any language.

In general, not every useful data structure automatically comes with a mutable variant. You can always write one yourself, but for the rest of this article we’ll assume the array is immutable so that you can use the techniques here more generally.

Mutable References

Data.IORef allows us to create and modify mutable garbage-collected values in IO context. This makes programming in IO context similar to a language like C# or Javascript, if a bit more verbose.

References are normally transparent in imperative languages: C++ has the concept of an lvalue and rvalue because reads and writes are not explicitly written out. Also, stack-allocated declarations look similar to assignments, so it’s not clear that allocation is happening:

Pointers make the allocation and indirect access a little more explicit. However, *x can involve either a read or write depending on the context:

Using IORef , that is equivalent to:

Like pointers, a reference to an integer is different from an integer. I’ve left commented-out type signatures above to show this.

newIORef creates a mutable reference.
readIORef reads the reference
writeIORef changes it to point to a new value. (Old values are automatically garbage-collected.)

All of these need to be in IO context, because reads and writes can’t be reordered.

modifyIORef' will read the value using a reference, apply a function to it, and write the value back. You almost always want to use the strict version modifyIORef' rather than the lazy version modifyIORef , because it has more predictable performance.

The expression arr // [(index, value)] is a new array with an updated value at index. It does not read or write to memory - modifyIORef will do the reading and writing - it simply is the new array. [(index, value)] is a list because (//) allows us to update multiple values at once, though I only update a single value each time here.

One shortcoming of this technique is that it requires your code to be in IO context: main , printElement , and writeArray all use IO . If you want to add mutable references throughout your code, you’ll have to change everything to be in IO context. If you want to actually mutate global variables, this is necessary; but see the ST monad below to avoid this if you only need mutation within a single function.

While this is the most direct way to do what we want, Haskellers usually get by without any mutation. How would that work?

The Haskell Way

We know how to create new arrays, but updating them is tricky. So one way around this is to create a new array, and have later code use it. This is what I would likely write in real code:

Recall that (//) takes a list of updates, so we can calculate all the places that we need to update and pass them in all at once. arr is the original zero matrix, while arr2 is the updated identity matrix. We make sure that printElement takes the array as a parameter, and call it with arr2 .

In subsequent sections, we’ll mostly show various ways to implement updateArray using mutation-like constructs. You can assume the rest of the code remains the same.

In Haskell, later variables with the same name “shadow” earlier variables, so we can pretend that we’ve updated the array in-place by giving the new array the same name:

updateArray in the above code is supposed to be similar to this Perl:

Another way of writing it in Haskell that avoids the deep nesting is:

The do notation above is in Identity context, which is a way of chaining together ordinary Haskell values and functions using the do syntax. The compiler should convert it to deeply-nested case statements, or something equivalent.

I needed to use do notation or case because pattern matches are non-recursive. This will produce an infinite loop, because let bindings are recursive by default; so arr will be defined in terms of itself, not the previous array.

There are two shortcomings with using shadowing as a replacement for mutation:

It’s a little verbose to repeat the variable name twice when “reading” and “writing”.
It works when there are a fixed number of updates( 10 ), but not in e.g. a while loop that repeatedly updates a variable.

Function Composition

Look back at this example from Shadowing :

Since we don’t really do anything with the variable other than feed it into the next binding, some Haskellers would remove the variables and compose the functions together:

Or(from Data.Monoid ):

This works okay when we have a chain updating 1 variable. You can extend it to more by using a tuple:

Here, we make the indices i and j part of the state, and increment them after each application of updateNext . At the end, we would get the new array, i , and j ; but we only care about the array, so getArray discards i and j .

One shortcoming of this technique is that you have to ensure that the state parameter is always last, to compose the functions. See the State monad for a way of hiding the state paremeter.

Tail-recursion

The most powerful way to write high-performance Haskell is to make your state explicit, and call an inner function written in tail-recursive style. I’ll add the variables to the previous example:

One problem with this is that we’re duplicating the same statement 10 times. It’s a little silly to have 18 lines when the Perl is only 3:

The for loop consists of 4 components:

An initial state: my $idx = 0 , and $array
A condition that reads the state: $idx < $size
Statements advancing the state: $idx++ , and $array->[$idx]->[$idx] = 1;
(Optional) A way to convert the state into a return value: $array after the loop.

Here is the most general way to convert a loop into a tail-recursive loop function:

With this, it’s mechanical to replace any simple for loops with a tail-recursive function. The only loops that can’t be converted are those that do something other than pure computation, such as write to a logfile or connect to a database. For those, see Recursive IO below.

Here’s a shorter function that I would be more likely to use in real code:

State Monad

In the Function Composition section, we saw that we can emulate mutable variables by making the state the last parameter in a sequence of functions and composing them.

The State monad is a way of automatically threading this state parameter to all code that uses it. I’ll add the variables back to the previous example:

A value of type State s a is a value of type a that implicitly reads and writes a state parameter of type s . You almost always want the strict version. Here is the above code written to use State :

The pattern match (i, arr) <- get pattern matches the state that is implicitly being passed around. put (i+1, arr') passes a new state to the next statement.

Take a moment to see how the tuple is being passed along. It’s entirely equivalent to the let binding example.

This can also be written much shorter:

You can also mix tail recursive style to emulate loops that change mutable variables:

The IORef example had the disadvantage that it required code to be in IO context to read or write references. Code in ST context also allows mutable variables, and can be embedded anywhere. Instead of an IORef , you’ll want an STRef .

The difference between ST and IO is that ST only allows references to be local to the runST call. Mutable local variables have no external side effects. They are completely safe to use, which is why we can embed them anywhere. The compiler ensures that the references never escape.

IO lets you use global variables, write files, connect to databases, and all sorts of other things.

Implicit Parameters

In Shadowing , we learned that you can sometimes simulate mutation by creating a new value with the same name, rather than updating the existing value. In State Monad , we learned that you can make this more composable by hiding the parameters and implicitly threading them along. ImplicitParams is a language extension that allows you to thread hidden parameters anywhere in your code.

Their chief advantage is that you can add them to any existing implementation code and it will still compile. You may need to update type signatures, but in well-structured code you’ll often only have a single program-specific type alias to change.

I’ve used them for global configuration before: Imagine connecting to a database on program startup to retrieve configuration information, which is accessible throughout most of the program. You usually wouldn’t use them for inner loops or computation, so I’ve changed the example for this section.

Here’s the current code:

The length arr > 10 is slowing down the calculation. Add the implicit parameter:

The important thing above is that we added the implicit parameter without needing to change any of the code in the calculation: We only changed the type signature, and set ?length once near the beginning of our program. Of course, now that it’s available, we can temporarily fix the performance problem by using it:

Recursive IO

You can mix recursive style with IORef to write code that feels like C in Haskell. Here’s the plain version:

And here’s a few variations with some helper functions:

Use when instead of having an empty else clause:

Use replicateM_ to repeat the action rather than hand-coding a recursive loop:

Use modifyIORef' rather than reading and writing:

All-in-all, it’s not the most elegant, but it gets the job done.

You’ve seen a lot of different ways to do the same thing. Hopefully some of them are new, and help you port your own imperative thinking and code over to Haskell. Send me a message on Twitter if this helped at all.

The best one-liner to create the array is probably this:

Thanks to Tome Jaguar for the concept and code used in the Mutable Arrays section!

Created at: Aug 15, 2017

Social Media

MichaelBurge_

About the Author

Michael Burge has years of experience developing complex software, including custom compilers, data stores for "big data", and machine learning models.

Greatest Hits

Injecting a Chess Engine into Amazon Redshift

Write your next Ethereum Contract in Pyramid Scheme

Implementing a NES emulator in Rust

Curiosity Killed the Mario

A brief introduction to Haskell

Haskell is:

A language developed by the programming languages research community.
Is a lazy, purely functional language (that also has imperative features such as side effects and mutable state, along with strict evaluation)
Born as an open source vehicle for programming language research
One of the youngest children of ML and Lisp
Particularly useful for programs that manipulate data structures (such as compilers and interpreters ), and for concurrent/parallel programming

Inspired by the article Introduction to OCaml , and translated from the OCaml by Don Stewart.

1 Background
2 Haskell features
3.1 Interacting with the language
4.1 Libraries
4.2 Overloading
4.3 Expressions
4.4 Local bindings
4.5 Allocation
4.7 Pattern matching
5.1 Currying
5.2 Patterns
5.3 Immutable declarations
5.4 Higher order functions
5.5 Currying
5.6 A bigger example
5.7 Proving program properties by induction
5.8 Loading source from a file
6.1 Type Declarations
6.2 Type synonyms
6.3 Polymorphic Types and Type Inference
6.4 Parametric polymorphism
6.5 Algebraic Data Types
6.6 Record Declarations
7.2 do-Notation
7.3 Mutable variables
7.4 Exceptions
7.5 Concurrency
8.1 Mutable variables
8.3 Monad transformers
9 Compilation
1990 . Haskell 1.0
1991 . Haskell 1.1
1993 . Haskell 1.2
1996 . Haskell 1.3
1997 . Haskell 1.4
1998 . Haskell 98
2000-2006 . Period of rapid language and community growth
~2007 . Haskell Prime
2009 . Haskell 2010

Implementations :

Haskell features

Has some novel features relative to Java (and C++).

Immutable variables by default (mutable state programmed via monads)
Pure by default (side effects are programmed via monads)
Lazy evaluation : results are only computed if they're required (strictness optional)
Everything is an expression
First-class functions: functions can be defined anywhere, passed as arguments, and returned as values.
Both compiled and interpreted implementations available
Full type inference -- type declarations optional
Pattern matching on data structures -- data structures are first class!

Parametric polymorphism

Bounded parametric polymorphism

These are all conceptually more advanced ideas .

Compared to similar functional languages, Haskell differs in that it has support for:

Lazy evaluation
Pure functions by default
Monadic side effects
Type classes
Syntax based on layout

The GHC Haskell compiler, in particular, provides some interesting extensions:

Generalised algebraic data types
Impredicative types system
Software transactional memory
Parallel, SMP runtime system

Read the language definition to supplement these notes. For more depth and examples see the Haskell wiki .

Interacting with the language

Haskell is both compiled and interpreted. For exploration purposes, we'll consider interacting with Haskell via the GHCi interpreter:

expressions are entered at the prompt
newline signals end of input

Here is a GHCi session, starting from a UNIX prompt.

Here the incredibly simple Haskell program let x = 3 + 4 is compiled and loaded, and available via the variable x .

We can ask the system what type it automaticaly inferred for our variable. x :: Integer means that the variable x "has type" Integer , the type of unbounded integer values.

A variable evaluates to its value.

The variable x is in scope, so we can reuse it in later expressions.

Local variables may be bound using let , which declares a new binding for a variable with local scope.

Alternatively, declarations typed in at the top level are like an open-ended let:

Notice that type inference infers the correct type for all the expressions, without us having to ever specify the type explicitly.

Basic types

There is a range of basic types, defined in the language Prelude .

For example:

These types have all the usual operations on them, in the standard libraries .

The Prelude contains the core operations on basic types. It is imported by default into every Haskell module. For example;

Learn the Prelude well. Less basic functions are found in the standard libraries . For data structures such as List , Array and finite maps .

To use functions from these modules you have to import them, or in GHCi, refer to the qualified name, for example to use the toUpper function on Chars:

In a source file, you have to import the module explicitly:

Overloading

Haskell uses typeclasses to methodically allow overloading. A typeclass describes a set of functions, and any type which provides those functions can be made an instance of that type. This avoids the syntactic redundancy of languages like OCaml.

For example, the function * is a method of the typeclass Num , as we can see from its type:

Which can be read as "* is a polymorphic function, taking two values of some type 'a', and returning a result of the same type, where the type 'a' is a member of the class Num".

This means that it will operate on any type in the Num class, of which the following types are members: Double , Float , Int , Integer . Thus:

or on integers:

The type of the arguments determines which instance of * is used. Haskell also never performs implicit coercions, all coercions must be explicit. For example, if we try to multiply two different types, then the type check against * :: Num a => a -> a -> a will fail.

To convert 5 to a Double we'd write:

Why bother -- why not allow the system to implicitly coerce types? Implicit type conversions by the system are the source of innumerable hard to find bugs in languages that support them, and makes reasoning about a program harder, since you must apply not just the language's semantics, but an extra set of coercion rules.

Note that If we leave off the type signatures however, Haskell will helpfully infer the most general type:

Expressions

In Haskell, expressions are everything. There are no pure "statements" like in Java/C++. For instance, in Haskell, if - then - else is a kind of expression, and results in a value based on the condition part.

Local bindings

In Haskell let allows local declarations to be made in the context of a single expression.

is analogous to:

in C, but the Haskell variable x is given a value that is immutable (can never change).

When you declare a new variable, Haskell automatically allocates that value for you -- no need to explicitly manage memory. The garbage collector will then collect any unreachable values once they go out of scope.

Advanced users can also manage memory by hand using the foreign function interface.

Lists are ... lists of Haskell values. Defining a new list is trivial, easier than in Java.

This automatically allocates space for the list and puts in the elements. Haskell is garbage-collected like Java so no explicit de-allocation is needed. The type of the list is inferred automatically. All elements of a list must be of the same type.

Notice how the function call concat [ "f" , "g" ] does not require parenthesis to delimit the function's arguments. Haskell uses whitespace, and not commas, and:

You don't need parentheses for function application in Haskell: sin 0.3
Multiple arguments can be passed in one at a time (curried) which means they can be separated by spaces: max 3 4 .

Lists must be uniform in their type ("homogeneous").

Here we tried to build a list containing a Char and a Boolean, but the list constructor , [] , has type:

meaning that all elements must be of the same type, a . (For those wondering how to build a list of heterogeneous values, you would use a sum type ):

List operations are numerous, as can be seen in the Data.List library .

Pattern matching

Haskell supports pattern matching on data structures. This is a powerful language feature, making code that manipulates data structures incredibly simple. The core language feature for pattern matching is the case expression:

The case forces x (the scrutinee) to match pattern h : t , a list with head and tail, and then we extract the head, h . Tail is similar, and we can use a wildcard pattern to ignore the part of the pattern we don't care about:

Tuples are fixed length structures, whose fields may be of differing types ("heterogeneous"). They are known as product types in programming language theory.

Unlike the ML family of languages, Haskell uses the same syntax for the value level as on the type level. So the type of the above tuple is:

All the data mentioned so far are immutable - it is impossible to change an entry in an existing list, tuple, or record without implicitly copying the data! Also, all variables are immutable. By default Haskell is a pure language. Side effects, such as mutation, are discussed later.

Here is a simple recursive factorial function definition.

The function name is fac , and the argument is n . This function is recursive (and there is no need to specially tag it as such, as you would in the ML family of languages).

When you apply (or invoke) the fac function, you don't need any special parenthesis around the code. Note that there is no return statement; instead, the value of the whole body-expression is implicitly what gets returned.

Functions of more than one argument may be defined:

Other important aspects of Haskell functions:

Functions can be defined anywhere in the code via lambda abstractions :

Or, identical to let f x = x + 1 :

Anonymous functions like this can be very useful. Also, functions can be passed to and returned from functions. For example, the higher order function map , applies its function argument to each element of a list (like a for-loop):

In Haskell, we can use section syntax for more concise anonymous functions:

Here map takes two arguments, the function ( ^ 2 ) :: Integer -> Integer , and a list of numbers.

Currying is a method by which function arguments may be passed one at a time to a function, rather than passing all arguments in one go in a structure:

The type of comb, Num a => a -> a -> a , can be rewritten as Num a => a -> ( a -> a ) . That is, it takes a single argument of some numeric type a , and returns a function that takes another argument of that type!

Indeed, we can give comb only one argument, in which case it returns a function that we can later use:

Mutually recursive functions may be defined in the same way as normal functions:

This example also shows a pattern match with multiple cases, either empty list or nonempty list. More on patterns now.

Patterns make function definitions much more succinct, as we just saw.

In this function definition, [] and ( x : xs ) are patterns against which the value passed to the function is matched. The match occurs on the structure of the data -- that is, on its constructors .

Lists are defined as:

That is, a list of some type a has type [ a ] , and it can be built two ways:

either the empty list, []
or an element consed onto a list, such as 1 : [] or 1 : 2 : 3 : [] .
For the special case of lists, Haskell provides the syntax sugar: [ 1 , 2 , 3 ] to build the same data.

Thus, [] matches against the empty list constructor, and ( x : xs ) , match against the cons constructor, binding variables x and xs to the head and tail components of the list.

Remember that case is the syntactic primitive for performing pattern matching (pattern matching in let bindings is sugar for case ). Also, the first successful match is taken if more than one pattern matches:

Warnings will be generated at compile time if patterns don't cover all possibilities, or contain redundant branches.

An exception will be thrown at runtime if a pattern match fails:

As we have seen, patterns may be used in function definitions. For example, this looks like a function of two arguments, but its a function of one argument which matches a pair pattern.

Immutable declarations

Immutable Declarations

Important feature of let-defined variable values in Haskell (and some other functional languages): they cannot change their value later.
Greatly helps in reasoning about programs---we know the variable's value is fixed.
Smalltalk also forces method arguments to be immutable; C++'s const and Java's final on fields has a similar effect.

Here's the one that will mess with your mind: the same thing as above but with the declarations typed into GHCi. (The GHCi environment conceptually an open-ended series of lets which never close).

Higher order functions

Haskell, like ML, makes wide use of higher-order functions: functions that either take other functions as argument or return functions as results, or both. Higher-order functions are an important component of a programmer's toolkit.

They allow "pluggable" programming by passing in and out chunks of code.
Many new programming design patterns are possible.
It greatly increases the reusability of code.
Higher-order + Polymorphic = Reusable

The classic example of a function that takes another function as argument is the map function on lists. It takes a list and a function and applies the function to every element of the list.

The lower case variables in the type declaration of map are type variables , meaning that the function is polymorphic in that argument (can take any type).

Perhaps the simplest higher-order function is the composer, in mathematics expressed as g o f . it takes two functions and returns a new function which is their composition:

This function takes three arguments: two functions, f and g , and a value, x . It then applies g to x , before applying f to the result. For example:

As we have seen before, functions are just expressions so can also be immediately applied after being defined:

Note how Haskell allows the infix function . to be used in prefix form, when wrapped in parenthesis.

Currying is an important concept of functional programming; it is named after logician Haskell Curry , after which the languages Haskell and Curry are named! Multi-argument functions as defined thus far are curried, lets look at what is really happening.

Here is a two-argument function defined in our usual manner.

Here is another completely equivalent way to define the same function:

The main observation is myadd is a function returning a function, so the way we supply two arguments is

Invoke the function, get a function back
Then invoke the returned function passing the second argument.
Our final value is returned, victory.
( myadd 3 ) 4 is an inlined version of this where the function returned by myadd 3 is not put in any variable

Here is a third equivalent way to define myadd, as an anonymous function returning another anonymous function.

With currying, all functions "really" take exactly one argument. Currying also naturally arises when functions return functions, as in the map application above showed. Multiple-argument functions should always be written in curried form; all the library functions are curried.

Note thus far we have curried only two-argument functions; in general, n-argument currying is possible. Functions can also take pairs as arguments to achieve the effect of a two-argument function:

So, either we can curry or we can pass a pair. We can also write higher-order functions to switch back and forth between the two forms.

Look at the types: these mappings in both directions in some sense "implement" the well-known isomorphism on sets: A * B -> C = A -> B -> C

A bigger example

Here is a more high-powered example of the use of currying.

Here is an analysis of this recursive function, for the arbitrary 2-element list [x1,x2], the call

reduces to (by inlining the body of fold):

which in turn reduces to:

From this we can assert that the general result returned from foldr f [ x1 , x2 , ... , xn ] y is f x1 ( f x2 f ... ( f xn y ) ... )))) . Currying allows us to specialize foldr to a particular function f, as with prod above.

Proving program properties by induction

We should in fact be able to prove this property by induction. Its easier if we reverse the numbering of the list.

Lemma . foldr f [ xn , ... , x1 ] y evaluates to f xn ( f xn - 1 f ... ( f x1 y ) ... ))) for n greater than 0.

Proof . Proceed by induction on the length of the list [ xn , .. , x1 ] .

Base Case n=1, i.e. the list is [x1]. The function reduces to f x1 ( foldr f [] y ) which reduces to f x1 y as hypothesized.

Induction Step . Assume

it matches the pattern with x being xn+1 and xs being [ xn , ... , x1 ] . Thus the recursive call is

which by our inductive assumption reduces to

And, given this result for the recursive call, the whole function then returns

which is what we needed to show. QED

The above implementation is inefficient in that f is explicitly passed to every recursive call. Here is a more efficient version with identical functionality.

This function also illustrates how functions may be defined in a local scope, using where . Observe 'go' is defined locally but then exported since it is the return value of f.

Question: How does the return value 'go' know where to look for k when its called??

summate is just go but somehow it "knows" that k is ( + ) , even though k is undefined at the top level:

go in fact knew the right k to call, so it must have been kept somewhere: in a closure . At function definition point, the current values of variables not local to the function definition are remembered in a structure called a closure. Function values in Haskell are thus really a pair consisting of the function (pointer) and the local environment, in a closure.

Without making a closure, higher-order functions will do awkward things (such as binding to whatever 'k' happens to be in scope). Java, C++, C can pass and return function (pointers), but all functions are defined at the top level so they have no closures.

Loading source from a file

You should never type large amounts of code directly into GHCi! Its impossible to fix errors. Instead, you should edit in a file. Usingg any editor, save each group of interlinked functions in a separate file, for example "A.hs". Then, from GHCi type:

This will compile everything in the file.

Haskell has the show function.

It simply returns a string representation for its arguments.

We have generally been ignoring the type system of Haskell up to now. Its time to focus on typing in more detail.

Type Declarations

Haskell infers types for you, but you can add explicit type declarations if you like.

You can in fact put type assertions on any variable in an expression to clarify what type the variable has:

Type synonyms

You can also make up your own name for any type. To do this, you must work in a separate file and load it into GHCi using the ":load A.hs" command.

Working from GHCi:

Polymorphic Types and Type Inference

Since id was not used as any type in particular, the type of the function is polymorphic ("many forms").

t is a type variable, meaning it stands for some arbitrary type.
Polymorphism is really needed with type inference -- inferring Int -> Int would not be completely general.

The form of polymorphism in Haskell is to be precise, parametric polymorphism. The type above is parametric in t : what comes out is the same type as what came in. Generics is another term for parametric polymorphism used in some communities.

Java has no parametric polymorphism, but does have object polymorphism (unfortunately this is often just called polymorphism by some writers) in that a subclass object can fit into a superclass-declared variable.
When you want parametric polymorphism in Java you declare the variable to be of type Object, but you have to cast when you get it out which requires a run-time check.
The Java JDK version 1.5 will have parametrically polymorphic types in it.

The general intuition to have about the type inference algorithm is everything starts out as having arbitrary types, t, u, etc, but then the use of functions on values generates constraints that "this thing has the same type as that thing".

Use of type-specific operators obviously restricts polymorphism:

When a function is defined via let to have polymorphic type, every use can be at a different type:

Algebraic Data Types

Algebraic data types in Haskell are the analogue of union/variant types in C/Pascal. Following in the Haskell tradition of lists and tuples, they are not mutable. Haskell data types must be declared. Here is a really simple algebraic data type declaration to get warmed up, remember to write this in a separate file, and load it in to HHCi:

Three constructors have been defined. These are now official constants. Constructors must be capitalized, and variables must be lower-case in Haskell.

So we can type check them, but can't show them yet. Let's derive the typeclass Show for our data type, which generates a 'show' function for our data type, which GHCi can then use to display the value.

The previous type is only an enumerated type. Much more interesting data types can be defined. Remember the (recursive) list type:

This form of type has several new features:

As in C/Pascal, the data types can have values and they can be recursively defined, plus,
Polymorphic data types can be defined; a here is a type argument.
Note how there is no need to use pointers in defining recursive variant types. The compiler does all that mucking around for you.
Also note how ( : ) , the constructor, can be used as a function.

We can define trees rather simply:

Patterns automatically work for new data types.

Record Declarations

Records are data types with labels on fields. They are very similar to structs of C/C++. Their types are declared just like normal data types, and can be used in pattern matches.

Imperative Programming

Haskell and OCaml differ on imperative programming: OCaml mixes pure and impure code, while Haskell separates them statically.

The expressions and functions for I/O, mutable states, and other side effects have a special type. They enjoy a distinguished status: they are I/O instructions, and the entry point of each complete program must be one of them. The following honours this distinction by using the word command for them (another popular name is action ), though they are also expressions, values, functions.

Commands have types of the form IO a , which says it takes no parameter and it gives an answer of type a . (I will carefully avoid saying it “returns” type a , since “return” is too overloaded.) There are also functions of type b -> IO a , and I will abuse terminology and call this a command needing a parameter of type b , even though the correct description should be: a function from b to commands.

The command for writing a line to Standard Output is

It outputs the string parameter, plus linebreak. And since there is no answer to give, the answer type is the most boring () .

At first, using output commands at the prompt is as easy as using expressions.

You can also write a compound command with the >> operator.

The fun begins when you also take input. The command for reading one line from Standard Input is:

Note that the type is not String or () -> String . In a purely functional language, such types promise that all calls using the same parameter yield the exactly same string. This is of course what an input command cannot promise. If you read two lines, they are very likely to be different. The type IO String does not promise to give the same string all the time. (It only promises to be the same command all the time—a rather "duh" one.) But this poses a question: how do we get at the line it reads?

A non-solution is to expect an operation stripIO :: IO a -> a . What's wrong with this strip-that-IO mentality is that it asks to convert a command, which gives different answers at different calls, into a pure function, which gives the same answer at different calls. Contradiction!

But you can ask for a weaker operation: how to pass the answer on to subsequent commands (e.g., output commands) so they can use it. A moment of thought reveals that this is all you ever need. The operator sought is

It builds a compound command from two commands, the first one of which takes no parameter and gives an answer of type a , and the second of which needs a parameter of type a . You guessed it: this operator extracts the answer from the first command and passes it to the second. Now you have some way to use the answer!

Here is the first example. Why don't we read a line and immediately display it? getLine answers a string, and putStrLn wants a string. Perfect match!

But more often you want to output something derived from the input, rather than the input itself verbatim. To do this, you customize the second command to smuggle in the derivation. The trick of anonymous functions is very useful for this:

You will also want to give derived answers, especially if you write subroutines to be called from other code. This is facilitated by the command that takes a parameter and simply gives it as the answer (it is curiously named return ):

For example here is a routine that reads a line and answers a derived string, with a sample usage:

Some programmers never use Standard Input. Reading files is more common. One command for this is:

The parameter specifies the file path. Let us read a file and print out its first 10 characters (wherever available). Of course please change the filename to refer to some file you actually possess.

Do not worry about slurping up the whole file into memory; readFile performs a magic of pay-as-you-go reading.

A while ago I showed the >> operator for compound commands without elaboration. I can now elaborate it: it merely uses >>= in a way that throws away the first answer:

do-Notation

To bring imperative code closer to imperative look, Haskell provides the do-notation , which hides the >>= operator and the anonymous functions. An example illustrates this notation well, and it should be easy to generalize:

( cmd1 , cmd2 , and cmd3 may use x as a parameter; similarly, cmd3 may use z as a parameter. At the end, between cmd3 and } , you may choose to insert or omit semicolons; similarly right after { at the beginning.)

Below we re-express examples in the previous section in the do-notation.

At the prompt it is necessary to write one-liners. In a source code file it is more common to use multiple lines, one line for one command, as per tradition. In this case, layout rules allow omitting {;} in favour of indentation. Thus, here are two valid ways of writing the same do-block in a source code file, one with {;} and the other with layout.

Mutable variables

Data.IORef Data.Array.MArray Data.Array.IO

Control.Exception

Concurrency

Control.Concurrent Control.Concurrent.MVar

Data.STRef Data.Array.MArray Data.Array.ST

The State monad

Monad transformers

Compilation.

You can easily compile your Haskell modules to standalone executables. For example, write this in a file "A.hs":

In general, main is the entry point, and you must define it to be whatever you want run. (TODO: once the monad/IO section is done, this place should also say more about main and IO.)

The compiler, on unix systems, is ghc . For example "A.hs" can be compiled and run as:

For multiple modules, use the --make flag to GHC. Example: write these two modules:

To compile and run (this will automatically look for M1.hs):

In general, one and only one file must define main . In general, for all other files, the filename must match the module name.

Haskell/do notation

Using do blocks as an alternative monad syntax was first introduced way back in the Simple input and output chapter. There, we used do to sequence input/output operations, but we hadn't introduced monads yet. Now, we can see that IO is yet another monad.

Since the following examples all involve IO , we will refer to the computations/monadic values as actions (as we did in the earlier parts of the book). Of course, do works with any monad; there is nothing specific about IO in how it works.

Translating the then operator [ edit | edit source ]

The (>>) ( then ) operator works almost identically in do notation and in unsugared code. For example, suppose we have a chain of actions like the following one:

We can rewrite that in do notation as follows:

(using the optional braces and semicolons explicitly, for clarity). This sequence of instructions nearly matches that in any imperative language. In Haskell, we can chain any actions as long as all of them are in the same monad. In the context of the IO monad, the actions include writing to a file, opening a network connection, or asking the user for an input.

Here's the step-by-step translation of do notation to unsugared Haskell code:

and so on, until the do block is empty.

Translating the bind operator [ edit | edit source ]

The bind operator (>>=) is a bit more difficult to translate to and from the do notation. (>>=) passes a value, namely the result of an action or function, downstream in the binding sequence. do notation assigns a variable name to the passed value using the <- .

The curly braces and the semicolons are optional if every line of code is indented to line up equally (NB: beware the mixing of tabs and spaces in that case; with the explicit curly braces and semicolons indentation plays no part and there's no danger).

x1 and x2 are the results of action1 and action2 . If, for instance, action1 is an IO Integer then x1 will be bound to an Integer value. The two bound values in this example are passed as arguments to mk_action3 , which creates a third action. The do block is broadly equivalent to the following vanilla Haskell snippet:

The second argument of the first (leftmost) bind operator ( >>= ) is a function (lambda expression) specifying what to do with the result of the action passed as the bind's first argument. Thus, chains of lambdas pass the results downstream. The parentheses could be omitted, because a lambda expression extends as far as possible. x1 is still in scope at the point we call the final action maker mk_action3 . We can rewrite the chain of lambdas more legibly by using separate lines and indentation:

That shows the scope of each lambda function clearly. To group things more like the do notation, we could show it like this:

These presentation differences are only a matter of assisting readability. [1]

The fail method [ edit | edit source ]

Above, we said the snippet with lambdas was "broadly equivalent" to the do block. The translation is not exact because the do notation adds special handling of pattern match failures. When placed at the left of either <- or -> , x1 and x2 are patterns being matched. Therefore, if action1 returned a Maybe Integer we could write a do block like this...

...and x1 be an Integer . In such a case, what happens if action1 returns Nothing ? Ordinarily, the program would crash with an non-exhaustive patterns error, just like the one we get when calling head on an empty list. With do notation, however, failures are handled with the fail method for the relevant monad. The do block above translates to:

What fail actually does depends on the monad instance. Though it will often rethrow the pattern matching error, monads that incorporate some sort of error handling may deal with the failure in their own specific ways. For instance, Maybe has fail _ = Nothing ; analogously, for the list monad fail _ = [] . [2]

The fail method is an artifact of do notation. Rather than calling fail directly, you should rely on automatic handling of pattern match failures whenever you are sure that fail will do something sensible for the monad you are using.

Example: user-interactive program [ edit | edit source ]

We are going to interact with the user, so we will use putStr and getLine alternately. To avoid unexpected results in the output, we must disable output buffering when importing System.IO . To do this, put hSetBuffering stdout NoBuffering at the top of your do block. To handle this otherwise, you would explicitly flush the output buffer before each interaction with the user (namely a getLine ) using hFlush stdout . If you are testing this code with ghci, you don't have such problems.

Consider this simple program that asks the user for their first and last names:

A possible translation into vanilla monadic code:

In cases like this, where we just want to chain several actions, the imperative style of do notation feels natural and convenient. In comparison, monadic code with explicit binds and lambdas is something of an acquired taste.

Notice that the first example above includes a let statement in the do block. The de-sugared version is simply a regular let expression where the in part is whatever follows from the do syntax.

Returning values [ edit | edit source ]

The last statement in do notation is the overall result of the do block. In the previous example, the result was of the type IO () , i.e. an empty value in the IO monad.

Suppose that we want to rewrite the example but return an IO String with the acquired name. All we need to do is add a return :

This example will "return" the full name as a string inside the IO monad, which can then be utilized downstream elsewhere:

Here, nameReturn will be run and the returned result (called "full" in the nameReturn function) will be assigned to the variable "name" in our new function. The greeting part of nameReturn will be printed to the screen because that is part of the calculation process. Then, the additional "see you" message will print as well, and the final returned value is back to being IO () .

If you know imperative languages like C, you might think return in Haskell matches return elsewhere. A small variation on the example will dispel that impression:

The string in the extra line will be printed out because return is not a final statement interrupting the flow (as it would be in C and other languages). Indeed, the type of nameReturnAndCarryOn is IO () , — the type of the final putStrLn action. After the function is called, the IO String created by the return full will disappear without a trace.

Just sugar [ edit | edit source ]

As a syntactical convenience, do notation does not add anything essential, but it is often preferable for clarity and style. However, do is not needed for a single action, at all. The Haskell "Hello world" is simply:

Snippets like this one are totally redundant:

Thanks to the monad laws , we can write it simply as

A subtle but crucial point relates to function composition: As we already know, the greetAndSeeYou action in the section just above could be rewritten as:

While you might find the lambda a little unsightly, suppose we had a printSeeYou function defined elsewhere:

Now, we can have a clean function definition with neither lambdas or do :

Or, if we have a non-monadic seeYou function:

Then we can write:

Keep this last example with fmap in mind; we will soon return to using non-monadic functions in monadic code, and fmap will be useful there.

Of course, we could use even more indentation if we wanted. Here's an extreme example:

While that indention is certainly overkill, it could be worse:

That is valid Haskell but is baffling to read; so please don't ever write like that. Write your code with consistent and meaningful groupings.

↑ This explains why, as we pointed out in the "Pattern matching" chapter , pattern matching failures in list comprehensions are silently ignored.

Book:Haskell

Input and Output

We've mentioned that Haskell is a purely functional language. Whereas in imperative languages you usually get things done by giving the computer a series of steps to execute, functional programming is more of defining what stuff is. In Haskell, a function can't change some state, like changing the contents of a variable (when a function changes state, we say that the function has side-effects ). The only thing a function can do in Haskell is give us back some result based on the parameters we gave it. If a function is called two times with the same parameters, it has to return the same result. While this may seem a bit limiting when you're coming from an imperative world, we've seen that it's actually really cool. In an imperative language, you have no guarantee that a simple function that should just crunch some numbers won't burn down your house, kidnap your dog and scratch your car with a potato while crunching those numbers. For instance, when we were making a binary search tree, we didn't insert an element into a tree by modifying some tree in place. Our function for inserting into a binary search tree actually returned a new tree, because it can't change the old one.

While functions being unable to change state is good because it helps us reason about our programs, there's one problem with that. If a function can't change anything in the world, how is it supposed to tell us what it calculated? In order to tell us what it calculated, it has to change the state of an output device (usually the state of the screen), which then emits photons that travel to our brain and change the state of our mind, man.

Do not despair, all is not lost. It turns out that Haskell actually has a really clever system for dealing with functions that have side-effects that neatly separates the part of our program that is pure and the part of our program that is impure, which does all the dirty work like talking to the keyboard and the screen. With those two parts separated, we can still reason about our pure program and take advantage of all the things that purity offers, like laziness, robustness and modularity while efficiently communicating with the outside world.

Hello, world!

Up until now, we've always loaded our functions into GHCI to test them out and play with them. We've also explored the standard library functions that way. But now, after eight or so chapters, we're finally going to write our first real Haskell program! Yay! And sure enough, we're going to do the good old "hello, world" schtick.

So, for starters, punch in the following in your favorite text editor:

We just defined a name called main and in it we call a function called putStrLn with the parameter "hello, world" . Looks pretty much run of the mill, but it isn't, as we'll see in just a few moments. Save that file as helloworld.hs .

And now, we're going to do something we've never done before. We're actually going to compile our program! I'm so excited! Open up your terminal and navigate to the directory where helloworld.hs is located and do the following:

Okay! With any luck, you got something like this and now you can run your program by doing ./helloworld .

And there we go, our first compiled program that printed out something to the terminal. How extraordinarily boring!

Let's examine what we wrote. First, let's look at the type of the function putStrLn .

We can read the type of putStrLn like this: putStrLn takes a string and returns an I/O action that has a result type of () (i.e. the empty tuple, also know as unit). An I/O action is something that, when performed, will carry out an action with a side-effect (that's usually either reading from the input or printing stuff to the screen) and will also contain some kind of return value inside it. Printing a string to the terminal doesn't really have any kind of meaningful return value, so a dummy value of () is used.

So, when will an I/O action be performed? Well, this is where main comes in. An I/O action will be performed when we give it a name of main and then run our program.

Having your whole program be just one I/O action seems kind of limiting. That's why we can use do syntax to glue together several I/O actions into one. Take a look at the following example:

Ah, interesting, new syntax! And this reads pretty much like an imperative program. If you compile it and try it out, it will probably behave just like you expect it to. Notice that we said do and then we laid out a series of steps, like we would in an imperative program. Each of these steps is an I/O action. By putting them together with do syntax, we glued them into one I/O action. The action that we got has a type of IO () , because that's the type of the last I/O action inside.

Because of that, main always has a type signature of main :: IO something , where something is some concrete type. By convention, we don't usually specify a type declaration for main .

An interesting thing that we haven't met before is the third line, which states name <- getLine . It looks like it reads a line from the input and stores it into a variable called name . Does it really? Well, let's examine the type of getLine .

Aha, o-kay. getLine is an I/O action that contains a result type of String . That makes sense, because it will wait for the user to input something at the terminal and then that something will be represented as a string. So what's up with name <- getLine then? You can read that piece of code like this: perform the I/O action getLine and then bind its result value to name . getLine has a type of IO String , so name will have a type of String . You can think of an I/O action as a box with little feet that will go out into the real world and do something there (like write some graffiti on a wall) and maybe bring back some data. Once it's fetched that data for you, the only way to open the box and get the data inside it is to use the <- construct. And if we're taking data out of an I/O action, we can only take it out when we're inside another I/O action. This is how Haskell manages to neatly separate the pure and impure parts of our code. getLine is in a sense impure because its result value is not guaranteed to be the same when performed twice. That's why it's sort of tainted with the IO type constructor and we can only get that data out in I/O code. And because I/O code is tainted too, any computation that depends on tainted I/O data will have a tainted result.

When I say tainted , I don't mean tainted in such a way that we can never use the result contained in an I/O action ever again in pure code. No, we temporarily un-taint the data inside an I/O action when we bind it to a name. When we do name <- getLine , name is just a normal string, because it represents what's inside the box. We can have a really complicated function that, say, takes your name (a normal string) as a parameter and tells you your fortune and your whole life's future based on your name. We can do this:

and tellFortune (or any of the functions it passes name to) doesn't have to know anything about I/O, it's just a normal String -> String function!

Take a look at this piece of code. Is it valid?

If you said no, go eat a cookie. If you said yes, drink a bowl of molten lava. Just kidding, don't! The reason that this doesn't work is that ++ requires both its parameters to be lists over the same type. The left parameter has a type of String (or [Char] if you will), whilst getLine has a type of IO String . You can't concatenate a string and an I/O action. We first have to get the result out of the I/O action to get a value of type String and the only way to do that is to say something like name <- getLine inside some other I/O action. If we want to deal with impure data, we have to do it in an impure environment. So the taint of impurity spreads around much like the undead scourge and it's in our best interest to keep the I/O parts of our code as small as possible.

Every I/O action that gets performed has a result encapsulated within it. That's why our previous example program could also have been written like this:

However, foo would just have a value of () , so doing that would be kind of moot. Notice that we didn't bind the last putStrLn to anything. That's because in a do block, the last action cannot be bound to a name like the first two were. We'll see exactly why that is so a bit later when we venture off into the world of monads. For now, you can think of it in the way that the do block automatically extracts the value from the last action and binds it to its own result.

Except for the last line, every line in a do block that doesn't bind can also be written with a bind. So putStrLn "BLAH" can be written as _ <- putStrLn "BLAH" . But that's useless, so we leave out the <- for I/O actions that don't contain an important result, like putStrLn something .

Beginners sometimes think that doing

will read from the input and then bind the value of that to name . Well, it won't, all this does is give the getLine I/O action a different name called, well, name . Remember, to get the value out of an I/O action, you have to perform it inside another I/O action by binding it to a name with <- .

I/O actions will only be performed when they are given a name of main or when they're inside a bigger I/O action that we composed with a do block. We can also use a do block to glue together a few I/O actions and then we can use that I/O action in another do block and so on. Either way, they'll be performed only if they eventually fall into main .

Oh, right, there's also one more case when I/O actions will be performed. When we type out an I/O action in GHCI and press return, it will be performed.

Even when we just punch out a number or call a function in GHCI and press return, it will evaluate it (as much as it needs) and then call show on it and then it will print that string to the terminal using putStrLn implicitly.

Remember let bindings? If you don't, refresh your memory on them by reading this section . They have to be in the form of let bindings in expression , where bindings are names to be given to expressions and expression is the expression that is to be evaluated that sees them. We also said that in list comprehensions, the in part isn't needed. Well, you can use them in do blocks pretty much like you use them in list comprehensions. Check this out:

See how the I/O actions in the do block are lined up? Also notice how the let is lined up with the I/O actions and the names of the let are lined up with each other? That's good practice, because indentation is important in Haskell. Now, we did map toUpper firstName , which turns something like "John" into a much cooler string like "JOHN" . We bound that uppercased string to a name and then used it in a string later on that we printed to the terminal.

You may be wondering when to use <- and when to use let bindings? Well, remember, <- is (for now) for performing I/O actions and binding their results to names. map toUpper firstName , however, isn't an I/O action. It's a pure expression in Haskell. So use <- when you want to bind results of I/O actions to names and you can use let bindings to bind pure expressions to names. Had we done something like let firstName = getLine , we would have just called the getLine I/O action a different name and we'd still have to run it through a <- to perform it.

Now we're going to make a program that continuously reads a line and prints out the same line with the words reversed. The program's execution will stop when we input a blank line. This is the program:

To get a feel of what it does, you can run it before we go over the code.

First, let's take a look at the reverseWords function. It's just a normal function that takes a string like "hey there man" and then calls words with it to produce a list of words like ["hey","there","man"] . Then we map reverse on the list, getting ["yeh","ereht","nam"] and then we put that back into one string by using unwords and the final result is "yeh ereht nam" . See how we used function composition here. Without function composition, we'd have to write something like reverseWords st = unwords (map reverse (words st)) .

Let's first take a look at what happens under the else clause. Because, we have to have exactly one I/O action after the else , we use a do block to glue together two I/O actions into one. You could also write that part out as:

This makes it more explicit that the do block can be viewed as one I/O action, but it's uglier. Anyway, inside the do block, we call reverseWords on the line that we got from getLine and then print that out to the terminal. After that, we just perform main . It's called recursively and that's okay, because main is itself an I/O action. So in a sense, we go back to the start of the program.

Now what happens when null line holds true? What's after the then is performed in that case. If we look up, we'll see that it says then return () . If you've done imperative languages like C, Java or Python, you're probably thinking that you know what this return does and chances are you've already skipped this really long paragraph. Well, here's the thing: the return in Haskell is really nothing like the return in most other languages! It has the same name, which confuses a lot of people, but in reality it's quite different. In imperative languages, return usually ends the execution of a method or subroutine and makes it report some sort of value to whoever called it. In Haskell (in I/O actions specifically), it makes an I/O action out of a pure value. If you think about the box analogy from before, it takes a value and wraps it up in a box. The resulting I/O action doesn't actually do anything, it just has that value encapsulated as its result. So in an I/O context, return "haha" will have a type of IO String . What's the point of just transforming a pure value into an I/O action that doesn't do anything? Why taint our program with IO more than it has to be? Well, we needed some I/O action to carry out in the case of an empty input line. That's why we just made a bogus I/O action that doesn't do anything by writing return () .

Using return doesn't cause the I/O do block to end in execution or anything like that. For instance, this program will quite happily carry out all the way to the last line:

All these return s do is that they make I/O actions that don't really do anything except have an encapsulated result and that result is thrown away because it isn't bound to a name. We can use return in combination with <- to bind stuff to names.

So you see, return is sort of the opposite to <- . While return takes a value and wraps it up in a box, <- takes a box (and performs it) and takes the value out of it, binding it to a name. But doing this is kind of redundant, especially since you can use let bindings in do blocks to bind to names, like so:

When dealing with I/O do blocks, we mostly use return either because we need to create an I/O action that doesn't do anything or because we don't want the I/O action that's made up from a do block to have the result value of its last action, but we want it to have a different result value, so we use return to make an I/O action that always has our desired result contained and we put it at the end.

Before we move on to files, let's take a look at some functions that are useful when dealing with I/O.

putStr is much like putStrLn in that it takes a string as a parameter and returns an I/O action that will print that string to the terminal, only putStr doesn't jump into a new line after printing out the string while putStrLn does.

Its type signature is putStr :: String -> IO () , so the result encapsulated within the resulting I/O action is the unit. A dud value, so it doesn't make sense to bind it.

putChar takes a character and returns an I/O action that will print it out to the terminal.

putStr is actually defined recursively with the help of putChar . The edge condition of putStr is the empty string, so if we're printing an empty string, just return an I/O action that does nothing by using return () . If it's not empty, then print the first character of the string by doing putChar and then print of them using putStr .

See how we can use recursion in I/O, just like we can use it in pure code. Just like in pure code, we define the edge case and then think what the result actually is. It's an action that first outputs the first character and then outputs the rest of the string.

print takes a value of any type that's an instance of Show (meaning that we know how to represent it as a string), calls show with that value to stringify it and then outputs that string to the terminal. Basically, it's just putStrLn . show . It first runs show on a value and then feeds that to putStrLn , which returns an I/O action that will print out our value.

As you can see, it's a very handy function. Remember how we talked about how I/O actions are performed only when they fall into main or when we try to evaluate them in the GHCI prompt? When we type out a value (like 3 or [1,2,3] ) and press the return key, GHCI actually uses print on that value to display it on our terminal!

When we want to print out strings, we usually use putStrLn because we don't want the quotes around them, but for printing out values of other types to the terminal, print is used the most.

getChar is an I/O action that reads a character from the input. Thus, its type signature is getChar :: IO Char , because the result contained within the I/O action is a Char . Note that due to buffering, reading of the characters won't actually happen until the user mashes the return key.

This program looks like it should read a character and then check if it's a space. If it is, halt execution and if it isn't, print it to the terminal and then do the same thing all over again. Well, it kind of does, only not in the way you might expect. Check this out:

The second line is the input. We input hello sir and then press return. Due to buffering, the execution of the program will begin only when after we've hit return and not after every inputted character. But once we press return, it acts on what we've been putting in so far. Try playing with this program to get a feel for it!

The when function is found in Control.Monad (to get access to it, do import Control.Monad ). It's interesting because in a do block it looks like a control flow statement, but it's actually a normal function. It takes a boolean value and an I/O action if that boolean value is True , it returns the same I/O action that we supplied to it. However, if it's False , it returns the return () , action, so an I/O action that doesn't do anything. Here's how we could rewrite the previous piece of code with which we demonstrated getChar by using when :

So as you can see, it's useful for encapsulating the if something then do some I/O action else return () pattern.

sequence takes a list of I/O actions and returns an I/O actions that will perform those actions one after the other. The result contained in that I/O action will be a list of the results of all the I/O actions that were performed. Its type signature is sequence :: [IO a] -> IO [a] . Doing this:

Is exactly the same as doing this:.

So sequence [getLine, getLine, getLine] makes an I/O action that will perform getLine three times. If we bind that action to a name, the result is a list of all the results, so in our case, a list of three things that the user entered at the prompt.

A common pattern with sequence is when we map functions like print or putStrLn over lists. Doing map print [1,2,3,4] won't create an I/O action. It will create a list of I/O actions, because that's like writing [print 1, print 2, print 3, print 4] . If we want to transform that list of I/O actions into an I/O action, we have to sequence it.

What's with the [(),(),(),(),()] at the end? Well, when we evaluate an I/O action in GHCI, it's performed and then its result is printed out, unless that result is () , in which case it's not printed out. That's why evaluating putStrLn "hehe" in GHCI just prints out hehe (because the contained result in putStrLn "hehe" is () ). But when we do getLine in GHCI, the result of that I/O action is printed out, because getLine has a type of IO String .

Because mapping a function that returns an I/O action over a list and then sequencing it is so common, the utility functions mapM and mapM_ were introduced. mapM takes a function and a list, maps the function over the list and then sequences it. mapM_ does the same, only it throws away the result later. We usually use mapM_ when we don't care what result our sequenced I/O actions have.

forever takes an I/O action and returns an I/O action that just repeats the I/O action it got forever. It's located in Control.Monad . This little program will indefinitely ask the user for some input and spit it back to him, CAPSLOCKED:

forM (located in Control.Monad ) is like mapM , only that it has its parameters switched around. The first parameter is the list and the second one is the function to map over that list, which is then sequenced. Why is that useful? Well, with some creative use of lambdas and do notation, we can do stuff like this:

The (\a -> do ... ) is a function that takes a number and returns an I/O action. We have to surround it with parentheses, otherwise the lambda thinks the last two I/O actions belong to it. Notice that we do return color in the inside do block. We do that so that the I/O action which the do block defines has the result of our color contained within it. We actually didn't have to do that, because getLine already has that contained within it. Doing color <- getLine and then return color is just unpacking the result from getLine and then repackaging it again, so it's the same as just doing getLine . The forM (called with its two parameters) produces an I/O action, whose result we bind to colors . colors is just a normal list that holds strings. At the end, we print out all those colors by doing mapM putStrLn colors .

You can think of forM as meaning: make an I/O action for every element in this list. What each I/O action will do can depend on the element that was used to make the action. Finally, perform those actions and bind their results to something. We don't have to bind it, we can also just throw it away.

We could have actually done that without forM , only with forM it's more readable. Normally we write forM when we want to map and sequence some actions that we define there on the spot using do notation. In the same vein, we could have replaced the last line with forM colors putStrLn .

In this section, we learned the basics of input and output. We also found out what I/O actions are, how they enable us to do input and output and when they are actually performed. To reiterate, I/O actions are values much like any other value in Haskell. We can pass them as parameters to functions and functions can return I/O actions as results. What's special about them is that if they fall into the main function (or are the result in a GHCI line), they are performed. And that's when they get to write stuff on your screen or play Yakety Sax through your speakers. Each I/O action can also encapsulate a result with which it tells you what it got from the real world.

Don't think of a function like putStrLn as a function that takes a string and prints it to the screen. Think of it as a function that takes a string and returns an I/O action. That I/O action will, when performed, print beautiful poetry to your terminal.

Files and streams

getChar is an I/O action that reads a single character from the terminal. getLine is an I/O action that reads a line from the terminal. These two are pretty straightforward and most programming languages have some functions or statements that are parallel to them. But now, let's meet getContents . getContents is an I/O action that reads everything from the standard input until it encounters an end-of-file character. Its type is getContents :: IO String . What's cool about getContents is that it does lazy I/O. When we do foo <- getContents , it doesn't read all of the input at once, store it in memory and then bind it to foo . No, it's lazy! It'll say: "Yeah yeah, I'll read the input from the terminal later as we go along, when you really need it!" .

getContents is really useful when we're piping the output from one program into the input of our program. In case you don't know how piping works in unix-y systems, here's a quick primer. Let's make a text file that contains the following little haiku:

Yeah, the haiku sucks, what of it? If anyone knows of any good haiku tutorials, let me know.

Now, recall the little program we wrote when we were introducing the forever function. It prompted the user for a line, returned it to him in CAPSLOCK and then did that all over again, indefinitely. Just so you don't have to scroll all the way back, here it is again:

We'll save that program as capslocker.hs or something and compile it. And then, we're going to use a unix pipe to feed our text file directly to our little program. We're going to use the help of the GNU cat program, which prints out a file that's given to it as an argument. Check it out, booyaka!

As you can see, piping the output of one program (in our case that was cat ) to the input of another ( capslocker ) is done with the | character. What we've done is pretty much equivalent to just running capslocker , typing our haiku at the terminal and then issuing an end-of-file character (that's usually done by pressing Ctrl-D). It's like running cat haiku.txt and saying: “Wait, don't print this out to the terminal, tell it to capslocker instead!”.

So what we're essentially doing with that use of forever is taking the input and transforming it into some output. That's why we can use getContents to make our program even shorter and better:

We run the getContents I/O action and name the string it produces contents . Then, we map toUpper over that string and print that to the terminal. Keep in mind that because strings are basically lists, which are lazy, and getContents is I/O lazy, it won't try to read the whole content at once and store it into memory before printing out the capslocked version. Rather, it will print out the capslocked version as it reads it, because it will only read a line from the input when it really needs to.

Cool, it works. What if we just run capslocker and try to type in the lines ourselves?

We got out of that by pressing Ctrl-D. Pretty nice! As you can see, it prints out our capslocked input back to us line by line. When the result of getContents is bound to contents , it's not represented in memory as a real string, but more like a promise that it will produce the string eventually. When we map toUpper over contents , that's also a promise to map that function over the eventual contents. And finally when putStr happens, it says to the previous promise: "Hey, I need a capslocked line!" . It doesn't have any lines yet, so it says to contents : "Hey, how about actually getting a line from the terminal?" . So that's when getContents actually reads from the terminal and gives a line to the code that asked it to produce something tangible. That code then maps toUpper over that line and gives it to putStr , which prints it. And then, putStr says: "Hey, I need the next line, come on!" and this repeats until there's no more input, which is signified by an end-of-file character.

Let's make program that takes some input and prints out only those lines that are shorter than 10 characters. Observe:

We've made our I/O part of the program as short as possible. Because our program is supposed to take some input and print out some output based on the input, we can implement it by reading the input contents, running a function on them and then printing out what the function gave back.

The shortLinesOnly function works like this: it takes a string, like "short\nlooooooooooooooong\nshort again" . That string has three lines, two of them are short and the middle one is long. It runs the lines function on that string, which converts it to ["short", "looooooooooooooong", "short again"] , which we then bind to the name allLines . That list of string is then filtered so that only those lines that are shorter than 10 characters remain in the list, producing ["short", "short again"] . And finally, unlines joins that list into a single newline delimited string, giving "short\nshort again" . Let's give it a go.

We pipe the contents of shortlines.txt into the output of shortlinesonly and as the output, we only get the short lines.

This pattern of getting some string from the input, transforming it with a function and then outputting that is so common that there exists a function which makes that even easier, called interact . interact takes a function of type String -> String as a parameter and returns an I/O action that will take some input, run that function on it and then print out the function's result. Let's modify our program to use that.

Just to show that this can be achieved in much less code (even though it will be less readable) and to demonstrate our function composition skill, we're going to rework that a bit further.

Wow, we actually reduced that to just one line, which is pretty cool!

interact can be used to make programs that are piped some contents into them and then dump some result out or it can be used to make programs that appear to take a line of input from the user, give back some result based on that line and then take another line and so on. There isn't actually a real distinction between the two, it just depends on how the user is supposed to use them.

Let's make a program that continuously reads a line and then tells us if the line is a palindrome or not. We could just use getLine to read a line, tell the user if it's a palindrome and then run main all over again. But it's simpler if we use interact . When using interact , think about what you need to do to transform some input into the desired output. In our case, we have to replace each line of the input with either "palindrome" or "not a palindrome" . So we have to write a function that transforms something like "elephant\nABCBA\nwhatever" into "not a palindrome\npalindrome\nnot a palindrome" . Let's do this!

Let's write this in point-free.

Pretty straightforward. First it turns something like "elephant\nABCBA\nwhatever" into ["elephant", "ABCBA", "whatever"] and then it maps that lambda over it, giving ["not a palindrome", "palindrome", "not a palindrome"] and then unlines joins that list into a single, newline delimited string. Now we can do

Let's test this out:

Even though we made a program that transforms one big string of input into another, it acts like we made a program that does it line by line. That's because Haskell is lazy and it wants to print the first line of the result string, but it can't because it doesn't have the first line of the input yet. So as soon as we give it the first line of input, it prints the first line of the output. We get out of the program by issuing an end-of-line character.

We can also use this program by just piping a file into it. Let's say we have this file:

and we save it as words.txt . This is what we get by piping it into our program:

Again, we get the same output as if we had run our program and put in the words ourselves at the standard input. We just don't see the input that palindromes.hs because the input came from the file and not from us typing the words in.

So now you probably see how lazy I/O works and how we can use it to our advantage. You can just think in terms of what the output is supposed to be for some given input and write a function to do that transformation. In lazy I/O, nothing is eaten from the input until it absolutely has to be because what we want to print right now depends on that input.

So far, we've worked with I/O by printing out stuff to the terminal and reading from it. But what about reading and writing files? Well, in a way, we've already been doing that. One way to think about reading from the terminal is to imagine that it's like reading from a (somewhat special) file. Same goes for writing to the terminal, it's kind of like writing to a file. We can call these two files stdout and stdin , meaning standard output and standard input , respectively. Keeping that in mind, we'll see that writing to and reading from files is very much like writing to the standard output and reading from the standard input.

We'll start off with a really simple program that opens a file called girlfriend.txt , which contains a verse from Avril Lavigne's #1 hit Girlfriend , and just prints out out to the terminal. Here's girlfriend.txt :

And here's our program:

Running it, we get the expected result:

Let's go over this line by line. The first line is just four exclamations, to get our attention. In the second line, Avril tells us that she doesn't like our current romantic partner. The third line serves to emphasize that disapproval, whereas the fourth line suggests we should seek out a new girlfriend.

Let's also go over the program line by line! Our program is several I/O actions glued together with a do block. In the first line of the do block, we notice a new function called openFile . This is its type signature: openFile :: FilePath -> IOMode -> IO Handle . If you read that out loud, it states: openFile takes a file path and an IOMode and returns an I/O action that will open a file and have the file's associated handle encapsulated as its result.

FilePath is just a type synonym for String , simply defined as:

IOMode is a type that's defined like this:

Just like our type that represents the seven possible values for the days of the week, this type is an enumeration that represents what we want to do with our opened file. Very simple. Just note that this type is IOMode and not IO Mode . IO Mode would be the type of an I/O action that has a value of some type Mode as its result, but IOMode is just a simple enumeration.

Finally, it returns an I/O action that will open the specified file in the specified mode. If we bind that action to something we get a Handle . A value of type Handle represents where our file is. We'll use that handle so we know which file to read from. It would be stupid to read a file but not bind that read to a handle because we wouldn't be able to do anything with the file. So in our case, we bound the handle to handle .

In the next line, we see a function called hGetContents . It takes a Handle , so it knows which file to get the contents from and returns an IO String — an I/O action that holds as its result the contents of the file. This function is pretty much like getContents . The only difference is that getContents will automatically read from the standard input (that is from the terminal), whereas hGetContents takes a file handle which tells it which file to read from. In all other respects, they work the same. And just like getContents , hGetContents won't attempt to read the file at once and store it in memory, but it will read it as needed. That's really cool because we can treat contents as the whole contents of the file, but it's not really loaded in memory. So if this were a really huge file, doing hGetContents wouldn't choke up our memory, but it would read only what it needed to from the file, when it needed to.

Note the difference between the handle used to identify a file and the contents of the file, bound in our program to handle and contents . The handle is just something by which we know what our file is. If you imagine your whole file system to be a really big book and each file is a chapter in the book, the handle is a bookmark that shows where you're currently reading (or writing) a chapter, whereas the contents are the actual chapter.

With putStr contents we just print the contents out to the standard output and then we do hClose , which takes a handle and returns an I/O action that closes the file. You have to close the file yourself after opening it with openFile !

Another way of doing what we just did is to use the withFile function, which has a type signature of withFile :: FilePath -> IOMode -> (Handle -> IO a) -> IO a . It takes a path to a file, an IOMode and then it takes a function that takes a handle and returns some I/O action. What it returns is an I/O action that will open that file, do something we want with the file and then close it. The result encapsulated in the final I/O action that's returned is the same as the result of the I/O action that the function we give it returns. This might sound a bit complicated, but it's really simple, especially with lambdas, here's our previous example rewritten to use withFile :

As you can see, it's very similar to the previous piece of code. (\handle -> ... ) is the function that takes a handle and returns an I/O action and it's usually done like this, with a lambda. The reason it has to take a function that returns an I/O action instead of just taking an I/O action to do and then close the file is because the I/O action that we'd pass to it wouldn't know on which file to operate. This way, withFile opens the file and then passes the handle to the function we gave it. It gets an I/O action back from that function and then makes an I/O action that's just like it, only it closes the file afterwards. Here's how we can make our own withFile function:

We know the result will be an I/O action so we can just start off with a do . First we open the file and get a handle from it. Then, we apply handle to our function to get back the I/O action that does all the work. We bind that action to result , close the handle and then do return result . By return ing the result encapsulated in the I/O action that we got from f , we make it so that our I/O action encapsulates the same result as the one we got from f handle . So if f handle returns an action that will read a number of lines from the standard input and write them to a file and have as its result encapsulated the number of lines it read, if we used that with withFile' , the resulting I/O action would also have as its result the number of lines read.

Just like we have hGetContents that works like getContents but for a specific file, there's also hGetLine , hPutStr , hPutStrLn , hGetChar , etc. They work just like their counterparts without the h , only they take a handle as a parameter and operate on that specific file instead of operating on standard input or standard output. Example: putStrLn is a function that takes a string and returns an I/O action that will print out that string to the terminal and a newline after it. hPutStrLn takes a handle and a string and returns an I/O action that will write that string to the file associated with the handle and then put a newline after it. In the same vein, hGetLine takes a handle and returns an I/O action that reads a line from its file.

Loading files and then treating their contents as strings is so common that we have these three nice little functions to make our work even easier:

readFile has a type signature of readFile :: FilePath -> IO String . Remember, FilePath is just a fancy name for String . readFile takes a path to a file and returns an I/O action that will read that file (lazily, of course) and bind its contents to something as a string. It's usually more handy than doing openFile and binding it to a handle and then doing hGetContents . Here's how we could have written our previous example with readFile :

Because we don't get a handle with which to identify our file, we can't close it manually, so Haskell does that for us when we use readFile .

writeFile has a type of writeFile :: FilePath -> String -> IO () . It takes a path to a file and a string to write to that file and returns an I/O action that will do the writing. If such a file already exists, it will be stomped down to zero length before being written on. Here's how to turn girlfriend.txt into a CAPSLOCKED version and write it to girlfriendcaps.txt :

appendFile has a type signature that's just like writeFile , only appendFile doesn't truncate the file to zero length if it already exists but it appends stuff to it.

Let's say we have a file todo.txt that has one task per line that we have to do. Now let's make a program that takes a line from the standard input and adds that to our to-do list.

We needed to add the "\n" to the end of each line because getLine doesn't give us a newline character at the end.

Ooh, one more thing. We talked about how doing contents <- hGetContents handle doesn't cause the whole file to be read at once and stored in-memory. It's I/O lazy, so doing this:

is actually like connecting a pipe from the file to the output. Just like you can think of lists as streams, you can also think of files as streams. This will read one line at a time and print it out to the terminal as it goes along. So you may be asking, how wide is this pipe then? How often will the disk be accessed? Well, for text files, the default buffering is line-buffering usually. That means that the smallest part of the file to be read at once is one line. That's why in this case it actually reads a line, prints it to the output, reads the next line, prints it, etc. For binary files, the default buffering is usually block-buffering. That means that it will read the file chunk by chunk. The chunk size is some size that your operating system thinks is cool.

You can control how exactly buffering is done by using the hSetBuffering function. It takes a handle and a BufferMode and returns an I/O action that sets the buffering. BufferMode is a simple enumeration data type and the possible values it can hold are: NoBuffering , LineBuffering or BlockBuffering (Maybe Int) . The Maybe Int is for how big the chunk should be, in bytes. If it's Nothing , then the operating system determines the chunk size. NoBuffering means that it will be read one character at a time. NoBuffering usually sucks as a buffering mode because it has to access the disk so much.

Here's our previous piece of code, only it doesn't read it line by line but reads the whole file in chunks of 2048 bytes.

Reading files in bigger chunks can help if we want to minimize disk access or when our file is actually a slow network resource.

We can also use hFlush , which is a function that takes a handle and returns an I/O action that will flush the buffer of the file associated with the handle. When we're doing line-buffering, the buffer is flushed after every line. When we're doing block-buffering, it's after we've read a chunk. It's also flushed after closing a handle. That means that when we've reached a newline character, the reading (or writing) mechanism reports all the data so far. But we can use hFlush to force that reporting of data that has been read so far. After flushing, the data is available to other programs that are running at the same time.

Think of reading a block-buffered file like this: your toilet bowl is set to flush itself after it has one gallon of water inside it. So you start pouring in water and once the gallon mark is reached, that water is automatically flushed and the data in the water that you've poured in so far is read. But you can flush the toilet manually too by pressing the button on the toilet. This makes the toilet flush and all the water (data) inside the toilet is read. In case you haven't noticed, flushing the toilet manually is a metaphor for hFlush . This is not a very great analogy by programming analogy standards, but I wanted a real world object that can be flushed for the punchline.

We already made a program to add a new item to our to-do list in todo.txt , now let's make a program to remove an item. I'll just paste the code and then we'll go over the program together so you see that it's really easy. We'll be using a few new functions from System.Directory and one new function from System.IO , but they'll all be explained.

Anyway, here's the program for removing an item from todo.txt :

Next up, we use a function that we haven't met before which is from System.IO — openTempFile . Its name is pretty self-explanatory. It takes a path to a temporary directory and a template name for a file and opens a temporary file. We used "." for the temporary directory, because . denotes the current directory on just about any OS. We used "temp" as the template name for the temporary file, which means that the temporary file will be named temp plus some random characters. It returns an I/O action that makes the temporary file and the result in that I/O action is a pair of values: the name of the temporary file and a handle. We could just open a normal file called todo2.txt or something like that but it's better practice to use openTempFile so you know you're probably not overwriting anything.

Next up, we bind the contents of todo.txt to contents . Then, split that string into a list of strings, each string one line. So todoTasks is now something like ["Iron the dishes", "Dust the dog", "Take salad out of the oven"] . We zip the numbers from 0 onwards and that list with a function that takes a number, like 3, and a string, like "hey" and returns "3 - hey" , so numberedTasks is ["0 - Iron the dishes", "1 - Dust the dog" ... . We join that list of strings into a single newline delimited string with unlines and print that string out to the terminal. Note that instead of doing that, we could have also done mapM putStrLn numberedTasks

We ask the user which one they want to delete and wait for them to enter a number. Let's say they want to delete number 1, which is Dust the dog , so they punch in 1 . numberString is now "1" and because we want a number, not a string, we run read on that to get 1 and bind that to number .

Remember the delete and !! functions from Data.List . !! returns an element from a list with some index and delete deletes the first occurence of an element in a list and returns a new list without that occurence. (todoTasks !! number) (number is now 1 ) returns "Dust the dog" . We bind todoTasks without the first occurence of "Dust the dog" to newTodoItems and then join that into a single string with unlines before writing it to the temporary file that we opened. The old file is now unchanged and the temporary file contains all the lines that the old one does, except the one we deleted.

After that we close both the original and the temporary files and then we remove the original one with removeFile , which, as you can see, takes a path to a file and deletes it. After deleting the old todo.txt , we use renameFile to rename the temporary file to todo.txt . Be careful, removeFile and renameFile (which are both in System.Directory by the way) take file paths as their parameters, not handles.

And that's that! We could have done this in even fewer lines, but we were very careful not to overwrite any existing files and politely asked the operating system to tell us where we can put our temporary file. Let's give this a go!

Command line arguments

Dealing with command line arguments is pretty much a necessity if you want to make a script or application that runs on a terminal. Luckily, Haskell's standard library has a nice way of getting command line arguments of a program.

In the previous section, we made one program for adding a to-do item to our to-do list and one program for removing an item. There are two problems with the approach we took. The first one is that we just hardcoded the name of our to-do file in our code. We just decided that the file will be named todo.txt and that the user will never have a need for managing several to-do lists.

One way to solve that is to always ask the user which file they want to use as their to-do list. We used that approach when we wanted to know which item the user wants to delete. It works, but it's not so good, because it requires the user to run the program, wait for the program to ask something and then tell that to the program. That's called an interactive program and the difficult bit with interactive command line programs is this — what if you want to automate the execution of that program, like with a batch script? It's harder to make a batch script that interacts with a program than a batch script that just calls one program or several of them.

That's why it's sometimes better to have the user tell the program what they want when they run the program, instead of having the program ask the user once it's run. And what better way to have the user tell the program what they want it to do when they run it than via command line arguments!

The System.Environment module has two cool I/O actions. One is getArgs , which has a type of getArgs :: IO [String] and is an I/O action that will get the arguments that the program was run with and have as its contained result a list with the arguments. getProgName has a type of getProgName :: IO String and is an I/O action that contains the program name.

Here's a small program that demonstrates how these two work:

We bind getArgs and progName to args and progName . We say The arguments are: and then for every argument in args , we do putStrLn . Finally, we also print out the program name. Let's compile this as arg-test .

Nice. Armed with this knowledge you could create some cool command line apps. In fact, let's go ahead and make one. In the previous section, we made a separate program for adding tasks and a separate program for deleting them. Now, we're going to join that into one program, what it does will depend on the command line arguments. We're also going to make it so it can operate on different files, not just todo.txt .

We'll call it simply todo and it'll be able to do (haha!) three different things:

Delete tasks

We're not going to concern ourselves with possible bad input too much right now.

Our program will be made so that if we want to add the task Find the magic sword of power to the file todo.txt , we have to punch in todo add todo.txt "Find the magic sword of power" in our terminal. To view the tasks we'll just do todo view todo.txt and to remove the task with the index of 2, we'll do todo remove todo.txt 2 .

We'll start by making a dispatch association list. It's going to be a simple association list that has command line arguments as keys and functions as their corresponding values. All these functions will be of type [String] -> IO () . They're going to take the argument list as a parameter and return an I/O action that does the viewing, adding, deleting, etc.

We have yet to define main , add , view and remove , so let's start with main :

First, we get the arguments and bind them to (command:args) . If you remember your pattern matching, this means that the first argument will get bound to command and the rest of them will get bound to args . If we call our program like todo add todo.txt "Spank the monkey" , command will be "add" and args will be ["todo.xt", "Spank the monkey"] .

In the next line, we look up our command in the dispatch list. Because "add" points to add , we get Just add as a result. We use pattern matching again to extract our function out of the Maybe . What happens if our command isn't in the dispatch list? Well then the lookup will return Nothing , but we said we won't concern ourselves with failing gracefully too much, so the pattern matching will fail and our program will throw a fit.

Finally, we call our action function with the rest of the argument list. That will return an I/O action that either adds an item, displays a list of items or deletes an item and because that action is part of the main do block, it will get performed. If we follow our concrete example so far and our action function is add , it will get called with args (so ["todo.txt", "Spank the monkey"] ) and return an I/O action that adds Spank the monkey to todo.txt .

Great! All that's left now is to implement add , view and remove . Let's start with add :

If we call our program like todo add todo.txt "Spank the monkey" , the "add" will get bound to command in the first pattern match in the main block, whereas ["todo.txt", "Spank the monkey"] will get passed to the function that we get from the dispatch list. So, because we're not dealing with bad input right now, we just pattern match against a list with those two elements right away and return an I/O action that appends that line to the end of the file, along with a newline character.

Next, let's implement the list viewing functionality. If we want to view the items in a file, we do todo view todo.txt . So in the first pattern match, command will be "view" and args will be ["todo.txt"] .

We already did pretty much the same thing in the program that only deleted tasks when we were displaying the tasks so that the user can choose one for deletion, only here we just display the tasks.

And finally, we're going to implement remove . It's going to be very similar to the program that only deleted the tasks, so if you don't understand how deleting an item here works, check out the explanation under that program. The main difference is that we're not hardcoding todo.txt but getting it as an argument. We're also not prompting the user for the task number to delete, we're getting it as an argument.

We opened up the file based on fileName and opened a temporary file, deleted the line with the index that the user wants to delete, wrote that to the temporary file, removed the original file and renamed the temporary file back to fileName .

Here's the whole program at once, in all its glory!

To summarize our solution: we made a dispatch association that maps from commands to functions that take some command line arguments and return an I/O action. We see what the command is and based on that we get the appropriate function from the dispatch list. We call that function with the rest of the command line arguments to get back an I/O action that will do the appropriate thing and then just perform that action!

In other languages, we might have implemented this with a big switch case statement or whatever, but using higher order functions allows us to just tell the dispatch list to give us the appropriate function and then tell that function to give us an I/O action for some command line arguments.

Let's try our app out!

Another cool thing about this is that it's easy to add extra functionality. Just add an entry in the dispatch association list and implement the corresponding function and you're laughing! As an exercise, you can try implementing a bump function that will take a file and a task number and return an I/O action that bumps that task to the top of the to-do list.

You could make this program fail a bit more gracefully in case of bad input (for example, if someone runs todo UP YOURS HAHAHAHA ) by making an I/O action that just reports there has been an error (say, errorExit :: IO () ) and then check for possible erronous input and if there is erronous input, perform the error reporting I/O action. Another way is to use exceptions, which we will meet soon.

Many times while programming, you need to get some random data. Maybe you're making a game where a die needs to be thrown or you need to generate some test data to test out your program. There are a lot of uses for random data when programming. Well, actually, pseudo-random, because we all know that the only true source of randomness is a monkey on a unicycle with a cheese in one hand and its butt in the other. In this section, we'll take a look at how to make Haskell generate seemingly random data.

In most other programming languages, you have functions that give you back some random number. Each time you call that function, you get back a (hopefully) different random number. How about Haskell? Well, remember, Haskell is a pure functional language. What that means is that it has referential transparency. What THAT means is that a function, if given the same parameters twice, must produce the same result twice. That's really cool because it allows us to reason differently about programs and it enables us to defer evaluation until we really need it. If I call a function, I can be sure that it won't do any funny stuff before giving me the results. All that matters are its results. However, this makes it a bit tricky for getting random numbers. If I have a function like this:

It's not very useful as a random number function because it will always return 4 , even though I can assure you that the 4 is completely random, because I used a die to determine it.

How do other languages make seemingly random numbers? Well, they take various info from your computer, like the current time, how much and where you moved your mouse and what kind of noises you made behind your computer and based on that, give a number that looks really random. The combination of those factors (that randomness) is probably different in any given moment in time, so you get a different random number.

Enter the System.Random module. It has all the functions that satisfy our need for randomness. Let's just dive into one of the functions it exports then, namely random . Here's its type: random :: (RandomGen g, Random a) => g -> (a, g) . Whoa! Some new typeclasses in this type declaration up in here! The RandomGen typeclass is for types that can act as sources of randomness. The Random typeclass is for things that can take on random values. A boolean value can take on a random value, namely True or False . A number can also take up a plethora of different random values. Can a function take on a random value? I don't think so, probably not! If we try to translate the type declaration of random to English, we get something like: it takes a random generator (that's our source of randomness) and returns a random value and a new random generator. Why does it also return a new generator as well as a random value? Well, we'll see in a moment.

To use our random function, we have to get our hands on one of those random generators. The System.Random module exports a cool type, namely StdGen that is an instance of the RandomGen typeclass. We can either make a StdGen manually or we can tell the system to give us one based on a multitude of sort of random stuff.

To manually make a random generator, use the mkStdGen function. It has a type of mkStdGen :: Int -> StdGen . It takes an integer and based on that, gives us a random generator. Okay then, let's try using random and mkStdGen in tandem to get a (hardly random) number.

What's this? Ah, right, the random function can return a value of any type that's part of the Random typeclass, so we have to inform Haskell what kind of type we want. Also let's not forget that it returns a random value and a random generator in a pair.

Finally! A number that looks kind of random! The first component of the tuple is our number whereas the second component is a textual representation of our new random generator. What happens if we call random with the same random generator again?

Of course. The same result for the same parameters. So let's try giving it a different random generator as a parameter.

Alright, cool, great, a different number. We can use the type annotation to get different types back from that function.

Let's make a function that simulates tossing a coin three times. If random didn't return a new generator along with a random value, we'd have to make this function take three random generators as a parameter and then return coin tosses for each of them. But that sounds wrong because if one generator can make a random value of type Int (which can take on a load of different values), it should be able to make three coin tosses (which can take on precisely eight combinations). So this is where random returning a new generator along with a value really comes in handy.

We'll represent a coin with a simple Bool . True is tails, False is heads.

We call random with the generator we got as a parameter to get a coin and a new generator. Then we call it again, only this time with our new generator, to get the second coin. We do the same for the third coin. Had we called it with the same generator every time, all the coins would have had the same value and we'd only be able to get (False, False, False) or (True, True, True) as a result.

Notice that we didn't have to do random gen :: (Bool, StdGen) . That's because we already specified that we want booleans in the type declaration of the function. That's why Haskell can infer that we want a boolean value in this case.

So what if we want to flip four coins? Or five? Well, there's a function called randoms that takes a generator and returns an infinite sequence of values based on that generator.

Why doesn't randoms return a new generator as well as a list? We could implement the randoms function very easily like this:

A recursive definition. We get a random value and a new generator from the current generator and then make a list that has the value as its head and random numbers based on the new generator as its tail. Because we have to be able to potentially generate an infinite amount of numbers, we can't give the new random generator back.

We could make a function that generates a finite stream of numbers and a new generator like this:

Again, a recursive definition. We say that if we want 0 numbers, we just return an empty list and the generator that was given to us. For any other number of random values, we first get one random number and a new generator. That will be the head. Then we say that the tail will be n - 1 numbers generated with the new generator. Then we return the head and the rest of the list joined and the final generator that we got from getting the n - 1 random numbers.

There's also randomRs , which produces a stream of random values within our defined ranges. Check this out:

Nice, looks like a super secret password or something.

You may be asking yourself, what does this section have to do with I/O anyway? We haven't done anything concerning I/O so far. Well, so far we've always made our random number generator manually by making it with some arbitrary integer. The problem is, if we do that in our real programs, they will always return the same random numbers, which is no good for us. That's why System.Random offers the getStdGen I/O action, which has a type of IO StdGen . When your program starts, it asks the system for a good random number generator and stores that in a so called global generator. getStdGen fetches you that global random generator when you bind it to something.

Here's a simple program that generates a random string.

Be careful though, just performing getStdGen twice will ask the system for the same global generator twice. If you do this:

you will get the same string printed out twice! One way to get two different strings of length 20 is to set up an infinite stream and then take the first 20 characters and print them out in one line and then take the second set of 20 characters and print them out in the second line. For this, we can use the splitAt function from Data.List , which splits a list at some index and returns a tuple that has the first part as the first component and the second part as the second component.

Another way is to use the newStdGen action, which splits our current random generator into two generators. It updates the global random generator with one of them and encapsulates the other as its result.

Not only do we get a new random generator when we bind newStdGen to something, the global one gets updated as well, so if we do getStdGen again and bind it to something, we'll get a generator that's not the same as gen .

Here's a little program that will make the user guess which number it's thinking of.

We make a function askForNumber , which takes a random number generator and returns an I/O action that will prompt the user for a number and tell him if he guessed it right. In that function, we first generate a random number and a new generator based on the generator that we got as a parameter and call them randNumber and newGen . Let's say that the number generated was 7 . Then we tell the user to guess which number we're thinking of. We perform getLine and bind its result to numberString . When the user enters 7 , numberString becomes "7" . Next, we use when to check if the string the user entered is an empty string. If it is, an empty I/O action of return () is performed, which effectively ends the program. If it isn't, the action consisting of that do block right there gets performed. We use read on numberString to convert it to a number, so number is now 7 .

We check if the number that we entered is equal to the one generated randomly and give the user the appropriate message. And then we call askForNumber recursively, only this time with the new generator that we got, which gives us an I/O action that's just like the one we performed, only it depends on a different generator and we perform it.

main consists of just getting a random generator from the system and calling askForNumber with it to get the initial action.

Here's our program in action!

Another way to make this same program is like this:

It's very similar to the previous version, only instead of making a function that takes a generator and then calls itself recursively with the new updated generator, we do all the work in main . After telling the user whether they were correct in their guess or not, we update the global generator and then call main again. Both approaches are valid but I like the first one more since it does less stuff in main and also provides us with a function that we can reuse easily.

Bytestrings

Lists are a cool and useful data structure. So far, we've used them pretty much everywhere. There are a multitude of functions that operate on them and Haskell's laziness allows us to exchange the for and while loops of other languages for filtering and mapping over lists, because evaluation will only happen once it really needs to, so things like infinite lists (and even infinite lists of infinite lists!) are no problem for us. That's why lists can also be used to represent streams, either when reading from the standard input or when reading from files. We can just open a file and read it as a string, even though it will only be accessed when the need arises.

However, processing files as strings has one drawback: it tends to be slow. As you know, String is a type synonym for [Char] . Char s don't have a fixed size, because it takes several bytes to represent a character from, say, Unicode. Furthemore, lists are really lazy. If you have a list like [1,2,3,4] , it will be evaluated only when completely necessary. So the whole list is sort of a promise of a list. Remember that [1,2,3,4] is syntactic sugar for 1:2:3:4:[] . When the first element of the list is forcibly evaluated (say by printing it), the rest of the list 2:3:4:[] is still just a promise of a list, and so on. So you can think of lists as promises that the next element will be delivered once it really has to and along with it, the promise of the element after it. It doesn't take a big mental leap to conclude that processing a simple list of numbers as a series of promises might not be the most efficient thing in the world.

That overhead doesn't bother us so much most of the time, but it turns out to be a liability when reading big files and manipulating them. That's why Haskell has bytestrings . Bytestrings are sort of like lists, only each element is one byte (or 8 bits) in size. The way they handle laziness is also different.

Bytestrings come in two flavors: strict and lazy ones. Strict bytestrings reside in Data.ByteString and they do away with the laziness completely. There are no promises involved; a strict bytestring represents a series of bytes in an array. You can't have things like infinite strict bytestrings. If you evaluate the first byte of a strict bytestring, you have to evaluate it whole. The upside is that there's less overhead because there are no thunks (the technical term for promise ) involved. The downside is that they're likely to fill your memory up faster because they're read into memory at once.

The other variety of bytestrings resides in Data.ByteString.Lazy . They're lazy, but not quite as lazy as lists. Like we said before, there are as many thunks in a list as there are elements. That's what makes them kind of slow for some purposes. Lazy bytestrings take a different approach — they are stored in chunks (not to be confused with thunks!), each chunk has a size of 64K. So if you evaluate a byte in a lazy bytestring (by printing it or something), the first 64K will be evaluated. After that, it's just a promise for the rest of the chunks. Lazy bytestrings are kind of like lists of strict bytestrings with a size of 64K. When you process a file with lazy bytestrings, it will be read chunk by chunk. This is cool because it won't cause the memory usage to skyrocket and the 64K probably fits neatly into your CPU's L2 cache.

If you look through the documentation for Data.ByteString.Lazy , you'll see that it has a lot of functions that have the same names as the ones from Data.List , only the type signatures have ByteString instead of [a] and Word8 instead of a in them. The functions with the same names mostly act the same as the ones that work on lists. Because the names are the same, we're going to do a qualified import in a script and then load that script into GHCI to play with bytestrings.

B has lazy bytestring types and functions, whereas S has strict ones. We'll mostly be using the lazy version.

The function pack has the type signature pack :: [Word8] -> ByteString . What that means is that it takes a list of bytes of type Word8 and returns a ByteString . You can think of it as taking a list, which is lazy, and making it less lazy, so that it's lazy only at 64K intervals.

What's the deal with that Word8 type? Well, it's like Int , only that it has a much smaller range, namely 0-255. It represents an 8-bit number. And just like Int , it's in the Num typeclass. For instance, we know that the value 5 is polymorphic in that it can act like any numeral type. Well, it can also take the type of Word8 .

As you can see, you usually don't have to worry about the Word8 too much, because the type system can makes the numbers choose that type. If you try to use a big number, like 336 as a Word8 , it will just wrap around to 80 .

We packed only a handful of values into a ByteString , so they fit inside one chunk. The Empty is like the [] for lists.

unpack is the inverse function of pack . It takes a bytestring and turns it into a list of bytes.

fromChunks takes a list of strict bytestrings and converts it to a lazy bytestring. toChunks takes a lazy bytestring and converts it to a list of strict ones.

This is good if you have a lot of small strict bytestrings and you want to process them efficiently without joining them into one big strict bytestring in memory first.

The bytestring version of : is called cons It takes a byte and a bytestring and puts the byte at the beginning. It's lazy though, so it will make a new chunk even if the first chunk in the bytestring isn't full. That's why it's better to use the strict version of cons , cons' if you're going to be inserting a lot of bytes at the beginning of a bytestring.

As you can see empty makes an empty bytestring. See the difference between cons and cons' ? With the foldr , we started with an empty bytestring and then went over the list of numbers from the right, adding each number to the beginning of the bytestring. When we used cons , we ended up with one chunk for every byte, which kind of defeats the purpose.

Otherwise, the bytestring modules have a load of functions that are analogous to those in Data.List , including, but not limited to, head , tail , init , null , length , map , reverse , foldl , foldr , concat , takeWhile , filter , etc.

It also has functions that have the same name and behave the same as some functions found in System.IO , only String s are replaced with ByteString s. For instance, the readFile function in System.IO has a type of readFile :: FilePath -> IO String , while the readFile from the bytestring modules has a type of readFile :: FilePath -> IO ByteString . Watch out, if you're using strict bytestrings and you attempt to read a file, it will read it into memory at once! With lazy bytestrings, it will read it into neat chunks.

Let's make a simple program that takes two filenames as command-line arguments and copies the first file into the second file. Note that System.Directory already has a function called copyFile , but we're going to implement our own file copying function and program anyway.

We make our own function that takes two FilePath s (remember, FilePath is just a synonym for String ) and returns an I/O action that will copy one file into another using bytestring. In the main function, we just get the arguments and call our function with them to get the I/O action, which is then performed.

Notice that a program that doesn't use bytestrings could look just like this, the only difference is that we used B.readFile and B.writeFile instead of readFile and writeFile . Many times, you can convert a program that uses normal strings to a program that uses bytestrings by just doing the necessary imports and then putting the qualified module names in front of some functions. Sometimes, you have to convert functions that you wrote to work on strings so that they work on bytestrings, but that's not hard.

Whenever you need better performance in a program that reads a lot of data into strings, give bytestrings a try, chances are you'll get some good performance boosts with very little effort on your part. I usually write programs by using normal strings and then convert them to use bytestrings if the performance is not satisfactory.

All languages have procedures, functions, and pieces of code that might fail in some way. That's just a fact of life. Different languages have different ways of handling those failures. In C, we usually use some abnormal return value (like -1 or a null pointer) to indicate that what a function returned shouldn't be treated like a normal value. Java and C#, on the other hand, tend to use exceptions to handle failure. When an exception is thrown, the control flow jumps to some code that we've defined that does some cleanup and then maybe re-throws the exception so that some other error handling code can take care of some other stuff.

Haskell has a very good type system. Algebraic data types allow for types like Maybe and Either and we can use values of those types to represent results that may be there or not. In C, returning, say, -1 on failure is completely a matter of convention. It only has special meaning to humans. If we're not careful, we might treat these abnormal values as ordinary ones and then they can cause havoc and dismay in our code. Haskell's type system gives us some much-needed safety in that aspect. A function a -> Maybe b clearly indicates that it it may produce a b wrapped in Just or that it may return Nothing . The type is different from just plain a -> b and if we try to use those two functions interchangeably, the compiler will complain at us.

Despite having expressive types that support failed computations, Haskell still has support for exceptions, because they make more sense in I/O contexts. A lot of things can go wrong when dealing with the outside world because it is so unreliable. For instance, when opening a file, a bunch of things can go wrong. The file might be locked, it might not be there at all or the hard disk drive or something might not be there at all. So it's good to be able to jump to some error handling part of our code when such an error occurs.

Okay, so I/O code (i.e. impure code) can throw exceptions. It makes sense. But what about pure code? Well, it can throw exceptions too. Think about the div and head functions. They have types of (Integral a) => a -> a -> a and [a] -> a , respectively. No Maybe or Either in their return type and yet they can both fail! div explodes in your face if you try to divide by zero and head throws a tantrum when you give it an empty list.

Pure code can throw exceptions, but it they can only be caught in the I/O part of our code (when we're inside a do block that goes into main ). That's because you don't know when (or if) anything will be evaluated in pure code, because it is lazy and doesn't have a well-defined order of execution, whereas I/O code does.

Earlier, we talked about how we should spend as little time as possible in the I/O part of our program. The logic of our program should reside mostly within our pure functions, because their results are dependant only on the parameters that the functions are called with. When dealing with pure functions, you only have to think about what a function returns, because it can't do anything else. This makes your life easier. Even though doing some logic in I/O is necessary (like opening files and the like), it should preferably be kept to a minimum. Pure functions are lazy by default, which means that we don't know when they will be evaluated and that it really shouldn't matter. However, once pure functions start throwing exceptions, it matters when they are evaluated. That's why we can only catch exceptions thrown from pure functions in the I/O part of our code. And that's bad, because we want to keep the I/O part as small as possible. However, if we don't catch them in the I/O part of our code, our program crashes. The solution? Don't mix exceptions and pure code. Take advantage of Haskell's powerful type system and use types like Either and Maybe to represent results that may have failed.

That's why we'll just be looking at how to use I/O exceptions for now. I/O exceptions are exceptions that are caused when something goes wrong while we are communicating with the outside world in an I/O action that's part of main . For example, we can try opening a file and then it turns out that the file has been deleted or something. Take a look at this program that opens a file whose name is given to it as a command line argument and tells us how many lines the file has.

A very simple program. We perform the getArgs I/O action and bind the first string in the list that it yields to fileName . Then we call the contents of the file with that name contents . Lastly, we apply lines to those contents to get a list of lines and then we get the length of that list and give it to show to get a string representation of that number. It works as expected, but what happens when we give it the name of a file that doesn't exist?

Aha, we get an error from GHC, telling us that the file does not exist. Our program crashes. What if we wanted to print out a nicer message if the file doesn't exist? One way to do that is to check if the file exists before trying to open it by using the doesFileExist function from System.Directory .

We did fileExists <- doesFileExist fileName because doesFileExist has a type of doesFileExist :: FilePath -> IO Bool , which means that it returns an I/O action that has as its result a boolean value which tells us if the file exists. We can't just use doesFileExist in an if expression directly.

Another solution here would be to use exceptions. It's perfectly acceptable to use them in this context. A file not existing is an exception that arises from I/O, so catching it in I/O is fine and dandy.

To deal with this by using exceptions, we're going to take advantage of the catch function from System.IO.Error . Its type is catch :: IO a -> (IOError -> IO a) -> IO a . It takes two parameters. The first one is an I/O action. For instance, it could be an I/O action that tries to open a file. The second one is the so-called handler. If the first I/O action passed to catch throws an I/O exception, that exception gets passed to the handler, which then decides what to do. So the final result is an I/O action that will either act the same as the first parameter or it will do what the handler tells it if the first I/O action throws an exception.

If you're familiar with try-catch blocks in languages like Java or Python, the catch function is similar to them. The first parameter is the thing to try, kind of like the stuff in the try block in other, imperative languages. The second parameter is the handler that takes an exception, just like most catch blocks take exceptions that you can then examine to see what happened. The handler is invoked if an exception is thrown.

The handler takes a value of type IOError , which is a value that signifies that an I/O exception occurred. It also carries information regarding the type of the exception that was thrown. How this type is implemented depends on the implementation of the language itself, which means that we can't inspect values of the type IOError by pattern matching against them, just like we can't pattern match against values of type IO something . We can use a bunch of useful predicates to find out stuff about values of type IOError as we'll learn in a second.

So let's put our new friend catch to use!

First of all, you'll see that put backticks around it so that we can use it as an infix function, because it takes two parameters. Using it as an infix function makes it more readable. So toTry `catch` handler is the same as catch toTry handler , which fits well with its type. toTry is the I/O action that we try to carry out and handler is the function that takes an IOError and returns an action to be carried out in case of an exception.

Let's give this a go:

In the handler, we didn't check to see what kind of IOError we got. We just say "Whoops, had some trouble!" for any kind of error. Just catching all types of exceptions in one handler is bad practice in Haskell just like it is in most other languages. What if some other exception happens that we don't want to catch, like us interrupting the program or something? That's why we're going to do the same thing that's usually done in other languages as well: we'll check to see what kind of exception we got. If it's the kind of exception we're waiting to catch, we do our stuff. If it's not, we throw that exception back into the wild. Let's modify our program to catch only the exceptions caused by a file not existing.

Everything stays the same except the handler, which we modified to only catch a certain group of I/O exceptions. Here we used two new functions from System.IO.Error — isDoesNotExistError and ioError . isDoesNotExistError is a predicate over IOError s, which means that it's a function that takes an IOError and returns a True or False , meaning it has a type of isDoesNotExistError :: IOError -> Bool . We use it on the exception that gets passed to our handler to see if it's an error caused by a file not existing. We use guard syntax here, but we could have also used an if else . If it's not caused by a file not existing, we re-throw the exception that was passed by the handler with the ioError function. It has a type of ioError :: IOException -> IO a , so it takes an IOError and produces an I/O action that will throw it. The I/O action has a type of IO a , because it never actually yields a result, so it can act as IO anything .

So the exception thrown in the toTry I/O action that we glued together with a do block isn't caused by a file existing, toTry `catch` handler will catch that and then re-throw it. Pretty cool, huh?

There are several predicates that act on IOError and if a guard doesn't evaluate to True , evaluation falls through to the next guard. The predicates that act on IOError are:

isAlreadyExistsError
isDoesNotExistError
isAlreadyInUseError
isFullError
isIllegalOperation
isPermissionError
isUserError

Most of these are pretty self-explanatory. isUserError evaluates to True when we use the function userError to make the exception, which is used for making exceptions from our code and equipping them with a string. For instance, you can do ioError $ userError "remote computer unplugged!" , although It's prefered you use types like Either and Maybe to express possible failure instead of throwing exceptions yourself with userError .

So you could have a handler that looks something like this:

Where notifyCops and freeSomeSpace are some I/O actions that you define. Be sure to re-throw exceptions if they don't match any of your criteria, otherwise you're causing your program to fail silently in some cases where it shouldn't.

In the guard where isDoesNotExistError is True , we used a case expression to call ioeGetFileName with e and then pattern match against the Maybe value that it returned. Using case expressions is commonly used when you want to pattern match against something without bringing in a new function.

You don't have to use one handler to catch exceptions in your whole I/O part. You can just cover certain parts of your I/O code with catch or you can cover several of them with catch and use different handlers for them, like so:

Here, toTry uses handler1 as the handler and thenTryThis uses handler2 . launchRockets isn't a parameter to catch , so whichever exceptions it might throw will likely crash our program, unless launchRockets uses catch internally to handle its own exceptions. Of course toTry , thenTryThis and launchRockets are I/O actions that have been glued together using do syntax and hypothetically defined somewhere else. This is kind of similar to try-catch blocks of other languages, where you can surround your whole program in a single try-catch or you can use a more fine-grained approach and use different ones in different parts of your code to control what kind of error handling happens where.

Now you know how to deal with I/O exceptions! Throwing exceptions from pure code and dealing with them hasn't been covered here, mainly because, like we said, Haskell offers much better ways to indicate errors than reverting to I/O to catch them. Even when glueing together I/O actions that might fail, I prefer to have their type be something like IO (Either a b) , meaning that they're normal I/O actions but the result that they yield when performed is of type Either a b , meaning it's either Left a or Right b .

Using Project Execution Management

Variable Hours Assignment for Resource Requests

If you don't need the resource to work on the project full-time, you can request or assign one or more resources for variable hours on a project.

You can assign the resource to work various hours some or all days of the week (including non-working days) or request for number of hours per week with flexible days to meet the schedule requirements. This helps you schedule resources better.

Specify variable hours in the requested hours to create or update a resource request, assign resources for weekly pattern that's repeated for the duration of the assignment or the request. You can also adjust resource assignment schedule to add variable hours or weekly hours.

If you request for a particular number of hours per week, the resource managers can review available capacity based on the resource's total weekly hours available. For example, if a request is for 25 hours each week, they can assign any resource that has 25 hours of remaining capacity for each week to fulfill the request.

Available capacity calculation is based on the requested hours. If the request is for weekly hours, the available capacity is based on the total remaining available hours a resource has each week of the requested duration. If the requested hours specifies using the project calendar hours, hours per day, or variable hours, the available capacity is based on the resource's available hours each day of the requested duration.

If you want to update the requests or assignments in group, you could adjust the hours by calendar or by working days in case the hours of the variable hours fall on non-working days.

Consider these examples:

Within a single assignment, a resource may work Mondays and Tuesdays for 8 hours each day, Thursdays and Fridays for 4 hours each day and not work any hours on Wednesdays. A project manager can request for such a resource assignment.

A project needs a system administrator to work once each week on Tuesday or Thursday to handle weekly builds. The project manager requests a resource for Thursdays 4 hours per day for the duration of the project. The resource manager adjusts the daily hours to Tuesday and assigns the resource.

A project manager raises a request for a DBA analyst for 5 hours every week. The resource manager knows that there is a flexibility of days and assigns resource whose availability score is at least 5 hours every week.

A resource manager created several assignments for a project where the work is scheduled to start on a Saturday, which is a non-working day on the project calendar. The project gets delayed by two weeks. To adjust these assignments, she can select all the impacted assignments on the Change Assignment Schedule page and adjust the start and finish dates by 14 calendar days to ensure the assignments start on the Saturday again but 2 weeks later than originally planned.

IMAGES

Haskell let
Haskell Programming Tutorial-2: Variables and Functions (English Version)
Haskell function
Adventures in Haskell
Solved: Please solve this using Haskell Haskell Haskell!!!
How to increment a variable in Haskell?

VIDEO

[Algorithm Session 01]
6 storing values in variable, assignment statement
Haskell Programming Tutorial-2: Variables and Functions (Arabic Version)
variable declaration and assignment
EDAD 633 Mock Interview Assignment
CSS Class 1, CSS introduction🔴Live

COMMENTS

Does "<-" mean assigning a variable in Haskell?
Assignment in Haskell works exclusively like the second example—declaration with initialization: You declare a variable; Haskell doesn't allow uninitialized variables, so you are required to supply a value in the declaration; There's no mutation, so the value given in the declaration will be the only value for that variable throughout its scope.
Haskell/Variables and functions
Of course, that works just fine. We can change r in the one place where it is defined, and that will automatically update the value of all the rest of the code that uses the r variable.. Real-world Haskell programs work by leaving some variables unspecified in the code. The values then get defined when the program gets data from an external file, a database, or user input.
Does Haskell have variables?
Yes, Haskell has variables. Consider the (essentially equivalent) definitions . inc n = n + 1 inc = \n -> n + 1 In both these cases, n is a variable; it will take on different values at different times. The Haskell Report, in Section 3 refers to these explicitly as variables.. That n here is a variable may be easier to see if we consider the following complete program:
Variable
The operator = is used to assign a value to a variable, e.g. phi = 1.618. The scope of such variables can be controlled by using let or where clauses. Another sense in which "variable" is used in Haskell is as formal parameters to a function. For example: add x y = x + y. x and y are formal parameters for the add function.
PDF Haskell Cheat Sheet
variables. Technically, Haskell 98 only allows one type variable, but most implementations of Haskell support so-called multi-parameter type classes, which allow more than one type variable. We can deﬁne a class which supplies a ﬂavor for a given type: class Flavor a where flavor :: a -> String Notice that the declaration only gives the ...
Demystifying Haskell assignment
Demystifying Haskell assignment. This post clarifies the distinction between <- and = in Haskell, which sometimes mystifies newcomers to the language. For example, consider the following contrived code snippet: main = do. input <- getLine. let output = input ++ "!" putStrLn output.
PDF basics
We ﬁrst note that variables in Haskell can only be assigned once, unlike in many imperative programming languages, where a variable can be overwritten with diﬀerent values arbitrarily ... While the above variable assignments are still fairly regular, patternmatching allows you not only to assign multiple variables at once, but also to ...
Variable
In Haskell, a variably is a name for some valid print. The word "variable" more applied to Haskell variables is misleading, since a given variable's rate never varies during ampere program's runtime. ... The operator = is used toward assign a value on a variable, e.g. phi = 1.618.
Variables
Welcome to part 5 of this series on Haskell for Beginners.In this video we'll learn about variables in Haskell.Why does this course exist?It's the course I w...
Getting started with Haskell
Haskell is a pure functional language. By functions, we mean mathematical functions. Variables are immutable. x = 5; x = 6 is an error, since x cannot be changed. lazy - definitions of symbols are evaluated only when needed. If you divide two variables, for instance, it will not be evaluated until you read the result.
A Gentle Introduction to Haskell: Functions
3 Functions. 3. Functions. Since Haskell is a functional language, one would expect functions to play a major role, and indeed they do. In this section, we look at several aspects of functions in Haskell. First, consider this definition of a function which adds its two arguments: add :: Integer -> Integer -> Integer. add x y = x + y.
Write Yourself a Scheme in 48 Hours/Adding Variables and Assignment
A variable lets us save the result of an expression and refer to it later. In Scheme, a variable can also be reset to new values, so that its value changes as the program executes. This presents a complication for Haskell, because the execution model is built upon functions that return values, but never change them.
How do I modify a variable in Haskell?
In Haskell, we run into trouble when we realize there's no built-in mutating assignment operation: import Data.Array writeArray = undefined size = 10-- 1. Declare the array arr:: Array ... In Haskell, later variables with the same name "shadow" earlier variables, so we can pretend that we've updated the array in-place by giving the new ...
A brief introduction to Haskell
A brief introduction to Haskell. Haskell is: A language developed by the programming languages research community. Is a lazy, purely functional language (that also has imperative features such as side effects and mutable state, along with strict evaluation) Born as an open source vehicle for programming language research.
Haskell/do notation
In Haskell, we can chain any actions as long as all of them are in the same monad. In the context of the IO monad, the actions include writing to a file, opening a network connection, or asking the user for an input. Here's the step-by-step translation of do notation to unsugared Haskell code: do { action1 -- by monad laws equivalent to: do ...
Input and Output
Input and Output. We've mentioned that Haskell is a purely functional language. Whereas in imperative languages you usually get things done by giving the computer a series of steps to execute, functional programming is more of defining what stuff is. In Haskell, a function can't change some state, like changing the contents of a variable (when ...
How can I re-assign a variable in a function in Haskell?
let elephant = 2. print elephant. The code in a do block is executed in order, so you can effectively reassign variables the way you can in most programming languages. Note that this code really just creates a new binding for elephant. The old value still exists: main = do. let elephant = 1. print elephant.
Assignment (computer science)
In Haskell, there is no variable assignment; but operations similar to assignment (like assigning to a field of an array or a field of a mutable data structure) usually evaluate to the unit type, which is represented as (). This type has only one possible value, therefore containing no information. It is typically the type of an expression that ...
conditional statements
Because reassigning is fundamentally impossible in Haskell. All variable are immutable. That is rather logical since Haskell is a lazy language and hence if objects were mutable, then the behavior would be completely unpredictable. ... In fact you never even assign, you declare (that is something related, but different).
Variable Hours Assignment for Resource Requests
You can also adjust resource assignment schedule to add variable hours or weekly hours. Note: You can update a resource request in Open status only. If you request for a particular number of hours per week, the resource managers can review available capacity based on the resource's total weekly hours available. For example, if a request is for ...
PDF Haskell Cheat Sheet Strings Enumerations
type variables. Technically, Haskell 98 only al-lows one type variable, but most implementations of Haskell support so-called multi-parameter type classes, which allow more than one type variable. We can deﬁne a class which supplies a ﬂavor for a given type: class Flavor a where flavor :: a -> String Notice that the declaration only gives ...
Assign multiple variables Haskell
5. Prelude>. You need replicate to "duplicate" a value. In this case, I duplicating 5 twice. [x, y] mean get x and y from a List. That list is [5, 5]. So, you got x = 5 and y = 5. Well, I never did such behavior in the real world Haskell but you get what you want. EDIT: We could use repeat function and the feature of lazy evaluation in the Haskell.