Notebook

University of Zagreb
Faculty of Electrical Engineering and Computing

Programming in Haskell

Academic Year 2014/2015

Arrays and Imperative Programming in Haskell¶

Version 1.0

We will be discussing arrays, a fundementally imperative structure, as well as programming in the imperative style in general. While these might not be very idiomatic to the language, Haskell implements them and makes (some versions of) them quite usable.

Arrays¶

We are all familiar with arrays from other, imperative, languages. They are data structures - containers - the elements of which we can access in $\mathcal{O}(1)$ (actually $\theta(1)$). Unlike lists, however, they are not recursive structures. The best we can get with recursion are Sequences, based on finger trees, that have a time complexity ranging from $\theta(1)$ to $\theta(\log n)$. We need to look elsewhere.

We can divide arrays offered by Haskell into four broad categories. They are given here, with an array belonging to each pair.

	Immutable
Boxed	`Array`	`IOArray`
Unboxed	`UArray`	`IOUArray`

These are hardly the only representatives of these classes. There also exist:

DiffArray (Immutable Boxed)
DiffUArray (Immutable Unboxed)
StorableArray (Mutable Unboxed)
STArray (Mutable Boxed)
STUArray (Mutable Unboxed)

The difference between immutable and mutable arrays in quite simple - one cannot change an immutable operator, but can only create a new instance. This does not hold for mutable arrays - we can freely set values. You might, however, notice that our representative mutable array is wrapped in IO. This is because mutating values is clearly impure and has to be wrapped. A similar thing occurs with the STArray family, but they are not wrapped in IO, but in something called ST.

The difference between boxed and unboxed arrays is a bit more subtle. The easiest way to explain them is probably by comparing them to C arrays. We can compare Boxed arrays to C arrays of pointers. The "elements" have a constant size - the size of a pointer - no matter their actual size, like void*[]. However, there is also the overhead of boxing and unboxing them. The Unboxed arrays can be compared to C arrays of primitives like char[] or int[]. We don't have to bother with boxing and unboxing at all and that makes them a lot more efficient.

So, why would we ever use boxed arrays?

Well, quite simply, we can only put some types in unboxed arrays. Specifically, only plain values of fixed size like Int, Char, Bool, ...

Furthermore, boxed arrays do have some advantages - lazy evaluation, for example.

Both Array and UArray implement the same ~~horrible~~ "interface".

Data.Array.IArray¶

-- | Constructors
array :: (IArray a e, Ix i) => (i, i) -> [(i, e)] -> a i e

listArray :: (IArray a e, Ix i) => (i, i) -> [e] -> a i e

accumArray :: (IArray a d, Ix i) => (d -> e -> d) -> d -> (i, i) -> [(i, e)] -> a i d

-- | Accessors
(!) :: (IArray a e, Ix i) => a e -> i -> e

indices :: (IArray a e, Ix i) => a i e -> [i]

elems :: (IArray a e, Ix i) => a i e -> [e]

assocs :: (IArray a e, Ix i) => a i e -> [(i, e)]

-- | Incremental updates
(//) :: (IArray a e, Ix i) => a i e -> [(i, e)] -> a i e

accum :: (IArray a e, Ix i) => (e -> e' -> e) -> a i e -> [(i, e')] -> a i e

-- | Mapping
amap :: (IArray a e', IArray a e, Ix i) => (e' -> e) -> a i e' -> a i e

ixmap :: (IArray a e, Ix i, Ix j) -> (i, i) -> (i -> j) -> a j e -> a i e

-- | Other
bounds :: (IArray a e, Ix i) => a i e -> (i, i)

Let's give it a go:

In [1]:

import Data.Array

a = array (0, 2) [(0, id), (1, (+1)), (2, (^2))]
b = listArray (0, 4) [5..1]
c = accumArray (*) 2 (0, 1) [(0, 1), (1, 2)]

e = c ! 1

e

(a ! 2) e

In [2]:

indices b

elems c

assocs c

bounds a

[0,1,2,3,4]

[2,4]

[(0,2),(1,4)]

(0,2)

In [3]:

d = c // [(0, 7)]

elems d

f = accum (-) d (assocs d)

assocs f

[7,4]

[(0,0),(1,0)]

It works exactly the same for UArray, as it implements the same typeclass (the "interface" mentioned before) except that its types are more limited and it's actually more efficient under the hood.

Please never use these.

There is a far more usable implementation of arrays under a different name and you've already seen it in one of the homework assignments. Any ideas?

Data.Vector¶

Data.Vector and Data.Vector.Unboxed offer a richer, more list-like interface. However, there is one crucial difference between instances of IArray and Vector - Vectors are unbounded arrays. If you want them to be bounded, you have to write a wrapper around them. What they are, however, is usable and quite intuitive.

Let's see what the unboxed version can do:

In [4]:

import Data.Vector.Unboxed
import Prelude hiding (init, sum, elem, map)
import Data.Array hiding ((!), (//))

v = empty :: (Vector Int)
w = 0 `cons` v `snoc` 1

w

v' = fromList [1..10]

init v'

sum v'

3 `elem` v

3 `elem` v'

w' = map (`div` 2) v' :: (Vector Int)

w' ! 2

slice 3 5 w'

fromList [0,1]

fromList [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0]

55.0

False

True

fromList [2,2,3,3,4]

You will probably agree Data.Vector trumps Data.Array as far as usability goes. Unless you have a really good reason to do otherwise, Data.Vector should be your preferred immutable array of choice.

So, we have a (somewhat) $\mathcal{O}(1)$ data structure. Why would we every use lists? Silly list users.

Well, it turns out there is a big donwside to using such a non-functional data structure in functional code.

What are the time complexities of the following functions?

(:) :: a -> [a] -> [a]

cons :: a -> Vector a -> Vector a

The answers are $\mathcal{O}(1)$ and $\mathcal{O}(n)$. Why?

Naturally, we want to avoid copying the array every time. Luckily, we can do that. That's where mutable arrays come into play. But let us first introduce the basics of imperative programming in Haskell.

Writing imperative code in Haskell¶

Some problems are pretty tricky or inefficient when solved functionally. Besides, none of your friends understand your Haskell code. The obvious solution is to just give up and write imperative code. There are, however, two very big problems with that.

Side effects
Code needs to be executed in order

Haskell doesn't do well with either of those things. There is more than one way to solve these problems, but the prevalent one has to do with monads. Don't be intimidated however, you've already worked with a bunch of monads without knowing it. Let's start out by using IO for this.

In [5]:

import Data.IORef

fooImp :: IO ()
fooImp = do
    x <- newIORef 0
    writeIORef x 1
    modifyIORef x (+1) -- Lazy!
    modifyIORef' x (+1) -- Strict!
    readIORef x >>= print
    
fooImp

There are more possible operations with IORef, but most have to deal with multithreading and such. You can look up the Data.IORef package yourselves.

IORefs enable us to use variables in the imperative context. The same goes for arrays! We can use Data.Array.IO. Mutable arrays implement a Data.Arrays.MArray interface, which is similar to the IArray interface.

Again, we deal with boxed and unboxed arrays. The same constraints as for immutable arrays apply. Let us give it a go:

In [6]:

import Data.Array.IO

fooIOArray :: IO ()
fooIOArray = do
    arr <- newArray (1, 10) 1 :: IO (IOArray Int Int) -- Bounds and initial values
    
    e <- readArray arr 1
    print e
    
    writeArray arr 2 2
    writeArray arr 3 3
    
    a <- readArray arr 2
    b <- readArray arr 3
    
    print $ a * b
    
fooIOArray

1
6

There are a few functions not used here, namely:

newListArray :: (MArray a e m, Ix i) => (i, i) -> [e] -> m (a i e)

mapArray :: (MArray a e' m, MArray a e m, Ix i) => (e' -> e) -> a i e' -> m (a i e)

mapIndices :: (MArray a e m, Ix i, Ix j) => (i, i) -> (i -> j) -> a j e -> m (a i e)

and a few others.

We can combine these to actually do something imperative. But first, let us introduce a few new impure functions. You are familiar with the map function by now. However, if we try do use map for impure actions, it will not work! The types will be wrong.

In [10]:

import Data.Vector hiding (map)
import Data.Vector.Unboxed hiding (map)
import Prelude (map)

:type putStrLn
:type map
f1 = map putStrLn ["one", "two", "three"]
f1

putStrLn :: String -> IO ()

map :: forall a b. (a -> b) -> [a] -> [b]

Unshowable:IO ()No instance for (Show (IO ())) arising from a use of `print'
Possible fix: add an instance declaration for (Show (IO ()))
In a stmt of an interactive GHCi command: print it

There are, however, two variants of map that work on impure functions. Namely, mapM and mapM_. The difference can be seen from their respective types, as shown below. Which one is suitable for the task above, instead of the mundane map?

In [12]:

import Data.Vector hiding (mapM, mapM_)
import Data.Vector.Unboxed hiding (mapM, mapM_)

:type mapM
:type mapM_

arr = mapM newIORef [1,2,3,4]
:type arr

prem = mapM_ print [1,2,3,4]
:type prem
prem

mapM :: forall a (m :: * -> *) b. Monad m => (a -> m b) -> [a] -> m [b]

mapM_ :: forall a (m :: * -> *) b. Monad m => (a -> m b) -> [a] -> m ()

arr :: forall a. Num a => IO [IORef a]

prem :: IO ()

In our case, we can simplify the type signatures. For example, for mapM it would be (only with respect to IO):

mapM :: (a -> IO b) -> [a] -> IO [b]

The two also have their variants, namely forM and forM_. The only difference being that their arguments are flipped - they take a list first and a an impure function second.

Let us use what we have learned for a simple task - implementing Bubble Sort.

In [14]:

import Control.Monad
import Data.Vector hiding (forM_, length)
import Data.Vector.Unboxed hiding (length, forM_)

bubbleSort :: [Int] -> IO [Int]
bubbleSort xs = do
    let iMax = length xs - 1
    arr <- newListArray (0, iMax) xs :: IO (IOArray Int Int)
    
    forM_ [0..iMax] $ \ _ -> 
        forM_ [0..iMax - 1] $ \ j -> do
            
            x <- readArray arr j
            y <- readArray arr $ j + 1
            
            when (x > y) $ do
                writeArray arr (j + 1) x
                writeArray arr j y
    
    getElems arr
                
bubbleSort [3,5,-1,2,8,1,0,1] >>= print

[-1,0,1,1,2,3,5,8]

Again, the functionality offered by the IOArray is quite limited. And again, Vector comes to the rescue, this time in the shape of Data.Vector.Mutable. Again, it offers quite a few functionalities and more than one instance. In particular, we are interested in the IOVector instance. Let us just see a few things we can do with it.

In [15]:

import Data.Vector.Mutable
import Prelude hiding (read)

fvec :: IO ()
fvec = do
    v <- new 10
    set v 3

    write v 1 5
    swap v 0 1
    
    a <- read v 0
    b <- read v 1
    
    print a
    print b
    
fvec

5
3

As can be seen, it's not actually much better in this case, like Vector is compared to Array. But it is slightly more intuitive. There are functionalities not mentioned here at all, but you can look it up.

It can be seen that using arrays makes sense if you are already doing something wrapped in IO and want it to be efficient. However, look at the bubble sort implementation. We don't have a single line of IO in it, yet we had to wrap it into IO to use our mutable arrays. That means that any code that uses that also has to be wrapped in IO (let us ignore unsafePerformIO). This is a serious downside. We generally want to keep most of our code pure.

However, there is a way out of that, as well.

ST Monad¶

ST is another instance of a monad. However, this is not of particular interest to us at this point, we never even said what monads are, even if we have already encountered quite a few without knowing it. What is interesting, however, is what we can do with it. In particular, we can deal with state and arrays and then go back to pure code! We just have to use the "magic" runST function. Let us use it to implement a space-efficient Fibbonaci calculator (and show the basics of working with it).

In [17]:

import Control.Monad.ST
import Data.STRef
import Data.Vector hiding (map)
import Data.Vector.Unboxed hiding (map)

fib :: Int -> Int
fib n
    | n < 2     = n
    | otherwise = runST $ do
            x <- newSTRef 0
            y <- newSTRef 1
            fibST n x y
        
    where fibST 0 x _ = readSTRef x
          fibST n x y = do
              x' <- readSTRef x
              y' <- readSTRef y
              writeSTRef x y'
              writeSTRef y $! x' + y'
              fibST (n - 1) x y


map fib [1..10]

[1,1,2,3,5,8,13,21,34,55]

Let us not delve much deeper into the arcane secrets of ST, but just show that arrays can be used in the ST monad as well. We have the STArray, mentioned at the beginning, which is very similar to the IOArray. However, mutable Vectors can be used in ST as well, we just call them STVector instead of IOVector. Let us just implement re-implement our Bubble Sort using ST and Vector, which makes it possible to insert it in otherwise pure code.

In [21]:

import Data.Vector hiding (length, forM_)
import Data.Vector.Unboxed hiding (length, thaw, fromList, freeze, forM_, toList)
import Data.Array.IO hiding (thaw, freeze)
import Prelude hiding (length, read)

bubbleSort' :: [Int] -> [Int]
bubbleSort' xs = runST $ do
    arr <- thaw . fromList $ xs
    let iMax = length arr - 1
    
    forM_ [0..iMax] $ \ _ ->
        forM_ [0..iMax - 1] $ \ j -> do
            x <- read arr j
            y <- read arr $ j + 1
            
            when (x > y) $ swap arr j $ j + 1
                
    fArr <- freeze arr
    return . toList $ fArr
    
bubbleSort' [3, 5, -1, 7, 2, 9, 0, 1, 1]

[-1,0,1,1,2,3,5,7,9]