This lecture gives a brief introduction to developing software in Haskell, covering packaging, versioning, continuous integration, and package repositories.

Basic structure of a Haskell project

Running example: a search tree library

As a running example, we consider a toy implementation of sets as binary search trees (without balancing). We aim to package and publish this as library binary-search-tree.

The code is structured as follows:

src/Data/BST.hs: interface, hiding implementation

type BST a

-- Construction
empty     :: BST a
singleton :: a -> BST a
insert    :: a -> BST a -> BST a
fromList  :: [a] -> BST a
union     :: BST a -> BST a -> BST a

instance Semigroup (BST a)
instance Monoid    (BST a)

-- Deletion
delete :: a -> BST a -> BST a
split  :: a -> BST a -> (BST a, BST a)

-- Query
instance Foldable BST

src/Data/BST/Internal.hs: implementation, exporting everything
```
data BST a
  = Leaf
  | Node (BST a) a (BST a)
```

It is good practice to also publish the implementation so that users can implement missing functionality efficiently. However, only the interface is officially public and thus binding.

Basic packaging

Haskell software is usually packaged with Cabal.

The following file binary-search-tree.cabal is our first take on a Cabal package description for our library:

cabal-version: 2.2
name: binary-search-tree
version: 0.0.0.0

common common-build-parameters
  default-language:
    Haskell2010
  default-extensions:
    LambdaCase
    InstanceSigs
  ghc-options:
    -Wall
    -Wcompat

library
  import: common-build-parameters

  build-depends:
    , base

  hs-source-dirs:
    src

  exposed-modules:
    Data.BST
  other-modules:
    Data.BST.Internal

It contains of tree mandatory fields cabal-version, name, and version and two stanzas, a common stanza and a library stanza, each of which contains again a couple of fields.

Field cabal-version

This field goes first in the file and determines version of the syntax of things to follow. The cabal-version needs to be chosen high enough to support all syntax elements in the file. We use cabal-version 2.2 because it is the first to support common stanzas. This version is a good default.

Field name

The name of the package, same as the name of the .cabal file.

Field version

The version of the package, which is a list of natural numbers separated by dots. There is no limit on the number of positions, but only the first 4 are given a meaning in the Haskell Package Versioning Policy, PVP. Versions are lexicographically ordered, so 0 < 0.0 < 0.0.0 < 0.0.0.0 < ... < 0.1 < 0.1.0 < ... < 0.1.1 etc. It is sometimes surprising that 0 is not the same as 0.0, because we are used to think in terms of decimal numbers. To prevent confusions, it is best practice to either

fix version numbers to 3 or 4 positions, or
not use trailing 0s.

We opt for the first alternative here, choosing 0.0.0.0.

The PVP and correct versioning will be discussed in section Dependencies and versioning.

Stanza common

We define here a common stanza common-build-parameters that we import in the following library stanza. Importing means basically copying the fields of the common stanza. Currently we import the common-build-parameters only once, so its rather pointless to put them into a common stanza. However, we aim to reuse the same parameters in further package components later.

Stanza library

The library stanza defines the default component of our package binary-search-trees: a library exporting the modules Data.BST and shipping the hidden module Data.BST.Internal. These modules are located in subdirectory src, as we specify in field hs-source-dirs. We could put several source directories there and divide our library code up into these source directories, but this is seldom necessary. It is best practice to not place the source files directly at the project root.

Any modules we wish to export by the library must be listed explicitly under exposed-modules. There is no automatic population of this field from the contents of the hs-source-dirs. (Tools like hpack do this, though.) Modules needed for compilation but not exported must be listed under other-modules. We placed Data.BST.Internal there to hide the implementation.

Any library our code depends on must be listed under build-depends. We currently only depend on the Haskell standard library base which is shipped with GHC. Nevertheless, it must be mentioned. We did not state yet which versions of base would work with our code. Let us put this off until section Dependencies and versioning.

Finally, we need to specify the version of the Haskell language itself we are using in our code. These are the fields in common-build-parameters:

default-language sets the base version of Haskell defined by one of the standards Haskell98, Haskell2010 or GHC2021. Our choice of Haskell2010 is a good default (not too old and not to new).
We can modify the Haskell language by turning on some LANGUAGE extensions for our project in default-extensions. Our code needs LambdaCase and InstanceSigs.
Further modification to compilation can be made by passing ghc-options. We turn on the common warning sets -Wall and -Wcompat to ensure good code quality and forward compatibility.

Lots of boring details?

It seems there is a lot of administrative fuss going on in Cabal packaging. You haven't seen nothing yet!

One reason for the painstaking detail required in .cabal files is to ensure building independent of the local environment. Our library might be used by others having other GHCs and libraries installed, and it should build in their environment out-of-the-box.

Another reason is that Haskell itself does not have the concept of a package (only that of a module). Thus, packages have to be defined externally, and Cabal provides the standard way to do this.

Finally, Haskell, even when starting out first-in-class with the Haskell 98 report, has not seen consequent standardization subsequently, with no standard produced after Haskell 2010. With Haskell 2020 stalled, only the GHC2021 language extension collection was published. This (or a successor) might well be the source of the next standard, but in the meantime, we stick to Haskell2010 and pick our favorite LANGUAGE extensions manually. (Note that GHC2021 is only available from GHC 9.0 up, and does not include my favorite extension LambdaCase.)

Interactive development with `cabal repl`

$ cabal repl
...
GHCi, version ...
[1 of 2] Compiling Data.BST.Internal ( src/Data/BST/Internal.hs, interpreted )
[2 of 2] Compiling Data.BST          ( src/Data/BST.hs, interpreted )
Ok, two modules loaded.

ghci> :m + Data.BST

ghci> :r
[1 of 2] Compiling Data.BST.Internal ( src/Data/BST/Internal.hs, interpreted ) [Source file changed]
[2 of 2] Compiling Data.BST         ( src/Data/BST.hs, interpreted ) [Source file changed]
Ok, two modules loaded.

ghci> let t :: BST Int = fromList [5,2,4,3]

:4:5: error:
    Illegal type signature: ‘BST Int’
      Type signatures are only allowed in patterns with ScopedTypeVariables

ghci> :set -XScopedTypeVariables

ghci> let t :: BST Int = fromList [5,2,4,3]

ghci> t
Node (Node Leaf 2 Leaf) 3 (Node Leaf 4 (Node Leaf 5 Leaf))

ghci> toList t

:8:1: error: [GHC-88464]
    Variable not in scope: toList :: BST Int -> t

ghci> :m + Data.Foldable

ghci> toList t
[2,3,4,5]

Testsuites

A Cabal package can and should contain test suites that run with cabal test.

The package tasty is a framework to organize and run tests.

Unit tests

These are handwritten tests to e.g. check the output of a function for a given input.

The package tasty-hunit provides an API for constructing unit tests.

E.g. UnitTests.hs:

import Data.BST.Internal

import Data.Bifunctor        (bimap)
import Data.Foldable         (toList)

import Test.Tasty            (TestTree, defaultMain)
import Test.Tasty.HUnit      (testCase, assertEqual)

main :: IO ()
main = defaultMain tests

tests :: TestTree
tests = testCase "split" $ do
  assertEqual ""
    (bimap toList toList $ split "dog" $ fromList ["cat", "dog", "tiger", "wolf"])
    (["cat"], ["tiger", "wolf"])

We use these API functions:

assertEqual :: String -> a -> a -> Assertion
testCase    :: Name -> Assertion -> TestTree
defaultMain :: TestTree -> IO ()

We would like to add this testsuite to our project so that cabal test runs it.

The problem is that the module Data.BST.Internal is not exported by our library, so we have to find ways how to import it in our test suite.

Test suite includes internal modules as source

UnitTests.hs can be added as cabal test-suite to our .cabal package file:

test-suite unittests
  import: common-build-parameters

  type: exitcode-stdio-1.0

  hs-source-dirs:
    test
    src

  main-is:
    UnitTests.hs

  other-modules:
    Data.BST.Internal

  build-depends:
    -- inherited bounds
    , base
    -- new dependencies, need bounds
    , tasty               >= 1.1.0.4 && < 1.6
    , tasty-hunit         >= 0.10    && < 0.11

Now the module Data.BST.Internal is shared between the library and the test-suite.

A drawback is that it is compiled twice: once for the library and once for the test suite.

Put internal modules into a internal library

We can move Data.BST.Internal our of the main library into an internal library bst-internal, which gets imported by the test suite.

-- Internal library containing the implementation

library bst-internal
  import: common-build-parameters

  hs-source-dirs:
    src

  exposed-modules:
    Data.BST.Internal

  build-depends:
    , base                >= 4.12    && < 5

-- Library exporting the API

library
  import: common-build-parameters

  hs-source-dirs:
    lib

  exposed-modules:
    Data.BST

  build-depends:
    , base
    , bst-internal

-- Testsuites

test-suite unittests
  import: common-build-parameters

  type: exitcode-stdio-1.0

  hs-source-dirs:
    test

  main-is:
    UnitTests.hs

  build-depends:
    -- inherited bounds
    , base
    , bst-internal
    -- new dependencies, need bounds
    , tasty               >= 1.1.0.4 && < 1.6
    , tasty-hunit         >= 0.10    && < 0.11

Now each module is only compiled once.

Drawback: internal modules still have hiccups in the tooling (cabal and stack). While stack repl works here (but not always), cabal repl complains that GHCi cannot load two libraries. We need to invoke it via cabal repl bst-internal.

Property tests

Using QuickCheck we can random-test some laws of our binary search tree operations.

Building up QuickCheckTests.hs...

Generate random data

We can use our fromList to generate well-formed trees from lists.

import Test.QuickCheck

instance (Arbitrary a, Ord a) => Arbitrary (BST a) where
  arbitrary = fromList <$> arbitrary

Properties

We formulate some simple properties about membership after insertion or deletion.

prop_member_insert x s = member x (insert x s)
prop_member_delete x s = not $ member x (delete x s)

Equational laws

We consider two trees equal if they contain the same elements:

(~~) :: Ord a => BST a -> BST a -> Bool
(~~) = (==) `on` toList

This allows us to write some equational laws, e.g., that the union of trees makes an idempotent commutative monoid:

prop_union_idem  s        = s `union` s ~~ s
prop_union_comm  s1 s2    = s1 `union` s2 ~~ s2 `union` s1
prop_union_assoc s1 s2 s3 = s1 `union` (s2 `union` s3) ~~ (s1 `union` s2) `union` s3

prop_union_unit_left  s   = empty `union` s ~~ s
prop_union_unit_right s   = s `union` empty ~~ s

Constructing a testsuite

The tasty-quickcheck package provides some convenience to build a TestTree from properties.

{-# LANGUAGE TemplateHaskell #-}

import Test.Tasty            (TestTree, defaultMain)
import Test.Tasty.QuickCheck (testProperties)

prop_...  -- Our properties start with the prefix 'prop_'.

-- Placing a Template Haskell instruction here forces GHC to finish checking the code above.
-- This makes the splice $allProperties work.
-- The instruction 'return []' inserts an empty list of declarations, thus, does nothing.
return []

tests :: TestTree
tests = testProperties "Tests" $allProperties

main :: IO ()
main = defaultMain tests

This uses:

allProperties  :: Q Exp
testProperties :: TestName -> [(String, Property)] -> TestTree

NB: Q is the quotation monad from Template Haskell.

We add QuickCheckTests.hs as a cabal testsuite:

test-suite quickcheck
  import: common-build-parameters

  type: exitcode-stdio-1.0

  hs-source-dirs:
    test

  main-is:
    QuickCheckTests.hs

  build-depends:
    -- inherited bounds
    , base
    , bst-internal
    -- new dependencies, need bounds
    , QuickCheck          >= 2.11.3  && < 2.15
    , tasty               >= 1.1.0.4 && < 1.6
    , tasty-quickcheck    >= 0.10    && < 0.11

  ghc-options:
    -Wno-orphans
    -Wno-missing-signatures

The last field turns off warnings for the orphan instance Arbitrary (BST a) and the missing signatures for the prop_... definitions. (We could also place them in the file using the {-# OPTIONS_GHC ... #-} pragma.)

Shrinking counterexamples

If we implement the method shrink :: a -> [a] in our instance Arbitrary, quickcheck will try to reduce the counterexamples it found against properties:

If x is a counterexample, try if any of shrink x is a counterexample as well.
If yes, continue like this recursively.

instance (Arbitrary a, Ord a) => Arbitrary (BST a) where
  arbitrary = ...

  shrink :: BST a -> [BST a]
  shrink = \case
    Leaf -> []
    Node l a r -> concat
      [ [ Leaf ] -- The empty tree (optional, aggressive shrinking).
      , [ l, r ] -- The immediate subtrees.
      , [ Node l' a' r' | (l', a', r') <- shrink (l, a, r) ]
                 -- Shrinking in the subtrees.
      ]

(Adapted from the shrink doc.)

Further stuff

The CoArbitrary instance would allow us to generate random functions:

instance CoArbitrary a => CoArbitrary (BST a) where
  coarbitrary = coarbitrary . toList

Doctests

Unit tests can be included in haddock comments for documentation:

-- | Insert into binary search tree.
--
-- Example:
--
-- >>> toList $ insert "baz" $ fromList ["bar","foo"]
-- ["bar","baz","foo"]
--

Such a unit test needs to be separated from the surrounding documentation by empty lines. After the prompt >>> we supply the Haskell expression, the following lines contain the Showed result.

Humans learn easily from examples, thus, this is good documentation to start with.

The doctest tool automatically extracts such unit tests and runs them, comparing the obtained result to the expected result. Benefits of checking the examples in the documentation:

Provide unit tests for the package.
Prevent documentation from rotting.
Adding doctests is less boilerplate than using e.g. tasty-hunit.

There are several ways how to run the doctests.

1. Doctests as cabal test-suite

Adding a doctest suite to the .cabal file:

test-suite doctest
  import: common-build-parameters

  type: exitcode-stdio-1.0

  hs-source-dirs:
    test

  main-is:
    DocTests.hs

  build-depends:
    -- inherited bounds
    , base
    -- new dependencies, need bounds
    , doctest >= 0.22 && < 0.23

We need a stub DocTests.hs that runs the doctest function on the files that contain tests:

import Test.DocTest    ( doctest )

main :: IO ()
main = doctest $ concat
  [ map ("-X" ++)
    -- All language extensions needed by the tests
    [ "LambdaCase"
    , "InstanceSigs"
    ]
    -- Files containing doctests
  , [ "src/Data/BST/Internal.hs"
    ]
  ]

The doctest library uses the ghc library to extract tests and expected results from the haddock comments.

Unfortunately, the information in the .cabal file is not used automatically. We have to pass the language extensions manually, as well as the list of source files. Also, the source files may need $setup code e.g. to make imports available to the doctests.

Evaluation of calling the doctest library function:

Drawback: violates DRY ("don't repeat yourself"), brittle.
Advantage: runs automatically with cabal test.

2. Doctests via the `doctest` executable

We can also run the doctests via the doctest executable. This needs an installation of the doctest executable matching exactly the GHC version (see ghc -V). To use doctest with GHC x.y.z, doctest needs to be compiled with GHC x.y.z. Basically, doctest is a modified ghci.

Invocation (from project root). cabal build cabal repl --with-compiler doctest Short: cabal repl -w doctest It often makes sense to turn off warnings: cabal repl -w doctest --repl-options=-w

Evaluation of calling the doctest executable:

Drawback: needs installation of doctest.
Advantage: uses information from .cabal file, more stable.

3. Other doctest solutions

cabal-doctest: provide information from .cabal file for the doctest function call. Drawback: needs build-type: custom and custom Setup.hs.
parallel-doctest: Fork of doctest to run doctests in parallel.

Dependencies and versioning

Each dependency of our package should come with compatibility bounds.

  build-depends:
    , base                >= 4.12    && < 5
    , QuickCheck          >= 2.11.3  && < 2.15
    , tasty               >= 1.1.0.4 && < 1.6
    , tasty-quickcheck    >= 0.10    && < 0.11

The Haskell Package Versioning Policy (PVP) specifies 4 version digits/positions: edition.major.minor.patch. E.g. base for GHC-9.8.2 has version 4.19.1.0.

Each Hackage release needs a new version number. The PVP specifies what kind of version bump is required.

position	example	bump when ...
edition	4	(big) API change
major	19	API change
minor	1	API additions only
patch	0	bug and doc fixes only

The edition (my terminology) is not present in SemVer and is rather freely chosen. Radical API changes should maybe bump the edition (e.g. aeson-1 → aeson-2), but it is up to the maintainer to decide. The pair edition.major makes the major version of a package, the triple edition.major.minor the minor version.

Contracts:

Your package will continue to build if its dependencies only bump patch.
If your package only uses qualified imports, it will even continue to build when its dependencies only bump minor.

Determine upper bounds:

Use only qualified imports.
If your package builds with the latest release x.y.z.u of a dependency, set the upper bound to < x.(y+1).

Determine lower bounds:

Check when the functions you need from a dependency entered its API. E.g. all functions were present at version x.y.z.u, set the lower bound to >= x.y.z.u or >= x.y.z.
Optionally raise the lower bound if your other deps need a newer version of this dep.

Special dependency base:

Each major GHC version includes a new major version of base.
base serves as a proxy for the GHC version.
Actual changes in base are usually minor.

Problems of upper bounds

We cannot see the future. We cannot know now whether a future major bump of a dependency will break our build. While the meaning of lower bounds is unambiguous, we can interpret upper bounds in two ways:

The "pessimistic" interpretation of an upper bound: dep < version means that we have no evidence that building with dep ≥ version will succeed.
The "optimistic" interpretation of an upper bound: dep < version means that we have evidence that building with dep ≥ version fails.

The pessimistic interpretation guarantees the PVP, but may require lots of upward revisions of upper bounds.

The optimistic interpretation might lead to broken builds which need downward revisions of upper bounds.

Pragmatically, some maintainers do not specify upper bounds or just edition level ones like base < 5. This may require quick intervention to repair build failures when new versions of dependencies are released.

Stack solves the problem of dependency compatibility by defining Hackage snapshots.

Stackage

Stackage defines snapshots of its package collection. A snapshot is named e.g. lts-21.25 or nightly-2024-02-24 and contains a single version of each package in the collection.

A nightly snapshot usually contains the latest versions from Hackage with exceptions (e.g. if a dependencies is so new that packages using it have not caught up to it). Often nightly tracks the latest GHC.

A lts (Long Term Service) snapshot tracks usually an older, established version of GHC.

A snapshot is specified in a stack.yaml file (only mandatory entry):

resolver: lts-21.25

A stack.yaml file can also add or replace dependencies in a snapshot, using extra-deps:

extra-deps:
- my-cool-pkg-0.2.0.1

stack serves as an alternative to cabal and provides a similar command set, e.g.

stack build
stack install
stack test

Continuous integration

Continuous integration (CI) can be utilized to regularly build and test a project in various environments (e.g. different OSs, different GHC version, different stackage snapshots).

E.g. https://github.com donates free CI to open source developments.

Haskell-CI wizard

haskell-ci autogenerates a multi-GHC CI script for Ubuntu machines from the tested-with field in the .cabal file.

tested-with:
  GHC == 9.8.2
  GHC == 9.6.4
  GHC == 9.4.8
  GHC == 9.2.8
  GHC == 9.0.2
  GHC == 8.10.7
  GHC == 8.8.4
  GHC == 8.6.5

Configuration can be given on the command line or in cabal.haskell-ci.

$ haskell-ci github binary-search-tree.cabal
*INFO* Generating GitHub config for testing for GHC versions:
8.4.4 8.6.5 8.8.4 8.10.7 9.0.2 9.2.8 9.4.8 9.6.4 9.8.2

This generates .github/workflows/haskell-ci.yml.

Writing a simple Stack CI

A simple CI using stack could look like this .github/workflows/simple-stack.yml:

name: Simple Stack CI

jobs:
  build:
    name: Stack Simple CI
    runs-on: ubuntu-latest

    steps:

    - name: Checkout sources from repository
      uses: actions/checkout@v4

    - name: Setup haskell for use with stack
      uses: haskell-actions/setup@v2
      with:
        ghc-version: '9.4.8'
        enable-stack: true
        cabal-update: false

    - name: Build
      run:  stack --system-ghc --stack-yaml=stack-9.4.8.yaml build

    - name: Test
      run:  stack --system-ghc --stack-yaml=stack-9.4.8.yaml test

    - name: Build docs
      run:  stack --system-ghc --stack-yaml=stack-9.4.8.yaml haddock

This defines a single job with identifier build that runs on Ubuntu Linux and performs the following steps:

Check out the default branch of the repository.
Install GHC 9.4.8 and Stack via the https://github.com/haskell-actions/setup action.
Run stack build using the installed GHC and the resolver as specified in stack-9.4.8.yaml.
Run stack test.
Build the documentation via stack haddock.

Before running the script, it makes sense to check for errors using actionlint.

We might test on several GHCs. To this end, we can use build matrices:

A CI using a build matrix

We extend the previous CI to run on different OSs and different GHCs. To this end, we utilize the matrix strategy feature:

jobs:
  build:
    name: Stack ${{ matrix.ghc }} ${{ matrix.os }}
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest]
        ghc: ['9.8.1', '9.6.4', '9.4.8', '8.6.5']
        include:
        - os: windows-latest
          ghc: '9.8.1'
        - os: macos-latest
          ghc: '9.8.1'

    steps:

    - name: Checkout sources from repository
      uses: actions/checkout@v4

    - name: Setup haskell for use with stack
      uses: haskell-actions/setup@v2
      with:
        ghc-version: ${{ matrix.ghc }}
        enable-stack: true
        cabal-update: false

    - name: Build
      run:  stack --system-ghc --stack-yaml=stack-${{ matrix.ghc }}.yaml build

    - name: Test
      run:  stack --system-ghc --stack-yaml=stack-${{ matrix.ghc }}.yaml test

    - name: Build docs
      run:  stack --system-ghc --stack-yaml=stack-${{ matrix.ghc }}.yaml haddock

This will run 6 copies of the build job, four on ubuntu-latest and one each on windows-latest and macos-latest. Setting fail-fast: false will run each job to completion even if some of them fail. The variables are referred to by matrix.os and matrix.ghc and their value can be used inside ${{ ... }} escapes.

In larger projects it might make sense to cache the build products, especially when building takes long.

CI with caching

We add caching of the global and local stack working directories. To this end, we give the setup step an identifier via id: setup. The setup action has outputs stack-version and ghc-version that we can use as parts of the cache key. The other components are the OS (runner.os) and the hash of the commit we are testing (github.sha).

After setting up Haskell, we attempt to restore the cache from a previous CI run using the actions/cache/restore action. Of course, since we include the commit hash in the cache key, a cache entry cannot be found, and we fall back to one of our restore-keys. In fact, we have just one backup key that is composed of OS, Stack and GHC version. We restore the global stack directory as provided by the setup action as stack-root output, and the local .stack-work directory.

After building, we cache these directories under the full key which we copy from the restore step via the output cache-primary-key (we could also have repeated the key construction, but DRY). The special condition always() makes sure the step executes even if one of the previous steps failed. This way, we can salvage partial work in building the project.


    steps:

    - name: Checkout sources from repository
      uses: actions/checkout@v4

    - name: Setup haskell for use with stack
      id:   setup
      uses: haskell-actions/setup@v2
      with:
        ghc-version: ${{ matrix.ghc }}
        enable-stack: true
        cabal-update: false

    - name: Restore cache
      id:   cache
      uses: actions/cache/restore@v4
      env:
        key: ${{ runner.os }}-stack-${{ steps.setup.outputs.stack-version }}-ghc-${{ steps.setup.outputs.ghc-version }}
      with:
        key: ${{ env.key }}-hash-${{ github.sha }}
        restore-keys: ${{ env.key }}
        path: |
          ${{ steps.setup.outputs.stack-root }}
          .stack-work

    - name: Build
      run:  stack --system-ghc --stack-yaml=stack-${{ matrix.ghc }}.yaml build

    - name: Test
      run:  stack --system-ghc --stack-yaml=stack-${{ matrix.ghc }}.yaml test

    - name: Build docs
      run:  stack --system-ghc --stack-yaml=stack-${{ matrix.ghc }}.yaml haddock

    - name: Save cache
      if:   always()
        ## Save cache regardless of whether on of the steps failed
      uses: actions/cache/save@v4
      with:
        key: ${{ steps.cache.outputs.cache-primary-key }}
        path: |
          ${{ steps.setup.outputs.stack-root }}
          .stack-work

Github workflows 101:

Each job may run on a different machine. To communicate result from one job to the next, upload artifacts!
Each step runs in a new process instance. In particular, changes to the environment are not preserved. Results can be communicated from one step to later ones:
- By outputs.
- Via the file system.
- Writing to the special files ${GITHUB_ENV} (to add environment settings) and ${GITHUB_PATH} (to add to the PATH).
Some Haskell libraries bind to system libraries you might need to install first.

Resilient and maintainable code

Software evolves. What style of coding leads to maintainable code?

Modelling data

Make custom and tight-fitting data structures!

Prefer `newtype` over type synonyms

-- | Reversed list
type SnocList a = [a]

reverseAppend :: [a] -> SoncList a -> SoncList a
reverseAppend (x:xs) ys = reverseAppend xs (x:ys)
...

Problems:

We might accidentially supply a list instead of a reverse list, the type checker will not notice.
Type synonym gets unfolded, thus, type errors might speak be about lists [...] instead of SnocList ....

{-# LANGUAGE GeneralisedNewtypeDeriving #-}

-- | Reversed list
newtype SnocList a = SnocList { theSnocList :: [a] }
  deriving (Eq, Ord, Show, Functor, Foldable, Traversable)

reverseAppend :: [a] -> SoncList a -> SoncList a
reverseAppend xs (SnocList ys) = SnocList (revApp xs ys)
  where
    revApp = ...

Safe: no mixup between [a] and SnocList a.
No performance penalty, at runtime SnocList a is just [a], the wrappers
```
SnocList :: [a] -> SnocList a
theSnocList :: SnocList a -> [a]
```
are identity functions and optimised away.
Cost: needs some boilerplate code.

Real-world example from the Agda code base: commit refactoring a type into newtype

Meaningful numeric types

Give meaning to numeric types (example: turtle graphics):

Bad: direct use of numeric type.

type Program

right   :: Double -> Program
forward :: Double -> Program

Better: type synonyms.

type Angle    = Double
type Distance = Double

right   :: Angle    -> Program
forward :: Distance -> Program

Diligent: newtype.

newtype Angle = Angle { angleToDouble :: Double }
  deriving (Num, ...)
newtype Distance = Distance { distanceToDouble :: Double }
  deriving (Num, ...)

But: Haskell's numeric classes are not the pinnacle of design.

Angle and distance are (1-dimensional) vector spaces: we have addition and scalar multiplication, but not general multiplication:

(*) :: Num a => a -> a -> a
  -- makes no sense for 'Angle' and 'Distance'

Expert: account for different angle units.

type Angle
degreeAngle :: Double -> Angle
radianAngle :: Double -> Angle

Meaningful booleans

Make your own copies of Bool.

Bad: direct use of Bool.

data TurtleState = TurtleState {
  ...
  , penState :: Bool -- ^ 'True' means drawing, 'False' not drawing.
  ...
  }

setPen :: Bool -> TurtleState -> TurtleState

Better: type synonym.

data TurtleState = TurtleState {
  ...
  , penState :: PenState
  ...
  }

-- | 'True' means drawing, 'False' not drawing.
type PenState = Bool

setPen :: PenState -> TurtleState -> TurtleState

Recommended: custom data type.

data PenState
  = PenUp   -- ^ Not drawing.
  | PenDown -- ^ Drawing.

Alternative: newtype PenState = PenState Bool.

Advantages:

code is easier to comprehend
code is searchable (e.g. for PenUp; try searching for False)

Custom `data` rather than `Maybe` / `Either` / tuples

Unreasonable:

completePersonData :: Either String (String, String) -> (String, String, String)

Reasonable: custom data structures, use Text.

import Data.Text (Text)

newtype PersonNumber = PersonNumber Text
data Name = Name
  { firstName :: Text
  , lastName  :: Text
  }
data PersonData = PersonData
  { personName   :: Name
  , personNumber :: PersonNumber
  }

completePersonData :: Either PersonNumber Name -> PersonData

Either is fine here.

But:

completePersonData :: Either PersonNumber (Either Email Name) -> PersonData

Not fine! Rather:

data PersonIdentification
  = ByPersonNumber PersonNumber
  | ByEmail        Email
  | ByName         Name

Pattern matching instead of projections

As Haskell software evolves, new constructors may be added to data types or new fields to constructors.

Using the coverage checker {-# OPTIONS_GHC -Wincomplete-patterns #-} helps detecting functions that need to be extended to handle the new cases or situation.

However, this works only if we implement functions in a certain style:

Use pattern matching instead of selectors.
Avoid catch-alls, spell out all cases.
Spell out all constructor arguments.

Example:

data BST a
  = Leaf
  | Node (BST a) a (BST a)

isNode :: BST a -> Maybe (BST a, a, BST a)
isNode = \case
  Node l a r -> Just (l,a,r)
  _ -> Nothing

insert :: a -> BST a -> BST a
insert k t
  | Just (l, p, r) <- isNode t =
      ...
  | otherwise =
      Node Leaf k Leaf

If we extend BST we are not informed that we need to update insert:

data BST a
  ...
  | LeafNode a

insert will now lose LeafNodes.

Rewriting insert to match on each case prevents such bit-rot.

E.g. https://github.com/agda/agda/commit/9702781671c621e638e905fbec2a6a2a1083613e

Documentation and style

Document:

Each data type.

The purpose of the type.
Each of its constructors.
Each of the constructor fields.

-- | Binary search trees: ordered, not necessarily balanced trees.
data BST a
    -- | Empty search tree.
  = Leaf
    -- | Binary node in search tree.
  | Node
      (BST a)  -- ^ Left subtree, with elements ≤ label.
      a        -- ^ Node label (pivot element).
      (BST a)  -- ^ Right subtree, with element ≥ label.

Each function.

The purpose of the function.
Maybe an example (can be doctested).
The role of each argument and preconditions.
The result and postconditions.

-- | Create a binary search tree from a list.
fromList :: Ord a
  => [a]   -- ^ List to convert into tree.
  -> BST a -- ^ Tree, not containing duplicates.

Style:

Haskell is very expressive. One can write very abstract code. Don't overdo it!
Pointfree style. Use modestly! Excessive point-free style makes code unreadable: flip . ((flip . (flip .)) .) vs. \f a b c d -> f b c d a.

Some advice:

https://github.com/andreasabel/haskell-style-guide (not recently updated)
https://wiki.haskell.org/Pointfree