I’ve been recently trying to learn more advanced type-level constructs in Haskell and was very happy to find this amazing talk by Prof. Stephanie Weirich about Dependent Types in haskell. This talk helped me to understand deeper a few more recent concepts introduced by some of GHC’s extensions and how to use them. In this post I want to focus a little bit in a simplified version of one of the data structures Prof. Weirich uses in her talk. She does a lot more than this, in the talk, but I’m going slowly to understand every bit of it.
The record problem is an old problem in Haskell. Succintly, Haskell’s traditional native records have lots of problems – you couldn’t reuse record names, updating record fields lead to dull boilerplate code, etc. Many of those problems are attacked by the idea of lenses (see this talk by Simon Peyton Jones to get the basics of it) and many other libraries as well as the OverloadedRecordFields extension.
Though there are many solutions attacking parts of the record problem, there’s one particular aspect of it which offers a nice opportunity to learn type level programming techniques in Haskel and are worth working out from scratch: how to create records whose type’s are aware of the fields contained in the records and their types?
That means, how to create a record type such that the when we try to access a non-existing field:
> getField "nonExistentFieldName" record
we get an actual type error in compile time. This allows us to completely rule out a whole class of bugs from our programs: we don’t need to worry about users acessing unexisting fields type of errors because this code wouldn’t even compile.
Our first attempt will be to model our records as lists of named-entries:
data Entry a = Entry String a
data Dict a = Nil | Cons (Entry a) (Dict a)
getField :: String -> Dict a -> Maybe a
getField _ Nil = Nothing
getField name (Cons (Entry name' x) dict') = case (name == name') of
True -> Just x
False -> getField name dict'
This compiles alright, but it’s not a solution to our problem. First of all, it has no information about the entry field names in the type. The type of Dict a
only carries information about the type of the values. Second, all fields must be of the same type. If you try to build something like: ''
-- this raises a type error
myRecord = Cons (Entry "name" "Rafael") (Cons (Entry "age" (35::Int)) Nil)
You’ll get an obvious type error since Cons (Entry "age" 35::Int) Nil
is a Dict Int
and Entry "name" "Rafael"
is an Entry String
, and Cons
type signature is Entry a -> Dict a -> Dict a
.
So, it seems that this is not a very useful record (:P).
Let’s try to solve the second problem first and make the type of each entry more flexible. For that we need GADTs and existential types.
The second problem is caused by the fact the we have a explicit reference to the type of the entry in the Dict
type constructor. We could try to make it more flexible like this:
data Dict a = Nil | Cons (Entry a) (Dict b)
But of course this doesn’t work because the type variable b
is not defined in this scope. There is no way for the type checker to fix it:
/.../Post.hs:5:42: error:
Not in scope: type variable ‘b’
|
5 | data Dict a = Nil | Cons (Entry a) (Dict b)
For this to work, we need to put b
in scope, without adding it as argument to the type constructor or else we’d get an infinite regress of types (I’ll get back to this later). For that we need two GHC extensions: GADTs
and Rank2Types
(or RankNTypes
, or other extension providing the forall
keyword).
GADTs
is an extension that allows us to give more generic types to the data constructors of an algebraic data type. It also allows a nicer syntax for data constructors with a long type signature. With GADTs
and RankNTypes
enabled we can do this:
{-# LANGUAGE GADTs, RankNTypes #-}
data Entry a = Entry String a
data Dict a = Nil | forall b . Cons (Entry a) (Dict b)
This compiles correctly and we can try to use it! Now our previous record is well typed:
myRecord :: Dict String
myRecord = Cons (Entry "name" "Rafael") (Cons (Entry "age" (35::Int)) Nil)
But look what happened. The information that there’s an Int
somewhere inside the structure of the record is gone! Yep. We enclosed it in a forall
and all the information Cons
have now is that its second argument is some kind of Dict b
, whatever b
is. This doesn’t look like a good sign.
Let’s try to write a getField
function. We still didn’t solve the problem of letting the type know what fields are possible, so we still need to guard ourselves against the possibility that the user will try to fetch the data from a field that doesn’t exist. So the signature of getField
still is String -> Dict a -> Maybe
… wait a minute! What’s the return type?
In the record above, if the field name is "name"
it should return a String
, but if the field name is "age"
it should return an Int
. But the compiler wouldn’t know that because there’s no information in the type of the record about the value of the fields in is tail. We only have information about the type of the head entry.
So, the return type of getField
is something like (forall b . Maybe b)
? That doesn’t look very useful. I can retrieve the value but I loose all the information about its type! This doesn’t seem to be working…
I want to get back to “infinite regress of types” I refered above. Why couldn’t we put the b
type variable above as an argument for the type constructor? Well, let’s try and see. We could create a data type Dict a b
where a
is the value of the head Entry
and b
is the type of the head of the next Dict
down the Cons
data constructor. So:
data Dict a b = Nil | Cons (Entry a) (Dict b ???)
Oops. Damn, what about the type of the entry after the next entry? Well… Let’s put it in the constructor too:
data Dict a b c = Nil | Cons (Entry a) (Dict b c ???)
You get it, right? There’s always a new type to keep track of. The type of the record must know not only the type of the head entry, but also all the types of all entries in its tail. This looks a hell like a linked list of types, doesn’t it? If we had a way to create a type level list we could have the following GADT:
data Dict (types :: (TypeLevelList Type)) where
Nil :: Dict TypeLevelEmptyList
Cons :: (Entry a) -> Dict (tail :: TypeLevelList) -> Dict (a `TypeLevelCons` tail)
Wait, what the hell is this? First of all, what are those type signatures in the wrong place? Those are kind signatures. Kind is the “type of a type constructor”. For example, type constructors that have no parameters, like Bool
or String
have kind Type
(or *
). Type constructors that take a single parameter, like Maybe
have kind Type -> Type
. Single parameter Typeclasses like Functor
or Monad
have kind Type -> Constraint
, etc.
Here I’m supposing that there exists a kind called TypeLevelList
, and that there exists two type constructors:
TypeLevelEmptyList
with kind TypeLevelList
,TypeLevelCons
with kind Type -> TypeLevelList -> TypeLevelList
.When I write data Dict (types :: TypeLevelList)
I’m declaring a type constructor Dict
with kind TypeLevelList -> Type
. This type has two data constructors:
Nil
which is just an empty record with type Dict TypeLevelEmptyList
Cons
which takes an Entry a
and a Dict TypeLevelList
and return another Dict TypeLevelList
putting a
on the head of that TypeLevelList
it received.In practice we’d have something like this:
emptyRecord :: Dict TypeLevelEmptyList
emptyRecord = Nil
agedRecord :: Dict (Int `TypeLevelCons` TypeLevelEmptyList)
agedRecord = Cons (Entry "age" 35) emptyRecord
recordWithAStringAndAnInt :: Dict (String `TypeLevelCons` Int `TypeLevelCons` TypeLevelEmptyList)
namedAndAgedRecord = Cons (Entry "name" "Rafael") agedRecord
This is sweet! We can keep track to the types of all fields! But how do we create those type level lists? :O
To create those type level lists we have to use a GHC extension called DataKinds
. To understand what DataKinds
do lets consider this simple type declaration:
data Nat = Zero | Succ Nat
What this does is to create a type constructor called Nat
, whose kind is Type
, and two data constructors:
Zero
, whose type is Nat
Succ
, whose type is Nat -> Nat
When you use the DataKinds
extension this declaration creates, besides the three objects described above, three more objects:
'Nat
(the tick is not a typo)'Zero
whose kind is 'Nat
'Succ
whose kind is 'Nat -> 'Nat
Those types constructed with those type constructors are not inhabited by values, but they are very useful for type computation. So, how do we create the “kind constructor” TypeLevelList
with type constructor TypeLevelEmptyList
and TypeLevelCons
? Exactly with the same code that we would use to create a type constructor List
with data constructor EmptyList
and Cons
, but we use the DataKinds
extension to lift those objects from the value :: type
world to the type :: kind
world. We can do:
{-# LANGUAGE GADTs, RankNTypes, DataKinds, KindSignatures #-}
module Post where
import Data.Kind (Type)
data Entry a = Entry String a
data List a = EmptyList | ListCons a (List a)
data Dict (a :: (List Type)) where
Nil :: Dict 'EmptyList
Cons :: Entry a -> Dict t -> Dict ('ListCons a t)
So, what’s happening here? First of all we have the declaration data List a = EmptyList | ListCons a (List a)
. This is a simple list type, but since we used the Data.Kinds
extension, we get a new list kind for free:
'List
is a “kind constructor” which takes a kind and return another kind (* -> *
)'EmptyList :: forall a . List a
is a type constructor'ListCons :: forall a . a -> List a -> List a
is another type constructorSo, when applied to the kind Type
, the “kind constructor” 'List
creates the kind 'List Type
which is a list of types! We can have the following types which have this kind:
'EmptyList
'ListCons Int 'EmptyList
'ListCons String (Cons Int 'EmptyList)
etc. All those types have kind 'List Type
. Those types are not inhabited (that is, we can’t construct values that have those types), but we can used them to provide compile time information that helps us to avoid bugs, because we can build type constructors that build inhabited types out of them! For example, we can build Dict
. Let’s check the kind of Dict
on GHCi:
> :k Dict
Dict :: List Type -> Type
-- the actual GHCi output is Dict :: List * -> *, but Type is a nice synonym for *
This is what’s happening with the declaration data Dict (a :: (List Type))
. We used the extension KindSignatures
to inform the compiler that the Dict
type constructor has a kind which takes an argument of kind List Type
and returns a regular Type
.
Now to the data constructors - which are the things that allows us to actually build values of type Dict a
. The simples one is Nil
which builds a value of type Dict 'EmptyList
. This is an empty record, with no values stored and thus no types stored in the type level list.
Also we have Cons
, which takes a parameter of type Entry a
and a parameter of type Dict t
(remember, here t
is a type of kind 'List Type
) and builds a value of type Dict ('ListCons a t)
. So, Cons
does two things:
Wow. If that’s too much to grasp, let’s see this in action:
namedRecord ::Dict ('ListCons String 'EmptyList)
namedRecord = (Entry "name" "Rafael") Nil
namedAndAgedRecord :: Dict ('ListCons Int ('ListCons String 'EmptyList))
namedAndAgedRecord = Cons (Entry "age" (35::Int)) namedRecord
See how the types of the fields we’re creating are concatenated in the type of the record? This allows us to know precisely the types of all the fields in a record!
We didn’t have to code our own list type, Haskell already provides one for us and fortunately Data.Kinds
works with the built-in types too. So we could have written simply:
{-# LANGUAGE GADTs, RankNTypes, DataKinds, KindSignatures, TypeInType, TypeOperators #-}
module Post where
import Data.Kind (Type)
infixr 6 :>
data Entry a = Entry String a
data Dict (a :: [Type]) where
Nil :: Dict '[]
(:>) :: Entry a -> Dict t -> Dict (a:t)
We made a few changes to make the types nicer:
We are now using Haskel’s built-in lists:
> :k Dict
Dict :: [Type] -> Type
This is completely equivalent to the previous signature List Type -> Type
the only difference is that we’re using the built-in type instead of our custom list type.
We’re using the TypeInType
extension to allow for the syntax [Type]
We’re using the TypeOperators
extension to allow for two things:
(:) :: a -> [a] -> [a]
which concatenates a type on the head of a type level list;Cons
data constructor to a nicer (:>)
infix type operator so that the expressions are nicer looking.With this modifications, instead of this ugly monster:
myRecord :: Dict ('ListCons String ('ListCons Int 'EmptyList))
myRecord = Cons (Entry "name" "Rafael") (Cons (Entry "age" 35) Nil)
we can write this:
myRecord :: Dict '[String, Int]
myRecord = Entry "name" "Rafael" :> Entry "age" 35 :> Nil
Much better, right?
Well, yep. This post is already big and we still don’t know:
getField
Dict
to allow for information about field names to be statically checked by the compiler.So it looks like a perfect point to stop and start planning to write Part 2!