Understanding Hindley–Milner
Before starting, be warned this post is heavily inspired by the video series: Type Systems: Lambda calculus to Hindley-Milner by Adam Jones. I strongly recommend you watch this if you, like me, have been struggling to get into Hindley-Milner, and type theory in general.
Having said that, Hindley-Milner is a type system for the lambda calculus. A type system is a set of rules for assigning types to expressions. They can serve to verify that annotated types are correct or to infer the types. As for lambda calculus, is a mathematical system created to reason about computation. You can think of it as a very, very small programming language.
The Hindley-Milner type system was originally described by J. Roger Hindley in 1969 for combinatory logic. Then, rediscovered by Robin Milner in 1978 for the ML programming language. Finally, Damas formalized the type system for lambda calculus in 1984.
The core value of Hindley-Milner is to support generic programming while mostly not needing type annotations. In a Hindley-Milner type system, the most general type for an expression can be inferred.
Most popular languages that use variations of this system include Haskell, OCaml, Rust, Swift, Elm and Gleam.
How this looks in practice?
The easiest way to implement a type system is to require type annotations, and then verify that inconsistencies don't exist. The other easiest way to implement a type system it's to make the language syntax directed. Syntax directed means that the type of any expressions can be derived just by looking a it. This requires less type annotations, but some expressions will still need them. For example:
list = []
list.append(1)
list.append(2)
list.append(3)
Here, list needs a type annotation to know what type of list it is, even though clearly we can see it's a list of integers.
In a Hindley-Milner type system, annotating the type of the list wouldn't be necessary. The type of list would be inferred based on how it is used. We can see that integers are appended after the declaration, so the list must be a list of integers.
The awesome things about Hindley-Milner type systems is that they make easy writing generic code and code that's easy to change while maintaining static type safety. And if you want to be explicit, you can always add type annotations later.
Formalization
Hindley-Milner consists of a set of rules about types for lambda calculus with let polymorphism. Then, an algorithm can make use of these rules to infer types. That's the case for algorithm W and algorithm M.
Lambda calculus: The language
Its syntax can be described as:
An expression can be a variable/constant, an application, an anonymous function or a let expression. Some examples of expressions are:
An application is just a function call.
On the other hand, the equivalent of a let expression in a modern programming language would be a function binding. This is what allows generic functions in a static type system.
The above expression would work well in an untyped programming language. But even if it makes sense, we could not infer the types correctly in a static type system for the previous expression. For this we must use a let expression.
If we try to infer types for the function application, we first would find that the first argument of the identity function is an , and then a , this would be a type error. On the other hand, with let expression, we found first that type of is . This means we can instantiate separately in both function applications.
Types
A type can be a type variable or a type function application.
would be a constant type, like , , , , etc. It can have any number of parameters.
Type schemes
A type scheme may consist of a type, or a a type quantified with for all. When an expression has a type with type variables, the for all () allows you to instantiate the type variables every time the expression is used.
Type context
Is where you store what type different variables have. It's a list of variable to type assignments. Is denoted generally with .
Typing rules: The type system
A typing rule consists of a premise and a conclusion. If the premise is true, then its conclusion also must be true. They are denoted like:
The typing rules for Hindley-Milner are:
Variable typing rule: If assignment is in then in context it follows that has type .
Function application typing rule: If in context it follows that expression has type and expression has type . Then, in context it follows the expression has type .
Function abstraction typing rule: If it follows that expression has type in the case we add to . Then, it follows that in the expression has type .
Let binding rule: If in it follows that has type , and it follows that if we add to typing context has type . Then, it follows that the expression has type .
Algorithm W: An inference algorithm
The following is the first algorithm proposed to infer types in Hindlye-Milner types. It is a recursive algorithm that takes as input a type context and an expression, and returns a substitution and a type. The type is what we care about.
The function instantiate takes a type scheme, and for each for all creates a new type variable and replaces it in the body of the type. The result is a type without for all quantifiers.
The function unify takes two types and returns a substitution that, when applied to both types, makes them equal.
A substitution is a list of mappings from type variables to other types.
The function generalize adds for all quantifiers to free type variables in a type. i.e. it adds a for all quantifier to all type variables in a type that are not currently quantified.
Algorithm M: Another inference algorithm
M is also a recursive algorithm, but it takes as input a type context, a expression, and a type, and returns a substitution. If we want to infer the type of an expression , we create a new type variable and call . Then we apply the resulting substitution to to know the type of .
Algorithm M is top-down instead of bottom-up. This means constraints are propagated from the top, and if there exists an error, we found it in a leaf.