Advanced Functional Programming (9) Domain Specific Embedded Languages
Bastiaan Heeren
Center for Software Technology, Universiteit Utrecht http://www.cs.uu.nl/groups/ST/
February 28, 2007
Domain Specific Embedded Languages
Overview
1
Domain Specific Languages
2
Domain Specific Embedded Languages
3
Growing a language
4
Template Haskell
5
Agenda
Domain Specific Languages
General purpose language: a programming language for creating various kinds of programs.
Domain specific language (DSL): a programming language for one particular problem domain.
DSLs offer appropriate notation and abstractions for one domain.
Often limited to the domain.
Examples:
PostScript – for page rendering
Structured Query Language – for accessing a database
make – macro language for declaring file dependencies
L
ATEX and bibtex – for document preparation
Domain Specific Embedded Languages> Domain Specific Languages
Why use a DSL?
Because the language is tailored for one problem domain, a DSL is even at a higher level than a high-level language.
Ideally, a domain engineer should be able to use a DSL Programs in a DSL are generally easier to write, modify, and reason about.
Argument in favor of using DSLs: the initial cost to start using
a DSL may be high, but over time it should yield significant
savings.
Domain Specific Embedded Languages> Domain Specific Embedded Languages
Problems of DSL
Don’t build a DSL from scratch, but inherit the infrastructure of some other language!
This yields a domain specific embedded language
A pure embedding: no pre-processor, macro-expander, or generator.
What makes Haskell a good host for a DSEL? Higher-order functions
Laziness
Polymorphism
Type classes
Domain Specific Embedded Languages> Domain Specific Embedded Languages
Problems of DSL
Don’t build a DSL from scratch, but inherit the infrastructure of some other language!
This yields a domain specific embedded language
A pure embedding: no pre-processor, macro-expander, or generator.
What makes Haskell a good host for a DSEL?
Higher-order functions Laziness
Polymorphism
Type classes
Syntax and semantics
Slogan: “semantics is more important than syntax” – however, syntax does matter (although it probably isn’t the designer’s biggest worry)
Many semantic details do not matter much (numbers, scoping rules, looping constructs)
Idea: borrow design decisions made for host language, and reuse ideas. Advantages:
Access to more programming features. (Common evolutionary path a for DSL is to grow to a complex general purpose language – hard to find the foundations)
Share a common base language (and its tools). Large applications may have more than one DSEL.
Domain Specific Embedded Languages> Domain Specific Embedded Languages
Example: geometric region analysis
type Region = Point → Bool inRegion :: Point → Region → Bool p ‘inRegion‘ r = r p
circle :: Radius → Region
outside :: Region → Region
-- logical negation(∩) :: Region → Region → Region
-- intersection(∪) :: Region → Region → Region
-- union(r1 ∩ r2) p = p ‘inRegion‘ r1 ∧ p ‘inRegion‘ r2
Disbelief that this code is executable!
Simple to prove associativity of intersection: use equational reasoning (referential transparency), or QuickCheck.
(r1 ∩ r2) ∩ r3 ≡ r1 ∩ (r2 ∩ r3)
Example: geometric region analysis
type Region = Point → Bool inRegion :: Point → Region → Bool p ‘inRegion‘ r = r p
circle :: Radius → Region
outside :: Region → Region
-- logical negation(∩) :: Region → Region → Region
-- intersection(∪) :: Region → Region → Region
-- union(r1 ∩ r2) p = p ‘inRegion‘ r1 ∧ p ‘inRegion‘ r2
Disbelief that this code is executable!
Simple to prove associativity of intersection: use equational reasoning (referential transparency), or QuickCheck.
(r1 ∩ r2) ∩ r3 ≡ r1 ∩ (r2 ∩ r3)
Domain Specific Embedded Languages> Domain Specific Embedded Languages
Modular algebraic semantics
It is important to recognize layers of abstraction for a DSEL (this requires a good understanding of the domain).
Good example: two layers in the wxHaskell library.
Example: A simple graphics DSEL
Layer 1: pictures
-- Atomic objects:
circle
-- a unit circlesquare
-- a unit squarebitmap "p.gif"
-- an imported bit-map -- Composite objects:scale v p
-- scale picture p by vector vcolor c p
-- color picture p with color ctrans v p
-- translate picture p by vector vp1 ‘over ‘ p2
-- overlay p1 on p2p1 ‘above‘ p2
-- place p1 above p2p1 ‘beside‘ p2
-- place p1 beside p2Nice properties can be proven about this algebra of pictures (e.g.,
distributive laws, associativity)
Domain Specific Embedded Languages> Domain Specific Embedded Languages
Layer 2: animations
type Behavior a = Time → a
type Animation = Behavior Picture (lift1 f b1) t = f (b1 t )
(lift2 f b1 b2 ) t = f (b1 t ) (b2 t )
colorB = lift2 color
-- the color may also be time varyingsinB = lift1 sin time t = t
wiggle = sinB (pi ∗ time)
wiggleRange lo hi = lo + (hi − lo) ∗ (wiggle + 1) / 2
ball = colorB red (scaleB (wiggleRange 0.5 1) circle)
Generalization: we adopt a more generic viewpoint
Lifting could be done with type classes (also Num)
Layer 3: reactivity
Basic reactive expression has the form b1 ‘until‘ e =⇒ b2.
(‘’behave as b1 until event e occurs, then behave as b2”).
until and (=⇒) are just Haskell functions (although they appear to be primitives!)
color (cycle red green blue) circle where
cycle c1 c2 c3 =
c1 ‘until‘ leftMouseButton =⇒
cycle c2 c3 c1
(cycle relies on lazy evaluation)
Domain Specific Embedded Languages> Domain Specific Embedded Languages
Advanced Parsing Techniques
Another example of a layered DSEL are the parser combinators:
Simple combinators (used for the Grammars and Parsing course)
Error-correcting parsers
Fast online parser (produce output as soon as possible to avoid space-leaks)
Self-analyzing parser (recent work by Arthur Baars to cope with a left recursive context-free grammar)
Although the implementation of the (basic) combinators is getting
more and more complicated, the interface is (almost) the same.
This is essential for building a successful library.
Advanced Parsing Techniques
Another example of a layered DSEL are the parser combinators:
Simple combinators (used for the Grammars and Parsing course)
Error-correcting parsers
Fast online parser (produce output as soon as possible to avoid space-leaks)
Self-analyzing parser (recent work by Arthur Baars to cope with a left recursive context-free grammar)
Although the implementation of the (basic) combinators is getting more and more complicated, the interface is (almost) the same.
This is essential for building a successful library.
Domain Specific Embedded Languages> Domain Specific Embedded Languages
Building a modular interpreter
So far, a DSEL is just about notation.
It should be possible to make fundamental changes in the interpreter, even long after the initial design.
Describe language features along with their semantics.
New features can be added without altering any previous code
Possible approaches:
Use monads and monad transformers for constructing building blocks
Use an attribute grammar (the Utrecht approach)
Template Haskell may also be used to acquire some
extensions
Building a modular interpreter
So far, a DSEL is just about notation.
It should be possible to make fundamental changes in the interpreter, even long after the initial design.
Describe language features along with their semantics.
New features can be added without altering any previous code Possible approaches:
Use monads and monad transformers for constructing building blocks
Use an attribute grammar (the Utrecht approach)
Template Haskell may also be used to acquire some
extensions
Domain Specific Embedded Languages> Growing a language
Growing a language
“Growing a language”: brilliant paper by Guy L. Steele Jr. from Sun Microsystems Laboratories (highly recommended to read it
yourself)
Don’t try to design and build “The Right Thing”: users will not wait for it.
Plan for growth: the language must grow as the set of users grows. Library functions should look like primitives.
Make it a community-effort: cathedral versus bazaar style.
Domain Specific Embedded Languages> Growing a language
Growing a language: Haskell
Haskell helps developers to design libraries with a native look-and-feel. For instance, recall (:=) and the on function from wxHaskell.
However, not all desired extensions can be expressed within Haskell.
Some examples:
Add syntactic sugar to the language: for instance, list comprehension cannot be introduced within the language Change the evaluation strategy: lazy versus strict
Extend the type system: for example, to support generic programming
The static analyzer is not aware of DSELs: error messages are
still reported in terms of the host language
Domain Specific Embedded Languages> Growing a language
Growing a language: Haskell
Haskell helps developers to design libraries with a native look-and-feel. For instance, recall (:=) and the on function from wxHaskell.
However, not all desired extensions can be expressed within Haskell.
Some examples:
Add syntactic sugar to the language: for instance, list comprehension cannot be introduced within the language Change the evaluation strategy: lazy versus strict
Extend the type system: for example, to support generic programming
The static analyzer is not aware of DSELs: error messages are
still reported in terms of the host language
Syntax macros
Work by Arthur Baars: supply syntax macros to the compiler for adding syntactic sugar to the language.
(constructors+types)
abstract syntax initial grammar (list of parsers)
macro interpreter compiler
compiler output macros
syntax
source file
Domain Specific Embedded Languages> Growing a language
Type inference directives
Scripting the type inference process (Heeren, Hage, and Swierstra), ICFP 2003. Directives let a compiler report domain specific type error messages.
(<$>) :: (a → b) → Parser s a → Parser s b x :: t1; y :: t2;
x <$> y :: t3;
t1 ≡ a1 → b1 : left operand is not a function t2 ≡ Parser s1 a2 : right operand is not a parser t3 ≡ Parser s2 b2 : result type is not a parser
s1 ≡ s2 : parser has an incorrect symbol type
a1 ≡ a2 : function cannot be applied to result of parser
b1 ≡ b2 : parser has an incorrect result type
Type inference directives
Scripting the type inference process (Heeren, Hage, and Swierstra), ICFP 2003. Directives let a compiler report domain specific type error messages.
(<$>) :: (a → b) → Parser s a → Parser s b x :: t1; y :: t2;
x <$> y :: t3;
t1 ≡ a1 → b1 : left operand is not a function t2 ≡ Parser s1 a2 : right operand is not a parser t3 ≡ Parser s2 b2 : result type is not a parser
s1 ≡ s2 : parser has an incorrect symbol type
a1 ≡ a2 : function cannot be applied to result of parser b1 ≡ b2 : parser has an incorrect result type
Use error message attributes in the specialized type error messages:
t2 ≡ Parser s1 a2 :
@expr .pos @ : The right operand of <$> should be a parser expression : @expr .pp@
right operand : @y.pp@
type : @t2@
does not match : Parser @s1@ @a2@
Domain Specific Embedded Languages> Growing a language
Example
A program with a type error:
test :: Parser Char String
test = map toUpper <$> "hello, world!"
Compiling this program results in the following type error message:
(2, 21) : The right operand of <$> should be a parser expression : map toUpper <$> "hello, world!"
right operand : "hello, world!"
type : String
does not match : Parser Char String
What is Template Haskell
System by Tim Sheard and Simon Peyton Jones.
Extension to Haskell that supports compile-time meta- programming (i.e., algorithmic construction of programs at compile-time).
With TH, we can do:
Polytypic programming Macro-like expansion
User-directed optimization (for instance, inlining) Generation of supporting data structures and functions from existing data structures and functions
Implemented in GHC.
Domain Specific Embedded Languages> Template Haskell
What is meta-programming?
Nice example: C-like printf function in Haskell.
printf "Error: %s on line %d." msg line
We can’t define printf in Haskell because its type depends (in a complicated way) on its first parameter.
With Template Haskell, we can
guarantee type-safety for printf (types for msg and line) compile template code efficiently (for instance, interpret the control string at compile-time)
make it user-definable (not a compiler extension)
What is meta-programming?
Nice example: C-like printf function in Haskell.
printf "Error: %s on line %d." msg line
We can’t define printf in Haskell because its type depends (in a complicated way) on its first parameter.
With Template Haskell, we can
guarantee type-safety for printf (types for msg and line) compile template code efficiently (for instance, interpret the control string at compile-time)
make it user-definable (not a compiler extension)
Domain Specific Embedded Languages> Template Haskell
Using printf in Template Haskell
$( printf "Error: %s on line %d." ) msg line
$( ... ) is special notation (“evaluate at compile-time”).
$( ... ) is called a splice (not to be confused with Haskell’s infix application operator).
Conceptually, the splice above generates the following lambda term:
(λs
0→ λn
1→ "Error: " + + s
0+ + " on line " + + show n
1)
Of course, this term can be assigned a type as any other “normal”
function.
Using printf in Template Haskell
$( printf "Error: %s on line %d." ) msg line
$( ... ) is special notation (“evaluate at compile-time”).
$( ... ) is called a splice (not to be confused with Haskell’s infix application operator).
Conceptually, the splice above generates the following lambda term:
(λs
0→ λn
1→ "Error: " + + s
0+ + " on line " + + show n
1) Of course, this term can be assigned a type as any other “normal”
function.
Domain Specific Embedded Languages> Template Haskell
Defining printf in Template Haskell
printf :: String → Expr printf s = gen (parse s) data Format = D | S | L String parse :: String → [Format ] gen :: [ Format ] → Expr gen [D ] = [| λn → show n |]
gen [S ] = [| λs → s |]
gen [L s ] = lift s
parse is an ordinary Haskell function
Simplified implementation of gen: only one format specifier [| ... |] is called the quasi-quote notation
lift lifts a string to a value of type Expr
Let’s define gen for an arbitrary
number of format specifiers. We
use recursion!
Defining printf in Template Haskell
printf :: String → Expr printf s = gen (parse s) data Format = D | S | L String parse :: String → [Format ] gen :: [ Format ] → Expr gen [D ] = [| λn → show n |]
gen [S ] = [| λs → s |]
gen [L s ] = lift s
parse is an ordinary Haskell function
Simplified implementation of gen: only one format specifier [| ... |] is called the quasi-quote notation
lift lifts a string to a value of type Expr
Let’s define gen for an arbitrary
number of format specifiers. We
use recursion!
Domain Specific Embedded Languages> Template Haskell
Defining printf in Template Haskell (2)
printf :: String → Expr
printf s = gen (parse s) [| "" |]
data Format = D | S | L String parse :: String → [Format ] gen :: [ Format ] → Expr → Expr gen [ ] x = x
gen (D : xs) x = [| λn → $( gen xs [| $x + + show n |] ) |]
gen (S : xs) x = [| λs → $( gen xs [| $x + + s |] ) |]
gen (L s : xs) x = gen xs [| $x + + $( lift s ) |]
gen uses an accumulating parameter
Recursive calls to gen are ran at compile-time
Static scoping extends across the template mechanism (task
of quotation monad Q)
Domain Specific Embedded Languages> Template Haskell
Why templates?
High-level languages make programs shorter, easier to maintain, and easier to reason about.
Why? The compiler will do the job (most of the times, in a superior way).
But what if a programmer knows some particular details? Let the user teach the compiler a new trick.
A compiler manipulates programs. TH lets users manipulate their own programs.
1
Conditional compilation (for different configurations)
2
Program reification (inspect program structure, deriving)
3
Algorithmic program construction (printf )
4
Abstractions (zip1, zip2, zip3, etcetera)
5
Optimizations (for algebraic laws, in-lining opportunities)
Domain Specific Embedded Languages> Template Haskell
Why templates?
High-level languages make programs shorter, easier to maintain, and easier to reason about.
Why? The compiler will do the job (most of the times, in a superior way).
But what if a programmer knows some particular details? Let the user teach the compiler a new trick.
A compiler manipulates programs. TH lets users manipulate their own programs.
1
Conditional compilation (for different configurations)
2
Program reification (inspect program structure, deriving)
3
Algorithmic program construction (printf )
4
Abstractions (zip1, zip2, zip3, etcetera)
5
Optimizations (for algebraic laws, in-lining opportunities)
Design issues
The advantages and disadvantages of TH’s design decisions:
Compile-time and run-time functions use the same language
In contrast with most systems (#if,#define)Existing libraries and programming skills can be used
We need explicit annotations for specifying when to execute the code
Executed at compile time
Allows to do full static analysis (including type inference) May lead to non-terminating compilation
Domain Specific Embedded Languages> Template Haskell
Syntax-construction functions
Select the first component of a triple:
case x of (a, b, c) → a
With Template Haskell:
$( sel 1 3 ) x given that
sel :: Int → Int → Expr
sel i n = [| λx → case x of ... |]
We can’t write sel with
quasi-quoting only
Domain Specific Embedded Languages> Template Haskell
Syntax-construction functions
Select the first component of a triple:
case x of (a, b, c) → a
With Template Haskell:
$( sel 1 3 ) x given that
sel :: Int → Int → Expr
sel i n = [| λx → case x of ... |]
We can’t write sel with
quasi-quoting only
Domain Specific Embedded Languages> Template Haskell
Syntax-construction functions
Select the first component of a triple:
case x of (a, b, c) → a
With Template Haskell:
$( sel 1 3 ) x given that
sel :: Int → Int → Expr
sel i n = [| λx → case x of ... |]
We can’t write sel with
quasi-quoting only
Domain Specific Embedded Languages> Template Haskell
Syntax-construction functions (2)
sel :: Int → Int → Expr
sel i n = lam [pvar "x" ] (caseE (var "x") [alt ]) where
alt :: Match
alt = simpleM pat rhs pat :: Patt
pat = ptup (map pvar as) rhs :: Expr
rhs = var (as !! (i − 1))
-- the operator (!!) is zero basedas :: [ String ]
as = ["a" + + show i | i ← [1 . . n ]]
-- Syntax for patterns
pvar :: String → Patt
-- xptup :: [ Patt ] → Patt
--(x,y,z)pcon :: String → [Patt ] → Patt
-- Fork x ypwild :: Patt
---- Syntax for expressions
var :: String → Expr
-- xtup :: [ Expr ] → Expr
--(3,y)app :: Expr → Expr → Expr
-- f xlam :: [ Patt ] → Expr → Expr
--λx y→ 5caseE :: Expr → [Match ] → Expr
-- case x of...simpleM :: Patt → Expr → Match
-- x:xs→ 2Domain Specific Embedded Languages> Template Haskell
Syntax-construction functions (2)
sel :: Int → Int → Expr
sel i n = lam [pvar "x" ] (caseE (var "x") [alt ]) where
alt :: Match
alt = simpleM pat rhs pat :: Patt
pat = ptup (map pvar as) rhs :: Expr
rhs = var (as !! (i − 1))
-- the operator (!!) is zero basedas :: [ String ]
as = ["a" + + show i | i ← [1 . . n ]]
-- Syntax for patterns
pvar :: String → Patt
-- xptup :: [ Patt ] → Patt
--(x,y,z)pcon :: String → [Patt ] → Patt
-- Fork x ypwild :: Patt
---- Syntax for expressions
var :: String → Expr
-- xtup :: [ Expr ] → Expr
--(3,y)app :: Expr → Expr → Expr
-- f xlam :: [ Patt ] → Expr → Expr
--λx y→ 5caseE :: Expr → [Match ] → Expr
-- case x of...simpleM :: Patt → Expr → Match
-- x:xs→ 2Domain Specific Embedded Languages> Template Haskell
Mix the two styles
sel :: Int → Int → Expr
sel i n = [| λx → $( caseE (var "x") [alt ] ) |]
where
alt = simpleM pat rhs pat = ptup (map pvar as) rhs = var (as !! (i − 1))
as = ["a" + + show i | i ← [1 . . n]]
Our next challenge: implement an n-ary zip function.
$( zipN 3 ) as bs cs
Domain Specific Embedded Languages> Template Haskell
Mix the two styles
sel :: Int → Int → Expr
sel i n = [| λx → $( caseE (var "x") [alt ] ) |]
where
alt = simpleM pat rhs pat = ptup (map pvar as) rhs = var (as !! (i − 1))
as = ["a" + + show i | i ← [1 . . n]]
Our next challenge: implement an n-ary zip function.
$( zipN 3 ) as bs cs
zipN: investigate for n = 3
To gain some insight, we first consider what zipN should do for n = 3.
zip3 :: [a ] → [b ] → [c ] → [(a, b, c)]
zip3 =
let rec = λy1 y2 y3 →
case (y1, y2, y3) of
(x1 : xs1, x2 : xs2, x3 : xs3 ) → (x1, x2, x3) : rec xs1 xs2 xs3
→ [ ] in rec
Recursive definition
Quite a lot pattern/expression variables
Domain Specific Embedded Languages> Template Haskell
Defining zipN
zipN::Int→Expr
zipN n=[| let zp=$(mkZip n[|zp|])in zp|]
mkZip::Int→Expr →Expr
mkZip n name=lampys(caseE(tupeys) [m1,m2])where (pxs,exs) =genPE "x" n
(pys,eys) =genPE "y" n (pxss,exss) =genPE "xs"n pcons x xs =[p|$x:$xs|]
body =[|$(tupexs):$(apps(name:exss))|]
m1 =simpleM(ptup(zipWith pcons pxs pxss))body m2 =simpleM pwild (con"[]")
genPE::String →Int→ ([Pat], [ExpQ])
genPE s n=let ns= [s++show i|i ← [1 . .n]]
in(mappvarns,mapvar ns) apps=foldl1app
apps has an elegant definition Expr is really ExpQ
lam should be lamE Pattern quasi-quoting is not yet
supported
con "[]" must be
con "GHC.Base:[]"
Domain Specific Embedded Languages> Template Haskell
Defining zipN
zipN::Int→Expr
zipN n=[| let zp=$(mkZip n[|zp|])in zp|]
mkZip::Int→Expr →Expr
mkZip n name=lampys(caseE(tupeys) [m1,m2])where (pxs,exs) =genPE "x" n
(pys,eys) =genPE "y" n (pxss,exss) =genPE "xs"n pcons x xs =[p|$x:$xs|]
body =[|$(tupexs):$(apps(name:exss))|]
m1 =simpleM(ptup(zipWith pcons pxs pxss))body m2 =simpleM pwild (con"[]")
genPE::String →Int→ ([Pat], [ExpQ])
genPE s n=let ns= [s++show i|i ← [1 . .n]]
in(mappvarns,mapvar ns)
apps=foldl1app
apps has an elegant definition
Expr is really ExpQ
lam should be lamE Pattern quasi-quoting is not yet
supported
con "[]" must be
con "GHC.Base:[]"
Domain Specific Embedded Languages> Template Haskell
Defining zipN
zipN::Int→Expr
zipN n=[| let zp=$(mkZip n[|zp|])in zp|]
mkZip::Int→Expr →Expr
mkZip n name=lampys(caseE(tupeys) [m1,m2])where (pxs,exs) =genPE "x" n
(pys,eys) =genPE "y" n (pxss,exss) =genPE "xs"n pcons x xs =[p|$x:$xs|]
body =[|$(tupexs):$(apps(name:exss))|]
m1 =simpleM(ptup(zipWith pcons pxs pxss))body m2 =simpleM pwild (con"[]")
genPE::String →Int→ ([Pat], [ExpQ])
genPE s n=let ns= [s++show i|i ← [1 . .n]]
in(mappvarns,mapvar ns)
apps=foldl1app
apps has an elegant definition Expr is really ExpQ
lam should be lamE Pattern quasi-quoting is not yet
supported
con "[]" must be
con "GHC.Base:[]"
Declaration slicing
We can splice in template declarations at top-level.
For instance, introduce zip0, zip1, ..., zip10.
zipDecl :: Int → DecQ
zipDecl n = valD name body [ ] where name = pvar ("zip" + + show n) body = normalB (zipN n)
$ (mapM zipDecl [0 . . 10])
Question: what is the type of zip0?
Domain Specific Embedded Languages> Template Haskell
Reification
data Tree a = Node (Tree a) (Tree a) | Leaf a repTree :: Decl
repTree = reifyDecl Tree lengthType :: Type
lengthType = reifyType length percentFixity :: Q Int
percentFixity = reifyFixity (%) here :: Q String
here = reifyLocn
Reification: query the state of the compiler
A language construct, not a function!
Conclusion
Template Haskell provides a powerful, new way of writing programs.
Because all meta-code is executed at compile-time, we can still guarantee run-time safety.
GHC supports Template Haskell, but the library is not mature yet (see also Section 7.6 of the User’s Guide).
What will the future bring for TH?
Domain Specific Embedded Languages> Agenda
Agenda
1
Read “QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs (Koen Claessen and John Hughes)”.
2