Programming Language Design Project

During Semester 2 of Year 2 of my Computer Science BSc at Southampton I was tasked with creating a programming language from scratch that would need to carry out calculations on boundless streams of input data. This was a pair project with sseneca. We decided to call this language Gnome Compsci, which is a play on words of the name Noam Chomsky.

We were required to use Haskell as the language in which to write the Lexer, Parser and Interpreter. Therefore we chose to use the Alex and Happy libraries to make the lexing and parsing more straightforward as the Interpreter was likely to be the most complex part.

Lexer and Parser

Criteria

  • Integers
  • Lists/Arrays
  • Strings (For output)
  • Flow Control and Loops
  • Numerical/Boolean Comparisons

Grammar

We decided that it made sense to include the few simple functions we needed in the grammar rather than specifying a general way to define and call functions. This helped to simplify both the grammar and the interpreter. We spent a large amount of time perfecting this grammar and removing shift/reduce conflicts but eventually we managed to include all the criteria and in a relatively concise manor.

Prog : RootExpr Prog  { $1:$2 }
     | RootExpr       { [$1] }

RootExpr : Expr ENDLINE { $1 }
         | if '(' Expr ')' then '{' Prog '}' else '{' Prog '}' { GCIf $3 $7 $11 }
         | while '(' Expr ')' '{' Prog '}'                     { GCWhile $3 $6 }

Expr : print '(' Expr ')'            { GCPrint $3 }
     | readline '(' ')'              { GCRead }
     | split '(' Expr ')'            { GCSplit $3 }
     | head '(' Expr ')'             { GCHead $3 }
     | tail '(' Expr ')'             { GCTail $3 }
     | int                           { GCInt $1 }
     | var '=' Expr                  { GCAssign $1 $3 }
     | '-' Expr %prec NEG            { GCNeg $2 }
     | var                           { GCVar $1 }
     | str                           { GCStr $1 }
     | True                          { GCTrue }
     | False                         { GCFalse }
     | Expr '+' Expr                 { GCPlus $1 $3}
     | Expr '-' Expr                 { GCSub  $1 $3}
     | Expr '/' Expr                 { GCDiv  $1 $3}
     | Expr '*' Expr                 { GCMult $1 $3}
     | Expr EQ Expr                  { GCEq $1 $3}
     | Expr NEQ Expr                 { GCNEq $1 $3}
     | Expr OR  Expr                 { GCOr $1 $3 }
     | Expr AND Expr                 { GCAnd $1 $3 }
     | Expr '<' Expr                 { GCLess $1 $3 }
     | Expr '>' Expr                 { GCMore $1 $3 }
     | '[' ExprList ']'              { $2 }
     | Expr ':' Expr                 { GCCons $1 $3}
     | Expr APPEND Expr              { GCAppend $1 $3 }
     | '(' Expr ')'                  { $2 }

ExprList : {- empty -}               { GCEList }
         | Expr                      { GCCons $1 GCEList }
         | Expr ',' ExprList         { GCCons $1 $3 }
Code Snippet 1: The Happy Grammar for the Gnome Compsci Programming Language

The Execution Model (Interpreter)

We used concepts from a CESK machine with small-step semantics and introduced some additional run loops and monadic behaviour to deal with IO.

Here is a worked example of our execution model:

exp: (1 + 2) + 3  continuation:
exp: 1 + 2        continuation: Hole + 3
exp: 1            continuation: Hole + 2, Hole + 3
val: 1            continuation: Hole + 2, Hole + 3
exp: 2            continuation: 1 + Hole, Hole + 3
val: 2            continuation: 1 + Hole, Hole + 3
val: 3            continuation: Hole + 3
exp: 3            continuation: 3 + Hole
val: 3            continuation: 3 + Hole
val: 6            continuation:

The main problems we encountered during the implementation of this interpreter were to do with IO functions such as print. They posed a problem as they didn’t have a value in the language, they just did some action. We called these commands and created a null node for the parse tree to cheat the system and have a “value” to return when the commands were executed. We also had to convert each step in the evaluation to use the IO monad, just for the benefit of a few commands that used IO features like putStrLn.

Extensions

For additional marks we could choose to implement a few extensions. As we are both Emacs users we were interested to explore the idea of major-mode for our language. After some initial research we settled on creating a generic mode for Emacs as our language was quite simple and the syntax could be simply captured by the in-built define-generic-mode function. We also wrote a syntax highlighting lexer for Pygments which allowed us to highlight code snippets in our LaTeX report.

Results and Conclusions

Results

We scored 80% for this assignment, having passed 90/94 tests with programs written in our language.

Conclusions

When completing the programming tasks in our language it became clear that including function declaration and function calling would have made the final programs more succinct. It was also clear that the style of programming in our language was not always very intuitive, but we always got there in the end.

It was a very useful and informative insight into the programming languages we use regularly. Looking in-depth at the execution model helped me to understand why certain things are the way they are. Finally the challenge of creating the language was extremely exciting and programming language design is an area I would be more than happy to explore further in the future.