I found this post in my drafts that I forgot to publish, as a sequel to this earlier post – Scanning annotations in class path. Though now probably out-of-date, I thought I’d still record it as a part of my experience so that I can refer to it anytime I like. If it benefits you, it will be a positive side-effect. The DSL used various transformation functions which closely resembled MongoDB’s aggregation-projection framework functions, like the Arithmetic operations for add, subtract, etc… and the String operations like concat, toLower, etc…
# Custom DSL: # Transform a field by using a built-in Arithmetic Operators. # It takes in an input field and an expression containing built-in # arithmetic function and returns the result by setting it on the # output field. # # syntax: # db.things.transform("outputFieldName", "{ $add: ["$age", 1] }") ##################################################################
The user will use the above DSL to apply the transformation to the totalAmount
. Say, we are offering 20% discount on totalAmount
, then we’d expect the user to say something like –
db.orders.transform('totalAmount', '{ $subtract: ["$totalAmount", { $multiply: ["$totalAmount", 0.2] } ] }')
If you look at the second argument of the transform, the user expresses the discount on totalAmount
as series of arithmetic operations to be carried out on totalAmount
field and then save the result back into totalAmount
field. We receive the above arguments from user as Json Strings and I need to parse this string Json to domain model. This domain model would then be told to evaluate at runtime to apply the transformation to every document in the orders collection.
So, it is important to see, how the string representation can closely represent the domain model without any impedance mismatch. So, how do I do this? Well, to start out, I view this as
Subtract(Field("totalAmount"), Multiply((Field("totalAmount"), Literal(0.2)))
To generalize the above, I could represent it as
FunctionExpression(FieldExpression("someField"), FunctionExpression(FieldExpression("someField"), ConstantExpression(someValue))
So each of them is really an expression, i.e, either a terminal expression like FieldExpression
or ConstantExpression
or a non-terminal Expression like SubtractFunction
or MultiplyFunction
, in general, let’s call the non-terminal function expression as FunctionExpression
that refers other Expression
s. To be more precise, it is the Interpreter Pattern
- Literal Expression for representing Constants
- Field Expression that gets a field value from a document
- Function Expression that operates on arguments that are themselves –
Expression
s.
Below is the domain model that can represent the above concepts
+––––––––––+ |Expression|<-–––––––––+ +–-–––-––-–+ | | | | | +––––+ | +––––––+ | | | | | +–––-–––+ +––-––+ +––––-–––+ | |Literal| |Field| |Function+––+ +–––––––+ +–––––+ +––––––––+
Here is the model expressed as code.
sealed trait Expression { def evaluate(document: BSONObject): Literal } final case class Literal(val value: Any) extends Expression { def evaluate(document: BSONObject) = this } final case class Field(name: String) extends Expression { def evaluate(document: BSONObject) = document(name) match { case Some(value) => Literal(value) case None => Literal(null) } } sealed abstract class Function(expressions: Expression*) extends Expression
Furthering the discussion, I can now represent Arithmetic functions, String functions and their sub-types like
+--------+ |Function| +--------+ ________________/ | \_________________ / | \ +------------------+ +--------------+ +------------+ _________|ArithmeticFunction| |StringFunction| |DateFunction| / +------------------+ +--------------+ +------------+ / ______/ | \___ \_____ | |____ \_______ / / | \ \ | \ \ +---+ +--------+ +------+ +--------+ +---+ +-------+ +-------+ +------+ |Add| |Multiply| |Divide| |Subtract| |Mod| |ToLower| |ToUpper| |Concat| +---+ +--------+ +------+ +--------+ +---+ +-------+ +-------+ +------+
Translating the above to code.
abstract class ArithmeticFunction(expressions: Expression*) extends Function(expressions: _*) { def value(literal: Literal): Double = literal match { case Literal(null) => 0 case Literal(x) => x.toString.toDouble } } final case class Add(expressions: Expression*) extends ArithmeticFunction(expressions: _*) { def evaluate(document: BSONObject) = { val result = expressions.foldLeft(0d) { (sum, expression) => sum + value(expression.evaluate(document)) } Literal(result) } } final case class Subtract(minuend: Expression, subtrahend: Expression) extends ArithmeticFunction(minuend, subtrahend) { def evaluate(document: BSONObject) = { val left = value(minuend.evaluate(document)) val right = value(subtrahend.evaluate(document)) Literal(left - right) } }
On similar lines String aggregation functions can also be represented as.
abstract class StringFunction(expressions: Expression*) extends Function(expressions: _*) { def value(literal: Literal): String = literal match { case Literal(null) => "" case Literal(x) => x.toString } } final case class Concat(expressions: Expression*) extends StringFunction(expressions: _*) { def evaluate(document: BSONObject) = { val result = expressions.foldLeft("") { (concatenated, expression) => concatenated + value(expression.evaluate(document)) } Literal(result) } } final case class ToLower(expression: Expression) extends StringFunction(expression) { def evaluate(document: BSONObject) = { val string = value(expression.evaluate(document)) Literal(string.toLowerCase) } } final case class Length(expression: Expression) extends StringFunction(expression) { def evaluate(document: BSONObject) = { val string = value(expression.evaluate(document)) Literal(string.length) }
So, the next step for me is to parse the above and convert it to the above domain model. One of the ways to achieve this is to parse and construct Expression tree out of this manually, but Scala offers Parser Combinators which do the heavy lifting for you. This is where functional programming paradigm really shines. It attacks this problem by breaking stuff in to smaller pieces, that is, using composition to control complexity.
Lets say, if you were to do this manually, you would write a parser that has parse method that may look something like –
def parse(jsonExpression: String) : Expression = { //parses the string and returns an Expression tree. }
This in my opinion, is a monolithic parser and a very big-step to take. Instead how about breaking this parse method into number of smaller parse methods that only caters to a small bit of the input and then compose these small parse methods to form a big parse method, returning ParseResult?
def parse(input: String) : ParseResult = { //parses the string and returns an Expression tree. }
Further, each small parse method does what it can –
- It is either able to parse and return the result
- if it cannot do anything sensible with that input, it either fails or passes that input to the next parse method down the chain.
So the parser either succeeds or fails. As we are in functional programming paradigm, we do not throw exception to signal failure and return type for success, instead let us introduce abstractions that represent success and failure.
sealed abstract class ParseResult[+T] case class Success[T](result: T, in: Input) extends ParseResult[T] case class Failure(msg: String, in: Input) extends ParseResult[Nothing]
We can understand that the Success will wrap the result of parsing and failure will wrap the parse error message, but why are both taking in Input? Well, it turns out that we need to analyze a part of the input and return the remaining part in result for the next parser in chain to work upon it.
This is all good, we looked at parsing, but we have not combined parsers yet, after all Parser Combinators = Parser + Combinators.
We need to use combinators and combine these small parsers to create a complex parser. In reality this is functional composition at the rescue, we use higher order function that takes in 2 parsers and applies transformation to generate a complex parser. Lets create one such combinator
def combinator(parser1: Input => ParseResult, parser2: Input => ParseResult): Input => ParseResult = { ... }
In Scala, the class Parser
, like class Map
is both, a function and a class.
abstract class Parser[+T] extends (Input => ParseResult[T]) { def apply(in: Input): ParseResult[T] }
So the above combinator now becomes
def combinator[T, U, R](parser1: Parser[T], parser2: Parser[U]): Parser[R] = { ... }
But as the combinator method would be present on the Parser[T] itself, the correct signature would be
abstract class Parser[+T] extends (Input => ParseResult[T]) { def apply(in: Input): ParseResult[T] def combinator[U, R](parser: Parser[U]): Parser[R] = { ... } }
Well, in reality, this combinator method does not exist on Scala Parser class, instead you will find many combinators, for example you can string parsers together with sequential composition operator ~ or use the | combinator for alternative composition.
If you take a look at the Scala’s Parser class, you will find more interesting operations like flatMap and map on the parser, this tells us that, in Scala, a Parser is a Monad.
abstract class Parser[+T] extends (Input => ParseResult[T]) { def apply(in: Input): ParseResult[T] def flatMap[U](f: T => Parser[U]): Parser[U] = Parser{ in => this(in) flatMapWithNext(f)} def map[U](f: T => U): Parser[U] = Parser{ in => this(in) map(f)} def ^^ [U](f: T => U): Parser[U] = map(f).named(toString+"^^") }
Next, look at Parser Input. Best is to read Martin’s Odersky’s book – Programming in Scala pp744 section on Parser input. But just to inline some of it here –
… parser reads streams of tokens instead of a raw sequence of characters. A separate lexical analyzer is then used to convert a stream of raw characters into a stream of tokens…
type Input = Reader[Elem]The class Reader is similar to a Stream, but it also keeps track of the positions of all the elements it reads. The type Elem represents individual input elements. It is an abstract member of the Parsers trait:
type ElemThis means that subclasses and subtraits of Parsers need to instantiate class Elem to the type of input elements that are being parsed. For instance, RegexParsers and JavaTokenParsers fix Elem to be equal to Char…
So the next step is to create your own parsers from scratch or try the parsers available in JavaTokenParsers
to see how they work. I’ll leave that as an exercise for the reader.
Next, I need to now create small parsers and eventually combine them into a complex parser that parses the above expression and builds an Expression
tree. Before you create these small parsers, the approach is to express the above in the Backus–Naur Form and then use Scala’s DSL . So I ended up creating the BNF that represents the above expressions
BNF: value ::= obj | floatingPointNumber | "null" | "true" | "false" | quotedField | singleQuotedField | quotedStringLiteral | singleQuotedStringLiteral. obj ::= "{" function "}". fn ::= fnName ":" fnArgs. fnArgs ::= "[" values "]" | value. values ::= value { "," value }.
Other examples of the above BNF would be:
// 1. Concat first name and last name fields using delimiter - ", " // 2. Calculate length of the salutation field and the concatenated first and last names """{ $length: [ "$salutation", { $concat : ["$fname", ", ", "$lname" ] } ] }"""
Now, let us implement the above BNF
def value: Parser[Expression] = ( obj | floatingPointNumber ^^ (s => Literal(s.toDouble)) | "null" ^^ (s => Literal(null)) | "true" ^^ (s => Literal(true)) | "false" ^^ (s => Literal(false)) | quotedField | singleQuotedField | quotedStringLiteral | singleQuotedStringLiteral ) def field: Parser[String] = """([a-zA-Z_]\w*([\.][a-zA-Z_0-9]\w*)*)""".r def quotedField: Parser[Expression] = "\"$" ~> field <~ "\"" ^^ (Field(_)) def singleQuotedField: Parser[Expression] = "\'$" ~> field <~ "\'" ^^ (Field(_)) def quotedStringLiteral: Parser[Expression] = ("\"" + """([^"\p{Cntrl}\\\$\.]|\\[\\'"bfnrt]|\\u[a-fA-F0-9]{4})*""" + "\"").r ^^ (s => Literal(s.replaceAllLiterally("\"", ""))) def singleQuotedStringLiteral: Parser[Expression] = ("\'" + """([^'\p{Cntrl}\\\$\.]|\\[\\'"bfnrt]|\\u[a-fA-F0-9]{4})*""" + "\'").r ^^ (s => Literal(s.replaceAllLiterally("\'", ""))) def obj: Parser[Expression] = "{"~> fn <~"}" def fnArgs: Parser[List[Expression]] = "["~> repsep(value, ",") <~"]" | (value ^^ (List(_))) def fnName: Parser[String] = "$"~> """[a-zA-Z_]\w*""".r def fn: Parser[Expression] = fnName~":"~fnArgs ^^ { case "add"~":"~args => Add(args: _*) case "subtract"~":"~args => Subtract(args: _*) case "multiply"~":"~args => Multiply(args: _*) case "divide"~":"~args => Divide(args: _*) case "concat"~":"~args => Concat(args: _*) case "length"~":"~args => Length(args: _*) } def parse(input: String): Expression = parseAll(obj, input) match { case Success(value, _) => value case NoSuccess(message, _) => throw new IllegalArgumentException(s"Parsing Failed: $message") } }
Now, the above highlighted lines will keep on growing as functions grow. How do I confirm the fn
function code to OCP? For that, you may want to look at this post
Next, you may ask, is it possible to TDD this parser? Well, the answer is Yes! In fact, it turns out that is type of evolution of parsers is very much aligned with the spirit of TDD, where we build a little, get the feedback, and evolve a testable design by employing principle of composition. I find this at perfect harmony. We are using Specs2 and here are the Parser Specifications.
@RunWith(classOf[JUnitRunner]) class ParserSpecs extends Specification { trait ExpressionParser extends Parser with Scope { def Result[T](result: ParseResult[T]): T = result match { case Success(value, _) => value case NoSuccess(message, _) => throw new IllegalArgumentException(s"Parsing Failed Message: $message") } } "Parse" should { "return Literal for null" in new ExpressionParser { //Given val input = "null" //When val expression = Result(parseAll(value, input)) //Then expression mustEqual Literal(null) } "return Literal for true" in new ExpressionParser { //Given val input = "true" //When val expression = Result(parseAll(value, input)) //Then expression mustEqual Literal(true) } "return Literal for false" in new ExpressionParser { //Given val input = "false" //When val expression = Result(parseAll(value, input)) //Then expression mustEqual Literal(false) } "return Literal for Integer" in new ExpressionParser { //Given val input = "2" //When val expression = Result(parseAll(value, input)) //Then expression mustEqual Literal(2) } "return Literal for Decimal" in new ExpressionParser { //Given val input = "2.4" //When val expression = Result(parseAll(value, input)) //Then expression mustEqual Literal(2.4) } "return Field for fieldName" in new ExpressionParser { //Given val input = """"$age"""" //When val expression = Result(parseAll(value, input)) //Then expression mustEqual Field("age") } "fail to parse field that has trailing dot" in new ExpressionParser { //Given val input = """"$address."""" //When-Then Result(parseAll(value, input)) must throwA[IllegalArgumentException] } "fail to parse field that has extra dot between levels" in new ExpressionParser { //Given val input = """"$address..line"""" //When-Then Result(parseAll(value, input)) must throwA[IllegalArgumentException] } "fail to parse field begins with a dot" in new ExpressionParser { //Given val input = """"$.address"""" //When-Then Result(parseAll(value, input)) must throwA[IllegalArgumentException] } "fail to parse field that is prefixed with multiple $" in new ExpressionParser { //Given val input = """"$address"""" //When-Then Result(parseAll(value, input)) must throwA[IllegalArgumentException] } "fail to parse field without a name" in new ExpressionParser { //Given val input = "$" //When-Then Result(parseAll(value, input)) must throwA[IllegalArgumentException] } "return Literal for string" in new ExpressionParser { //Given val input = """"age"""" //When val expression = Result(parseAll(value, input)) //Then expression mustEqual Literal("age") } "return function name" in new ExpressionParser { //Given val funcName = "function" val input = "$" + funcName //When val functionName = Result(parseAll(fnName, input)) //Then funcName mustEqual functionName } "fail to parse function name containing reserved prefix $" in new ExpressionParser { //Given val funcName = "$function" val input = "$" + funcName //When-Then Result(parseAll(fnName, input)) must throwA[IllegalArgumentException] } "fail to parse function without a name" in new ExpressionParser { //Given val input = "$" //When-Then Result(parseAll(fnName, input)) must throwA[IllegalArgumentException] } "return empty args" in new ExpressionParser { //Given val input = "[]" //When val args = Result(parseAll(fnArgs, input)) //Then args must beEmpty } "return Number argument" in new ExpressionParser { //Given val input = "[1]" //When val args = Result(parseAll(fnArgs, input)) //Then args mustEqual List(Literal(1)) } "return Number and String args" in new ExpressionParser { //Given val input = """[1, "age"]""" //When val args = Result(parseAll(fnArgs, input)) //Then args mustEqual List(Literal(1), Literal("age")) } "return Number, String and Field args" in new ExpressionParser { //Given val input = """[1, "age", "$name"]""" //When val args = Result(parseAll(fnArgs, input)) //Then args mustEqual List(Literal(1), Literal("age"), Field("name")) } "return Function arg" in new ExpressionParser { //Given val input = """[{ $add: [1, "$age"]}]""" //When val args = Result(parseAll(fnArgs, input)) //Then args mustEqual List(Add(Literal(1), Field("age"))) } "fail to parse incomplete args" in new ExpressionParser { //Given val input = "[1" //When-Then Result(parseAll(fnArgs, input)) must throwA[IllegalArgumentException] } "fail to parse ill-formed args" in new ExpressionParser { //Given val input = "[1]]" //When-Then Result(parseAll(fnArgs, input)) must throwA[IllegalArgumentException] } "return Add function" in new ExpressionParser { //Given val input = """$add: [1, "$age"]""" //When val add = Result(parseAll(fn, input)) //Then add mustEqual Add(Literal(1), Field("age")) } "return empty Add function" in new ExpressionParser { //Given val input = """$add: []""" //When val add = Result(parseAll(fn, input)) //Then add mustEqual Add() } "return recursive add function" in new ExpressionParser { //Given val input = """$add: [1, { $add: [2, "$age"]}]""" //When val add = Result(parseAll(fn, input)) //Then add mustEqual Add(Literal(1), Add(Literal(2), Field("age"))) } "return Multiply function" in new ExpressionParser { //Given val input = """$multiply: [1, "$age"]""" //When val multiply = Result(parseAll(fn, input)) //Then multiply mustEqual Multiply(Literal(1), Field("age")) } "return empty multiply function" in new ExpressionParser { //Given val input = """$multiply: []""" //When val multiply = Result(parseAll(fn, input)) //Then multiply mustEqual Multiply() } "return recursive multiply function" in new ExpressionParser { //Given val input = """$multiply: [1, { $multiply: [2, "$age"]}]""" //When val multiply = Result(parseAll(fn, input)) //Then multiply mustEqual Multiply(Literal(1), Multiply(Literal(2), Field("age"))) } "return concat function" in new ExpressionParser { //Given val input = """$concat: [1, "$age"]""" //When val concat = Result(parseAll(fn, input)) //Then concat mustEqual Concat(Literal(1), Field("age")) } "return empty concat function" in new ExpressionParser { //Given val input = """$concat: []""" //When val concat = Result(parseAll(fn, input)) //Then concat mustEqual Concat() } "return recursive concat function" in new ExpressionParser { //Given val input = """$concat: [1, { $concat: [2, "$age"]}]""" //When val concat = Result(parseAll(fn, input)) //Then concat mustEqual Concat(Literal(1), Concat(Literal(2), Field("age"))) } "allow single function at top-level within obj" in new ExpressionParser { //Given val input = """{ $add: [1, { $multiply: [2, "$age"]}] }""" //When val objExpr = Result(parseAll(obj, input)) //Then objExpr mustEqual Add(Literal(1), Multiply(Literal(2), Field("age"))) } "fail when more than 1 function is defined at top-level in obj" in new ExpressionParser { //Given val input = """{ $add: [1, { $multiply: [2, "$age"]}], $multiply: [] }""" //When-Then Result(parseAll(obj, input)) must throwA[IllegalArgumentException] } "return Subtract function" in new ExpressionParser { //Given val input = """$subtract: [1, "$age"]""" //When val subtract = Result(parseAll(fn, input)) //Then subtract mustEqual Subtract(Literal(1), Field("age")) } "return empty Subtract function" in new ExpressionParser { //Given val input = """$subtract: []""" //When val subtract = Result(parseAll(fn, input)) //Then subtract mustEqual Subtract() } "return Subtract function with 1 argument" in new ExpressionParser { //Given val input = """$subtract: [1.0]""" //When val subtract = Result(parseAll(fn, input)) //Then subtract mustEqual Subtract(Literal(1d)) } "return recursive subtract function" in new ExpressionParser { //Given val input = """$subtract: [1, { $subtract: [2, "$age"]}]""" //When val subtract = Result(parseAll(fn, input)) //Then subtract mustEqual Subtract(Literal(1), Subtract(Literal(2), Field("age"))) } "return Divide function" in new ExpressionParser { //Given val input = """$divide: [1, "$age"]""" //When val divide = Result(parseAll(fn, input)) //Then divide mustEqual Divide(Literal(1), Field("age")) } "return empty Divide function" in new ExpressionParser { //Given val input = """$divide: []""" //When val divide = Result(parseAll(fn, input)) //Then divide mustEqual Divide() } "return Divide function with 1 argument" in new ExpressionParser { //Given val input = """$divide: [1.0]""" //When val divide = Result(parseAll(fn, input)) //Then divide mustEqual Divide(Literal(1d)) } "return recursive Divide function" in new ExpressionParser { //Given val input = """$divide: [1, { $divide: [2, "$age"]}]""" //When val divide = Result(parseAll(fn, input)) //Then divide mustEqual Divide(Literal(1), Divide(Literal(2), Field("age"))) } } }
I know this was a long post but I did not see it as a point to split this into multiple ones. So thank-you for your time and patience.