Leveraging Groovy’s MOP to create embedded DSL

To give a context, we have created an embedded DSL for declaring the kind of transformations that the user wants to be applied to a MongoDB document to get that final expected document. Here is an excerpt of such a DSL.

// Sample Delta file
use test
db.customers.add("{'city' : 'Please Set City', 'pin':  'Please Pin code' }")

//Increment age field by 1
db.customers.transform('age', "{ $add: [&amp;quot;$age&amp;quot;, 1] }")

// Set context to transactions db
use transactions

// add to orders collection a nested document
db.orders.add('{"dispatch" : { "status" : "Default", "address": { "city": "someCity" }}}')

If you have used MongoDB and its shell, you would immediately recognise the semblance to this DSL. The tool that we are building is intended to be used by developers and Devops guys. This particular DSL is dev-centric and so it makes sense for us to stay very close to the MongoDB lingo. This will cause less friction in adoption as it has virtually no learning curve. The objective of using this DSL is to allow the developers to specify the transformations that they to apply to documents in that collection and get the transformed document.

The use keyword is used to set the context of the DB in which you issue commands. transform and add commands specify the type of transformation to be applied on customers collection and orders collection respectively. For example, transform takes in a built-in function add, adds 1 to the age field in the document and stores the results back into the age field.

Also, the tool allows multiple files that contain such transformations, we call these files as delta files. Obviously I’d need to read all these files, parse them and create a data-structure that will be consumed by the tool itself to make sense of the information stored in there. So the question before me was how do achieve this using Groovy? Before I begin describing the how I approached it by using Groovy’s Meta-Object Protocol, lets take a detour and understand some concepts elaborated by Terrence Parr in his book – Language Implementation Patterns, thanks to my friend Aslam Khan for pointing this book out to me. In here, Terrence gives an excellent big picture of how to implement it. He breaks down the whole process into a multi-stage pipeline (I have modified the diagram below to suit my context while preserving the spirit though) that analyzes and manipulates the input stream. The output of this pipeline is a useful internal data structure, he calls this as Intermediate Representation (IR). It is this IR that is consumed by Generators and it generates output based on this IR.

                               T R A N S L A T O R
       +---------------------------------------------------------------+
       |                                                               |
       +                                                               +
         +------------+         +-------------+         +------------+
         |            |         |             |         |            |
input -->|   Reader   |--> IR ->|   Analyzer  |--> IR ->| Generator  |--> output
         |            |         |             |         |            |
         +------------+         +-------------+         +------------+
       +                                         +
       |                                         |
       +-----------------------------------------+
                   I N T E R P R E T E R

                          M U L T I S T A G E    P I P E L I N E

It is the responsibility of the Reader is to build this data structure – Intermediate Representation from the input stream. It really is concerned with the Syntax in the input and ensures that it is honoured.
The job of the Analyzer is to perform Semantic Analysis, that is, to figure out what the input means, as Terrence Parr puts it – “anything beyond Syntax is called the Semantics”.
The job of the Generator is to consume this IR and produce output, it does so by walking this IR.
The Translator is really an entity that is composed out of the above three and does the overall translation of the input, whether it is text or binary and converts it to another consumable form, say translating a markdown to HTML could be one of the examples.
Finally Interpreter is composed out of Reader and Analyzer, it reads, decodes and executes instructions, much like the Java Interpreter.

For me, the above was very helpful in breaking down big problem in to bite-sized problems. So jumping straight in to the implementation, the Reader would read in the delta files for us and hand in each to the Parser and it’s Parser’s responsibility to build the IR for consumption by the Generator. The Translator is responsible for doing the entire translation job and does so be delegating it appropriately to the Reader and Generator. So here is some code to help grasp this.

public class Translator<T> {
  private final Reader reader
  private final Generator<T> generator

  public Translator(Reader reader, Generator<T> generator) {
    this.reader = reader
    this.generator = generator
  }

  public T translate(final TransformType transformType, final List<File> deltaFiles) {
    def tree = reader.read(deltaFiles)
    generator.generate(transformType, tree)
  }
}

In the above code the reader produces a Tree, an Intermediate Representation (IR) which is consumed by the generator. The Generator is a just a Strategy where I can have variety of them. In our case, the concrete is a ScalaGenerator, that produces Lambda Expressions for each of the transforms specified in the delta files. So, the Generator walks the Tree and produces a converted output.

public interface Generator<T> {
  T generate(TransformType transformType, Tree representation)
}

Here is the Reader, the consumes the delta files given to it. For each delta file, I create a new GroovyShell and tell it to evaluate the code (delta file text). Result of shell evaluation is an object that is passed to my parser. It is here that the parser gets nodes where GStrings are already converted to String and ‘use’ is already converted to ‘using’ method name. Please see my earlier post where I explain in depth, how this is achieved.

public class Reader {
  private GroovyShell createNewShell() {
    ...
    ...
    def configuration = new CompilerConfiguration()
    configuration.addCompilationCustomizers(secureCustomizer)
    new GroovyShell(configuration)
  }

  public Tree read(final List deltaFiles) {
    def parser = new Parser()
    deltaFiles.each { deltaFile ->
      def deltaFileAbsoluteName = deltaFile.absolutePath
      log.info("Reading $deltaFileAbsoluteName")
      def dsl = deltaFile.text
      def code = """{-> $dsl}"""
      //shell evaluates once, hence create new each time
      def shell = createNewShell()
      def delta = shell.evaluate(code, deltaFileAbsoluteName)
      try {
        use (FileExtension) {
          parser.parse(deltaFile.changeSet(), delta)
        }
      } catch (Throwable t) {
        throw new InvalidGrammar("$deltaFileAbsoluteName --> ${t.message}")
      }
      shell = null
    }
    parser.ast()
  }
}

Now, the real fun starts. I know it took us quite long to get here, but in my view it was important to see the overall big picture and how every element contributes towards the final output in the scheme of things. Let us now look at the Parser.

@Slf4j
class Parser {
  private Tree tree = new Tree()

  @CompileStatic
  def getProperty(String name) {
    log.debug("property name is: $name")
    if(name == 'db') {
      return tree.currentDB()
    }
    tree.using(name)
  }

  def using(db) {
    log.info "Changing db context $db"
  }

  public Tree parse(Long changeSet, Closure closure) {
    tree.updateCS(changeSet)
    def cloned = closure.clone()
    cloned.delegate = this
    cloned.resolveStrategy = Closure.DELEGATE_FIRST
    cloned()
    tree
  }

  def ast() {
    tree
  }
}

When the Parser’s parse method starts it work, the first line that it encounters is the use test. This invokes the using(db) method on the Parser class. But before Groovy can invoke the using method, it needs to resolve the property db. As it does not find test property anywhere on the Parser class, it invokes the getProperty method. getProperty method is a part of Groovy MOP, where in if a property is not found, then it throws MissingPropertyException. But here, as we have provided getProperty implementation, it first invokes it and gives it a chance to handle that property. In the body of the getProperty method, I check if the name of the property is ‘db’. If it is so, then the Tree is told to return the current database tree.currentDB(). In case, if it’s value is not ‘db’ then, I know that the user is wanting to either:

Create a new database (in our case – test)
Or use an existing database (test – in the example above)

The above behavior is similar to what one would see on MongoDB shell. The Tree is again told to do that by invoking the tree.using(db) method on it. The Tree‘s using (String dbName) is self explanatory, implementing the above 2 bullet points. Below is the Tree object.

@Slf4j
public class Tree {
  @Delegate
  private final Context ctx = new Context()
  private final Map<String, Database> databases = [:]

  @CompileStatic
  def using(String dbName) {
    def database = createOrGetDatabase(dbName)
    updateDB(database)
    database
  }

  @CompileStatic
  private Database createOrGetDatabase(String name) {
    if (databases.containsKey(name)) {
      log.info("Using database $name")
      databases[name]
    } else {
      log.info("Creating Database $name")
      databases[name] = new Database(name, ctx)
    }
  }

  @CompileStatic
  def eachWithVersionedMap(TransformType transformType, Closure closure) {
    databases.each { name, Database database ->
      database.eachWithVersionedMap(transformType, closure)
    }
  }
}

While itself acting as an Aggregate Root for Intermediate Representation, the Tree also holds the Context. What is this Context all about? Well, just so that you have a link, the Context is just an object that holds the database on which the transformations are being applied and hence it is DatabaseAware and also needs to know to which ChangeSet does this transformation belong to and hence the Context is also ChangeSetAware. Take a look below.

class Context implements ChangeSetAware, DatabaseAware {
  private Long cs = 0
  private Database db

  @Override
  def updateCS(Long cs) {
    this.cs = cs
  }

  @Override
  def resetCS() {
    cs = 0
  }

  @Override
  def currentCS() {
    cs
  }

  @Override
  def currentDB() {
    db
  }

  @Override
  def updateDB(Database db) {
    this.db = db
  }
}

Now, for the next bit, once the Database object is returned (either existing or fresh), the ‘collection’ property on gets invoked. So we are at


   db.customers.add("{'city' : 'Please Set City', 'pin':  'Please Pin code' }")

in our journey. Again, this needs to be tackled in a fashion similar to how use was, that is, I don’t know beforehand, to what collection the user wants to have this transformation applied to. To make this happen, I again resort to Groovy MOP and make sure I provide a getProperty implementation. If the collection ‘customers’ existed before I would return it, else I would create a new Collection object. Take a look below

@ToString
@Slf4j
class Database {
  final String name
  private final Map collections = [:]
  private final Context ctx

  @CompileStatic
  Database(String name, Context ctx) {
    this.name = name
    this.ctx = ctx
  }

  @CompileStatic
  def getProperty(String name) {
    if(collections.containsKey(name)) {
        log.debug("Using Collection with $name")
        collections[name]
    } else {
        log.info("Creating Collection $name")
        collections[name] = new Collection(name, ctx)
    }
  }

  @CompileStatic
  def eachWithVersionedMap(TransformType transformType, Closure closure) {
     def dbName = this.name
     collections.each { String name, Collection collection ->
        closure(dbName, name, collection.asVersionedMap(transformType))
     }
  }

  def String toString() {
    "${getClass().simpleName}: $name $collections"
  }
}

Now, for the final bit the add method that users write in the DSL. So we are at in the journey of DSL statement resolution


   db.customers.add("{'city' : 'Please Set City', 'pin':  'Please Pin code' }")

Again, Groovy MOP at rescue. On the Collection object, such a method is not present and I want to keep DSL open to future extensions, so that I can accommodate more command Verbs in future. The invokeMethod is available on any GroovyObject and you can do a lot of stuff with it. One of usecases can be to implement method interception, like the AOP or to synthesize methods that never existed on the class before. Here I am using it neither to synthesize methods nor as an AOP, but for adding series of transformations that are specified by the DSL writer to be applied on that collection. So, when Groovy encounters


add("{'city' : 'Please Set City', 'pin':  'Please Pin code' }")

on the customers collection, it tries to locate that method. But as this method is not present, it calls invokeMethod. Please take a look at Using invokeMethod and getProperty documentation for details.

@Slf4j
class Collection {
  final String name
  private final Map<Double, Tuple> versionedExpansions = [:] as LinkedHashMap
  private final Map<Double, Tuple> versionedContractions = [:] as LinkedHashMap
  private Double curExpansionVersion = 1
  private Double curContractionVersion = 1
  private final Context ctx

  Collection(String name, Context ctx) {
    this.name = name
    this.ctx = ctx
  }

  def invokeMethod(String name, args) {
      log.info("${this.name} invokeMethod: Operation $name with $args")

      Verb verb = asVerb(name)
      def parameters = args? args as List<String> : []
      verb.validate(parameters)
      def changeSet = ctx.currentCS()
      if (verb.isExpansion()) {
         log.info("${this.name} Adding Expansion $verb with $args to changeSet $changeSet")
         versionedExpansions[curExpansionVersion++] = new Tuple(verb, args, changeSet)
         return
      }
      if (verb.isContraction()) {
        log.info("${this.name} Adding Contraction $verb with $args to changeSet $changeSet")
        versionedContractions[curContractionVersion++] = new Tuple(verb, args, changeSet)
        return
      }
  }

  @CompileStatic
  private Verb asVerb(String token) {
    try {
       Verb.valueOf(token)
    } catch (IllegalArgumentException iae) {
      throw new InvalidGrammar("Sorry!! Midas Compiler doesn't understand $token")
    }
  }

  @CompileStatic
  def asVersionedMap(TransformType transformType) {
    Map<Double, Tuple> versionedTransforms = null
    if(transformType == EXPANSION) {
      versionedTransforms = versionedExpansions
    }
    if(transformType == CONTRACTION) {
      versionedTransforms = versionedContractions
    }
    versionedTransforms
  }
}

In the invokeMethod, I convert the method name to Verb object, and tell the verb to validate its parameters length and each parameter’s type. And based on how the Verb is classified, i.e, either as an Expansion or Contraction type verb, I put that into the appropriate map in the Collection object along with changeSet and expansionVersion or contractionVersion as deemed by the TransformType. It is this invokeMethod where each Verb is analyzed for its meaning. So the Collection object also doubles up as Semantic Analyser.

So, the domain model for Tree looks like, which is pretty much close to MongoDB’s model where in you have a server containing several databases and each database containing several collections.


 +------+      +----------+      +------------+      +------+
 | Tree |+---->| Database |+---->| Collection |+---->| Verb |
 +------+ 1  * +----------+ 1  * +------------+ 1  * +------+

Now to allow any client in our case the Generator object to walk the Tree I have provided the eachWithVersionedMap(TransformType transformType, Closure closure) that takes in a closure (allowing client to do what it can) and this method does internal iteration without breaking encapsulation (read without leaking internal structure) like Database, Collection and Verb objects to the outside world.

For the curious, here is how the Verb looks like

public enum Verb {
  @Expansion @ArgsSpecs(ArgType.JSON)
  add,

  @Expansion @ArgsSpecs({ ArgType.Identifier, ArgType.Identifier })
  copy,

  @Expansion @ArgsSpecs({ ArgType.Identifier, ArgType.String, ArgType.JSON })
  split,

  @Expansion @ArgsSpecs({ ArgType.JSON, ArgType.String, ArgType.Identifier })
  merge,

  @Expansion @ArgsSpecs({ ArgType.Identifier, ArgType.JSON })
  transform,

  @Contraction @ArgsSpecs(ArgType.JSON)
  remove;

  private Annotation getAnnotation(final Class<? extends Annotation> annotationClass) {
    try {
      return Verb.class
        .getField(name())
        .getAnnotation(annotationClass);
    } catch (NoSuchFieldException e) {
      return null;
    }
  }

  public boolean isExpansion() {
    return getAnnotation(Expansion.class) != null;
  }

  public boolean isContraction() {
    return getAnnotation(Contraction.class) != null;
  }

  public void validate(final List<String> args) {
    ArgsSpecs annotation = (ArgsSpecs) getAnnotation(ArgsSpecs.class);
    if (annotation == null) {
      throw new InvalidGrammar("You seem to have forgotten @ArgsSpecs on verb " + name());
    }
    ArgType[] types = annotation.value();
    validateArgsLength(args, types);
    validateArgsValues(args, types);
  }

  private void validateArgsValues(final List<String> args,
       final ArgType[] types) {
    for (int index = 0; index < types.length; index++) {
      types[index].validate(args.get(index));
    }
  }

  private void validateArgsLength(final List<String> args, final ArgType[] types) {
    if (types.length != args.size()) {
      final String errMsg = "Wrong number of arguments supplied for %s, Required %d, Found %d";
      throw new InvalidGrammar(String.format(errMsg,
               name(),
               types.length,
               args.size()));
    }
  }
}

Phew that was a long post..but I hope it was helpful.