To give a context on this problem, on my current project, I have created a embedded DSL that uses Groovy as the host language. This DSL closely resembles MongoDB lingo.

An example would be:

// Sample Delta file
use test
db.customers.add("{'city' : 'Please Set City', 'pin':  'Please Pin code' }")

//Increment age field by 1
db.customers.transform('age', "{ $add: ["$age", 1] }")

// Set context to transactions db
use transactions

// add to orders collection a nested document
db.orders.add('{"dispatch" : { "status" : "Default", "address": { "line1" : "Road", "city": "City" }}}')

Like the Mongo Shell, I wanted to support command arguments that can either be wrapped in a single or a double quoted String. Same as JavaScript where you can use quotes inside a string, as long as they don’t match the quotes surrounding the string. When I want to do that I hit two problems right away:

  1. use is a DefaultGroovyMethod for pimping your library that is used by Groovy Categories, quite similar to the implicit conversions in Scala and extension methods in C#.
  2. Double-quoted strings for arguments in functions – add, transform are GStrings in Groovy that support string interpolation using the $ insertion – as they say in the Groovy world and you probably have heard it – “You need a $ in GString ;)”. It evaluates the expression following the dollar sign and substitutes the evaluation result in its place in the output string. GStrings are lazily evaluated, that is, they are not evaluated until toString() is called on them or they are passed around as parameters in functions, where a function call causes it to be evaluated. As you can see in the above example, $age will cause problems when the GString is evaluated by the parser that parses this. It won’t know where to get the value of $age during GString evaluation and would throw a fit.

Well, I could come up with a hack. Lets not use – use and instead choose a different verb – say, using. But for the second problem, how would I stop the user from entering double quoted strings in function arguments? Putting a caveat in documentation would mean being non-proactive and demand a disciplined user. So this one cannot be hacked. Both these problems, sounded like acting at a compiler level in some form or the other. Here is how I solved it, much like killing two birds with one stone!

Groovy offers a way to visit the Abstract Syntax Tree (AST) and transform it. An AST is an intermediate representation that the compiler generates during the compilation phase. It is this AST that gets used to generate another translation or bytecodes. Groovy provides a hook in the form of ASTTransformation that allows us to add or modify this tree during execution of a specific compiler phase. A class that implements this interface must annotate it with @GroovyASTTransformation so that Groovy knows which compile phase to run in. As I am dealing with global AST transformation, the visit method is called once for the sourceUnit, i.e. the actual source code and I’ll ignore the first and the second entries in the ASTNode[] array. Here is my ASTTransformation code.

@Slf4j
@GroovyASTTransformation
public class StatementTransformation implements ASTTransformation {
  private def transformations = ['use' : 'using']

  @Override
  void visit(ASTNode[] nodes, SourceUnit source) {
    log.info("Source name = ${source.name}")
    ModuleNode ast = source.ast
    def blockStatement = ast.statementBlock

    blockStatement.visit(new CodeVisitorSupport() {
      void visitConstantExpression(ConstantExpression ce) {
        def name = ce.value
        if (transformations.containsKey(name)) {
          def newName = transformations[name]
          log.debug("Transform Name => $name -> $newName")
          ce.value = newName
        } else {
          log.debug("Skip Name => $name")
        }
      }

      public void visitArgumentlistExpression(ArgumentListExpression ale) {
        log.debug("Arg List $ale.expressions")
        def expressions = ale.expressions
        expressions.eachWithIndex { expr, idx ->
          if(expr.getClass() == GStringExpression) {
            log.debug("Transform GString => String ($expr.text)")
            expressions[idx] = new ConstantExpression(expr.text)
          }
        }
        log.debug("Transformed Arg List $ale.expressions")
        super.visitArgumentlistExpression(ale)
      }
    })
  }
}

In the code above:

  1. visitConstantExpression(...) gets called when a constant like, use, db, customers, add, transform, fn params etc… are encountered. Based on what is defined in the transformations map (Line 4), a transformation is applied by simple assignment to the value field of ConstantExpression (Line 18).
  2. visitArgumentlistExpression gets called when there is a function call. In my case db.customers.transform(...) and db.customers.add(...) are function calls and the entire argument list gets passed to this visitArgumentlistExpression method. It is here that I inspect the each argument for occurrence of a GStringExpression and convert it to a ConstantExpression (Line 30).

Here is how you would then use the above transformation.

The Reader reads the DSL files, in my case, we are calling them as delta files. For each delta file, I create a new GroovyShell and tell it to evaluate the code (delta file text). This shell is configured using my custom AST transformer – StatementTransformation. Result of shell evaluation is an object that is passed to my parser. It is here that the parser gets nodes where GStrings are already converted to String and ‘use’ is already converted to ‘using’ method name.

@Slf4j
public class Reader {
  private def createNewShell() {
    def secureCustomizer = new SecureASTCustomizer()
    secureCustomizer.with {
      methodDefinitionAllowed = false // user will not be able to define methods
      importsWhitelist = [] // empty whitelist means imports are disallowed
      staticImportsWhitelist = [] // same for static imports
      staticStarImportsWhitelist = []
      ....
    }

    def astCustomizer =
      new ASTTransformationCustomizer(new StatementTransformation())
    def config = new CompilerConfiguration()
    config.addCompilationCustomizers(secureCustomizer,
                          astCustomizer)
    new GroovyShell(config)
  }

  public Tree read(final List<File> deltas) {
    def parser = new Parser()
    deltas.each { delta ->
      def deltaName = delta.name
      def dslCode = """{-> $delta.text}"""
      //shell evaluates once, hence create new each time
      def shell = createNewShell()
      def deltaObject = shell.evaluate(dslCode, deltaName)
      try {
        parser.parse(deltaObject)
      } catch (Throwable t) {
        throw new InvalidGrammar("$deltaName --> ${t.message}")
      }
      shell = null
    }
    parser.ast()
  }
}

Here is the Parser code. In here is the using(db) method that gets called after the custom transformation is applied. An astute reader may have noticed how I intercept property access using the getProperty method (a part of the the Groovy MOP – Meta-Object Protocol feature) to change the database context.

@Slf4j
class Parser {
  private Tree tree = new Tree()
  private def dbContext

  @CompileStatic
  def getProperty(String name) {
    log.debug("property name is: $name")
    if(name == 'db') {
      return dbContext
    }
    tree.using(name)
  }

  def using(db) {
     log.info "Setting db context to ${db.toString()}"
     dbContext = db
  }

  public Tree parse(Closure closure) {
    def cloned = closure.clone()
    cloned.delegate = this
    cloned.resolveStrategy = Closure.DELEGATE_FIRST
    cloned()
    tree
  }

  def ast() {
    tree
  }
}