To give a context on this problem, on my current project, I have created a embedded DSL that uses Groovy as the host language. This DSL closely resembles MongoDB lingo.
An example would be:
// Sample Delta file use test db.customers.add("{'city' : 'Please Set City', 'pin': 'Please Pin code' }") //Increment age field by 1 db.customers.transform('age', "{ $add: ["$age", 1] }") // Set context to transactions db use transactions // add to orders collection a nested document db.orders.add('{"dispatch" : { "status" : "Default", "address": { "line1" : "Road", "city": "City" }}}')
Like the Mongo Shell, I wanted to support command arguments that can either be wrapped in a single or a double quoted String. Same as JavaScript where you can use quotes inside a string, as long as they don’t match the quotes surrounding the string. When I want to do that I hit two problems right away:
use
is a DefaultGroovyMethod for pimping your library that is used by Groovy Categories, quite similar to the implicit conversions in Scala and extension methods in C#.- Double-quoted strings for arguments in functions – add, transform are
GString
s in Groovy that support string interpolation using the $ insertion – as they say in the Groovy world and you probably have heard it – “You need a $ in GString ;)”. It evaluates the expression following the dollar sign and substitutes the evaluation result in its place in the output string. GStrings are lazily evaluated, that is, they are not evaluated untiltoString()
is called on them or they are passed around as parameters in functions, where a function call causes it to be evaluated. As you can see in the above example, $age will cause problems when the GString is evaluated by the parser that parses this. It won’t know where to get the value of $age during GString evaluation and would throw a fit.
Well, I could come up with a hack. Lets not use – use
and instead choose a different verb – say, using
. But for the second problem, how would I stop the user from entering double quoted strings in function arguments? Putting a caveat in documentation would mean being non-proactive and demand a disciplined user. So this one cannot be hacked. Both these problems, sounded like acting at a compiler level in some form or the other. Here is how I solved it, much like killing two birds with one stone!
Groovy offers a way to visit the Abstract Syntax Tree (AST) and transform it. An AST is an intermediate representation that the compiler generates during the compilation phase. It is this AST that gets used to generate another translation or bytecodes. Groovy provides a hook in the form of ASTTransformation
that allows us to add or modify this tree during execution of a specific compiler phase. A class that implements this interface must annotate it with @GroovyASTTransformation
so that Groovy knows which compile phase to run in. As I am dealing with global AST transformation, the visit method is called once for the sourceUnit, i.e. the actual source code and I’ll ignore the first and the second entries in the ASTNode[] array. Here is my ASTTransformation code.
@Slf4j @GroovyASTTransformation public class StatementTransformation implements ASTTransformation { private def transformations = ['use' : 'using'] @Override void visit(ASTNode[] nodes, SourceUnit source) { log.info("Source name = ${source.name}") ModuleNode ast = source.ast def blockStatement = ast.statementBlock blockStatement.visit(new CodeVisitorSupport() { void visitConstantExpression(ConstantExpression ce) { def name = ce.value if (transformations.containsKey(name)) { def newName = transformations[name] log.debug("Transform Name => $name -> $newName") ce.value = newName } else { log.debug("Skip Name => $name") } } public void visitArgumentlistExpression(ArgumentListExpression ale) { log.debug("Arg List $ale.expressions") def expressions = ale.expressions expressions.eachWithIndex { expr, idx -> if(expr.getClass() == GStringExpression) { log.debug("Transform GString => String ($expr.text)") expressions[idx] = new ConstantExpression(expr.text) } } log.debug("Transformed Arg List $ale.expressions") super.visitArgumentlistExpression(ale) } }) } }
In the code above:
visitConstantExpression(...)
gets called when a constant like, use, db, customers, add, transform, fn params etc… are encountered. Based on what is defined in the transformations map (Line 4), a transformation is applied by simple assignment to the value field ofConstantExpression
(Line 18).visitArgumentlistExpression
gets called when there is a function call. In my casedb.customers.transform(...)
anddb.customers.add(...)
are function calls and the entire argument list gets passed to this visitArgumentlistExpression method. It is here that I inspect the each argument for occurrence of aGStringExpression
and convert it to aConstantExpression
(Line 30).
Here is how you would then use the above transformation.
The Reader
reads the DSL files, in my case, we are calling them as delta files. For each delta file, I create a new GroovyShell and tell it to evaluate the code (delta file text). This shell is configured using my custom AST transformer – StatementTransformation
. Result of shell evaluation is an object that is passed to my parser. It is here that the parser gets nodes where GString
s are already converted to String and ‘use’ is already converted to ‘using’ method name.
@Slf4j public class Reader { private def createNewShell() { def secureCustomizer = new SecureASTCustomizer() secureCustomizer.with { methodDefinitionAllowed = false // user will not be able to define methods importsWhitelist = [] // empty whitelist means imports are disallowed staticImportsWhitelist = [] // same for static imports staticStarImportsWhitelist = [] .... } def astCustomizer = new ASTTransformationCustomizer(new StatementTransformation()) def config = new CompilerConfiguration() config.addCompilationCustomizers(secureCustomizer, astCustomizer) new GroovyShell(config) } public Tree read(final List<File> deltas) { def parser = new Parser() deltas.each { delta -> def deltaName = delta.name def dslCode = """{-> $delta.text}""" //shell evaluates once, hence create new each time def shell = createNewShell() def deltaObject = shell.evaluate(dslCode, deltaName) try { parser.parse(deltaObject) } catch (Throwable t) { throw new InvalidGrammar("$deltaName --> ${t.message}") } shell = null } parser.ast() } }
Here is the Parser code. In here is the using(db)
method that gets called after the custom transformation is applied. An astute reader may have noticed how I intercept property access using the getProperty method (a part of the the Groovy MOP – Meta-Object Protocol feature) to change the database context.
@Slf4j class Parser { private Tree tree = new Tree() private def dbContext @CompileStatic def getProperty(String name) { log.debug("property name is: $name") if(name == 'db') { return dbContext } tree.using(name) } def using(db) { log.info "Setting db context to ${db.toString()}" dbContext = db } public Tree parse(Closure closure) { def cloned = closure.clone() cloned.delegate = this cloned.resolveStrategy = Closure.DELEGATE_FIRST cloned() tree } def ast() { tree } }