Software Artisan's Blog

~ onLife >> onMusic >> onDesign >> onPatterns >> onRefactoring >> onAgile

Tag Archives: Scala

Scanning Annotations in Classpath and Sealed Factories

26 Monday May 2014

Posted by Dhaval Dalal in Scala

≈ Leave a comment

Tags

Scala


To give a bit of context on this, on my earlier project Midas, we developed transformation functions (thats what we are called it) that quite closely resemble MongoDB’s aggregation-projection framework functions for Arithmetic operations (add, subtract, …) , String operations (concat, tolower, …) etc… Here is an example.

db.orders.transform('totalAmount', '{ $subtract: ["$totalAmount", { $multiply: ["$totalAmount", 0.2] } ] }')

We parse the above using Scala’s Parser Combinators and convert them to domain model objects which are sub-types of Expression. Each expression is either a Literal, a Field expression or a Function. We have a fairly wide and sufficiently deep hierarchy of Functions and is depicted below

                           +----------+
                           |Expression|
                           +----------+
                      ______/   |    \_____
                     /          |          \
                +-------+   +-------+   +--------+
                |Literal|   | Field |   |Function|
                +-------+   +-------+   +--------+
                        ________________/    |   \_________________
                       /                     |                     \
              +------------------+        +--------------+      +------------+
     _________|ArithmeticFunction|        |StringFunction|      |DateFunction|
    /         +------------------+        +--------------+      +------------+
   /       ______/   |    \___  \_____         |   |____ \_______
  /       /          |        \       \        |        \        \
+---+ +--------+ +------+ +--------+ +---+  +-------+ +-------+ +------+
|Add| |Multiply| |Divide| |Subtract| |Mod|  |ToLower| |ToUpper| |Concat|
+---+ +--------+ +------+ +--------+ +---+  +-------+ +-------+ +------+

In our case, we need to be able to support various arithmetic functions, string functions and other data functions. We also would not know how many more functions we would need to support in the future. Obviously, adding that functionality would manifest as classes added to the hierarchy which in turn means that I have to modify the factory function each time a new class gets added to the Function hierarchy.
the factory function looks something like this.

def fn: Parser[Expression] = fnName~":"~fnArgs ^^ {
  case "add"~":"~args      =>  Add(args: _*)
  case "subtract"~":"~args =>  Subtract(args: _*)
  case "multiply"~":"~args =>  Multiply(args: _*)
  case "divide"~":"~args   =>  Divide(args: _*)
  case "concat"~":"~args   =>  Concat(args: _*)
  ...
  ...
}

So, in other words, the factory method is not sealed against these kind of changes. Goal is to get to a place where the case statements don’t exist. As pointed out earlier in one of my blog-posts Achieving Explicit Closure for a Simple Factory using Annotations… many years back, I decided to seal the factory method so that this concern is taken away from the developer.

The approach is to mark behavior-implementing sub-types of the Function, pick them up from the classpath and store them in cache so that they can then be instantiated when required. As we are on the JVM, we use a custom annotation @FunctionExpression to mark the Function sub-types.

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.TYPE)
public @interface FunctionExpression {
    Class value();
}

Here is how it is applied to a Function sub-type.

@FunctionExpression
final case class ToLower(expression: Expression) extends StringFunction(expression) {
  def evaluate(document: BSONObject) = {
    val string = value(expression.evaluate(document))
    Literal(string.toLowerCase)
  }
}

Next, we do need to scan the classpath for all the classes annotated with @FunctionExpression annotation. It turns out that using reflection to check for the presence of the annotation is not only expensive time-wise, but also it loads the class into memory and thus causing the heap to grow.

ASM from Object Web is a bytecode manipulation and analysis library that does not need to load the class into the JVM and is very performant in doing such an analysis. Also, there are many open-source frameworks like Scannotation or Reflections that use ASM to scan annotations in classpath. Scannotation is not under active development and I did not need many features from Reflections for such a small work that I need to do. So instead of using these frameworks, I just wrote a custom AnnotationScanner that uses ASM’s ClassVisitor

class AnnotationScanner(pkg: String, annotationClass: Class[_]) {
  private val fsSlashifiedPkg = fsSlashify(pkg)

  private val slashifiedPkg = slashify(pkg)

  private val slashifiedAnnotation = slashify(annotationClass.getName)

  private val classLoader = AnnotationScanner.this.getClass.getClassLoader

  private val pkgURI = classLoader.getResource(slashifiedPkg).toURI

  private var startDir: Path = null

  val pkgURIString = pkgURI.toString
  if(pkgURIString.startsWith("jar")) {
    val (jar, _) = pkgURIString.splitAt(pkgURIString.indexOf("!"))
    val jarUri = URI.create(jar)
    import scala.collection.JavaConverters._
    FileSystems.newFileSystem(jarUri, Map[String, AnyRef]().asJava)
  }
  startDir = Paths.get(pkgURI)

  private val dirWalker = new DirWalker(startDir, Pattern.compile(".*\.class$"))

  private def fsSlashify(string: String) = string.replaceAllLiterally(".", File.separator)

  private def slashify(string: String) = string.replaceAllLiterally(".", "/")

  private def classesInPackage: Set[String] = {
    dirWalker.walk map { file =>
      val index = if(pkgURIString.startsWith("jar"))
                    file.indexOf(slashifiedPkg)
                  else
                    file.indexOf(fsSlashifiedPkg)

      val className = file.substring(index)
      className.replaceAllLiterally(".class", "")
    }
  }

  private def hasAnnotation(annotationClass: Class[_], className: String): Boolean = {
    val slashifiedClassName = fsSlashify(className)
    var foundAnnotation = false
    val cv = new ClassVisitor(Opcodes.ASM4) {
      // Invoked when a class level annotation is encountered
      override def visitAnnotation(desc: String, visible: Boolean): AnnotationVisitor = {
        val annotation = desc.substring(1, desc.length - 1)
        if (annotation == slashifiedAnnotation)
          foundAnnotation = true
        super.visitAnnotation(desc, visible)
      }
    }
    val in = classLoader.getResourceAsStream(slashifiedClassName + ".class")
    try {
      val classReader = new ClassReader(in)
      classReader.accept(cv, 0)
    } catch {
      case _: Throwable =>
    } finally {
      in.close()
    }
    foundAnnotation
  }

  private def dotify(string: String) = string.replaceAllLiterally("/", ".").replaceAllLiterally("\", ".")

  def scan = {
    val classes = classesInPackage
    classesInPackage.filter(className => hasAnnotation(annotationClass, className)).map(dotify)
  }
}

Here is the DirWalker implemented using the Java7 Paths that walks down from a start directory down recursively.

class DirWalker (startDir: Path, collectFilesRegex: Pattern)  {
  private val files = scala.collection.mutable.Set[String]()

  private val visitor = new SimpleFileVisitor[Path] {
    override def visitFile(path: Path, mainAtts: BasicFileAttributes) = {
      val file = path.toAbsolutePath.toString
      val matcher = collectFilesRegex.matcher(file)
      if(matcher.matches) {
        files += file
      }
      FileVisitResult.CONTINUE
    }

    override def visitFileFailed(path: Path, exc: IOException) = {
      log.info(s"Continuing Scanning though visiting File has Failed for $path, Message ${exc.getMessage}")
      FileVisitResult.CONTINUE
    }
  }

  def walk = {
    files.clear
    Files.walkFileTree(startDir, visitor)
    files.toSet
  }
}

Finally, I need a place from where I can use AnnotationScanner. What better place could I have found other than placing this under Function as a Singleton factory for creating its sub-types.

sealed abstract class Function(expressions: Expression*) extends Expression {
  override def toString = s"""${getClass.getSimpleName}(${expressions mkString ", "})"""
}

object Function {
  lazy val functions = new AnnotationScanner("com.ee.midas", classOf[FunctionExpression])
    .scan
    .map { className =>
      val clazz = Class.forName(className).asInstanceOf[Class[Function]]
      clazz.getSimpleName.toLowerCase -> clazz
    }
    .toMap
    .withDefaultValue(classOf[EmptyFunction])

  def apply(fnName: String, args: Expression*): Function = {
    val fnClazz = functions(fnName.toLowerCase)
    val constructor = fnClazz.getConstructor(classOf[Seq[Expression]])
    log.debug(s"Instantiating Class $fnClazz...")
    constructor.newInstance(args)
  }
}

Eventually, the sealed parser now looks

  def fn: Parser[Expression] = fnName~":"~fnArgs ^^ { case name~":"~args => Function(name, args: _*) }
Advertisements

Share this:

  • Twitter
  • LinkedIn
  • Google
  • Reddit
  • Email
  • Facebook
  • Print

Like this:

Like Loading...

Memoization to the Rescue

17 Thursday Apr 2014

Posted by Dhaval Dalal in Scala

≈ Leave a comment

Tags

Scala


Context:

On my previous project, we wanted to transform a MongoDB document based on what user specified. Say for example, we are offering 20% discount on totalAmount, then we’d expect the user to say something like –

# Our Custom DSL:
#
# Transform a field by using a built-in Arithmetic Operators.
# It takes in an input field and an expression containing built-in
# arithmetic function and returns the result by setting it on the
# output field.
#
# syntax:
# db.things.transform("outputFieldName", "{ $add: ["$age", 1] }")
##################################################################

db.orders.transform('totalAmount', '{ $subtract: ["$totalAmount", { $multiply: ["$totalAmount", 0.2] } ] }')

Apart from the transform operation, there are many other operations that we offer to the user. For each operation we produce a corresponding lambda expression. Many such operations specified by the user result in a collection of lambda expressions that would be applied sequentially one-at-a-time to transform the existing document.

The lambda expression for the above transform operation looked something like this.

  def transform(outputField: String, expressionJson: String) : BSONObject => BSONObject = {
    ((document: BSONObject) => {
      try {
        val expression: Expression = parse(expressionJson)
        val literal = expression.evaluate(document)
        document + (outputField, literal.value)
      } catch {
        case t: Throwable => injectException(document, outputField, t)
      }
    })
  }

As you see in the above code, we return a lambda expression, that goes from document to document (BSONObject). Look at line 4, where we parse the expressionJson, a String to an Expression tree and then the tree is evaluated by consuming the document. This happens each time the lambda is called. This is a serious problem…why? Because,

  1. Each time the lambda is invoked, the parsing expression JsonString to expression tree is a expensive computation
  2. Parsing errors are not known until run-time, ideally I would be like to know them when my DSL is parsed

Solution:

I could move the parsing expression line outside of the lambda and satisfy both the above problems, that is, parsing will happen once and closure would close over the expression. This means effectively cached expression tree will be available each time and there would not be any performance penalty of parsing. It would also throw parse errors at DSL compile time, where I map user operation to lambda expression by calling the above transform function.

Forces:

  1. Preventing Force

    In order for me to move to such a solution, there were impediments – we allow users to change DSL at run-time, so that means the new transformation lambdas will be regenerated on-the-fly and the way application worked this out was to generate textual Scala file, compile it and hot-deploy the code into running JVM. So doing the parsing outside the body of the returned lambda meant losing the reference to the expression tree generated during DSL compilation. So the only option was to keep the parsing call within the body of lambda.

  2. Pushing Force

    One of the obvious pushing force is Performance, parsing again and again for every document in the collection, is an expensive computation to repeat, imagine 1000s of documents and many such transform operations, this will make it hellishly slow to respond to requests.

Which Force to tackle?:

Well, it turns out that Pushing Force was easy to nullify as compared to a long-term solution needed to solve the Preventing Force. This is where memoization comes to the rescue. Simply put, if you have a pure function (read side-effect free), then I know that calling it again and again with same set of inputs, results in the same output. This means that we can optimize the program by calling the function once, do the computation and cache its result. For future calls, return the result directly from the cache, thus avoiding that expensive computation. Thus, memoization gives us the ability compute-once, use-many times, a nice technique that trades space for time.

So below is the same function after applying memoization to it.

  def transform(outputField: String, expressionJson: String) : BSONObject => BSONObject = {
    ((document: BSONObject) => {
      try {
       val expression: Expression = Memoize(parse)(expressionJson)
        val literal = expression.evaluate(document)
        document + (outputField, literal.value)
      } catch {
        case t: Throwable => injectException(document, outputField, t)
      }
    })
  }

As Scala does not offer memoization out-of-box, you can easily hand-toss one or use libraries like Scalaz.

In the code below, I use a simple mutable map to cache the result of evaluation by using function argument as a key. So Memoize is a Function as it extends from Function1, that takes in single argument T1 and returns the result R. It is a class at the same time and its apply method first checks for the presence of the argument in the cache, if the argument is found, it returns the result of previously stored computation, else it evaluates the function, caches its result and then returns it to the caller.

import scala.collection.mutable.Map

class Memoize[-T1, +R] private (fn: T1 => R) extends (T1 => R) {
  private[this] val cache = Map[T1, R]()

  def apply(arg: T1): R = {
    if(cache.contains(arg)) {
      cache(arg)
    } else {
      val result = fn(arg)
      cache += ((arg, result))
      result
    }
  }
}

object Memoize {
  def apply[T1, R](fn: T1 => R) = new Memoize[T1, R](fn)
}

Below are the tests that were used to flush it out. We used the Specs2 framework.

@RunWith(classOf[JUnitRunner])
class MemoizeSpecs extends Specification {
  "Memoizer" should {

    def square(x: Int) = x * x

    "evaluate a function correctly" in {
      //Given
      val memoizedSquare = Memoize(square)

      //When
      val x = 2

      //Then
      memoizedSquare(x) mustEqual square(x)
    }

    "ensures it calls a function with same arguments just once and memoizes for later use" in {
      //Given
      var called = 0
      def aFunction(x: Int): (Int, Int) = {
        called += 1
        (x, called)
      }

      val memoizedFunction = Memoize(aFunction)

      //When
      val x = 1
      val resultOnce = memoizedFunction(x)
      val resultTwice = memoizedFunction(x)

      //Then
      resultOnce mustEqual (x, 1)
      resultTwice mustEqual (x, 1)
    }

    "calls a function with different arguments just once and memoizes for later use" in {
      //Given
      var called = 0
      def aFunction(x: Int): (Int, Int) = {
        called += 1
        (x, called)
      }

      val memoizedFunction = Memoize(aFunction)

      //When
      val x = 1
      val resultXOnce = memoizedFunction(x)
      val resultXTwice = memoizedFunction(x)

      val y = 2
      val resultYOnce = memoizedFunction(y)
      val resultYTwice = memoizedFunction(y)

      //Then
      resultXOnce mustEqual (x, 1)
      resultXTwice mustEqual (x, 1)

      resultYOnce mustEqual (y, 2)
      resultYTwice mustEqual (y, 2)
    }
  }
}

So, in the Memoize code above, you may have wondered, what is the use of private[this] on line 4 that is highlighted. Why can’t I just use private here?

When I used private here, I get the following 2 compile errors:

  1. contravariant type T1 occurs in invariant position in type => scala.collection.mutable.Map[T1,R] of value cache private val cache = Map[T1, R]()
  2. covariant type R occurs in invariant position in type => scala.collection.mutable.Map[T1,R] of value cache private val cache = Map[T1, R]()

To be honest with you I had to fight this one out, the compiler would not simply take in private and not much help from googling as well. Eventually, I could not find any better explanation in other than Martin Odersky’s book – Programming in Scala, Chapter 19, Section 19.7. Just inlining it here for reference

….You might wonder whether this code passes the Scala type checker. After all,
queues now contain two reassignable fields of the covariant parameter type T.
Is this not a violation of the variance rules? It would be indeed, except for
the detail that leading and trailing have a private[this] modifier and are thus
declared to be object private.

As mentioned in Section 13.5, object private members can be accessed only from within
the object in which they are defined. It turns out that accesses to variables from
the same object in which they are defined do not cause problems with variance. The
intuitive explanation is that, in order to construct a case where variance would
lead to type errors, you need to have a reference to a containing object that has a
statically weaker type than the type the object was defined with. For accesses to
object private values, however, this is impossible.

Scala’s variance checking rules contain a special case for object private definitions.
Such definitions are omitted when it is checked that a type parameter with
either a + or – annotation occurs only in positions that have the same variance
classification. Therefore, the code below compiles without error.

On the other hand, if you had left out the [this] qualifiers from the two private
modifiers, you would see two type errors:…

So, the private[this] is really needed here, because mutable Map[T1, R] here has invariant types.

Just to complete this post, eventually, we tackled the Preventing Force, where we no longer hot deploy the code by generating a text file, compiling it and hot deploying bytecode into the JVM. Now, the transform method finally looks like –

def transform(outputField: String, expressionJson: String) : BSONObject => BSONObject = {
  val expression: Expression = Try { parse(expressionJson) } match {
    case scala.util.Success(expr) => expr
    case scala.util.Failure(failure) => throw failure
  }
  ((document: BSONObject) => {
    try {
      val literal = expression.evaluate(document)
      document + (outputField, literal.value)
    } catch {
      case t: Throwable => injectException(document, outputField, t)
    }
  })
}

Thats it!

Share this:

  • Twitter
  • LinkedIn
  • Google
  • Reddit
  • Email
  • Facebook
  • Print

Like this:

Like Loading...

Dhaval Dalal

Profile

Dhaval Dalal

Slides

View DhavalDalal's profile on slideshare

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 290 other followers

Ideas

  • Code Jugalbandi

Open Source Projects

  • Java8 Exception Monad
  • Midas
  • Tayra

Recent Posts

  • Exploring Concurrency CodeJugalbandi February 6, 2019
  • Destructuring and Pattern Matching in Functional Programming December 29, 2018
  • Help yourself by making iTerm2 vivid May 30, 2018
  • How to disable failed discrete-GPU (NVIDIA GeForce GT 650M) for 15″ MacBook-Pro 10,1 (mid-2012) on High Sierra 10.13.4 May 29, 2018
  • Improvised Tasks.tmbundle for TextMate April 15, 2018

Calendar

February 2019
M T W T F S S
« Dec    
 123
45678910
11121314151617
18192021222324
25262728  

Categories

  • Agile (13)
  • APL (2)
  • Array-Oriented (1)
  • C# (14)
  • Clojure (2)
  • Code Jugalbandi (5)
  • Code Retreat (2)
  • Computer (2)
  • Education (1)
  • Erlang (3)
  • Functional Programming (10)
  • Functional Reactive (3)
  • General (34)
  • Groovy (15)
  • GWT (6)
  • Haskell (2)
  • Java (28)
  • JavaScript (10)
  • Life (2)
  • Microservices (1)
  • MongoDB (2)
  • Music (2)
  • Neo4J (1)
  • Sanskrit (4)
  • Scala (8)
  • Thoughts-Philosophy (9)
  • Yoga (2)

Archives

Follow me on Twitter

My Tweets

Blog Stats

  • 26,703 hits

Tags

Agile Code Jugalbandi Erlang Functional Programming General Groovy Homeopathy iTerm2 java Java8 Life Microservices reactjs rxjs Sanskrit Scala Textmate2 Thoughts-Philosophy

RSS

  • RSS - Posts
  • RSS - Comments
Advertisements

Create a free website or blog at WordPress.com.

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
%d bloggers like this: