Tags


This October, I had the opportunity to learn Erlang at a pre-conference workshop – Deep Dive into Erlang Ecosystem by Robert Virding, one of the co-creators of the language during Functional Conference 2016, Bengaluru. Many things surprised me, especially the philosophy behind letting things fail, no custom data-types to facilitate upgrading things while the system is running, share nothing, massively scalable etc… all baked into the language and down below till the VM. Erlang is a functional language, though it did not start out to be one. It evolved to it. The same applies to the Actor model in Erlang as well. All these got introduced in the language to solve real-world problems and the simplicity with which these things got approached.

John Hughes in his paper – Why Functional Programming Matters highlights that structuring software using complicated scope rules, separate compilation units are help only with clerical details for modularisation. However, the two great features of functional languages, Higher-Order Functions (HOFs) and Lazy Evaluation (Separating generation and selection) contribute significantly to modularity. Additionally, in my view, Erlang brings the modularity of functions to processes to the above. This takes modularity to runtime, where a function can become a first-class process. You don’t have to do anything special to that function, by simply using spawn call, a function can run in a separate process. Thus, partitioning of the application based on load becomes dynamic. This axis of modularity does not pervade through the physical structure while organising the code. This is really a big thing to me.

I’ll jot down a few things that I managed to scribe during the workshop. Robert started with Sequential Erlang, followed by Concurrent Erlang:

Sequential Erlang

LANGUAGE:

  • No equivalence for nulls in Erlang.
  • No Character data-type.
  • No proper string data-type. Just list of characters and shell presents pretty-printed string – a syntactic sugar. For example – [$H, $e, $l, $l, $o]. prints "Hello"
  • Atoms (concept borrowed from Prolog) – Its value is itself. They are constant literals.
    • For example – foo. is an atom that points to itself foo.
    • They are unique, only one in the system with the name foo.
    • For example – all module names, function names are atoms.
    • c(sample) prints tuple {sample, ok} is also an atom.
    • All atoms are stored in atom table (size about 1 million).
    • You can convert atom_to_list(foo).
    • When to use atom? If you do things dynamically, you have to be very careful not to use it.
    • Converting list_to_atom("123"). gives '123'
  • You cannot create new data-types in Erlang.
    • A common convention to create a data-type is to create a tuple with first element as the type name. For example – {person, 'Joe', 'Armstrong'}
    • There are no constructors. The act of creating data structure is writing data itself.
    • So, if you want to create a type, you can use the first element in the tuple to be a type discriminator atom. For example –
               % define a square with side 10
               S = {square, 10}.
               % define a circle with radius 20
               C = {circle, 20}
               % define a triangle with length of 3 sides
               T = {triangle, 10, 20, 30}
               
  • Erlang is functional. All Data is immutable. Its all the way down to Erlang VM. Many things are implemented at VM level, because its very fundamental.
  • Variables cannot be re-bound. Variables beginning with _ are special (called don’t care variable)
  • There are no global variables. Erlang does not share data, there is no global data and no mutation of data is allowed.
  • 13> A.
    * 1: variable 'A' is unbound
    14> A = 1.
    1
    15> A = 2.
    ** exception error: no match of right hand side value 2
    16> A.
    1
    

    To forget the value of A. (It works only in a shell, not in a function)

    17> f(A).
    ok
    18> A.
    * 1: variable 'A' is unbound
    19> A = 2.
    2
    20> A.
    2
    

PATTERN MATCHING:

  • = is a pattern match operator. This is the second big thing. It should not be confused with assignment. So, A = 10 is pattern matching. The general structure is – Pattern = Expression.
  • Expression cannot contain unbound variables.
  • Lists. List = [element | List] or []. Cons operator is – |. For example –
    21> [1 | []].
    [1]
    22> [1 | [2, 3]].
    [1,2,3]
    23>[Head|Rest] = [1, 2, 3, 4].
    24>Head.
    1.
    25>Rest.
    [2, 3, 4].
    26>[H|T] = [].
    ** exception error: no match of right hand side value []
    
  • Erlang is probably the only language where you have binary pattern matching and pattern matching on bitstrings. This makes it easy to grab parts of a binary packet and do validations on them.

FUNCTIONS:

  • Always prefix module names, don’t import entire module. module:function is the way it can be called.
  • Declaring function example –
         circumference(R) -> 
            2 * math::pi() * R
       

    Write variables with upper-case, example R in above case, if you use lower-case, it becomes an atom and not treated as a variable.

  • Also product(X, Y) -> X * Y. and product(X, Y, Z) -> X * Y * Z. are different functions.
  • You can do multiple dispatch based on patterns matched. For example –
          area({square, Side}) -> 
             Side * Side;
          area({circle, Radius}) -> 
             3.14 * Radius * Radius;
          area({triangle, A, B, C}) ->
             math:sqrt(S*(S-A)*(S-B)*(S-C)).      
        

As it is said, two most difficult things in computer science is naming and doing distributed computing. Erlang has lambdas to handle the naming and for distributed computing, it has built-in support.

MODULES:

It is a construct that allows related functions to be grouped together.

  • Module name has to be same as file name. Store “`demo“` in “`demo.erl“`.
  •     -module(demo). % This is the first directive
        -export(double/1) % exportable functions
        
        double(X) ->
          times(2, X).
       
        % private function
        times(X, Y) ->
          X * Y. 
        
  • Exported functions can be called from outside the module. In the above case double is exported. The function name is followed by its arity, in our case the function double has arity 1.
  • Unexported functions have private visibility.
  • Upgrade while the system is running is one of the key principles behind reloading modules. This is one of the reasons why Erlang does not have user-defined types because they will come in the way of hot-loading of the modules.

NODES:

An Erlang node is composed of many Erlang Processes. An Erlang Process is not an OS level process. However, Erlang system is an OS process. For example, to start 2 Erlang systems – on the shell prompt, you can say

$ erl -sname foo
$ erl -sname bar
  • An Erlang System can spawn millions of Erlang Processes.
  • Things that are shared in an Erlang System across Erlang Processes are –
    • There is one atom table per Erlang System and
    • they all see the same code.
  • You send messages across Erlang Processes. Two Erlang Processes also communicate by sending messages, its PID etc.. and then marshal/unmarshall data across processes. You don’t need marshalling/unmarshalling across Erlang Processes.

EXCEPTIONS:

Erlang is big on errors. There is no defensive programming in Erlang. System is designed to handle errors. Whenever an exception occurs, instead of handling the exception, Erlang lets its process crash and starts a new one in a hope that it can take care. It died and I restarted. This is fundamentally a different way of thinking about handling exceptions. When you start handling exceptions, the client code gets polluted with lot of exception handling. Erlang is not scared of crashing Erlang processes. So, if an exception occurs, the Erlang Process is killed, not the whole Erlang System. Closest Metaphor that I can think of is: each Erlang process is like a cell in the body, if one dies, create a new one.

At any given time you can check the processes running in an Erlang system by-

45> processes().
[<0.0.0>,<0.1.0>,<0.4.0>,<0.30.0>,<0.31.0>,<0.33.0>,
 <0.34.0>,<0.35.0>,<0.36.0>,<0.38.0>,<0.39.0>,<0.40.0>,
 <0.41.0>,<0.42.0>,<0.43.0>,<0.44.0>,<0.45.0>,<0.46.0>,
 <0.47.0>,<0.48.0>,<0.49.0>,<0.50.0>,<0.51.0>,<0.52.0>,
 <0.53.0>,<0.146.0>]

Concurrent Erlang

CREATING PROCESSES:

Everything in Erlang is a process, in fact, IO is also a process. One has to design SRP compliant processes. You can group related processes under a supervisor. To create a process, you use spawn and it returns a process Id, the syntax is – Pid = spawn(Mod, Func, Args). For example:

49> spawn(sample, add, [2, 3]).
<0.159.0>
50> Pid = spawn(sample, add, [2, 3]).
<0.161.0>
51> processes().
[<0.0.0>,<0.1.0>,<0.4.0>,<0.30.0>,<0.31.0>,<0.33.0>,
 <0.34.0>,<0.35.0>,<0.36.0>,<0.38.0>,<0.39.0>,<0.40.0>,
 <0.41.0>,<0.42.0>,<0.43.0>,<0.44.0>,<0.45.0>,<0.46.0>,
 <0.47.0>,<0.48.0>,<0.49.0>,<0.50.0>,<0.51.0>,<0.52.0>,
 <0.53.0>,<0.146.0>]
  • Message Passing is the only way in which processes can communicate.
  • Sending is asynchronous and Receive is selective and suspend. Matched messages against pattern are removed and executed whereas Unmatched messages are left in the queue. There is a tool called Wombat which can monitor Erlang System queues and one can define a threshold.
  • When a process dies, its message queue and everything else associated with it also dies.
  • A process can have multiple receive...end blocks.
  • A module has many functions and each function can run in its own process. There is no way to tell by physical inspection, except by reading code and knowing which function runs in different process.
  • spawn can create processes on other nodes, not just within the local node.
  • You don’t have to use Pid, you can instead use Registered Name, an Alias for that Pid.
  • Return Success and Error value as a tuple. For success: {ok, Result}, for error: {error, Reason} and for timeout: {error, timeout}.
  • Don’t forget to remove messages from the queue after the timeout occurs.

So, the general Process Skeleton is –

  1. Initialize, Loop, Terminate.
  2. Managing state: You carry state along. Produce new state after each operation and carry that along.

For Tracing, one can use observer:start(). to look at the Graphical Interface for Tracing and other details.

SUPERVISORS:

You can group related processes under a supervisor. Supervisor trees can be created and have a top-level supervisor. If you kill the supervisor, and it will kill all the children. If you kill the child then it will keep re-starting. gen_server is an implementation is a generic server implementation. It has callbacks that you plugin.

ERLANG ECOSYSTEM:

  • Test Tools – Eunit and CommonTest
  • Erlang Build Tool – Rebar3 – It can run Eunit tests.
  • Languages – Many languages available to code in: Elixir, LFE
  • Mnesia – Distributed Database

USING ERLANG:

  • Takes time to get in new features/things in the language.
  • Typically takes 1-3 releases to stabilize. Normally, 1 release per year.
  • Its backward compatible.

Looking at all of the above, I can say that elephants can cope up with Erlang and it will make them dance too ;).

What is Erlang good at and not good at?

Its strength are: Concurrency, Fault Tolerence, and Scaling. It can very nicely be used for soft real-time systems. However, it is not efficient at doing numerical calculations, sharing things etc… Use Erlang as a concurrent glue between parts of the same system. Because different parts of the system have different requirements. You can use a Java program to talk to database/numerical calculation and you can use Erlang System that can front-end the world and talk to Java behind the scenes.

Well, there is more to Erlang, you can start exploring on your own. This post was just meant to be an inspiration and a journal for me!