Thursday, February 26, 2015

ELENA 2.x: messages, types, dispatching...

In this topic I would like to discuss in details how ELENA objects interact with each others, the message structure, the concept of types and "Compiler magic" to increase the code performance.
In a dynamic language the type of the object (its class) cannot be resolved during compilation time in most cases. So we need a way to resolve the message mapping for every object before we call the appropriate method. The simplest way would be to create a method table containing pairs of message hash code and reference to the executable code (message handler - method). Calculating the message hash code can be done by a compiler, so the following code:

x add:y
will be compiled as

  <push> y
  <push> x
  <set-message> message-id-of:add
  <get> x.class
  <find-message>
  <if-found>
     <call-message-handler>
  <else>
     <throw-exception> system'MethodNotFound
  <end>
Resolving the message entry in the table can be done by a class itself. It gives the compiler possibility to tune the operation, for big classes binary search may be used for example. So a special method - dispatcher - can be declared in the super class which will be always the first entry in the method table (note that a custom dispatcher is used in many group objects - system'Variable, system'dynamic'Extension, ...). In this case our code will look like this:

  <push> y
  <push> x
  <set-message> message-id-of:add
  <get> x.class
  <select-dispatcher>
  <call-message-handler>
In the simplest case a message hash code can be an index in the global message table.

Let's again review our example and assume that x is system'IntNumber and y - system'LongNumber. Because x and y are dynamic objects, it is not possible to sum them directly. So IntNumber.add[1] method should ask the operand about its type. Alternatively we could send a new message to y providing it with x identity. e.g.
   add : y = y addToInt:$self.
So LongNumber knows the operand type and may perform the operation. Similar for multiply operation - multiplyByInt should be sent and so on. As we see the message may contain the information about the operand "type" (note that it is not a real object type - system'IntNumber, but a protocol, convention between the objects). So all these led me to idea to introduce the message structure. So the message can be split into two parts - a a verb describing the operation and a subject describing the operation parameter. So in our case the message addToInt, may be replaced with add&int (where "add" is a verb and "int" is a subject). If we somehow bind the object with its subject (e.g as a field in the class header) we could dynamically dispatch the parameter:
  <push> y
  <push> x
  <set-verb> verb-id-of:add
  <add-subject> y.class.type
  <select-dispatcher>
  <call-message-handler>
Unfortunately this works only for few cases (when a message has only one parameter), in most cases the solution become too complex so after a while I had to give up this approach and took another way.
Alternatively the message can be split into three parts : the generic action, the signature and the parameter count. The signature can be split into several subjects describing the parameters - e.g. the message insert&index&literal[2] is a insert action with the signature index&literal and has two parameters. The message dispatching is possible when the signature is the operand "type" and the parameter count is 1.
This allows us to make several operations with message parts. We could use a signature symbol to qualify the generic message or dispatch an action with specific signature.
For example dispatching is used in cast methods:
   cast : aVerb &to:aTarget = aTarget::aVerb short:$self.
and it will be used in the following code:
   anObject cast:%add &to:$self.
where %add is generic verb. In cast method we dispatch it with particular subject, i.e. dynamically adding subject to the generic message.
Though a "typeless" nature of dynamic languages is good thing, in many cases we still have situations where we need a specific class. For example in system'Array constructor the parameter should contains the length. In strongly typed language it could be guaranteed in compile time, for ELENA we have to check the type in run-time. So it would be convenient if we could create a special agreement between the method and the caller to guarantee the parameter role (or its type) without need to check it every time in the method itself
As the message signature consists of subjects they could be used to describe not only the parameter role but in some case its type as well. In most cases the subject may be used implicitly without need to declare it. But if we would like to use it as a protocol it should be explicitly declared. So for example we could declare a new subject "enumerable" - which means that the object passed under this role supports enumerator message. There is no way to guarantee that the object actually supports this protocol, it is up to programmer to care about it. But in some cases we can force it - especially for data types. In that case we associate the subject (or type) with a class. As a result only instance of this class has to be passed. But ELENA still dynamic language so how could we make this without introducing the actual types? I found the solution that every time the strong typed parameter is required, compiler calls typecast message - get&<type-subject> (e.g. get&int, get&literal). So actually the only place where compile-time typecasting should be performed is in any get&<type-subject> method.
Though subjects were designed initially only for the message parameters they could be used for providing the variable (local or class) and the method result "type" as well
Strong types can give us a way to increase performance as well. If the type class is sealed or limited, compiler can resolve the message in compile-type. For example system'Enumerator is limited class. system'LiteralEnumerator inherits it. Both of them are of enumerator "type". So if we declare the enumerator variable, all operations with it will be resolved in compile-time due to the fact that system'LiteralEnumerator may only override existing methods without adding new ones.
If the class is a structure (contains raw data) and sealed its type can be used for stack allocated variables. For example in the following code:
        int x := self int.
        int y := anOperand int.
        
        int z := x / y.
        z := z * y.
x, y, z are stack allocated system'IntNumber classes. As a result the basic arithmetic operation can be done at compile-time without need to create a new dynamic class for every operation.
Despite the introducing "type" concept ELENA is still dynamic language. Strong types can be used in performance critical part of the application (like arithmetic operation) but in all other cases ELENA is 100% dynamic language.

No comments:

Post a Comment