Wednesday, April 15, 2015

Tutorial: Literal as an array of characters

In this tutorial we will see how could we read the literal a character by character.

Let's consider the simple example: we will read each literal character and print it on the screen.

import extensions.

// ...

        var l := "Hello world".
        var i := 0.
        while (i < l length)
        [
            console write:(l@i).
            
            i := i + 1.
        ].

Let's make our example a little bit more difficult:

        l := "Привет Мир".
        var i := 0.
        while (i < l length)
        [
            console write:(l@i).
            
            i := i + 1.
        ].

The first loop works but the second one fails.

Let's find what breaks our code.

Starting from 1.9.19 LiteralValue is UTF-8. So the literal is actually twice as long. All Russian characters are encoded by two bytes. So why the code was not broken in the first loop? Because CharValue is UTF-32. It has enough place to store any Unicode characters. But when we read the second byte CharValue raises the exception because the code is invalid.

Note that it would works well if "l" is WideLiteralValue. But we will have the similar problem for Chinese symbols.

Fortunately we could easily fix the problem if we will use CharValue.length method. It returns how much bytes it take to encode the symbol.

        i := 0.
        while (i < l length)
        [
            console write:(l@i).
            
            i := i + l@i length.
        ].

Or we could use an enumerator:

        var enum := l enumerator.
        while (enum next)
        [
            console write:(enum get).
        ].

And the simplest way would be to use extensions'control helper:

       control run:l forEach: (:ch) [ console write:ch. ].

No comments:

Post a Comment