Ginger: It's Alive!

The new best language for computing fibonacci numbers.

This post is part of a series:
Previously: The Syntax of Ginger

As a kind of Christmas present to myself I took a whole week off of work specifically to dedicate myself to working on ginger.

My concrete goal was to be able to run a ginger program to compute any Nth fibonacci number, a goal I chose because it would require the implementation of conditionals, some kind of looping or recursion, and basic addition/subtraction. In other words, it would require all the elements which comprise a Turing complete language.

And you know what? I actually succeeded!

The implementation can be found here. At this point ginger is an interpreted language running in a golang-based VM. The dream is for it to be self-hosted on LLVM (and other platforms after), but as an intermediate step to that I decided on sticking to what I know (golang) rather than having to learn two things at once.

In this post I’m going to describe the components of this VM at a high level, show a quick demo of it working, and finally talk about the roadmap going forward.

Graph

The core package of the whole project is the graph package. This package implements a generic directed graph datastructure.

The generic part is worth noting; I was able to take advantage of go’s new generics which are currently in beta. I’d read quite a bit on how the generic system would work even before the beta was announced, so I was able to hit the ground running and start using them without much issue.

Ginger’s unique graph datastructure has been discussed in previous posts in this series quite a bit, and this latest implementation doesn’t deviate much at a high level. Below are the most up-to-date core datatypes and functions which are used to construct ginger graphs:


// Value is any value which can be stored within a Graph. Values should be
// considered immutable, ie once used with the graph package their internal
// value does not change.
type Value interface {
	Equal(Value) bool
	String() string
}

// OpenEdge consists of the edge value (E) and source vertex value (V) of an
// edge in a Graph. When passed into the AddValueIn method a full edge is
// created. An OpenEdge can also be sourced from a tuple vertex, whose value is
// an ordered set of OpenEdges of this same type.
type OpenEdge[E, V Value] struct { ... }

// ValueOut creates a OpenEdge which, when used to construct a Graph, represents
// an edge (with edgeVal attached to it) coming from the vertex containing val.
func ValueOut[E, V Value](edgeVal E, val V) *OpenEdge[E, V]

// TupleOut creates an OpenEdge which, when used to construct a Graph,
// represents an edge (with edgeVal attached to it) coming from the vertex
// comprised of the given ordered-set of input edges.
func TupleOut[E, V Value](edgeVal E, ins ...*OpenEdge[E, V]) *OpenEdge[E, V]

// Graph is an immutable container of a set of vertices. The Graph keeps track
// of all Values which terminate an OpenEdge. E indicates the type of edge
// values, while V indicates the type of vertex values.
type Graph[E, V Value] struct { ... }

// AddValueIn takes a OpenEdge and connects it to the Value vertex containing
// val, returning the new Graph which reflects that connection.
func (*Graph[E, V]) AddValueIn(val V, oe *OpenEdge[E, V]) *Graph[E, V]

// ValueIns returns, if any, all OpenEdges which lead to the given Value in the
// Graph (ie, all those added via AddValueIn).
func (*Graph[E, V]) ValueIns(val Value) []*OpenEdge[E, V]

The current Graph implementation is incredibly inefficient, it does a lot of copying, looping, and equality checks which could be optimized out one day. That’s going to be a recurring theme of this post, as I had to perform a balancing act between actually reaching my goal for the week while not incurring too much tech debt for myself.

MapReduce

There’s a final operation I implemented as part of the graph package: MapReduce. It’s a difficult operation to describe, but I’m going to do my best in this section for those who are interested. If you don’t understand it, or don’t care, just know that MapReduce is a generic tool for transforming graphs.

For a description of MapReduce we need to present an example graph:

        +<--b---
        +       \
X <--a--+<--c----+<--f-- A
        +               /
        +      +<---g---
        +<--d--+
               +<---h---
                        \
Y <---------e----------- B

Plus signs indicate tuples, and lowercase letters are edge values while upper case letters are vertex values. The pseudo-code to construct this graph in go might look like:

    g := new(Graph)

    fA := ValueOut("f", "A")

    g = g.AddValueIn(
        "X",
        TupleOut(
            "a",
            TupleOut("b", fA),
            TupleOut("c", fA),
            TupleOut(
                "d",
                ValueOut("g", "A"),
                ValueOut("h", "B"),
            ),
        ),
    )

    g = g.AddValueIn("e", "B")

As can be seen in the code, MapReduce’s first argument is an OpenEdge, not a Graph. Fundamentally MapReduce is a reduction of the dependencies of a particular value into a new value; to reduce the dependencies of multiple values at the same time would be equivalent to looping over those values and calling MapReduce on each individually. Having MapReduce only deal with one edge at a time is more flexible.

So let’s focus on a particular OpenEdge, the one leading into X (returned by TupleOut("a", etc...). MapReduce is going to descend into this OpenEdge recursively, in order to first find all value vertices (ie the leaf vertices, those without any children of their own).

At this point MapReduce will use its second argument, the mapVal function, which accepts a value of one type and returns a value of another type. This function is called on each value from every value vertex encountered. In this case both A and B are connectable from X, so mapVal will be called on each only once. This is the case even though A is connected to multiple times (once with an edge value of f, another with an edge value of b). mapVal only gets called once per vertex, not per connection.

With all values mapped, MapReduce will begin reducing. For each edge leaving each value vertex, the reduceEdge function is called. reduceEdge accepts as arguments the edge value of the edge and the mapped value (not the original value) of the vertex, and returns a new value of the same type that mapVal returned. Like mapVal, reduceEdge will only be called once per edge. In our example, <--f--A is used twice (b and c), but reduceEdge will only be called on it once.

With each value vertex edge having been reduced, reduceEdge is called again on each edge leaving those edges, which must be tuple edges. An array of the values returned from the previous reduceEdge calls for each of the tuples’ input edges is used as the value argument in the next call. This is done until the OpenEdge is fully reduced into a single value.

To flesh out our example, let’s imagine a mapVal which returns the input string repeated twice, and a reduceEdge which returns the input values joined with the edge value, and then wrapped with the edge value (eg reduceEdge(a, [B, C]) -> aBaCa).

Calling MapReduce on the edge leading into X will then give us the following calls:

# Map the value vertices

mapVal(A) -> AA
mapVal(B) -> BB

# Reduce the value vertex edges

reduceEdge(f, [AA]) -> fAAf
reduceEdge(g, [AA]) -> gAAg
reduceEdge(h, [BB]) -> hBBh

# Reduce tuple vertex edges

reduceEdge(b, [fAAf]) -> bfAAfb
reduceEdge(c, [fAAf]) -> cfAAfc
reduceEdge(d, [gAAg, hBBh]) -> dgAAgdhBBhd

reduceEdge(a, [bfAAfb, cfAAfc, dgAAgdhBBhd]) -> abfAAfbacfAAfcadgAAgdhBBhda

Beautiful, exactly what we wanted.

MapReduce will prove extremely useful when it comes time for the VM to execute the graph. It enables the VM to evaluate only the values which are needed to produce an output, and to only evaluate each value once no matter how many times it’s used. MapReduce also takes care of the recursive traversal of the Graph, which simplifies the VM code significantly.

gg

With a generic graph implementation out of the way, it was then required to define a specific implementation which could be parsed from a file and later used for execution in the VM.

The file extension used for ginger code is .gg, as in “ginger graph” (of course). The package name for decoding this file format is, therefore, also called gg.

The core datatype for the gg package is the Value, since the graph package takes care of essentially everything else in the realm of graph construction and manipulation. The type definition is:

// Value represents a value which can be serialized by the gg text format.
type Value struct {

	// Only one of these fields may be set
	Name   *string
	Number *int64
	Graph  *Graph

	// Optional fields indicating the token which was used to construct this
	// Value, if any.
	LexerToken *LexerToken
}

type Graph = graph.Graph[Value, Value] // type alias for convenience

Note that it’s currently only possible to describe three different types in a gg file, and one of them is the Graph! These are the only ones needed to implement a fibonacci function, so they’re all I implemented.

The lexing/parsing of gg files is not super interesting, you can check out the package code for more details. The only other thing worth noting is that, for now, all statements are required to end with a ;. I had originally wanted to be less strict with this, and allow newlines and other tokens to indicate the end of statements, but it was complicating the code and I wanted to move on.

Another small thing worth noting is that I decided to make each entire .gg file implicitly define a graph. So you can imagine each file’s contents wrapped in curly braces.

With the gg package out of the way I was able to finally parse ginger programs! The following is the actual, real-life implementation of the fibonacci function (though at this point it didn’t actually work, because the VM was still not implemented:

out = {

    decr = { out = add < (in; -1;); };

    n = tupEl < (in; 0;);
    a = tupEl < (in; 1;);
    b = tupEl < (in; 2;);

    out = if < (
        isZero < n;
        a;
        recur < (
            decr < n;
            b;
            add < (a;b;);
        );
    );

} < (in; 0; 1;);

VM

Finally, the meat of all this. If the graph and gg packages are the sturdy, well constructed foundations of a tall building, then the vm package is the extremely long, flimsy stick someone propped up vertically so they could say they built a structure of impressive height.

In other words, it’s very likely that the current iteration of the VM will not be long for this world, and so I won’t waste time describing it in super detail.

What I will say about it is that within the vm package I’ve defined a new Value type, which extends the one defined in gg. The necessity of this was that there are types which cannot be represented syntactically in a .gg file, but which can be used as values within a program being run.

The first of these is the Operation, which is essentially a first-class function. The VM will automatically interpret a graph as an Operation when it is used as an edge value, as has been discussed in previous posts, but there are also built-in operations (like if and recur) which cannot be represented as datastructures, and so it was necessary to introduce a new in-memory type to properly represent operations.

The second is the Tuple type. This may seem strange, as ginger graphs already have a concept of a tuple. But the ginger graph tuple is a vertex type, not a value type. The distinction is small, but important. Essentially the graph tuple is a structural element which describes how to create a tuple value, but it is not yet that value. So we need a new Value type to hold the tuple once it has been created during runtime.

Another thing worth describing about the vm package, even though I think they might change drastically, are Thunks:

// Thunk is returned from the performance of an Operation. When called it will
// return the result of that Operation having been called with the particular
// arguments which were passed in.
type Thunk func() (Value, error)

The term “thunk” is borrowed from Haskell, which I don’t actually know so I’m probably using it wrong, but anyway…

A thunk is essentially a value which has yet to be evaluated; the VM knows exactly how to evaluate it, but it hasn’t done so yet. The primary reason for their existence within ginger is to account for conditionals, ie the if operation. The VM can’t evaluate each of an if’s arguments all at once, it must only evaluate the first argument (to obtain a boolean), and then based on that evaluate the second or third argument.

This is where graph.MapReduce comes in. The VM uses graph.MapReduce to reduce each edge in a graph to a Thunk, where the Thunk’s value is based on the operation (the edge’s value) and the inputs to the edge (which will themselves be Thunks). Because each Thunk represents a potential value, not an actual one, the VM is able to completely parse the program to be executed (using graph.MapReduce) while allowing conditionals to still work correctly.

EvaluateEdge is where all that happens, if you’re interested, but be warned that the code is a hot mess right now and it’s probably not worth spending a ton of time understanding it as it will change a lot.

A final thing I’ll mention is that the recur operation is, I think, broken. Or probably more accurately, the entire VM is broken in a way which prevents recur from working correctly. It does produce the correct output, so I haven’t prioritized debugging it, but for any large number of iterations it takes a very long time to run.

Demo

Finally, to show it off! I put together a super stupid eval binary which takes two arguments: a graph to be used as an operation, and a value to be used as an argument to that operation. It doesn’t even read the code from a file, you have to cat it in.

The README documents how to run the demo, so if you’d like to do so then please clone the repo and give it a shot! It should look like this when you do:

# go run ./cmd/eval/main.go "$(cat examples/fib.gg)" 8
21

You can put any number you like instead of 8, but as mentioned, recur is broken so it can take a while for larger numbers.

Next Steps

The following are all the things I’d like to address the next time I work on ginger:

  • gg

    • Allow for newlines (and ) and }) to terminate statements, not just ;.

    • Allow names to have punctuation characters in them (maybe?).

    • Don’t read all tokens into memory prior to parsing.

  • vm

    • Fix recur.

    • Implement tail call optimization.

  • General

    • A bit of polish on the eval tool.

    • Expose graph creation, traversal, and transformation functions as builtins.

    • Create plan (if not actually implement it yet) for how code will be imported from one file to another. Namespacing in general will fall into this bucket.

    • Create plan (if not actually implement it yet) for how users can extend/replace the lexer/parser.

I don’t know when I’ll get to work on these next, ginger will come back up in my rotation of projects eventually. It could be a few months. In the meantime I hope you’re as excited about this progress as I am, and if you have any feedback I’d love to hear it.

Thanks for reading!

If you liked this post, consider checking out other posts in the series:
Previously: The Syntax of Ginger