Calling to the JVM from Haskell: Some benchmarks

11 June 2020 — by Facundo Domínguez

In our previous posts about inline-java, we presented it as a tool for interoperating with the huge Java ecosystem. Indeed, the Haskell community being smaller, it is not uncommon to face problems for which there are no libraries in the ecosystem. In these situations, a multi-language solution can be a good compromise.

One trade-off is performance, keeping in mind that communicating across language boundaries always has costs. In this post, I want to argue that inline-java can be a good solution for integrating Haskell and Java from a performance standpoint. We do this by benchmarking a concrete example, and discuss in which situations this kind of integration becomes affordable as compared to projects that go full Haskell or full Java for their implementations.

The benchmarks

FrameworkBenchmarks is a project that gathers benchmarks of HTTP servers for a large set of implementations in different languages. At the time of this writing, there are six types of tests, but a given implementation doesn’t need to implement all six of them.

I provided an implementation of a fast HTTP server implemented in Java which invokes a handler implemented in Haskell for every request. The handler, in turn, uses inline-java to interact with the HTTP server. The implementation is called wizzardo-inline and it is based on wizzardo-http, a fast full-Java HTTP server.

I implemented three of the test types:

The Plaintext test, which returns a fixed plaintext response for each request, such as "Hello, World!".
The JSON serialization test, returns a fixed JSON response, where each request must encode over and over again the same JSON object, for example, "{ \"message\" : \"Hello, World!\" }".
And finally, the Single query test, which for every request provides a random record retrieved from a database, encoded again as a JSON object, such as "{ \"id\" : 1234, \"randomNumber\" : 6678 }".

These benchmarks measure the throughput of the server, that is, the number of requests per second that each implementation could deliver. That means that higher numbers are better.

Because the benchmarks are run multiple times with varying amounts of concurrency, the table shows the highest throughput achieved by each implementation. Additionally, it shows the throughput as a percentage of the one performed by wizzardo-http.

Throughput (requests/sec)	wizzardo-http	wizzardo-inline
Plaintext	669946	203347 (30%)
JSON serialization	136293	86638 (63%)
Single query	51646	38772 (75%)

To understand the trend in the table, it must be noted that inline-java is expected to add a constant overhead per request, since we need to invoke a couple of methods of the JVM to marshal the response from Haskell to Java.

In all cases, one has to convert a Haskell ByteString to a Java byte array. The code to convert the ByteString to a byte array is in the jvm package, and does the following:

instance Reflect ByteString where
  reflect bs = BS.unsafeUseAsCStringLen bs $ \(content, n) -> do
      arr <- newByteArray (fromIntegral n)
      setByteArrayRegion arr 0 (fromIntegral n) content
      return arr

After obtaining a C-style buffer with the bytes from the ByteString, we create a byte array object with newByteArray, and then ask the JVM to copy the bytes from our buffer with setByteArrayRegion.

The JVM does some extra bookkeeping when copying bytes. The destination could be moved by the Java garbage collector during copying, so arrangements are necessary for the operation to complete safely. After obtaining the byte array, a Java method is invoked to feed it back to the server framework. In the case of the Single query test, an extra Java method must be called in order to kick the query to the database.

The overhead of these conversions and method calls amounts to 4 microseconds, give or take. This constant overhead is very noticeable for fast requests. For instance, lets say that wizzardo-http can execute a Plaintext request in 3 microseconds. Therefore wizzardo-inline needs 7 microseconds, which is more than twice the time.

A more interesting test is Single query where one has an additional database call. Supposing that wizzardo-http takes 15 microseconds to serve a request, wizzardo_inline needs 19 microseconds, which is not nearly as onerous as it was in the Plaintext test.

In general, the shorter it takes to serve a request, the more weight the overhead of making calls to the JVM has on the comparisons. Fast requests like those of Plaintext have durations which are close to those of JVM calls from Haskell. The bottleneck in the requests of Single query is in the database access, and calling to the JVM becomes more affordable in that case.

Provisionally, we ran the benchmarks on an m4.2xlarge instance on Amazon’s EC2 service. But we expect the FrameworkBenchmarks to provide definitive measures when the next round of benchmarks is executed by TechEmpower¹, the software consultancy firm running the project.

Final remarks

While preparing these benchmarks, I kept in mind all along the fact that the FrameworkBenchmarks include an implementation for warp, a full-Haskell implementation of an HTTP server. The following table compares it with wizzardo-inline.

Throughput (requests/sec)	wizzardo-inline	warp
Plaintext	203347	205573 (101%)
JSON serialization	86638	107381 (124%)
Single query	38772	20444 (53%)

As we can see, the throughput of warp is comparable or better on the simplest tests, but worse on the Single query one. I don’t claim wizzardo-http to be superior to warp, because I haven’t analyzed the performance differences. But replace warp with whatever complex full-Haskell technology X you need to consider for your project, and inline-java can be a factor in deciding when or if to engage in a detailed analysis.

The more complex a solution is, the more expensive it is to implement and to optimize it for the relevant cases. When the budget and the time cannot afford a reimplementation, inline-java can be a cheaper alternative. Furthermore, depending on the bottlenecks of the solution at hand, the overhead of crossing language runtimes may become negligible, turning the multi-language integration path into the ideal one.

Between the writing and publishing of this post, TechEmpower published a new round of benchmark results. The numbers from TechEmpower are different, but still consistent with the claims made here.↩

Behind the scenes

Facundo Domínguez

Facundo is a software engineer supporting development and research projects at Tweag. Prior to joining Tweag, he worked in academia and in industry, on a varied assortment of domains, with an overarching interest in programming languages.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.