A new ecosystem for Haskell: the JVM

17 October 2016 — by Mathieu Boespflug, Alp Mestanogullari

By now, Haskell has first class support for seamlessly embedding foreign code into source files and casually call anything in C (via inline-c) or R (via inline-r), let alone that whole programs can also be compiled down to in-browser JavaScript, thanks to GHCJS. Today the interoperability story for Haskell is getting better still: we’re announcing the addition of a new set of languages into the mix. With jvm, you can call any method known to the JVM from Haskell. With inline-java, you can moreover call these methods in Java syntax, embedded in your source files.

Not that this was particularly our intention - promise!

inline-java and friends just fell out naturally from our work on sparkle…

To give you a taste of what it’s like to program this way, here’s an obligatory “Hello World!” in Haskell, but with a twist: we call the Swing GUI toolkit to display our message to the world in a graphical dialog box.

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Main where

import Data.Int
import Language.Java
import Language.Java.Inline

main :: IO Int32
main = withJVM [] $ do
    message <- reflect "Hello World!"
    [java| { javax.swing.JOptionPane.showMessageDialog(null, $message);
             return 0; } |]

In short, it’s now possible to write Java programs that call into Haskell with near trivial overhead, as we demonstrated previously with sparkle, or indeed Haskell programs that call into any of the hundreds of thousands of JVM packages available publicly and boundless custom enterprise developments.

How it works

The key enabler to talking to Java and all the other JVM languages from Haskell is that Haskell speaks C and the JVM speaks C. That is, both Haskell and the JVM make it possible to call C functions and have C functions call them. In both cases, this is done as part of their respective foreign function interfaces (FFI). So we have a lingua franca for both languages: to move from Haskell to Java or vice versa, go through C first. Each language has their own custom calling convention anyways, so some small amount of glue code to mediate between the two is inevitable.

In fact, in the case of the JVM, bytecode is compiled “just-in-time” or perhaps even not at all. Fortunately, that’s not something we have to worry about: the JVM’s standard interface from C, called the Java Native Interface (JNI), encapsulates all the nitty-gritty detail of invoking methods behind a simple interface. As a first step, we wrote near complete bindings to all of the JNI, using inline-c under the hood for better safety.

Calling into the JVM, the hard way

We could just expose the raw JNI API in Haskell and call it a day. Using raw JNI calls to invoke say a static Java method called foo, which takes an int and a boolean and returns some object, goes something like this:

import Foreign.JNI

callFoo = do
  klass <- findClass "some/Java/Class"    -- JNI uses '/' instead of '.'...
  method <- getStaticMethodID klass "foo" "(IZ)Ljava/lang/Object;"
  callStaticObjectMethod klass method [JInt 0, JBoolean 1]

Because the JVM allows overloaded method names, when grabbing a handle to invoke a method, you’ll need to specify a type signature to disambiguate which method you really want to call. But the JNI was purposefully designed independently of Java’s syntax, to the point where even class names are written differently. The JNI syntax for type signatures is optimized for speed of parsing and compactness, not legibility. So constructing these type signatures by hand to invoke JVM methods via raw JNI calls is rather error prone. That’s why we wrote the jvm package, a toolkit for invoking JVM methods more conveniently and robustly.

Using Haskell types for safer JVM calls

There are two downsides to the raw JNI calls we saw above:

performance: getting class and method handles is expensive. Ideally, we’d only ever lookup classes and methods by name at most once throughout the lifetime of the program, assuming that loaded classes exist for all time and are never redefined.
stringly typing: we pass signatures explicitly, but these are literally strings, typos and all. If you mistype the signature, no compiler will call that out. Ideally ill-formed signatures would be caught at compile-time, rather than at runtime when it’s far too late and your program will simply crash.

The performance issue is easily dispensed with. The trick is to write wrappers that tell Haskell that findClass and getStaticMethodID are really pure, in the sense that calling either of them multiple times and in any context always yields equivalent results. So we could in principle ascribe pure types to them. The argument goes something like the following. Compare the following snippet with the one above:

callFoo = do
  let pureStuff@(klass, method) = unsafePerformIO $ do
      (,) <$> findClass "some/Java/Class"
          <*> getStaticMethodID klass "foo" "(IZ)Ljava/lang/Object;"
  callStaticObjectMethod klass method [JInt 0, JBoolean 1]

The expression for pureStuff is a closed expression (no free variables occur). And because its type is not IO, the compiler is free to float it to top-level, effectively turning it into a CAF, which are always evaluated at most once thanks to laziness:

(klass, method) = unsafePerformIO $ do
  (,) <$> findClass "some/Java/Class"
      <*> getStaticMethodID klass "foo" "(IZ)Ljava/lang/Object;"

callFoo = do
  callStaticObjectMethod klass method [JInt 0, JBoolean 1]

As for the stringly typing problem, we’ll need some tools first. First, we need to reflect in Haskell enough type information. To that end, we’ll index the type of Java objects by their Java type:

newtype J (a :: JType) = J (Ptr (J a))

Java types can either be primitives (int,boolean, etc) or reference types (classes, arrays, interfaces, generics etc). So our definition of JType goes something like this:

data JType
  = Prim Symbol
  | Class Symbol
  | Array JType
  | ...

genSingletons ['JType]

Thus equipped, we can write types like,

the type of Swing option panes, J ('Class "javax.swing.JOptionPane")
the type of boxed Java integers, J ('Class "java.lang.Integer"),
the type of primitive integer arrays, J ('Array ('Prim "int")),
etc.

What’s more, thanks to the family of singleton types and instances created by genSingletons above, we can reflect on the type of any Java object at runtime to get a representation of the type at the value level. This is helpful to auto compute JNI type signatures from the types alone. No more stringly typing will all those typos in tow: JNI type signatures are now correct by construction.

In particular, we can define a family of variants of callStaticObjectMethod:

module Language.Java where

callStatic1
  :: (SingI ty1, SingI tyr)
  => Sing (klass :: Symbol) -> JNI.String -> J ty1 -> IO (J tyr)
callStatic2
  :: (SingI ty1, SingI ty2, SingI tyr)
  => Sing (klass :: Symbol) -> JNI.String -> J ty1 -> J ty2 -> IO (J tyr)
callStatic3
  :: (SingI ty1, SingI ty2, SingI ty3, SingI tyr)
  => Sing (klass :: Symbol) -> JNI.String -> J ty1 -> J ty2 -> J ty3 -> IO (J tyr)
...

The types of these functions are expressive enough to infer a type signature for the called method. Thanks to the type reflection provided by the singletons package, we can reify types as values and produce JNI type signatures from that. Of course, a fixed number of callStatic* functions, one per arity, is rather limiting (what about arbitrary arities?), so in reality the Language.Java module provides a single such function, to whom arguments are passed packed into a homogeneous list:

callStatic
  :: SingI tyr
  => Sing (klass :: Symbol) -> JNI.String -> [JValue] -> IO (J tyr)

where JValue is defined as

data JValue
  = forall a. SingI a => JObject (J a)
  | JBoolean Word8
  | JInt Int32
  | ...

In this way, values of primitive type can be passed to Java without penalty: no need to box them into tiny objects first. It turns out we can extend the same idea to obtain unboxed return values, but the technical details get a bit more intricate, so we’ll have to defer that to the module’s documentation.

Calling a non-static method is achieved in much the same way:

call :: (SingI ty, SingI tyr)
     => J ty -> JNI.String -> [JValue] -> IO (J tyr)

JVM calls the Java way

call and callStatic are surprisingly versatile facilities for calling arbitrary JVM methods with an arbitrary number of boxed or unboxed arguments and return values, but sometimes one might still get the types wrong. For example, there’s nothing stopping us from attempting to call a java.lang.Integer constructor with a boolean typed argument. No such constructor exists, so we’ll get a method lookup exception at runtime. After all, we don’t know in Haskell what methods really do exist, and what their signatures are. But if we call the java.lang.Integer constructor using Java syntax, we can hope to get the Java compiler to perform full scope checking and type checking, thus ruling out common errors such as calling non-existent methods are supplying arguments of the wrong type.

To achieve that, we use GHC’s quasiquotation extension. This extension allows us to embed syntax from arbitrary foreign languages in Haskell source files, in between special brackets. Better yet, we are free to extend the foreign syntax to express antiquotation variables, i.e. variables that refer to the enclosing context in Haskell. Take for example our earlier “hello world” code snippet, simplified:

do message <- reflect "Hello World!"
   [java| javax.swing.JOptionPane.showMessageDialog(null, $message) |]

Using reflect, also provided by inline-java, we create a J "java.lang.String" from a Haskell String. We can then refer to this Java object, bound to a Haskell variable, from inside the Java code snippet. The $ sigil is there to disambiguate between variables bound in the Haskell context (aka antiquotation) and in the Java context.

You might have noticed a difference with inline-c: in inline-java we don’t need to annotate quasiquotations with the return type nor each antiquote with their types, which can get quite verbose. Instead, we just about manage to infer which foreign types are intended based on the types of the Haskell variables. To pull this off required a journey in compiler hacking, ghc-heap-view and a novel use of static pointers. A journey best told next time…

The road ahead

There are plenty of solutions out there for lightweight interop across languages. You can start by swapping JSON messages between separate processes and take it from there. But for a truly universal solution fit for all situations, our experience is that keeping any overheads low or perhaps even nonexistent is the key enabler to seamlessly mixing multiple languages and blithely crossing language boundaries without guilt. In this post we presented a suite of packages for high-speed Java/Haskell interop, which together ensure:

box-free foreign calls: because we infer precise JVM types from Haskell types, arguments passed to JVM methods are boxed only if they need to be. Small values of primitive type can be passed to/from the JVM with no allocation at all on the heap.
marshalling-free argument passing: Java objects can be manipulated as easily from Haskell as from Java. This means that you can stick to representing all your data as Java objects if you find yourself calling into Java very frequently, hence avoiding any marshalling costs when transferring control to/from the JVM.
type safe Java calls: when calls are made in Java syntax, this syntax is supplied to an embedded instance of javac at compile-time for scope checking and type checking. That way we have a static guarantee that the types on the Haskell side match up with the types on the Java side, without having to resort to FFI stub generators and preprocessors.

We were fortunate enough to be able to stand on excellent libraries to get here. Take parsing of Java syntax: that came straight from Niklas Broberg and Vincent Hanquez’s venerable language-java library.

What we haven’t addressed yet with inline-java is the perennial issue when interoperating two garbage collected languages of automatic memory management. Since we have two heaps (the GHC heap and the JVM heap), with two garbage collectors, neither of which able to traverse objects in the other heap, we are forced to pin in memory objects shared across the language boundary. In the case of JVM objects, the JNI does this for us implicitly, provided object references are kept thread-local. It would be nice if we could make these object references safe across threads and get both garbage collectors to agree to dispose of them safely when dead. You can get a fair amount of mileage the way things are: we managed to run topic analysis on all of Wikipedia concurrently on 16 machines and hours of machine time without tinkering with object lifetimes and GC’s.

So plenty more to do still! Make sure to check out the project’s GitHub repository to follow progress and contribute.

Behind the scenes

Mathieu Boespflug

Mathieu is the CEO and founder of Tweag.

Alp Mestanogullari

Tech Group

Programming Languages and Compilers

Research, create, improve and maintain programming languages and their tooling to enhance developer productivity and to deliver reliable, maintainable, correct and performant software with minimum effort.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← Haskell compute PaaS with sparkle Linear types make performance more predictable →