A Case Study on Correctness and Safety Testing of Stateful Systems

26 January 2022 — by Victor Miraldo

Writing a test suite is a crucial step when developing high-assurance software. Yet, identifying meaningful properties of the system and giving it suitable inputs may be hard. This is especially challenging for code involving state — besides properties and inputs, one must also take into account the state of the system at all times.

It is inconvenient to construct states suitable for testing by hand. An alternative is to build sequences of inputs that can be iterated over the initial state, placing the system in a state we want to test. These sequences of inputs are called traces.

For these traces to be useful for testing, it must be possible to:

report traces to the user in an informative way when some post-condition fails
modify traces by changing some or all inputs sent in them

While (1) is fairly simple to accomplish, (2) requires significantly more advanced techniques.

In this post I will present a technique to do testing using traces which can can be used with arbitrary stateful systems. They were applied in practice in cooked-validators, which was developed as a tool for writing Plutus validator scripts.

Context: auditing smart contracts

During this post I will use smart contracts as a guiding example, but no specific knowledge on them is required. The unfamiliar reader can think of smart contracts as arbitrary code whose state is stored in the blockchain, and which is triggered by transactions.

While auditing Plutus smart contracts, we want to write tests using a function validateTx that pretends to run a transaction in a blockchain. Its type is:

validateTx :: (Monad m) => Tx -> StateT BlockChainSt (ExceptT BlockChainErr m) TxId

Smart contracts will live somewhere in the state, of type BlockChainSt, and their implementation gets to decide whether some transaction is valid or not. If a contract rejects a transaction, validateTx will throw an error. If all contracts involved in a transaction agree, the overall blockchain state changes and we return the transaction id of that particular transaction.

In this context, a typical test consists of generating and executing a sequence of transactions then checking some predicate on the resulting state. For example, say that we are simulating a crowdfunding-like contract. One property we expect to hold is that the owner can withdraw the funds if and only if the threshold has been reached. To check that, we can write a number of traces that reach the threshold, execute them and ensure the owner can withdraw the funds. We might also want to transform those traces that succeed with certain attacks or faults and ensure they fail.

An Appetizer

Lets start by defining a convenient type synonym which represents computations that manipulate the blockchain:

type DirectT m = StateT BlockChainSt (ExceptT BlockChainErr m)

Coming back to the crowdfunding example, we might want to express a trace that opens a crowdfunding vault with a threshold of 10000 units, then deposits 8000 units, followed by depositing an additional 5000 units, and finally attempts withdraw, which should succeed. Using monadic syntax, this could be written as example below:

example :: (Monad m) => DirectT m ()
example = void $ do
  vaultId <- validateTx (openCrowdFundingWithLimit 10000)
  validateTx (fundVault vaultId 8000)
  validateTx (fundVault vaultId 5000)
  validateTx (withdrawVault vaultId)

Here, our example uses the transaction id from the first transaction to identify a particular vault but ignores the other TxIds. To witness that example succeeds, we can run it from the initial state and witness that it does not return an error. What if it doesn’t succeed, though? As a user, we would like to see a description of all the transactions that led to the failure, so we can diagnose any potential issue.

Even better, say we want to express that if either funding transaction above happened to fund 3001 units less, the trace should fail: the transactions deposit a total of 13000 units into the vault, if we take more than 3000 units from anywhere then the threshold won’t be met and the trace should fail. In fact, we want to be able to write something like:

example' = somewhere (subtractValFromFundVault 3001) example

The somewhere modality affects specific transactions over a trace, yielding a set of traces and enabling us to inject faults and attacks. In example' above, subtractValFromFundVault has type Tx -> Maybe Tx. Say we defined it to return Just if and only if it is applied to a fundVault transaction. In that case, example' represents the following two traces:

First trace
- openCrowdFunding
- fundVault vault 4999
- fundVault 5000
- withdraw
Second trace
- openCrowdFunding
- fundVault vault 8000
- fundVault 1999
- withdraw

The original example trace is not present because somewhere applies the modifier to exactly one transaction in the trace, and the original transaction has the modifier applied nowhere.

Finally, note how the monadic structure ensures that we can write dynamic traces: we can use a generated vaultId to validate subsequent transactions. More generally, traces often depend on values that are returned from previous transactions. One such case that shows up in practice all the time is the creation of NFTs, which often requires some hash of some previously submitted transaction id.

We have just covered the essentials of cooked-validators. If you are only interested in using our library, you can continue with the tutorials and examples in the repository. In the rest of the post I will go over some of the more technical internal details.

Free Monads and Staging

So how does this work? The core idea is to have two monads:

DirectT, that executes operations directly but doesn’t support somewhere
Staged, that is an AST which can be modified, and interpreted into DirectT

In order to be able to choose which monad we want, we abstract the core functionality we need in a type class called MonadBlockChain and write all of our functions with type: (MonadBlockChain m) => ... -> m res. For simplicity, let’s assume there are two base functions needed: validating a transaction and returning some id if validation succeeds, and a function to lookup the resources that belong to a certain address. We would then define MonadBlockChain as follows:

class (Monad m) => MonadBlockChain m where
  validateTx :: Tx -> m TxId
  lkupResources :: Address -> m [Resource]

instance (Monad m) => MonadBlockChain (DirectT m) where
  ...

instance MonadBlockChain Staged where
  ...

The idea behind Staged is to reify the base operations, declared in MonadBlockChain, into a GADT:

data Op :: * -> * where
  ValidateTx :: Tx -> Op TxId
  LkupResources :: Address -> Op [Resources]

Then we want an interpretation of Op in terms of the DirectT monad, called interpretOp:

interpretOp :: (Monad m) => Op a -> DirectT m a
interpretOp (ValidateTx tx) = validateTx tx
interpretOp (LkupResources addr) = lkpResources addr

With that at hand, we need to be able to construct values of type Op and chain them together. For this, we can use something like the Free Monad to build up an AST using operations in Op. Unfortunately, Op above is not a functor, hence, we cannot just take the free monad of Op. We must construct the operational monad of Op, which is also called the Freer monad:

data Freer (op :: Type -> Type) :: Type -> Type where
  Return :: a -> Freer a
  Bind   :: op a -> (a -> Freer b) -> Freer b

Our Staged monad then is just a particular instantiation of Freer:

type Staged = Freer Op

With that setup, we can now write a first interpreter for Staged. Let’s warm up with a trivial interpreter which only interprets returns and binds in the target monad:

interpret :: (Monad m) => Staged a -> DirectT m a
interpret (Return a) = return a
interpret (Bind op cont) = interpretOp op >>= interpret . cont

Constructing the AST

All that is left for us in order to use DirectT and Staged interchangeably is the instance for MonadBlockChain Staged, which is easy to write:

instance MonadBlockChain Staged where
  validateTx tx = Bind (ValidateTx tx) Return
  lkupResource addr = Bind (LkupResource addr) Return

Probabilistic traces for property-based testing

Up until now we can build ASTs with Staged which represent traces and can be interpreted with DirectT to witness their success or failure. That is good enough for unit testing, but we want to have property-based tests. The GenT transformer, is the key to achieve this, enabling us to write probabilistic traces:

instance (MonadBlockChain m) => MonadBlockChain (GenT m) where
  validateTx = lift . validateTx
  lkupResource = lift . lkupResource

We can rewrite our original example as a polymorphic, property-based test. The test will exercise the property that for all threshold and x < threshold, if we open a vault for threshold units then issue two transactions depositing and arbitrary x and threshold - x units, we must be able to withdraw the funds.

example :: (MonadBlockChain m) => GenT m ()
example = void $ do
  threshold <- choose (5000, 15000)
  vaultId <- validateTx (openCrowdFundingWithLimit threshold)
  x <- choose (1, threshold - 1)
  validateTx (fundVault vaultId x)
  validateTx (fundVault vaultId (threshold - x))
  validateTx (withdrawVault vaultId)

Here, example can be thought of as a probability distribution of traces. Calling runGenT example :: Staged () samples that probability distribution, yielding one trace at random that picks a particular instantiation of threshold and x.

We tie everything together with a simple forAllTr combinator which samples a distribution of traces, then interprets the random trace and applies a predicate to its result:

forAllTr :: GenT Staged a -> (Either BlockChainErr (a, BlockChainSt) -> Property) -> Property
forAllTr pTr prop = forAllBlind (runGenT pTr) $ \tr -> prop (runDirect (interpret tr) st0)

A test that checks that all traces in the distribution example above succeed can then be written as:

exampleOk :: Property
exampleOk = forAllTr example (property . isRight)

Up to this point we have addressed our first requirement only: express traces for property-based testing. However, we did it in a way that will enable us to address the second and third requirements: inform the user if something goes wrong and modify traces with combinators like somewhere.

What went wrong? Returning traces

If the exampleOk test fails at any point, the programmer has no idea of what went wrong. To improve this, we must change our interpret function to inspect and record the transactions being validated. This can be done with a modified version of forAllTr:

interpret :: (Monad m) => Staged a -> DirectT (WriterT [Tx] m) a
interpret (Return a) = return a
interpret (Bind op cont) = tellIfTx op >> interpretOp op >>= interpret . cont
  where tellIfTx (ValidateTx tx) = tell [tx]
        tellIfTx _ = return ()

forAllTr :: GenT Staged a -> (Either BlockChainErr (a, BlockChainSt) -> Property) -> Property
forAllTr pTr prop = forAllBlind (runGenT pTr) $ \tr
  -> let (res, descr) = runDirect (interpret tr) st0
      in counterexample (show descr) $ prop res

The type DirectT (WriterT [Tx] m) a is isomorphic to BlockChainSt -> (Either BlockChainErr (a, BlockChainSt), [Tx]), which gives us the sequence of transactions that yielded the given result, even if the result is actually an error, which is important.

Note that we could not just write a function observe :: Staged a -> [Tx], which extracts all the transactions used in a Staged trace. This would be possible if we relied on Applicative instead of Monad. Yet, in our case Monad is necessary because we may need the result of previous computations in order to construct the next transaction (think of hashes or signatures, for instance). This happens constantly in practice.

Modifying traces: adding modalities

At this point, we can write probabilistic traces and have a description of what happened if something goes wrong, but we still cannot modify transactions within traces. To accomplish that we will take inspiration in modal logic and define two dual functions:

somewhere  :: (Tx -> Maybe Tx) -> Staged a -> Staged a
everywhere :: (Tx -> Maybe Tx) -> Staged a -> Staged a

These combinators take a partial function: this way they are able not to affect certain transactions. Some transformations might only make sense for some transactions. Additionally, the semantics of Staged will be enriched. We now need:

interpret :: Staged a -> DirectT (WriterT TraceDescr []) a

which is equivalent to the type Staged a -> BlockChainSt -> [(Either BlockChainErr (a, BlockChainSt), TraceDescr)]. A Staged AST now denotes a set of possible executions in the DirectT monad, each of which has its own description of which transactions led to that particular outcome.

The everywhere f tr combinator maps a transformation f to every transaction tx in every outcome of tr such that isJust (f tx). Its dual, somewhere f tr will fork new outcomes from tr such that in each one, f was applied to exactly one transaction tx such that isJust (f tx). To illustrate this, let a, a', b, b' and c be arbitrary, but different, transactions. Now we define a transformation:

f :: Tx -> Maybe Tx
f a = Just a'
f b = Just b'
f _ = Nothing

Interpreting everywhere f (mapM_ validateTx [a,b,c]) should be equivalent to:

\st -> [ runDirect (validateTx a' >> validateTx b' >> validateTx c) st ]

That is, it returns the same set of traces, but with all possible transactions modified by f. Those that f returned Nothing are left untouched.

Interpreting somewhere f (mapM_ validateTx [a,b,c]) should be equivalent to:

\st -> [ runDirect (validateTx a' >> validateTx b  >> validateTx c) st
       , runDirect (validateTx a  >> validateTx b' >> validateTx c) st
       ]

The final definition of interpret gets a little involved, hence we will omit it here but we invite the interested reader to go check its source. The central idea in implementing it is maintaining a set of modalities that are yet to be consumed in the current branch. For instance, if we are interpreting a trace somewhere f tr, we collect Somewhere f as a modality that needs to be consumed and continue interpreting tr. Say we are interpreting Bind (ValidateTx tx) rest with two modalities to be consumed, [Somewhere f, Everywhere g]. We can either consume the Somewhere now, applying f to tx, or later, within rest. Yet, we must consume and preserve the Everywhere g in either case.

Closing thoughts

The combination of techniques presented in this post address an important obstacle in writing good test suites over code involving a state monad. In practice, they enable us to write complex traces for Plutus contracts, and we can use somewhere and everywhere to exhaustively attempt to execute an attack within some trace.

Having combinators is preferable to generating traces with hardcoded transformations. The resulting code is easier to understand and we get to write less code. A trace somewhere f tr doesn’t need to be fixed if something in tr changes. Moreover, we only have to maintain tr, and not a family of variations of tr that applies f to its different transactions. In general, it is useful to automatically rule out certain kinds of bad traces, which can be obtained as modifications over good traces through the injection of attacks and faults.

Behind the scenes

Victor Miraldo

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.