Getting Started

Filament is a programming language for Fearless Hardware Design. It aims to enable software programmers without much hardware background to start designing performant hardware. At its heart, Filament uses a type system to encode properties important for designing efficient hardware. This guide helps you install the various tools to make Filament work.

Minimal Build

A basic build, which does not support our automatic simulation harness, can be installed pretty easily: First, clone this repository: git clone https://github.com/cucapra/filament.git

Next, we can install the dependencies for the Filament compiler:

Install Rust which will configure the cargo tool.
Install one of the two SMT solvers
- Install z3:
  - On Mac OS: brew install z3.
  - On Ubuntu: apt install z3
- Install cvc5.
Build the compiler by running: cargo build in the root of the folder.

To check that the compiler works, run the following command:

cargo run -- tests/run/add.fil

Which should generate the Verilog implementing the original program.

Full Build

We'll need to install some tools from the Calyx compiler.

Calyx Docker Image

Calyx tools are provided using a docker image:

docker run -it ghcr.io/cucapra/calyx:0.4.0

If you're using the container, skip to configuring Filament tools.

Installing from Source

First, we need to configure the Calyx compiler which acts as the backend for Filament.

Clone the Calyx repository:

git clone https://github.com/cucapra/calyx.git --depth 1 --branch v0.3.0

Build the Calyx compiler:
```
cd calyx && cargo build
```

In order to simulate Filament programs, we need a couple more tools:

Install fud which manages hardware tools and makes it easy to test Filament programs.
- Install flit: python3 -m pip install flit
- Install fud: cd calyx/fud && flit install -s
- Check fud was installed: fud check. It will report some tools are missing. This is expected.
Install Icarus Verilog and configure fud to use it.
- Running fud check again should report that icarus-verilog was installed correctly.
Install runt: cargo install runt
Install jq
- On Mac OS: brew install jq
- On Ubuntu: apt install jq

Configuring Filament Tools

In the Filament repository, do the following:

Install cocotb: python3 -m pip install cocotb.
- Cocotb install can often fail. Check it was installed correctly by running python3 -c "import cocotb; print(cocotb.__version__)". If this command fails, see Debugging Cocotb Installation.
Register Filament's fud stages by running the command in the filament repository: fud register -p fud/filament.py filament
- Run fud check to make sure that the filament stages are correctly installed.

For a sanity check, run fud check. It should report that iverilog, jq, filament, futil, cocotb are correctly installed.

Once all tools are installed, running the following command should print out the test report:

runt -j 1 -o fail -d

Next Steps

Now that we have installed the Filament compiler and accompanying tools, we can start using Filament. Use the following links to learn more about Filament:

Writing your first Filament Program.
How do I integrate black-box Verilog with Filament?

Debugging Cocotb Installation

Cocotb requires the python shared library libpython.so/libpython.dylib (Mac OS) to work correctly. A common reason for a cocotb installation not working is when this library is missing.

To check if cocotb is able to find the shared library install find_libpython: python3 -m pip install find_libpython.

Next, run the following:

python3 -c "import find_libpython; print(find_libpython.find())"

If the above command does not print out anything, that means that the python library was not found and the python installation needs to be rebuilt.

If you use pyenv, the following command will install a python version with the shared library:

env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 3.10

Rerun the command to check that libpython was found after installing a new python version.

Hardware Design for the Curious Developer

Filament is a low-level hardware description language. This means that it does not have a lot of primitive constructs and essentially requires us to build up our hardware from scratch. However, Filament's type system helps us build small reusable components and guarantees that composing them generates efficient and correct hardware.

This tutorial does not assume familiarity with hardware design concepts.

Building an Arithmetic Logic Unit

Arithmetic Logic Units (ALUs) are a key component of most processors. In a nutshell, they perform various arithmetic operations based on a given op code. We will implement a simple ALU that either performs an addition or multiplication based on the op boolean. At a high-level, we want to build a circuit that performs the same computation as this python program:

def alu(op, left, right):
    if op:
        out = left * right
    else:
        out = left + right
    return out

The generated hardware will look something like this:

graph TB;
    L([left])
    R([right])
    O([op])
    A[Add]
    M[Mult]
    Mx[Mux]

    L & R -->A & M;
    A & M --> Mx -->out([out]);
    O--->Mx;

We start by defining a Filament component which wraps all the hardware required to implement some computation:

comp main<'G: 3>(
    go: interface['G],
     op: ['G, 'G+1] 1,
     left: ['G, 'G+1] 32,
     right: ['G, 'G+1] 32,
) -> ( out: ['G, 'G+1] 32)

The <G: 1> syntax defined the event G which can be thought of as the "start time" of the component. We define a module that takes the inputs op, left, and right and produces the output out. Since we're working with hardware, we need to specify the bitwidth of each input and output. Unlike other hardware description languages, Filament also requires us to specify exactly when we'll use the input signals and provide the outputs. The syntax @[G, G+1] states that the signal must be available in the half-open interval [G, G+1).

Next, we need to perform the computations. Since we're working with hardware designs, we don't get access to primitive operations like * and +; we must build circuits to perform these computations!

Thankfully, the Filament standard library defines these operations for us, so we can simply import those definitions and instantiate an adder and a multiplier:

import "primitives/core.fil";       // Defines Add
import "./sequential.fil"; // Defines Mult

comp main<G: 3>(...) -> (...) {
    A := new Add[32];
    M := new Mult[32];
}

We define two circuits A and M which represent a 32-bit adder and a multiplier respectively. The Add[32] syntax represents us passing the value 32 for the width parameter of the pre-defined components.

Next, we need to perform the two computations. In Filament, we have to specify the time when a particular computation occurs using an invocation:

    A := new Add[32];
    M := new Mult[32];
    a0 := A<G>(left, right);
    m0 := M<G>(left, right);

Here, a0 and m0 are invocations of the adder and the multiplier that are performed when the event G occurs. We provide values for the input ports of the adder and the multiplier. Finally, we can use a multiplexer to select between the output signals of the two invocations:

mx := new Mux[32]<G>(op, a0.out, m0.out);
out = mx.out

We make use of Filament's combined instance creation and invocation syntax to define a new multiplexer and use it when event G occurs. Finally, we forward the output from the multiplexer to the output signal of the component.

Coming from a software background, it might seem weird that we're performing both the computations first and selecting the output after the fact. However, a hardware circuit is always active¹–the multiplier and adder are always propagating signals and performing some computation even if the inputs are nonsensical. Furthermore, constructs like if-else are not compositional.²

The final program looks like this:

import "primitives/core.fil";
import "./sequential.fil";

/// ANCHOR: signature
comp main<'G: 3>(
    go: interface['G],
     op: ['G, 'G+1] 1,
     left: ['G, 'G+1] 32,
     right: ['G, 'G+1] 32,
) -> ( out: ['G, 'G+1] 32)
// ANCHOR_END: signature
{
    A := new Add[32];
    M := new Mult[32];
    a0 := A<'G>(left, right);
    m0 := M<'G>(left, right);
    mx := new Mux[32]<'G>(op, a0.out, m0.out);
    out = mx.out;
}

Checking Timing Behavior

Filament's prime directive is to ensure that your hardware is does not violate temporal constraints. Let's see what that means exactly by trying to compile our program. Save the file as alu.fil and run the following command from the Filament repository:

cargo run -- alu.fil

Filament tells us that the program is incorrect:

error: source port does not provide value for as long as destination requires
   ┌─ examples/tut-wrong-1.fil:17:39
   │
17 │     mx := new Mux[32]<'G>(op, a0.out, m0.out);
   │                                       ^^^^^^
   │                                       │
   │                                       source is available for ['G+2, 'G+3]
   │                                       required for ['G, 'G+1]

Compilation failed with 1 errors.
Run with --show-models to generate assignments for failing constraints.

Filament is telling us that our multiplexer expects its input during the interval [G, G+1) but the multiplier's output is only available in the interval [G+2, G+3). What went wrong? We started our adder and the multiplier at the same time (when event G occurs) but the multiplier seems to take longer. This is because multipliers are fundamentally different from adders–they require a lot more hardware and a lot more time to perform their computation. This temporal constraint–that multiplier may take several cycles while adders may not–is checked by Filament to ensure that our resulting hardware only reads meaningful values.

In order to fix this, we need to execute the multiplexer when the signal from the multiplier is available. However, in that case, we won't have access to the signal from the adder which only provides its output in the interval [G, G+1). We need to somehow make the signal from the adder last longer as well.

Saving Values for the Future

Registers are the primitive stateful building block for hardware designs and can extend the availability of signals. The signature of a register is complicated but interesting:

   // A register that can extend the lifetime of a signal to any required length.
   comp Register[WIDTH]<'G: 'L-('G+1), 'L: 1>(
      clk: 1,
      reset: 1,
      write_en: interface['G],
      in: ['G, 'G+1] WIDTH,
   ) -> (
      out: ['G+1, 'L] WIDTH,
   ) where 'L > 'G+1;

Notice the availability of the out signal: it is available in the interval [G, L) where L is provided to the component during its invocation. This means that a register can hold onto a value for as long as needed! The additional where clause ensures that out's interval is well-formed; it would be troublesome if we could say that out is available between [G+10, G+5).

Let's try to fix our program by making changes:

import "primitives/core.fil";
import "./sequential.fil";

comp main<'G: 3>(
    go: interface['G],
     op: ['G, 'G+1] 1,
     left: ['G, 'G+1] 32,
     right: ['G, 'G+1] 32,
) -> (
     out: ['G, 'G+1] 32,
) {
    A := new Add[32];
    M := new Mult[32];
    m0 := M<'G>(left, right);
    a0 := A<'G>(left, right);
    // Use register to hold the adder's value
    r0 := new Register[32]<'G, 'G+3>(a0.out);
    // Use the multiplexer when the mult's output is ready
    mx := new Mux[32]<'G+2>(op, r0.out, m0.out);
    out = mx.out;
}

We made a couple of changes to our program:

Run the multiplexer when the output from the multiplier is available (at G+2).
Save the value from the adder in the register invocation r0
Use the value from the register instead of the adder for multiplexing.

Sadly, Filament is still angry at us:

error: source port does not provide value for as long as destination requires
   ┌─ examples/tut-wrong-2.fil:19:29
   │
19 │     mx := new Mux[32]<'G+2>(op, r0.out, m0.out);
   │                             ^^
   │                             │
   │                             source is available for ['G, 'G+1]
   │                             required for ['G+2, 'G+3]

error: source port does not provide value for as long as destination requires
   ┌─ examples/tut-wrong-2.fil:20:11
   │
20 │     out = mx.out;
   │           ^^^^^^
   │           │
   │           source is available for ['G+2, 'G+3]
   │           required for ['G, 'G+1]

Compilation failed with 2 errors.
Run with --show-models to generate assignments for failing constraints.

The problem is that we accept the op input and produce the output out in the interval [G, G+1). However, we know that it is not possible to produce the output as soon as we get the input because the multiplier takes two cycles to produce its output!

A Correct Implementation

The fix is easy: we change the signature of the ALU to reflect this cruel reality

import "primitives/core.fil";
import "./sequential.fil";

/// ANCHOR: sig
comp main<'G: 3>(
/// ANCHOR_END: sig
    go: interface['G],
     op: ['G, 'G+3] 1,
     left: ['G, 'G+1] 32,
     right: ['G, 'G+1] 32,
) -> ( out: ['G+2, 'G+3] 32)
{
    A := new Add[32];
    M := new Mult[32];
    m0 := M<'G>(left, right);
    a0 := A<'G>(left, right);
    // Use register to hold the adder's value
    r0 := new Register[32]<'G, 'G+3>(a0.out);
    // Use the multiplexer when the mult's output is ready
    mx := new Mux[32]<'G+2>(op, r0.out, m0.out);
    out = mx.out;
}

And running the compiler again no longer generates any errors:

cargo run -- alu.fil

This is not quite true since we can build circuits where the clock signal to a particular sub-circuit is disabled (or "gated") based on a particular signal. However, this kind of clock-gating is generally not recommended for fine-grained usage. ↩
While control operators like if and for are supported in languages like Verilog, they don't quite work the same in all contexts. for loops are compile-time constructs whereas if can only be used for combinational circuits like adders but not multipliers. ↩

Running Filament Designs

Filament designs are compiled to Verilog using the Calyx backend and simulated using tools like Icarus Verilog. However, figuring out the right incantations to get these tools to work together and building testing harnesses can be tedious. We use fud to make the process of running Filament designs seamless: the user provides a file with the test data and runs a single command to compile, simulate, and generate outputs.

Data Format

The test runner's data format is a JSON file that contains the names of each port mentioned in a Filament program's main component. For example, for the tutorial ALU with the signature:

comp main<'G: 3>(
    go: interface['G],
     op: ['G, 'G+1] 1,
     left: ['G, 'G+1] 32,
     right: ['G, 'G+1] 32,
) -> ( out: ['G, 'G+1] 32)

We can have the following data file:

{
    "op": [ 0, 1, 0, 1 ],
    "left": [ 4, 5, 6, 7 ],
    "right": [ 7, 5, 11, 9 ]
}

The test harness operates with the idea of transactions where each transaction is a set of inputs and outputs corresponding to the indices into the JSON file. For example, the first transaction sends the inputs op[0], left[0], and right[0] to the Verilog design and capture outputs for out[0], corresponding to the output ports of the ALU.

This means, that the above data file will run the design with four inputs and capture four outputs. Adding another transaction is easy: just add another set of inputs to the JSON file.

Running Designs

Running the design is straightforward, assuming you've configured fud already:

fud e --to cocotb-out examples/tut-seq.fil \
      -s cocotb.data examples/data.json \
      -s calyx.flags ' -d canonicalize'

This instructs fud to compile the design to Verilog, setup the test harness, and run the simulation. The output captures the values on the out port of the ALU for each transaction in the data file and tells us how many cycles it took to run the design:

{"out": {"0": [28], "1": [10], "2": [66], "3": [16]}, "cycles": 12}

In general, the command to run designs is:

fud e --to cocotb-out <filament-file> \
      -s cocotb.data <data-file> \
      -s calyx.flags ' -d canonicalize'

Under the Hood

Note: If you're following the tutorial, skip to the Pipelining with Filament section and come back here after you've finished.

Filament's test runner uses the signature of the main component to decide how long to provide inputs for, when to capture the outputs, and when to schedule the next transaction.

Providing Inputs

The test harness holds the inputs for exactly the interval specified for each input and then provides 'x values to the input. This is done to make sure that incorrect designs that read inputs outside of their specified interval will not pass the test.

Scheduling Transactions

Because Filament is represents hardware pipelines, new transactions can start before the previous transaction has finished. By default, new transactions are scheduled exactly when the delay for the main event specifies. For example, if the main event has a delay of 2, then the next transaction will be scheduled after two cycles after starting the previous transaction.

However, it can be useful to change the scheduling behavior to check if there are pipelining bugs. Our fud-based harness provides a way to randomize the timing of transactions by adding a random delay:

fud e --to cocotb-out examples/tut-seq.fil \
      -s cocotb.data examples/data.json \
      -s cocotb.randomize 10 \
      -s calyx.flags ' -d canonicalize'

The -s cocotb.randomize 10 flag adds a random delay of up to 10 cycles between transactions.

Note: A well-typed Filament program produces the same output values regardless of the scheduling of transactions.

Pipelining with Filament

While we've designed a correct ALU, it is quite slow: it processes one input completely before moving on to the next. Such a hardware design is called a "fully sequential". A standard technique to improve throughput of the design is pipelining, which allows a hardware module to process multiple inputs at the same time.

Filament is designed so that check that a pipeline can support the throughput specified in its interface. We'll take our sequential ALU design and use Filament's type system to guide us to a correctly pipelined design. Before that, however, let's run the design and see how it performs:

fud e --to cocotb-out examples/tut-seq.fil \
      -s cocotb.data examples/data.json \
      -s calyx.flags ' -d canonicalize'

Which generates the following output:

{"out": {"0": [28], "1": [10], "2": [66], "3": [16]}, "cycles": 12}

Note that this sequential design takes 12 cycles to process 4 inputs.

We'll start with pipelining our program:

import "primitives/core.fil";
import "./sequential.fil";

/// ANCHOR: sig
comp main<'G: 3>(
/// ANCHOR_END: sig
    go: interface['G],
     op: ['G, 'G+3] 1,
     left: ['G, 'G+1] 32,
     right: ['G, 'G+1] 32,
) -> ( out: ['G+2, 'G+3] 32)
{
    A := new Add[32];
    M := new Mult[32];
    m0 := M<'G>(left, right);
    a0 := A<'G>(left, right);
    // Use register to hold the adder's value
    r0 := new Register[32]<'G, 'G+3>(a0.out);
    // Use the multiplexer when the mult's output is ready
    mx := new Mux[32]<'G+2>(op, r0.out, m0.out);
    out = mx.out;
}

Delays and Throughput

Filament uses an event's delay to determine when the module is can accept new inputs.

comp main<'G: 3>(

Note that the delay for the event G is 3 which indicates to Filament that the ALU process new inputs every three cycles. We can tell Filament that we instead want a module that can process new inputs every cycle by changing the delay to 1:

comp main<'G: 1>(

And run the compiler:

cargo run -- alu.fil

However, much to our dismay, Filament tells us that this program cannot be pipelined to achieve throughput 1. However, being the nice type checker it is, it will tell us exactly why the design cannot be pipelined in the form of type errors.

Let's work through each error and see how we can fix it.

Availability Intervals of Ports

The first error message points out that one of the inputs is required for three cycles, but the module may re-execute every cycle:

error: bundle's availability is greater than the delay of the event
  ┌─ examples/tut-pipe-wrong-1.fil:8:10
  │
5 │ comp main<'G: 1>(
  │               - event's delay
  ·
8 │      op: ['G, 'G+3] 1,
  │          ^^^^^^^^^^ available for 3 cycles

error: event provided to invocation triggers more often that invocation's event's delay allows

This is problematic because op represents a physical wire; it is incapable of holding multiple values. Our request to process inputs every cycle and have op last for three cycles is physically impossible. The fix is easy: looking at our original design, we see that op is only used by the multiplexer in the interval [G+2, G+3) so we can change the availability interval of op to be [G+2, G+3).

Note: If this step feel like divine insight, another way to reach the same conclusion is by changing the availability interval of op to be [G, G+1) which will cause the compiler to point out all availability intervals where op is used.

Delays of Subcomponents

The second error message points out that the ALU component may execute every cycle but the multiplier we used can only execute every two cycles:

error: event provided to invocation triggers more often that invocation's event's delay allows
   ┌─ examples/tut-pipe-wrong-1.fil:15:13
   │
 5 │ comp main<'G: 1>(
   │               - this event triggers every 1 cycles
   ·
15 │     m0 := M<'G>(left, right);
   │             ^^ event provided to invoke triggers too often
   │
   ┌─ examples/./sequential.fil:3:18
   │
 3 │ comp Mult[W]<'G: 2>(
   │                  - invocation's event is allowed to trigger every 2 cycles

Yet again, our request is physically impossible to satisfy: our multiplier circuit is fundamentally incapable of executing every cycle. Thankfully for us, the primitives/math.fil file provides a component called FastMult which does have delay 1:

/// Implementation of a multiplier with initiation interval 1 and latency 3.
/// Written in a way to allow Vivado to infer a DSP.
comp FastMult[W]<'G: 1>(
   left: ['G, 'G+1] W,
   right: ['G, 'G+1] W,
) -> (
   out: ['G+3, 'G+4] W,
) where W > 0

We can change out program to use this component instead:

import "primitives/core.fil";
import "./sequential.fil";

comp main<'G: 1>(
    go: interface['G],
     op: ['G+2, 'G+3] 1,
     left: ['G, 'G+1] 32,
     right: ['G, 'G+1] 32,
) -> ( out: ['G+2, 'G+3] 32)
{
    A := new Add[32];
    M := new FastMult[32];
    m0 := M<'G>(left, right);
    a0 := A<'G>(left, right);
    // Use register to hold the adder's value
    r0 := new Register[32]<'G, 'G+3>(a0.out);
    // Use the multiplexer when the mult's output is ready
    mx := new Mux[32]<'G+2>(op, r0.out, m0.out);
    out = mx.out;
}

However, in making this change, we've created a new problem for ourselves:

error: source port does not provide value for as long as destination requires
   ┌─ examples/tut-pipe-wrong-2.fil:18:41
   │
18 │     mx := new Mux[32]<'G+2>(op, r0.out, m0.out);
   │                                         ^^^^^^
   │                                         │
   │                                         source is available for ['G+3, 'G+4]
   │                                         required for ['G+2, 'G+3]

Compilation failed with 2 errors.

Filament tells us that FastMult's out port is available in the interval [G+3, G+4) instead of [G+2, G+3) for Mult, i.e., the latency of FastMult is different from the latency of Mult.

Filament catching this bug is important-it would be very easy to miss such a mistake in a Verilog program. Fixing it is quite mechanical:

import "primitives/core.fil";
import "./sequential.fil";

comp main<'G: 1>(
    go: interface['G],
     op: ['G+3, 'G+4] 1,
     left: ['G, 'G+1] 32,
     right: ['G, 'G+1] 32,
) -> ( out: ['G+3, 'G+4] 32)
{
    A := new Add[32];
    M := new FastMult[32];
    m0 := M<'G>(left, right);
    a0 := A<'G>(left, right);
    // Use register to hold the adder's value
    r0 := new Register[32]<'G, 'G+4>(a0.out);
    // Use the multiplexer when the mult's output is ready
    mx := new Mux[32]<'G+3>(op, r0.out, m0.out);
    out = mx.out;
}

Registers that Hold on for too Long

The final problem is quite similar to the previous one:

error: event provided to invocation triggers more often that invocation's event's delay allows
   ┌─ examples/tut-pipe-wrong-3.fil:16:28
   │
 4 │ comp main<'G: 1>(
   │               - this event triggers every 1 cycles
   ·
16 │     r0 := new Register[32]<'G, 'G+4>(a0.out);
   │                            ^^ event provided to invoke triggers too often
   │
   ┌─ ./primitives/./state.fil:6:29
   │
 6 │    comp Register[WIDTH]<'G: 'L-('G+1), 'L: 1>(
   │                             --------- invocation's event is allowed to trigger every 3 cycles

Compilation failed with 1 errors.

The compiler is telling us the register's delay is 3 cycles. However, unlike the multiplier, this is a consequence of our decision: we make the register hold on to its value for three cycles which increases its delay. The last line of the error message points to the problem: the delay of a register depends on how we use it; this means that if we make it hold onto a value for exactly one cycle, its delay is reduced to one.

However, the problem is that we need the computation from the adder to be available three cycles from when it starts. To get both the pipelining and correctness we want, we need to instantiate a chain of registers that feed values forward.

The intuition behind this is that because we want our ALU to process inputs every cycle, we need to "save" the computation in every cycle and push it forward.

The final program will look like this:

import "primitives/core.fil";
import "./sequential.fil";

comp main<'G: 1>(
    go: interface['G],
     op: ['G+3, 'G+4] 1,
     left: ['G, 'G+1] 32,
     right: ['G, 'G+1] 32,
) -> ( out: ['G+3, 'G+4] 32) {
    A := new Add[32];
    M := new FastMult[32];
    m0 := M<'G>(left, right);
    a0 := A<'G>(left, right);
    r0 := new Register[32]<'G, 'G+2>(a0.out);
    r1 := new Register[32]<'G+1, 'G+3>(r0.out);
    r2 := new Register[32]<'G+2, 'G+4>(r1.out);
    mx := new Mux[32]<'G+3>(op, r2.out, m0.out);
    out = mx.out;
}

Running the Pipelined Design

Now to the moment of truth: let's run the design and see how it performs:

fud e --to cocotb-out examples/tut-pipe.fil \
      -s cocotb.data examples/data.json \
      -s calyx.flags ' -d canonicalize'

We get the following output which shows that the design took only 7 cycles to process 4 inputs:

{"out": {"0": [28], "1": [10], "2": [66], "3": [16]}, "cycles": 7}

If you're still not convinced, try adding another transaction to the data file in examples/data.json and see how the cycle count for the original sequential and pipelined designs change.

Something Remarkable

During the process of pipelining, we all of our time looking at type errors. Once the program type checked, it produced the correct output and was correctly pipelined. Filament supports many other features but at its heart, this is the guarantee it provides: if your program type checks, it is correctly pipelined. Fast and correct, you can have both!

Using Verilog Modules in Filament

Filament is designed to make it easy to use Verilog modules in a correct and efficient manner. In short, to use a Verilog module, a Filament program needs to:

Provide the location of the Verilog module's source file
Provide a Filament signature for the Verilog module

Using `extern` to Import Verilog Modules

Filament's extern keyword allows us to specify the signatures of all Verilog modules in a source file. For example, if we have a file modules.sv that defines several modules:

module Add(input [31:0] a, input [31:0] b, output [31:0] c);
...
endmodule

module Mult(input [31:0] a, input [31:0] b, output [31:0] c);
...
endmodule

We can use the extern block to specify the location of the file and provide Filament signatures for each module:

extern "modules.sv" {
comp Add<G: 1>(
    @[G, G+1] a: 32, @[G, G+1] b: 32
) -> (@[G, G+1] c: 32);

comp Mult<G: 1>(
    @[G, G+1] a: 32, @[G, G+1] b: 32
) -> (@[G+2, G+3] c: 32);
}

Note that unlike a Filament component, the comp definitions do not have a body; they simply define the signature of the Verilog module.

Note. The location of the Verilog file is determined relative to the location of the Filament file containing the extern block.

Once the definitions are specified, the Filament compiler will automatically link the Verilog modules into the final design.

Defining the Right Interface

The trick with using external modules in Filament requires us to define the "right" interface. For example, one way of defining a combinational component is something that produces its output in the same cycle as its inputs. The following Filament signature captures this property

extern "comb.sv" {
comp Add<G: 1>(
    @[G, G+1] left: 32, @[G, G+1] right: 32
) -> (@[G, G+1] sum: 32);
}

The signature states that the Add module accepts an input in the first cycle and immediately produces the output in the same cycle.

However, another way to define a combinational component is something that can produce an output as long as its input is available; this means that the adder can produce the output for five or ten cycles as long as the input is available for the same number of cycles.

To capture such a signature, which can hold a signal for a caller defined number of cycles, we can use multiple events:

comp Add<G: L-(G), L: 1>(
    @[G, L] left: 32,
    @[G, L] right: 32
) -> (
    @[G, L] sum: 32
) where L > G;

The above signal states that the invocation of an Add instance gets to decide how long the output is available for. We require that the event L occurs after G to ensure that the intervals are well-formed. Finally, we also require that the delay of G is affected by the length of the signals; if the output is held for 10 cycles, then the adder cannot be used for 10 cycles.

Using such an component is straightforward:

A := new Add;
a := A<G, G+10>(l, r)

Default Binding for Events

In the above example, the signature for Add is more flexible that the original one. However, specifying the common case, where we use the adder for exactly one cycle, is a bit cumbersome:

A := new Add
a := A<G, G+1>(l, r)

Instead, we can provide a default binding for the event L that is used when the caller does not specify a value for L:

comp Add<G: L-(G), ?L=G+1: 1>(
    @[G, L] left: 32,
    @[G, L] right: 32
) -> (
    @[G, L] sum: 32
) where L > G;

The syntax ?L=G+1 tells the compiler to use the binding G+1 for L when there is no binding provided for it.

Note. Events with default bindings must occur after non-default events.

Optimizing Verilog Modules using Filament Signatures

Filament's signatures are a powerful tool–if we know that a Verilog module is only going to be used in a certain way, we can optimize the module to be used in that way. For example, if the module's interface requires that an input signal be available for multiple cycles, we don't have to save that signal in a register.

Metaprogramming Overview

When building hardware, it is often useful to design it parameterically so that we can use the same code to generate different hardware designs. For example, it is extremely common to design modules for numerical operations, such as adders, to be parameter over the bitwidth of the operands.

module Add #(
    parameter WIDTH = 32
) (
    input [WIDTH-1:0] a,
    input [WIDTH-1:0] b,
    output [WIDTH-1:0] c
);
    assign c = a + b;
endmodule

The Verilog module above specifies a parameter WIDTH that can be used to specify the bitwidth of the operands. User code can simply instantiate the module with the desired bitwidth:

Add #(.WIDTH(32)) adder_32(...);
Add #(.WIDTH(64)) adder_64(...);

Of course, this example hides the fact that metaprogramming ends up generating hardware. This is because the + operation needs to be implemented differently for different bitwidths; a 32-bit adder needs a lot more circuitry than a 1-bit adder. HDLs like Verilog provide abstractions like generate blocks to allow users to generate hardware at compile time. Languages like Chisel go one step further to generate hardware by writing Scala programs.

The challenge with generative programming is ensuring that the generated code is correct. This is harder than it seems because we don't have to ensure that one piece of code is correct; we have to ensure that all possible code generate-able from a module definition is correct. Scala's strong types are useful in ensuring that code generated by Chisel is free of some bugs, such as missing ports connections, but it misses crucial properties like correct pipelining and timing.

Filament's promise

Our goal with Filament is to provide an expressive metaprogramming model parameteric modules that typecheck are guaranteed to generate correctly pipelined hardware. Read on to see how we do this.

Loops and Bundles

Much like rest of Filament, generative programs in Filament need to be safe, i.e., correctly pipelined. We'll learn about two features in Filament that help us write safe programs: foreloops and bundles which help us write safe and efficient hardware modules.

Our running example will be a shift register. A shift register is a linear chain of registers that forward their values every cycle:

graph LR;
    R0 --> R1 --> R2 --> R3;

Our goal is to build a parameterizable shift register which takes some value N and builds a chain of N registers. The first order of business is to define the interface of the shift register.

comp Shift[#N](
    @[G, G+1] input: 32
) -> (@[G+#N, G+#N+1] out: 32);

Notice that we accept an input in the first cycle, [G, G+1), and is produced N cycles later in [G+#N, G+#N+1). For the implementation, we'd like to use a for loop to build up a chain of registers. Here is some python pseudocode for how we might do this:

# Initial output is just the input
cur_out = input
for i in range(0, N):
    # Build a new register and connect
    # its input to the previous output
    new_reg = Register(32)
    new_reg.input = cur_out
    # Update the current register and output
    cur_out = new_reg.out
# Output the final value
out = cur_out

Bundles

While straightforward, this code is hard for Filament to check: it does not understand when each register is used relative to the module's start time. We'll use a bundle to help Filament understand how the output signals from the register are used. A bundle is a sized array with a type describing when the values in the bundle are available:

bundle f[#N]: for<#i> @[G+#i, G+#i+1] 32

This defines a bundle f with N. The type for the bundle states that the value at index i in the bundle is available in the interval [G+i, G+i+1). For example, the value at f{0} is available at [G, G+1), f{1} is available at [G+1, G+2), and so on. Intuitively, each index in the bundle represents the input to a register in the shift register chain.

Loops

Filament loops are nothing special: they simply allow you to iterate over a numeric range:

for #i in s..e {
    <body>
}

This defines a loop where the value of #i ranges from s to e (exclusive).

Implementation

Using these two operations, we can implement a shift register. The following implementation is parameteric both over the width of the register and the number of registers in the chain:

// A component that delays `in` by D cycles.
// Uses the Delay component under the hood.
comp Shift[W, D, ?N=1]<'G: 1>(
   in[N]: ['G, 'G+1] W
) -> (
   out[N]: ['G+D, 'G+D+1] W
) where W > 0 {
   in_concat := new ConcatBundle[W, N]<'G>(in{0..N});

   bundle f[D+1]: for<k> ['G+k, 'G+k+1] W*N;

   f{0} = in_concat.out;
   for i in 0..D {
      d := new Delay[W*N]<'G+i>(f{i});
      f{i+1} = d.out;
   }
   out_split := new SplitWire[W, N]<'G+D>(f{D});
   out{0..N} = out_split.out{0..N};
}