Getting Started
Filament is a programming language for Fearless Hardware Design. It aims to enable software programmers without much hardware background to start designing performant hardware. At its heart, Filament uses a type system to encode properties important for designing efficient hardware. This guide helps you install the various tools to make Filament work.
Minimal Build
A basic build, which does not support our automatic simulation harness, can be installed pretty easily: First, clone this repository: git clone https://github.com/cucapra/filament.git
Next, we can install the dependencies for the Filament compiler:
- Install Rust which will configure the
cargo
tool. - Install one of the two SMT solvers
- Build the compiler by running:
cargo build
in the root of the folder.
To check that the compiler works, run the following command:
cargo run -- tests/compile/par.fil
Which should generate the Verilog implementing the original program.
Full Build
We'll need to install some tools from the Calyx compiler.
Calyx Docker Image
Calyx tools are provided using a docker image:
docker run -it ghcr.io/cucapra/calyx:0.4.0
If you're using the container, skip to [configuring Filament tools][#configuring-filament-tools].
Installing from Source
First, we need to configure the Calyx compiler which acts as the backend for Filament.
- Clone the Calyx repository:
git clone https://github.com/cucapra/calyx.git --depth 1 --branch v0.3.0
- Build the Calyx compiler:
cd calyx && cargo build
In order to simulate Filament programs, we need a couple more tools:
- Install
fud
which manages hardware tools and makes it easy to test Filament programs. - Install Icarus Verilog and configure
fud
to use it.- Running
fud check
again should report thaticarus-verilog
was installed correctly.
- Running
- Install
runt
:cargo install runt
- Install
jq
- On Mac OS:
brew install jq
- On Ubuntu:
apt install jq
- On Mac OS:
Configuring Filament Tools
In the Filament repository, do the following:
- Install cocotb:
python3 -m pip install cocotb
.- Cocotb install can often fail. Check it was installed correctly by running
python3 -c "import cocotb; print(cocotb.__version__)"
. If this command fails, see Debugging Cocotb Installation.
- Cocotb install can often fail. Check it was installed correctly by running
- Register Filament's fud stages by running the command in the filament repository:
fud register -p fud/filament.py filament
- Run
fud check
to make sure that the filament stages are correctly installed.
- Run
For a sanity check, run fud check
. It should report that iverilog
, jq
, filament
, futil
, cocotb
are correctly installed.
Once all tools are installed, running the following command should print out the test report:
runt -j 1 -o fail -d
Next Steps
Now that we have installed the Filament compiler and accompanying tools, we can start using Filament. Use the following links to learn more about Filament:
Debugging Cocotb Installation
Cocotb requires the python shared library libpython.so
/libpython.dylib
(Mac OS) to work correctly. A common reason for a cocotb installation not working is when this library is missing.
To check if cocotb is able to find the shared library install find_libpython
: python3 -m pip install find_libpython
.
Next, run the following:
python3 -c "import find_libpython; print(find_libpython.find())"
If the above command does not print out anything, that means that the python library was not found and the python installation needs to be rebuilt.
If you use pyenv
, the following command will install a python version with the shared library:
env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 3.10
Rerun the command to check that libpython
was found after installing a new python version.
Hardware Design for the Curious Developer
Filament is a low-level hardware description language. This means that it does not have a lot of primitive constructs and essentially requires us to build up our hardware from scratch. However, Filament's type system helps us build small reusable components and guarantees that composing them generates efficient and correct hardware.
This tutorial does not assume familiarity with hardware design concepts.
Building an Arithmetic Logic Unit
Arithmetic Logic Units (ALUs) are a key component of most processors. In a nutshell, they perform various arithmetic operations based on a given op code. We will implement a simple ALU that either performs an addition or multiplication based on the op
boolean. At a high-level, we want to build a circuit that performs the same computation as this python program:
def alu(op, left, right):
if op:
out = left * right
else:
out = left + right
return out
The generated hardware will look something like this:
graph TB; L([left]) R([right]) O([op]) A[Add] M[Mult] Mx[Mux] L & R -->A & M; A & M --> Mx -->out([out]); O--->Mx;
We start by defining a Filament component which wraps all the hardware required to implement some computation:
comp main<'G: 3>(
go: interface['G],
op: ['G, 'G+1] 1,
left: ['G, 'G+1] 32,
right: ['G, 'G+1] 32,
) -> ( out: ['G, 'G+1] 32)
The <G: 1>
syntax defined the event G
which can be thought of as the "start time" of the component.
We define a module that takes the inputs op
, left
, and right
and produces the output out
.
Since we're working with hardware, we need to specify the bitwidth of each input and output.
Unlike other hardware description languages, Filament also requires us to specify exactly when we'll use the input signals and provide the outputs. The syntax @[G, G+1]
states that the signal must be available in the half-open interval [G, G+1).
Next, we need to perform the computations. Since we're working with hardware designs, we don't get access to primitive operations like *
and +
; we must build circuits to perform these computations!
Thankfully, the Filament standard library defines these operations for us, so we can simply import those definitions and instantiate an adder and a multiplier:
import "primitives/core.fil"; // Defines Add
import "./sequential.fil"; // Defines Mult
comp main<G: 3>(...) -> (...) {
A := new Add[32];
M := new Mult[32];
}
We define two circuits A
and M
which represent a 32-bit adder and a multiplier respectively. The Add[32]
syntax represents us passing the value 32 for the width parameter of the pre-defined components.
Next, we need to perform the two computations. In Filament, we have to specify the time when a particular computation occurs using an invocation:
A := new Add[32];
M := new Mult[32];
a0 := A<G>(left, right);
m0 := M<G>(left, right);
Here, a0
and m0
are invocations of the adder and the multiplier that are performed when the event G
occurs. We provide values for the input ports of the adder and the multiplier. Finally, we can use a multiplexer to select between the output signals of the two invocations:
mx := new Mux[32]<G>(op, a0.out, m0.out);
out = mx.out
We make use of Filament's combined instance creation and invocation syntax to define a new multiplexer and use it when event G
occurs. Finally, we forward the output from the multiplexer to the output signal of the component.
Coming from a software background, it might seem weird that we're performing both the computations first and selecting the output after the fact. However, a hardware circuit is always active1–the multiplier and adder are always propagating signals and performing some computation even if the inputs are nonsensical. Furthermore, constructs like if
-else
are not compositional.2
The final program looks like this:
import "primitives/core.fil";
import "./sequential.fil";
/// ANCHOR: signature
comp main<'G: 3>(
go: interface['G],
op: ['G, 'G+1] 1,
left: ['G, 'G+1] 32,
right: ['G, 'G+1] 32,
) -> ( out: ['G, 'G+1] 32)
// ANCHOR_END: signature
{
A := new Add[32];
M := new Mult[32];
a0 := A<'G>(left, right);
m0 := M<'G>(left, right);
mx := new Mux[32]<'G>(op, a0.out, m0.out);
out = mx.out;
}
Checking Timing Behavior
Filament's prime directive is to ensure that your hardware is does not violate temporal constraints.
Let's see what that means exactly by trying to compile our program.
Save the file as alu.fil
and run the following command from the Filament repository:
cargo run -- alu.fil
Filament tells us that the program is incorrect:
error: source port does not provide value for as long as destination requires
┌─ examples/tut-wrong-1.fil:17:39
│
17 │ mx := new Mux[32]<'G>(op, a0.out, m0.out);
│ ^^^^^^
│ │
│ source is available for ['G+2, 'G+3]
│ required for ['G, 'G+1]
Compilation failed with 1 errors.
Run with --show-models to generate assignments for failing constraints.
Filament is telling us that our multiplexer expects its input during the interval [G, G+1) but the multiplier's output is only available in the interval [G+2, G+3).
What went wrong? We started our adder and the multiplier at the same time (when event G
occurs) but the multiplier seems to take longer.
This is because multipliers are fundamentally different from adders–they require a lot more hardware and a lot more time to perform their computation.
This temporal constraint–that multiplier may take several cycles while adders may not–is checked by Filament to ensure that our resulting hardware only reads meaningful values.
In order to fix this, we need to execute the multiplexer when the signal from the multiplier is available. However, in that case, we won't have access to the signal from the adder which only provides its output in the interval [G, G+1). We need to somehow make the signal from the adder last longer as well.
Saving Values for the Future
Registers are the primitive stateful building block for hardware designs and can extend the availability of signals. The signature of a register is complicated but interesting:
// A register that can extend the lifetime of a signal to any required length.
comp Register[WIDTH]<'G: 'L-('G+1), 'L: 1>(
clk: 1,
reset: 1,
write_en: interface['G],
in: ['G, 'G+1] WIDTH,
) -> (
out: ['G+1, 'L] WIDTH,
) where 'L > 'G+1;
Notice the availability of the out
signal: it is available in the interval [G, L) where L
is provided to the component during its invocation.
This means that a register can hold onto a value for as long as needed!
The additional where
clause ensures that out
's interval is well-formed; it would be troublesome if we could say that out
is available between [G+10, G+5).
Let's try to fix our program by making changes:
import "primitives/core.fil";
import "./sequential.fil";
comp main<'G: 3>(
go: interface['G],
op: ['G, 'G+1] 1,
left: ['G, 'G+1] 32,
right: ['G, 'G+1] 32,
) -> (
out: ['G, 'G+1] 32,
) {
A := new Add[32];
M := new Mult[32];
m0 := M<'G>(left, right);
a0 := A<'G>(left, right);
// Use register to hold the adder's value
r0 := new Register[32]<'G, 'G+3>(a0.out);
// Use the multiplexer when the mult's output is ready
mx := new Mux[32]<'G+2>(op, r0.out, m0.out);
out = mx.out;
}
We made a couple of changes to our program:
- Run the multiplexer when the output from the multiplier is available (at
G+2
). - Save the value from the adder in the register invocation
r0
- Use the value from the register instead of the adder for multiplexing.
Sadly, Filament is still angry at us:
error: source port does not provide value for as long as destination requires
┌─ examples/tut-wrong-2.fil:19:29
│
19 │ mx := new Mux[32]<'G+2>(op, r0.out, m0.out);
│ ^^
│ │
│ source is available for ['G, 'G+1]
│ required for ['G+2, 'G+3]
error: source port does not provide value for as long as destination requires
┌─ examples/tut-wrong-2.fil:20:11
│
20 │ out = mx.out;
│ ^^^^^^
│ │
│ source is available for ['G+2, 'G+3]
│ required for ['G, 'G+1]
Compilation failed with 2 errors.
Run with --show-models to generate assignments for failing constraints.
The problem is that we accept the op
input and produce the output out
in the interval [G, G+1). However, we know that it is not possible to produce the output as soon as we get the input because the multiplier takes two cycles to produce its output!
A Correct Implementation
The fix is easy: we change the signature of the ALU to reflect this cruel reality
import "primitives/core.fil";
import "./sequential.fil";
/// ANCHOR: sig
comp main<'G: 3>(
/// ANCHOR_END: sig
go: interface['G],
op: ['G, 'G+3] 1,
left: ['G, 'G+1] 32,
right: ['G, 'G+1] 32,
) -> ( out: ['G+2, 'G+3] 32)
{
A := new Add[32];
M := new Mult[32];
m0 := M<'G>(left, right);
a0 := A<'G>(left, right);
// Use register to hold the adder's value
r0 := new Register[32]<'G, 'G+3>(a0.out);
// Use the multiplexer when the mult's output is ready
mx := new Mux[32]<'G+2>(op, r0.out, m0.out);
out = mx.out;
}
And running the compiler again no longer generates any errors:
cargo run -- alu.fil
This is not quite true since we can build circuits where the clock signal to a particular sub-circuit is disabled (or "gated") based on a particular signal. However, this kind of clock-gating is generally not recommended for fine-grained usage.
While control operators like if
and for
are supported in languages like Verilog, they don't quite work the same in all contexts. for
loops are compile-time constructs whereas if
can only be used for combinational circuits like adders but not multipliers.
Running Filament Designs
Filament designs are compiled to Verilog using the Calyx backend and simulated using tools like Icarus Verilog. However, figuring out the right incantations to get these tools to work together and building testing harnesses can be tedious. We use fud to make the process of running Filament designs seamless: the user provides a file with the test data and runs a single command to compile, simulate, and generate outputs.
Data Format
The test runner's data format is a JSON file that contains the names of each port mentioned in a Filament program's main
component.
For example, for the tutorial ALU with the signature:
comp main<'G: 3>(
go: interface['G],
op: ['G, 'G+1] 1,
left: ['G, 'G+1] 32,
right: ['G, 'G+1] 32,
) -> ( out: ['G, 'G+1] 32)
We can have the following data file:
{
"op": [ 0, 1, 0, 1 ],
"left": [ 4, 5, 6, 7 ],
"right": [ 7, 5, 11, 9 ]
}
The test harness operates with the idea of transactions where each transaction is a set of inputs and outputs corresponding to the indices into the JSON file.
For example, the first transaction sends the inputs op[0]
, left[0]
, and right[0]
to the Verilog design and capture outputs for out[0]
, corresponding to the output ports of the ALU.
This means, that the above data file will run the design with four inputs and capture four outputs. Adding another transaction is easy: just add another set of inputs to the JSON file.
Running Designs
Running the design is straightforward, assuming you've configured fud
already:
fud e --to cocotb-out examples/tut-seq.fil \
-s cocotb.data examples/data.json \
-s calyx.flags ' -d canonicalize'
This instructs fud
to compile the design to Verilog, setup the test harness, and run the simulation.
The output captures the values on the out
port of the ALU for each transaction in the data file and tells us how many cycles it took to run the design:
{"out": {"0": [28], "1": [10], "2": [66], "3": [16]}, "cycles": 12}
In general, the command to run designs is:
fud e --to cocotb-out <filament-file> \
-s cocotb.data <data-file> \
-s calyx.flags ' -d canonicalize'
Under the Hood
Note: If you're following the tutorial, skip to the Pipelining with Filament section and come back here after you've finished.
Filament's test runner uses the signature of the main
component to decide how long to provide inputs for, when to capture the outputs, and when to schedule the next transaction.
Providing Inputs
The test harness holds the inputs for exactly the interval specified for each input and then provides 'x
values to the input.
This is done to make sure that incorrect designs that read inputs outside of their specified interval will not pass the test.
Scheduling Transactions
Because Filament is represents hardware pipelines, new transactions can start before the previous transaction has finished.
By default, new transactions are scheduled exactly when the delay for the main event specifies.
For example, if the main event has a delay of 2
, then the next transaction will be scheduled after two cycles after starting the previous transaction.
However, it can be useful to change the scheduling behavior to check if there are pipelining bugs.
Our fud
-based harness provides a way to randomize the timing of transactions by adding a random delay:
fud e --to cocotb-out examples/tut-seq.fil \
-s cocotb.data examples/data.json \
-s cocotb.randomize 10 \
-s calyx.flags ' -d canonicalize'
The -s cocotb.randomize 10
flag adds a random delay of up to 10 cycles between transactions.
Note: A well-typed Filament program produces the same output values regardless of the scheduling of transactions.
Pipelining with Filament
While we've designed a correct ALU, it is quite slow: it processes one input completely before moving on to the next. Such a hardware design is called a "fully sequential". A standard technique to improve throughput of the design is pipelining, which allows a hardware module to process multiple inputs at the same time.
Filament is designed so that check that a pipeline can support the throughput specified in its interface. We'll take our sequential ALU design and use Filament's type system to guide us to a correctly pipelined design. Before that, however, let's run the design and see how it performs:
fud e --to cocotb-out examples/tut-seq.fil \
-s cocotb.data examples/data.json \
-s calyx.flags ' -d canonicalize'
Which generates the following output:
{"out": {"0": [28], "1": [10], "2": [66], "3": [16]}, "cycles": 12}
Note that this sequential design takes 12 cycles to process 4 inputs.
We'll start with pipelining our program:
import "primitives/core.fil";
import "./sequential.fil";
/// ANCHOR: sig
comp main<'G: 3>(
/// ANCHOR_END: sig
go: interface['G],
op: ['G, 'G+3] 1,
left: ['G, 'G+1] 32,
right: ['G, 'G+1] 32,
) -> ( out: ['G+2, 'G+3] 32)
{
A := new Add[32];
M := new Mult[32];
m0 := M<'G>(left, right);
a0 := A<'G>(left, right);
// Use register to hold the adder's value
r0 := new Register[32]<'G, 'G+3>(a0.out);
// Use the multiplexer when the mult's output is ready
mx := new Mux[32]<'G+2>(op, r0.out, m0.out);
out = mx.out;
}
Delays and Throughput
Filament uses an event's delay to determine when the module is can accept new inputs.
comp main<'G: 3>(
Note that the delay for the event G
is 3
which indicates to Filament that the ALU process new inputs every three cycles.
We can tell Filament that we instead want a module that can process new inputs every cycle by changing the delay to 1
:
comp main<'G: 1>(
And run the compiler:
cargo run -- alu.fil
However, much to our dismay, Filament tells us that this program cannot be pipelined to achieve throughput 1. However, being the nice type checker it is, it will tell us exactly why the design cannot be pipelined in the form of type errors.
Let's work through each error and see how we can fix it.
Availability Intervals of Ports
The first error message points out that one of the inputs is required for three cycles, but the module may re-execute every cycle:
error: bundle's availability is greater than the delay of the event
┌─ examples/tut-pipe-wrong-1.fil:8:10
│
5 │ comp main<'G: 1>(
│ - event's delay
·
8 │ op: ['G, 'G+3] 1,
│ ^^^^^^^^^^ available for 3 cycles
error: event provided to invocation triggers more often that invocation's event's delay allows
This is problematic because op
represents a physical wire; it is incapable of holding multiple values.
Our request to process inputs every cycle and have op
last for three cycles is physically impossible.
The fix is easy: looking at our original design, we see that op
is only used by the multiplexer in the interval [G+2, G+3) so we can change the availability interval of op
to be [G+2, G+3).
Note: If this step feel like divine insight, another way to reach the same conclusion is by changing the availability interval of
op
to be [G, G+1) which will cause the compiler to point out all availability intervals whereop
is used.
Delays of Subcomponents
The second error message points out that the ALU component may execute every cycle but the multiplier we used can only execute every two cycles:
error: event provided to invocation triggers more often that invocation's event's delay allows
┌─ examples/tut-pipe-wrong-1.fil:15:13
│
5 │ comp main<'G: 1>(
│ - this event triggers every 1 cycles
·
15 │ m0 := M<'G>(left, right);
│ ^^ event provided to invoke triggers too often
│
┌─ examples/./sequential.fil:3:18
│
3 │ comp Mult[W]<'G: 2>(
│ - invocation's event is allowed to trigger every 2 cycles
Yet again, our request is physically impossible to satisfy: our multiplier circuit is fundamentally incapable of executing every cycle.
Thankfully for us, the primitives/math.fil
file provides a component called FastMult
which does have delay 1:
/// Implementation of a multiplier with initiation interval 1 and latency 3.
/// Written in a way to allow Vivado to infer a DSP.
comp FastMult[W]<'G: 1>(
left: ['G, 'G+1] W,
right: ['G, 'G+1] W,
) -> (
out: ['G+3, 'G+4] W,
) where W > 0
We can change out program to use this component instead:
import "primitives/core.fil";
import "./sequential.fil";
comp main<'G: 1>(
go: interface['G],
op: ['G+2, 'G+3] 1,
left: ['G, 'G+1] 32,
right: ['G, 'G+1] 32,
) -> ( out: ['G+2, 'G+3] 32)
{
A := new Add[32];
M := new FastMult[32];
m0 := M<'G>(left, right);
a0 := A<'G>(left, right);
// Use register to hold the adder's value
r0 := new Register[32]<'G, 'G+3>(a0.out);
// Use the multiplexer when the mult's output is ready
mx := new Mux[32]<'G+2>(op, r0.out, m0.out);
out = mx.out;
}
However, in making this change, we've created a new problem for ourselves:
error: source port does not provide value for as long as destination requires
┌─ examples/tut-pipe-wrong-2.fil:18:41
│
18 │ mx := new Mux[32]<'G+2>(op, r0.out, m0.out);
│ ^^^^^^
│ │
│ source is available for ['G+3, 'G+4]
│ required for ['G+2, 'G+3]
Compilation failed with 2 errors.
Filament tells us that FastMult
's out
port is available in the interval [G+3, G+4) instead of [G+2, G+3) for Mult
, i.e., the latency of FastMult
is different from the latency of Mult
.
Filament catching this bug is important-it would be very easy to miss such a mistake in a Verilog program. Fixing it is quite mechanical:
import "primitives/core.fil";
import "./sequential.fil";
comp main<'G: 1>(
go: interface['G],
op: ['G+3, 'G+4] 1,
left: ['G, 'G+1] 32,
right: ['G, 'G+1] 32,
) -> ( out: ['G+3, 'G+4] 32)
{
A := new Add[32];
M := new FastMult[32];
m0 := M<'G>(left, right);
a0 := A<'G>(left, right);
// Use register to hold the adder's value
r0 := new Register[32]<'G, 'G+4>(a0.out);
// Use the multiplexer when the mult's output is ready
mx := new Mux[32]<'G+3>(op, r0.out, m0.out);
out = mx.out;
}
Registers that Hold on for too Long
The final problem is quite similar to the previous one:
error: event provided to invocation triggers more often that invocation's event's delay allows
┌─ examples/tut-pipe-wrong-3.fil:16:28
│
4 │ comp main<'G: 1>(
│ - this event triggers every 1 cycles
·
16 │ r0 := new Register[32]<'G, 'G+4>(a0.out);
│ ^^ event provided to invoke triggers too often
│
┌─ ./primitives/./state.fil:6:29
│
6 │ comp Register[WIDTH]<'G: 'L-('G+1), 'L: 1>(
│ --------- invocation's event is allowed to trigger every 3 cycles
Compilation failed with 1 errors.
The compiler is telling us the register's delay is 3 cycles. However, unlike the multiplier, this is a consequence of our decision: we make the register hold on to its value for three cycles which increases its delay. The last line of the error message points to the problem: the delay of a register depends on how we use it; this means that if we make it hold onto a value for exactly one cycle, its delay is reduced to one.
However, the problem is that we need the computation from the adder to be available three cycles from when it starts. To get both the pipelining and correctness we want, we need to instantiate a chain of registers that feed values forward.
The intuition behind this is that because we want our ALU to process inputs every cycle, we need to "save" the computation in every cycle and push it forward.
The final program will look like this:
import "primitives/core.fil";
import "./sequential.fil";
comp main<'G: 1>(
go: interface['G],
op: ['G+3, 'G+4] 1,
left: ['G, 'G+1] 32,
right: ['G, 'G+1] 32,
) -> ( out: ['G+3, 'G+4] 32) {
A := new Add[32];
M := new FastMult[32];
m0 := M<'G>(left, right);
a0 := A<'G>(left, right);
r0 := new Register[32]<'G, 'G+2>(a0.out);
r1 := new Register[32]<'G+1, 'G+3>(r0.out);
r2 := new Register[32]<'G+2, 'G+4>(r1.out);
mx := new Mux[32]<'G+3>(op, r2.out, m0.out);
out = mx.out;
}
Running the Pipelined Design
Now to the moment of truth: let's run the design and see how it performs:
fud e --to cocotb-out examples/tut-pipe.fil \
-s cocotb.data examples/data.json \
-s calyx.flags ' -d canonicalize'
We get the following output which shows that the design took only 7 cycles to process 4 inputs:
{"out": {"0": [28], "1": [10], "2": [66], "3": [16]}, "cycles": 7}
If you're still not convinced, try adding another transaction to the data file in examples/data.json
and see how the cycle count for the original sequential and pipelined designs change.
Something Remarkable
During the process of pipelining, we all of our time looking at type errors. Once the program type checked, it produced the correct output and was correctly pipelined. Filament supports many other features but at its heart, this is the guarantee it provides: if your program type checks, it is correctly pipelined. Fast and correct, you can have both!
Using Verilog Modules in Filament
Filament is designed to make it easy to use Verilog modules in a correct and efficient manner. In short, to use a Verilog module, a Filament program needs to:
- Provide the location of the Verilog module's source file
- Provide a Filament signature for the Verilog module
Using extern
to Import Verilog Modules
Filament's extern
keyword allows us to specify the signatures of all Verilog modules in a source file.
For example, if we have a file modules.sv
that defines several modules:
module Add(input [31:0] a, input [31:0] b, output [31:0] c);
...
endmodule
module Mult(input [31:0] a, input [31:0] b, output [31:0] c);
...
endmodule
We can use the extern
block to specify the location of the file and provide Filament signatures for each module:
extern "modules.sv" {
comp Add<G: 1>(
@[G, G+1] a: 32, @[G, G+1] b: 32
) -> (@[G, G+1] c: 32);
comp Mult<G: 1>(
@[G, G+1] a: 32, @[G, G+1] b: 32
) -> (@[G+2, G+3] c: 32);
}
Note that unlike a Filament component, the comp
definitions do not have a body; they simply define the signature of the Verilog module.
Note. The location of the Verilog file is determined relative to the location of the Filament file containing the extern
block.
Once the definitions are specified, the Filament compiler will automatically link the Verilog modules into the final design.
Defining the Right Interface
The trick with using external modules in Filament requires us to define the "right" interface. For example, one way of defining a combinational component is something that produces its output in the same cycle as its inputs. The following Filament signature captures this property
extern "comb.sv" {
comp Add<G: 1>(
@[G, G+1] left: 32, @[G, G+1] right: 32
) -> (@[G, G+1] sum: 32);
}
The signature states that the Add
module accepts an input in the first cycle and immediately produces the output in the same cycle.
However, another way to define a combinational component is something that can produce an output as long as its input is available; this means that the adder can produce the output for five or ten cycles as long as the input is available for the same number of cycles.
To capture such a signature, which can hold a signal for a caller defined number of cycles, we can use multiple events:
comp Add<G: L-(G), L: 1>(
@[G, L] left: 32,
@[G, L] right: 32
) -> (
@[G, L] sum: 32
) where L > G;
The above signal states that the invocation of an Add
instance gets to decide how long the output is available for.
We require that the event L
occurs after G
to ensure that the intervals are well-formed.
Finally, we also require that the delay of G
is affected by the length of the signals; if the output is held for 10 cycles, then the adder cannot be used for 10 cycles.
Using such an component is straightforward:
A := new Add;
a := A<G, G+10>(l, r)
Default Binding for Events
In the above example, the signature for Add
is more flexible that the original one.
However, specifying the common case, where we use the adder for exactly one cycle, is a bit cumbersome:
A := new Add
a := A<G, G+1>(l, r)
Instead, we can provide a default binding for the event L
that is used when the caller does not specify a value for L
:
comp Add<G: L-(G), ?L=G+1: 1>(
@[G, L] left: 32,
@[G, L] right: 32
) -> (
@[G, L] sum: 32
) where L > G;
The syntax ?L=G+1
tells the compiler to use the binding G+1
for L
when there is no binding provided for it.
Note. Events with default bindings must occur after non-default events.
Optimizing Verilog Modules using Filament Signatures
Filament's signatures are a powerful tool–if we know that a Verilog module is only going to be used in a certain way, we can optimize the module to be used in that way. For example, if the module's interface requires that an input signal be available for multiple cycles, we don't have to save that signal in a register.
Metaprogramming Overview
When building hardware, it is often useful to design it parameterically so that we can use the same code to generate different hardware designs. For example, it is extremely common to design modules for numerical operations, such as adders, to be parameter over the bitwidth of the operands.
module Add #(
parameter WIDTH = 32
) (
input [WIDTH-1:0] a,
input [WIDTH-1:0] b,
output [WIDTH-1:0] c
);
assign c = a + b;
endmodule
The Verilog module above specifies a parameter WIDTH
that can be used to specify the bitwidth of the operands.
User code can simply instantiate the module with the desired bitwidth:
Add #(.WIDTH(32)) adder_32(...);
Add #(.WIDTH(64)) adder_64(...);
Of course, this example hides the fact that metaprogramming ends up generating hardware.
This is because the +
operation needs to be implemented differently for different bitwidths; a 32-bit adder needs a lot more circuitry than a 1-bit adder.
HDLs like Verilog provide abstractions like generate
blocks to allow users to generate hardware at compile time.
Languages like Chisel go one step further to generate hardware by writing Scala programs.
The challenge with generative programming is ensuring that the generated code is correct. This is harder than it seems because we don't have to ensure that one piece of code is correct; we have to ensure that all possible code generate-able from a module definition is correct. Scala's strong types are useful in ensuring that code generated by Chisel is free of some bugs, such as missing ports connections, but it misses crucial properties like correct pipelining and timing.
Filament's promise
Our goal with Filament is to provide an expressive metaprogramming model parameteric modules that typecheck are guaranteed to generate correctly pipelined hardware. Read on to see how we do this.
Loops and Bundles
Much like rest of Filament, generative programs in Filament need to be safe, i.e., correctly pipelined.
We'll learn about two features in Filament that help us write safe programs: for
eloops and bundle
s which help us write safe and efficient hardware modules.
Our running example will be a shift register. A shift register is a linear chain of registers that forward their values every cycle:
graph LR; R0 --> R1 --> R2 --> R3;
Our goal is to build a parameterizable shift register which takes some value N
and builds a chain of N
registers.
The first order of business is to define the interface of the shift register.
comp Shift[#N](
@[G, G+1] input: 32
) -> (@[G+#N, G+#N+1] out: 32);
Notice that we accept an input in the first cycle, [G, G+1)
, and is produced N
cycles later in [G+#N, G+#N+1)
.
For the implementation, we'd like to use a for
loop to build up a chain of registers.
Here is some python pseudocode for how we might do this:
# Initial output is just the input
cur_out = input
for i in range(0, N):
# Build a new register and connect
# its input to the previous output
new_reg = Register(32)
new_reg.input = cur_out
# Update the current register and output
cur_out = new_reg.out
# Output the final value
out = cur_out
Bundles
While straightforward, this code is hard for Filament to check: it does not understand when each register is used relative to the module's start time.
We'll use a bundle
to help Filament understand how the output signals from the register are used.
A bundle
is a sized array with a type describing when the values in the bundle are available:
bundle f[#N]: for<#i> @[G+#i, G+#i+1] 32
This defines a bundle f
with N
.
The type for the bundle states that the value at index i
in the bundle is available in the interval [G+i, G+i+1)
.
For example, the value at f{0}
is available at [G, G+1)
, f{1}
is available at [G+1, G+2)
, and so on.
Intuitively, each index in the bundle represents the input to a register in the shift register chain.
Loops
Filament loops are nothing special: they simply allow you to iterate over a numeric range:
for #i in s..e {
<body>
}
This defines a loop where the value of #i
ranges from s
to e
(exclusive).
Implementation
Using these two operations, we can implement a shift register. The following implementation is parameteric both over the width of the register and the number of registers in the chain:
// A component that delays `in` by D cycles.
// Uses the Delay component under the hood.
comp Shift[W, D, ?N=1]<'G: 1>(
in[N]: ['G, 'G+1] W
) -> (
out[N]: ['G+D, 'G+D+1] W
) where W > 0 {
in_concat := new ConcatBundle[W, N]<'G>(in{0..N});
bundle f[D+1]: for<k> ['G+k, 'G+k+1] W*N;
f{0} = in_concat.out;
for i in 0..D {
d := new Delay[W*N]<'G+i>(f{i});
f{i+1} = d.out;
}
out_split := new SplitWire[W, N]<'G+D>(f{D});
out{0..N} = out_split.out{0..N};
}