|
Virtual Processor (VProc)
by
Simon Southwell
19th June 2010
(last updated 30th June 2021)
abstract
A co-simulation element is presented which abstracts away the communication between
a user C interface and a verilog or VHDL task based interface, wrapped in a simple
behavioural HDL module, over the PLI, VPI or FLI of a simulator,
as a building block for a processing element, whose code is a normal C program
written and compiled for the host computer, controlling bus type operations in the
simulation environment. This 'virtual processor' (VProc) element has a basic
memory type interface in the HDL domain, and a simple API in the C domain, allowing
read and write operations and support for simple interrupts. The interfaces are
kept as simple as possible, but descibed is the means to build, on top of this
fundamental co-simulation element, any arbitrary complexity of bus interface
(e.g. AHB or PCI-Express) which can then be controlled by a user written program,
compiled for the host computer. The source code is freely available on
github.
Contents
Introduction
The concept of a virtual processor is far from new, as is the concept of
hardware/software co-simulation. Tools such as V-CPU from Summit or
Seamless from Mentor Graphics, give a professional way of running software
targetted for a processor or system under development early in the design cycle.
Although these tools are mainly aimed at running specifically targetted
processors (ARM, Mips, etc.), some have the ability to run in 'host' mode (such
as V-CPU), as a true 'virtual processor'.
In this context, a virtual processor is a means to run host compiled programs
as normal programs in the OS environment, with all its facilities available,
which are the control source for a bus of an associated instantiated HDL
module in a simulation environment running on the same machine. This has
the advantage of not relying on cross-compilers in order to introduce a
processing element in a simulation; either to act as a test harness stimulus, or
to replace an, as yet, non-existing embedded processor. With the software on
the virtual processor written in C or C++ (or with suitable C style linkage)
and a suitable I/O access API layer,
early development of code in this environment feeds forward directly to the
actual target processor with minimal changes.
The 'VProc' package is a thin 'veneer' to enable the development of a 'host'
virtual processor system and provide a basic co-simulation environment.
It isn't necessarily meant to be an end in itself, but hides the
co-simulation communication complexities for easily and quickly constructing
a virtual processor environment, with the bus functional capabilities
desired. Enough has been done to set up a link between one or more virtual
processors and user written programs, compiled into the simulation
executable. A very basic API is provided to the user code to allow control of
the HDL environment. What's not provided in VProc is the actual bus functional
models (BFMs) for specific protocols and their corresponding API software
layers. This is left for the developer. But this also means that any arbitrary
protocols can be supported, including proprietry ones. Enough functionality is
provided at the C/HDL boundary to support (it is believed) any arbitrary
protocol that one may conceive.
Where the boundary is chosen between
functionality in C and BFM support in HDL is not restricted by
this model. At one extreme, the pins of an HDL component can be
mapped directly in to the VProc address space and the user C code controls
the pins directly each cycle. At the other extreme, a complex HDL
model for a given bus has, say, control registers mapped into the VProc,
which simply reads and writes them to affect complex bus behaviour. The
choice is left to the implementer, and maybe affected by what's already
available in terms of source code and HDL models, as much as by the
preference of the designer, or speed requirements of the model.
For example, a PCIe model might simply have its 10 bit wide, pre-serialised,
lanes directly addressable by the virtual processor.
Only serialisation is left to do in HDL (if required)—all the rest
of the PCIe protocol could be handled by the user program associated
with the VProc. On the other hand,
a Virtual AMBA Bus Processor used in a
test harness might
utilse an already existing HDL interface block to connect to a transactional bus model
which was then controlled by mapping much simpler input signals into VProc.
The above diagram shows what a typical virtual processor system set of
layers might be. The bottom layers are provided with this system. On top of
that a BFM in the HDL domain and an equivalent Bus API in the C domain are
constructed to initiate bus transactions for the given targetted protocol. This
system can then be used to run a user program which communicates with the
target simulation.
Currently with this system up to 64 virtual processors may be instantiated in a
single simulation. The link between the user programs and the simulator is
implemented as messages passed via semaphore protected shared memory.
The interface between the HDL domain and the C environment is via the
standard verilog PLI interface or the VHDL foreign langauge interface (FLI),
supporting it in both languages. It
could just as easily have been via VCS's DirectC, or some other method
provided by the simulator (e.g. direct instantiation in SystemVerilog).
Each verilog module has access to two main 'tasks' in C (still within the
simulation's process). $vinit is used to initialise a node specific message
area and initiate communication with user code for that node in a new thread.
Each VProc module calls $vinit once at time 0, passing in a node number.
Communication between the HDL and the user code is done using
calls to $vsched. The bus status is sent with the node number, and routed to
the relevant user thread's message area and a semaphore set. The
simulation code then waits for a return semaphore. The user code, waiting for
a simulation message, responds with a message in the other direction, with a
node number and a new bus state plus a tick count, clearing the incoming
semaphore and setting its outgoing semaphore. Verilog state is updated via
the PLI, and the module waits for the number of ticks to have passed, or an
IO transaction is acknowledged, before calling $vsched again.
Bus Structure
In the bundled VProc package (see download section) a simple
memory bus module is provided to illustrate the use of the VProc elements. This
can be used as the starting point for connection to a BFM, but the VProc verilog
tasks can be used seperately in a user defined module wrapper if desired.
The verilog code is f_VProc.v in the top level VProc directory, with the software source
in code/, and the VHDL is f_vproc.vhd and f_vproc_pkg.vhd.
The bus interface of the provided VProc component is kept as simple as possible.
Any complexity required for a particular protocol must be added as a wrapper
around this simple interface, or a new wrapper created.
The IO consists of a data write and data read bus. The width of the data
busses is 32 bits. A 32 bit address is provided for the read and write
accesses, with associated write and read strobes (WE and RD).
Acknowledgement inputs for the accesses are WRAck and RDAck, which will
hold off returning to the user code until active (which could be immediately).
There is an interrupt input which causes an early call to $vsched when
active. Normally $vsched is only called when the tick count has expired,
(set when the previous $vsched call returned) or when an IO call is acknowledged. An interrupt
causes $vsched to be called, flagging the interrupt as the reason for the call, and
allowing an interrupt message to be sent to the relevant user thread. The return
message would not normally update the tick count (though it can), so that the original
scheduled call still gets invoked.
When a simulation 'finishes' (i.e. $finish is called, or the simulation is quit),
then $vinit is effectively called in the background. At this stage, the user
threads are shut down and the simulation cleanly exited.
The node number is also passed in to uniquely identify each VProc. If two
nodes have the same node number, then undefined behaviour will result. The
module definition is shown below:
module VProc (Clk, Addr, WE, RD, DataOut, DataIn,
WRAck, RDAck, Interrupt, Update, UpdateResponse, Node);
input Clk, RDAck, WRAck, UpdateResponse;
input [2:0] Node, Interrupt;
input [31:0] DataIn;
output [31:0] Addr;
output [31:0] DataOut;
output WE, RD, Update;
The Update output port is a special signal which allows multiple reads and
writes within a single delta cycle; useful when trying to construct or read
vectors wider than 32 bits. The port changes state (1 to 0 or 0 to 1) whenever
the outputs change. By waiting on this event, and updating local register
values each time it transistions, large vectors may be processed in a single delta cycle. For
example, suppose the VProc module is instantiated with a wire 'Update' on
the port, then an 'always @(Update)' can be used to instigate a decode of
the new address and strobes and update or return sub-words of a large
vector. Control of whether multiple Update events occur in a single delta
cycle is affected by a parameter into the access C procedures (see
Writing User Code
below). The Update signal needs a response to flag that the Updating is
complete, via the UpdateResponse input. This would normally be an external
register that is, like Update, toggled 1 to 0 or 0 to 1, at the end of the
update—say at the end of the Update always block.
It is not necessary to use the Update event signal if delta time
access of wide vectors is unnecessary. A normal inspection of the strobes on
an 'always @(posedge Clk)' (say) will work just fine, but then VWrite()
and VRead() must never be called with Delta set to 1 (see below for details of this)).
If delta updating is not required, then the UpdateResponse input should be directly
connected to the Update output as the VProc module will suspend until the response
comes back.
PLI syntax
The actual syntax of the PLI (or VPI) task calls are not important if using
the provided memory bus BFM (see above), as this is taken care of by the verilog in the VProc module. So
this section can be skipped if the memory bus BFM is to be used unmodified,
but if wishing to create one's own wrapper module, then the PLI tasks of VProc are as follows:
$vinit (NodeIn)
$vsched (NodeIn, InterruptIn, DataIn, DataOut, AddrOut, RWOut, TicksOut)
$vprocuser (NodeIn, ValueIn)
The VHDL has FLI equivalents of VInit, VSched and VProcUser,
defined in the f_vproc_pkg.vhd file.
All the arguments to the PLI tasks are verilog integer type, for simplicity
of interfacing to C. Vectors of verilog signals can still be passed in (including
padding with 0's), but care must be taken as an x or z on even a single bit causes
the whole returned value to be 0. This can be hard to debug, and so checks should be
made in the verilog code. For VHDL, the arguments must be converted to integers first,
which is done in f_vproc.vhd.
The $vinit task is usually called within an initial block. The
Node input needs to be a unique number for the VProc instantiated. If called
with a constant or a module parameter, then this can be at time 0 within the initial block.
If it is connected to a port or wire, even if ultimately connected to a constant, then
a small delay must be introduced before callin $vinit, as the call to the PLI
function and the wire assignment ordering is indeterminate.
The main active task call is to $vsched. It is this call that delivers
commands and data, and has new input returned. It too has a node input and, in addition,
an interrupt and data input. The interrupt is an integer value, but the valid range is
from 0 (no interrupts) to 7. The data input is a single 32 bit value. $vsched
is called for every input update required, or expired tick. On return, updated DataOut
and AddrOut values are returned, along with a directional RWOut. These map easily to the memory
style interface of the VProc module, but can be interpreted in any way. E.g. AddrOut is
a pin number, the DataOut value is the pin value (including x or z, say), and the RWOut
determines whether the pin value is just read, or both read and updated.
The TicksOut output is crucial to the timing of the $vsched calls.
In normal usage this should update a timer counter, which then ticks down until 0,
when $vsched is called once more. So usually a value equal or greater than zero is
expected. For a usage where communication is required every cycle, then this would
be zero (i.e. no additional cycles), but it can be an enormous value
(up to 2^31 - 1) if the software wishes to go to sleep.
If $vsched was called on an interrupt, then a new value of TickOut is
returned, where a value greater than zero can be used to override the current tick count,
or leave alone if 0. This allows a sleeping VProc to wake up on interrupt. A value
of less than 0 can be returned (for non-interrupt calls), indicating that the call is for a "delta cycle"
update, and another call is expected before advancing time in the simulation. This
allows multiple commands/updates to affect state before advancing the clock. Only
when a TickOut value of 0 or more is returned should the code cease calling
$vsched and processing the commands.
The $vprocuser syntax is a straight forward set of a node number
and single integer value. The value is not interpreted and is passed straight
on to the registered user function (if one exists).
For those interested in following this up further with an example usage, then look at
the f_VProc.v verilog and Pli.tab (or VSched_pli.c) code provided
for the memory interface BFM, or the f_vproc.vhd and f_vproc_pkg.vhd VHDL equivalents.
Only the PLI 1.0 task/function (TF) routines are used in the verilog virtual processor to
simplify and speed up the interface. I.e. all communications between C and verilog are
only via the arguments to the PLI tasks. The same is true for the VHDL FLI calls.
Of course, the PLI tasks are only one side of the interface, and they are connected
indirectly to C API functions. These are described in the next section.
Writing User Code
The user code that's 'run' on a virtual processor has an entry point to a
function whose prototype is as follows:
void VUserMainN(void);
The N indicates the node number of the virtual processor the
user code is run on. At start up, the initialisation code in the API
will attempt to call a user function of this form. So, for instance,
if there is instantiated a VProc with node number 8, VUserMain8()
is called. There must be one such function for each instantiated
VProc, otherwise a runtime error is produced, and, of course, each
instantiated node must have a different node number from all the rest.
When in a VuserMainN() function, the user code thus has access to some
functions for communication to and from the VProc bus in the simulation,
which appear as if defined as below:
int VWrite (unsigned int addr, unsigned int data, int delta, int node);
int VRead (unsigned int addr, unsigned int *data, int delta, int node);
int VTick (unsigned int cycles, int node);
void VRegInterrupt (int level, pVUserInt_t func, int node);
void VRegUser (pVUserCB_t func, int node);
void VPrint (char *format, ...);
In order to access these functions, VUser.h must be included at the head of
the user code file.
Input and Output
If the user code needs to write to the VProc's bus, then
VWrite() is called with data and address. The function will return after the
simulation has advanced some cycles depending when the write acknowledge
is returned in the VP module input and a status is returned. The status is
actually the DataIn value on the $vcsched parameters, and so is
implementation dependant, depending on the address decoding
arrangements outside of VProc. The Delta flag can override the advance of
simulation time or the waiting on an acknowledgement and performs the write
in the current simulation delta time. This can be employed when it is required
to write to a number of separate registers to form, say, a wide, bus specific,
transaction which may need to be issued once per clock cycle. Setting Delta
to 1 for all the writes except the last one allows words greater than 32 bits to
be constructed in a single cycle. Care must be taken that VWrite() (or
VRead() for that matter) is not always called with Delta set to 1, as this may
result in registers being overwritten without simulation time being advanced. It
is also necessary to have support for this feature in the verilog wrapper
around VProc, using the 'delta' event to instigate external register accesses
instead of the more normal clock edge
(see Verilog Bus Structure above).
Similarly a read is invoked with VRead(), which returns an arbitrary number
of cycles later since the simulation is waiting on a read acknowledge. The 32
bit read value is returned into the variable pointed to by the second argument.
A Delta input acts in the same way as writes, allowing wide data reads
without advancing simulation time, or waiting on an acknowledge.
Advancing Time
If the user code isn't expecting to need to process any IO data for some time,
it can effectively go to sleep for a number of clock ticks by calling VTick(),
passing in a cycle count. This should be used liberally to allow the simulator to
advance time without the need for heavy message traffic polling on a register
status (say). Interrupts can be used to flag events and wake up the main
process if the wait time is arbitrary or indeterminate (see below).
Interrupts
There are (currently) seven levels of interrupt. For every cycle that a VProc's
Interrupt input is non-zero, a call is made to a previously registered function.
To register a function VRegInterrupt() is call with the interrupt level (1 to
7) and a pointer to a function. The function pointer must be of type
pVUserInt_t (i.e. a pointer to a function returning an int, with void argument
list). The user interrupt function must return an integer, which is normally 0,
but may be > 0 if the interrupt function decides it wants to override the
schedule count of the outstanding IO or Tick call. A runtime error is generated
if the interrupt level is one which has no registered interrupt function.
In this release, the Interrupt routines cannot make calls to the IO routines (as
this will break the scheduling). They are intended only to flag events, which, if
IO status is required, the main routine can service. It should still possible to
emulate full interrupt service routines, even with this limitation. For example, if
the VRead() or VWrite() calls are wrapped inside another procedure,
interrupt event status can be inspected each time they return, and a call to a
handler made if required, based on status updated by a call to the inetrrupt
handler. This handler would now be part of the main thread, and can safely
make IO accesses. This implies that simulation IO calls are atomic, and won't
be interrupted (unless the interrupt function changes the pending tick count,
which affectively cancels the IO, which would need re-issuing). By adding this
slight complication in the user code, the VProc implementation becomes
much simpler.
User Callback
As well as registering interrupt callback functions, a general purpose callback
function may be registered using VRegUser(). It is similar to VRegInterrupt(),
but requires no 'level' argument. Registering a function in this manner does
not automatically attach it to an event, but instead attaches it to a
defined verilog task '$vprocuser(node, val)'. To invoke the registered function
$vprocuser is called with the node number corresponding to the VProc
instantiation running the user code. A value is also specified and is
passed in as an integer argument to the user function. This could be used
to select between different functions, depending from where the
task is called. Example uses might be to call the function at regular intervals
to dump debug state, or to call when about to finish to do some tidying up in the
user code etc. It should be noted that the registered function is synchronous to
the simulator thread, and not the user thread. Therefore, if the function
is to communicate with the main user thread it must do so in a thread safe manner.
The registered user function must be of type pVUserCB_t, which is to say
a function returning void, with a single integer argument. E.g.:
void VUserCB (int value);
If $vprocuser is invoked for a given node before a callback function has been
registered, then no effect is seen and the task exits. It is therefore safe
to invoke the task even if the user code never registers a function. However,
it is not safe to invoke the task for a non-existent node, and undefined
behaviour will result.
Log File Messages
The VPrint() function (actually a macro) allows normal printf type formatted
output, but sends to the simulation log output (and thus to any log file
being generated) instead of stdout. This makes correlation between verilog
log data and user code messages much easier, as they are sent to the same
output stream, and appear in the correct relative order.
Adding C functions to Simulators
Each of the simulators treats PLI code slightly differently, in the way it is linked as a
shared object/DLL. Three simulator examples are documented here.
VCS
The internal C functions called by the VProc modules' invocation of $vinit
and $vsched (VInit(), VSched() and VHalt()) are compiled into the
verilog during normal simulation compilation, along with all the user code and
any BFM support code
Simply add references to the list of .c files somewhere in the
vcs compile command. When compiling VSched.c for VCS, then 'VCS' must be defined,
which will usually be the case if compiled directly from the VCS command line.
If compiling to an object first, the use -DVCS. This contains the PLI code as
well, which the simulator must be informed about with a PLI table file (Pli.tab
provided), and indicated in the command line with the -P option. E.g.:
vcs *.c -P Pli.tab -Xstrict=0x01
Needless to say, the user code, VSched.c and Pli.tab files need to be in
the compilation directory, or the above modified to reference the files
remotely. The -Xstrict=0x01 is a VCS requirement for using threads in PLI code,
and the pthread and rt libraries must be compiled in (e.g. use the -syslib option).
If VProc is being used as a library component in a system which has additional PLI
functionality, then the system's Pli.tab must be extended with the entries in that
supplied by VProc.
NC-Verilog
Adding C functions to NC-Verilog is slightly different to VCS. Firstly the PLI
code must be compiled as a shared object. In our case, this includes all the
user code as well, compiled into a file VSched.so (e.g. use -fpic -shared
options with gcc). The PLI table, unlike for VCS, is compiled in as an
array, so no extra table file is required. To access this code the
+ncloadpli1 command line option is used at compile time. E.g.
ncverilog <normal compile options> +ncloadpli1=VSched:bootstrap
The pthread and rt libraries must also be linked with the shared object.
If VProc is being used as a library component in a system which has additional PLI
functionality, there can only be one set of 'veriuser' tables and boot functions.
The system using VProc must extend its own veriusertfs table to include
the VProc entries. The VSched_pli.h header includes definitions for these
entries as VPROC_TF_TBL, along with the number of entries (VPROC_TF_TBL_SIZE).
The new system's table can then look something like the following:
s_tfcell veriusertfs[VPROC_TF_TBL_SIZE + MY_TBL_SIZE] =
{
VPROC_TF_TBL,
<my table entries>,
{0}
};
When building VProc, a static library of the functions is generated as libvproc.a,
for linking with the new system. This does not contain a compiled object for
veriuser.c. The new system must provide this functionality if adding
additional PLI functionality, extended as just mentioned. If no new functionality
is to be added, the VProc code can be used, and veriuser.c from
VProc compiled into a shared object, along with the code from libvproc.a.
Note: when linking library functions into a shared object (with ld)
the -whole-archive/-no-whole-archive pair must bracket the library
references in order to have the shared object contain the library functions.
When using gcc to compile and link in a single step , the options become
-Wl,-whole-archive/-Wl,-whole-archive.
ModelSim
ModelSim compiles the verilog or VHDL, and runs the simulation seperately, with the commands
vlog (or vcom) and vsim respectively. The PLI/FLI code is referenced at the running
of the simulation. So, assumning the PLI/FLI C code was compiled as VProc.so, the
vsim command is structed as shown below:
vsim <normal compile options> -pli VProc.so
As with NC-Verilog, the pthread and rt libraries must be linked with the shared object.
If the verilog VProc is being used as a library component in a system which has additional PLI
functionality, there can only be one set of 'veriuser' tables and boot functions.
The system using VProc must extend its own veriusertfs table to include
the VProc entries. The VSched_pli.h header includes definitions for these
entries as VPROC_TF_TBL, along with the number of entries (VPROC_TF_TBL_SIZE).
The new system's table can then look something like that shown for NC-Verilog,
above. For VHDL, the connections to the functions are defined in the f_vproc_pkg.vhd file,
and a separate table is not necessary.
When building VProc, a static library of the functions is generated as libvproc.a,
for linking with the new system. This does not contain a compiled object for
veriuser.c. The new system must provide this functionality if adding
additional PLI functionality, extended as just mentioned. If no new functionality
is to be added, the VProc code can be used, and veriuser.c from
VProc compiled into a shared object, along with the code from libvproc.a.
Note: when linking library functions into a shared object (with ld)
the -whole-archive/-no-whole-archive pair must bracket the library
references in order to have the shared object contain the library functions.
When using gcc to compile and link in a single step , the options become
-Wl,-whole-archive/-Wl,-whole-archive.
Compilation Options
There are various make files for VProc to support different simulators. The default file (makefile) is for modelsim,
but other simulators are supported. The list below shows the supported simulators for the Verilog version of VProc,
and their respective make files.
| makefile | : | ModelSim |
| makefile.nc | : | NC-Sim |
| makefile.vcs | : | VCS |
| makefile.ica | : | Icarus |
| makefile.cver | : | GPL CVer |
When using the VHDL version of VProc then makefile.vhd is used. This only supports ModelSim currently
(the other simulators being Verilog only).
By default, the verilog compilations use the PLI task/function API (PLI 1.0). VProc supports use of the VPI (PLI 2.0)
and can be compiled to use this if VPROC_PLI_VPI is defined when compiling the code. The make files can be updated
to define this internally, but a variable (USRFLAGS) may be set when calling make to set the VPROC_PLI_VPI
definition. E.g. make USRFLAGS=-DVPROC_PLI_VPI -f makefile.ica.
Delivered Files
In order to use the virtual processor, the following files are used.
| f_VProc.v | : | Virtual processor verilog |
| f_vproc.vhd | : | Virtual processor VHDL |
| f_vproc_pkg.vhd | : | Virtual processor VHDL package defining FLI tasks |
| VSched.c | : | Simulation (server) side C code |
| VSched_pli.h | : | Common PLI definitions and prototypes |
| veriuser.c | : | PLI table (NC-Verilog/ModelSim) |
| Pli.tab | : | PLI table (VCS only) |
| VProc.h | : | VProc layer definitions |
| VUser.c | : | User (client) side C code |
| VUser.h | : | User (client) side header file |
| VUserMainT.c | : | Template for user code |
Note: If using NC-Verilog, the compiled shared object, VSched.so, must be available in the invocation
directory.
VProc is released under the GNU General Public License (version 3). See LICENSE in the
downloadable package for details (see VProc Download section to access
package).
Along with the above files are delivered example makefiles for compiling the C and verilog.
These have been tested in a specific environment and are for reference only. Adaptations will
need to be made to the local host environment, and no guarantees are given on their
validity. A simple test example is also bundled, with a basic random memory access from
one VProc, and an interrupt generation from another. Again, as for the makefiles, this
is for reference only.
VProc Download
The VProc package can be downloaded from github. This contains
all the files needed to use VProc, along with example makefiles, scripts
and test bench. Note that the only recent testing has been done with ModelSim on Linux.
Appendices
Message Passing
The figure below shows a typical exchange of messages between the HDL
simulation and one of the user threads. There can, of course, be multiple user
threads (one for each VProc) with similar interactions with the simulator.
As can be seen, there are normally two types of messages exchanged
between user thread and simulation. The simulation, at time 0, always sends
the first message. The simulation message to the user thread (a 'receive'
message) includes the value of the DataIn port value, as well as an interrupt
status flag. Once $vsched is called and a receive message sent, the
simulation is effectively paused. Running in VUserMainN(), the simulation will
not advance until VWrite(), VRead() or VTick() is invoked. Any amount
of processing can occur in the user process before this happens, but it
effectively happens in zero (delta) time as far as the simulation is concerned.
When, say, a VWrite() is eventually called a 'send' message is sent back
to the simulation with update data. In addition a Tick value is returned which is
usually 0, meaning that the simulation will not call $vsched again until the
write (or read) has been acknowledged externally. However it can be -1, in
which case $vsched is called again after the update without waiting for an
acknowledge, or waiting for the next clock edge. This allows a delta time
update, enabling vectors wider than 32 bits be written (or read) before
allowing the simulation to act upon the updated data. This is shown in the
second exchange in the above figure. Simulation time can also be advanced
without the need to perform a read or a write. Calling VTick() from the user
process sends a message with a positive value for the Ticks. This simply
delays the recalling of $vsched by the number of cycles specified. Effectively
the user process can go to sleep for a set number of cycles; useful if waiting
for an event that is known to take a minumum, but significant, time.
The send and receive messages are always paired like the exchanges in the
above figure—one never sees two or more consecutive messages in the
same direction. This gives full synchronisation between the simulation
process and the user thread, and is controlled with send and receive
semaphores—a pair for each VProc instantiated. If an interrupt occurs, a
receive message is still delivered (see figure), but potentially earlier than
expected. This will send the interrupt level in the message, and the
appropriate registered function is called. When the function returns, a new
send message is sent back, but the update values are ignored. Only the tick
value is of significance. Normally a tick value of 0 is sent back. In this case the
original state for the outstanding access is retained, and so the expected
receive message for the original VRead/VWrite is still generated, when it
would have been, if there had been no interrupt. A tick value greater than zero can be
returned by the interrupt function, in which case the current outstanding tick
count can be overridden, and the next receive message invoked earlier or
later than originally set. This is useful for cancelling or extending a VTick()
call. Say one has set off some action in the simulation which must complete
before the user code can continue, but it is of abitrary length in time. By
calling VTick() with a very large number, and by arranging the external verilog
to invoke an interrupt when the action is completed, the interrupt routine
called can clear the tick count to zero, and the user code will fall through the
VTick() at just the right time, without, for example, the need to continually poll
a register status, generating large amounts of message traffic and slowing the
simulation down.
Building more complex Virtual Processors
The VProc, as presented above, presents a simple environment, which has
a memory style interface in the verilog domain, and a very simple read/write
style API for the user C program to use, with simple support for interrupt
handling. This would have limited scope as a useful verification tool
if that was the limit of its capabilities. The main purpose of VProc is
to hide away the verilog/C interface complexities, and allow a user
to have an environment where a more useful and realistic virtual processor
may be built. As such, two scenarios are described below, where VProc could be
used to create more practical test elements.
An AHB processing element (ARM substitute).
Suppose a test environment for an ARM based chip is created which is using
an ARM model or netlist in the simulation. The test code for the processor
needs to be cross-compiled to target the ARM, with the limitations on
embedded ROM and RAM, and the compilation setup for the different areas
of memory etc. relevant to the processor in the simulation. It may be that
for greater than 80% of the tests it is not important that the code is
running exactly as would be on the silicon implementation of the processor,
but only bus transactions, on the AHB bus from the processor, are valid
to configure the chip, instigate operations, monitor status, and log information
to the simulation log. VProc can be the base to replace the ARM model in
these situations, giving additional facilities of computation (all host
libraries are available), checking, logging etc., that aren't possible
in the actual processor model.
In order to do this VProc needs to be wrapped in some code to turn its
memory based interface into an AHB interface, and the C API extended
to wrap up the basic VProc API. In this example, let's assume that there
is available a bus functional model (BFM) in behavioural verilog for
AHB. This BFM, say, is controlled by reading and writing internal
registers via a memory mapped interface. In that case, it is a simple
matter to connect VProc's memory style interface to the register interface
of the BFM, and wrap the whole in another module to hide away the details.
Controlling the BFM from C code is now simply a matter of sequences of
read and write calls over the VProc API to configure the registers and
instigate bus traffic. It is likely that for any given bus type transaction
multiple register accesses are likley, so the normal thing to do would be
to extend the VProc API with fundamental AHB bus API operations, such that
all supported BFM transactions are a single C function call. This new AHB
API is then the interface for verification tests to be written.
As an aside, one could imagine that if an instruction set simulator (ISS)
existed for the processor being modelled (in this case an ARM processor),
then this could be interfaced to the API described above to provide
a full instruction capable model for verilog simulation. In this ARM scenario,
this defeats the object of replacing the original ARM model, but suppose
a new processing element or microcode engine is being designed, and
an ISS is available long before the RTL design is ready and verified,
then this could allow early testing of the rest of the RTL, and even
as a platform for embedded firmware test and development.
PCI Express Host Model
This scenario is for a model to drive a PCI Express (PCIe) interface. The
PCIe interface on the chip under test is the element that is under verification,
and needs a transactor to drive it. In this scenario, let's assume, unlike for
the AHB model, that there is no existing BFM model in verilog. Whatever code drives
the bus (actually link in the case of PCIe) must be written from scratch. We
want to use VProc to allow a C program to deliver transactions over the bus,
and receive and process returned data etc.
Because there is no BFM, and we are going to have to write a C API for the
PCIe transactions anyway, let's decide that an absolute minimum of verilog
is going to be written, and that we will model almost everything in C. Thus
the verilog will consist only of mapping each of the PCIe lanes (the serial
input and output ports) in to locations in the VProc memory map. In this case
each lane will be a 10 bit value, with a behavioural serialiser/deserialiser
in verilog. Thus lane 0 is mapped at address 0, lane 1 at address 1 etc. To
update all lanes (anything from 1 to 32 can be used at once), delta updates
to write to all the lanes (bar the last, to increment the clock) are done,
returning the read value for the said lane on return. In this scenario,
a single cycle always elapses for each lane set update, and the lanes
are updated every cycle.
This is about as fundamental a C to verilog mapping that is possible with
VProc. Each IO pin is simply mapped to memory and update/sampled for
every clock. The C API then has to extend the basic VProc API in to
all the PCIe transaction types. This is a much more complex task than
for the AHB scenario above, but is made here simply because this
functionality must be coded somewhere (there is no BFM, remember), and
so C was the choice made. The C code would need to provide some basic conversions
for the PCIe standard, such as 8/10 encoding/decoding, inline data pattern
generation such as ordered sets, data link transactions like flow control
and the data transactions such as memory, IO and configure reads and writes.
A transmit queueing system is required, and the ability to handle split completions
etc. This code should be written to hide these details and present an extended
API to a user which allows single call access to the bus for every type
of transaction. As you may appreciate by now, this is a non-trivial task,
and VProc does not, of itself, solve these problems—it just enables
the ability to do this. Under different circumstances it might be better to place more
functionality in verilog behavioural code, and simplify the C code. It will depend
on local circumstances.
Just such a model, for a 16 lane PCIe (v1.1 and 2.0), has been constructed,
and is documented here.
It includes an example test bench and compilation for ModelSim, with two
models linked back to back, and also has bundled
some link display modules. This should serve as a sufficiently complex case study
of the power of the VProc component, and of what can be easily achieved with it.
Limitations of VProc
There are few limitations to the model, which compiles and runs on a variety
of simulators (4 have been tested), and platforms (Linux, Solaris). There is
one flaw however, regarding the use of save and restore (or checkpointing, as it
is sometimes known). Although support for checkpointing has been implemented
in the model, inconsistent results are acheived—some simulation runs have
worked, others behave differently, even by simply saving a checkpoint, and
not just after a restart. Currently there is no fix for this in the published
version of VProc, and the use of checkpointing is not yet supported.
Pending Enhancements
- Transfer of data blocks over API
- Common entry point for all nodes (VUserMain()), to allow common code on all nodes.
- Fix checkpointing
- Pointer memory accesses (i.e. not read/write functional calls, but pointer references)
Conclusions
A fundamental co-simulation element, VProc, has been described which virtualises away
the C and simulation interface to give a basic processing element, controllable
by host compiled code. This basic element provides enough functionality such that
any arbitrary processing element with a given bus can be constructed, and two
scenarios given (AHB and PCIe) with two very different approaches, showing
the ultimate flexibility of the VProc element. The VProc code is available for
download and is released under the GNU GPL version 3.
This code, it is hoped, will allow engineers to construct highly flexible test
elements, where none already exists, with the bus functionality they require
combined with the power and flexibility of a full host programming environment.
The two bus scenarios described were based on real examples using VProc, but it is
hoped that VProc will be used in even more ways than this or originally envisaged.
Copyright © 2002-2021 Simon Southwell
simon@anita-simulators.org.uk
|