Reference Manual
for the LatticeMico32 soft CPU
Instruction Set Simulator
Simon Southwell
August 2016
(last updated: 19 April 2019)
abstract
A C/C++ instruction set simulator model of the LatticeMico32 soft CPU
core is presented. In addition to the core model, with both a
C++ and C
application programming interface, described are a
test platform excutable with a suite of tests,
a GDB remote debug interface, and a
case study for an embedded
Linux system based around the model, to demonstrate
its capabilities. The code for both are included, with the ISS source, in a
downloadable package on GITHUB,
and released under GPL v3.0. The
internal design is also described to allow ease of extensibility by
third parties, and for informative purposes. Finally, details are given
on how to use the ISS in a
multi-processor system model.
Contents
Introduction
The intent of presenting this model is mostly informative and (hopefully)
it tries to strip down fundamental processor
system concepts into simple, easily digestable pieces, whilst still ending up with
a sufficiently complex system that has all the main concepts contained within it, such that
an OS can be booted (µClinux), and the model can interface to third party IDEs and
debugging systems (such as Eclipse, via GDB). More complex concepts, such as caching,
pipelining and memory protection and management (for multi-threading etc.), are skipped for
clarity, as these might be considered as optimisations that have been added to processor
systems for reasons of speed and ease of use since microprocessors first appeared (which did
not have these), and are not the foundational constructs of a processor based system.
The more advanced topics will have to wait for another article.
The LatticeMico32 processor was chosen for this project as having a straightforward architecture that is more easily
understood compared to, say, a 32 bit ARM processor, but is still a modern 32 bit embedded processor
currently used and supported, with a toolchain that is relevant to other current systems developments,
and is thus transferable knowledge. The LatticeMico32 already has an ISS built-in to the toolset supplied
by Lattice Semiconductors, and this model is not meant to replace it. However, by having source code
freely available, with clarity of understanding at its heart, the model is readily modified for
non-standard systems, or extended for additional peripherals, or even simply experimented with to aid
understanding of the system as it stands.
The package includes
the source code for the instruction set simulator,
modelling the Lattice Mico32 soft-CPU. The core model is coded as a single C++ class
(lm32_cpu),
which can be, and is meant to be, integrated into other system models. It implements all the non-optional
features, and most of the optional features of that core. It has both C++ and C
compatible APIs and is extensible to include additional modelled functionality.
A test platform (cpumico32)
is bundled in the package, as is a case study of an embedded Linux system (lnxmico32), based around
the model. An interface for remote debugging via GDB (and, by extension, third party IDEs, such as Eclipse) is
also supplied. The source is free-software, released under the terms of the GNU licence (see LICENCE included in the
package, available on github).
Features
Included Features:
•
All supported core instructions
•
All h/w modelled for configurable instructions
- Multiplier
- Divider
- Sign
extender
- Barrel
shifter
•
Configurable internal memory
•
All h/w debug break- and watchpoints modelled
•
Cycle count functionality
•
Configurable 'hardware', as per the Mico32
•
Run-time and static disassembly
•
Data and Instruction caches for timing model
•
Optional Data and Instruction TLBs
•
Extensibility via callbacks
- Intercept
memory accesses
- Regular
callback with ability for external interrupt generation
- JTAG
register access callback
•
Configurable execution break points
- On
a given address
- After
a single step, or clock tick
- After
a fixed number of cycles
- On
'hardware' debug break point
•
Access to internal Memory
•
Access to internal state
•
Compatible with GNU tool chain (lm32-elf-xx)
•
Both C++ and C linkage interfaces available
•
Separate GDB remote debug interface code is included
-
Supports both virtual serial and TCP socket connections
Features Not Included:
•
JTAG interface model (but callback interface included)
•
User definable instructions
The code is a simple exercise in modelling a RISC based embedded
CPU. It comes with absolutely no warranties for accuracy, or fitness for any
given purpose, and is provided 'as-is'. Hopefully it is useful for someone, and
feel free to extend and enhance the model, and maybe let me know how it's
going.
Simon Southwell (simon@anita-simulator.org.uk)
Cambridge, August 2016
Source files
Listed and described here are those source files that make
up the Mico32 ISS library (e.g. libmico32.so).
These are the source files needed for integration into other C or C++ environments.
The source files for the example executable and test bench program, cpumico32, are not described
in this document (i.e. cpumico32.cpp,
cpumico32_c.c and lm32_get_config.cpp). These
are still freely available for use, under the terms of the GNU public license,
but do not form part of the core functionality of the simulator, and are not
documented.
The main header files comprise those listed below:
•
src/lm32_cpu.h
•
src/lm32_cpu_hdr.h
•
src/lm32_cpu_mico32.h
For integrating the model with external programs only lm32_cpu.h needs be included
in source code that references the API, but this header makes reference to lm32_cpu_hdr.h, which will
need to be in the include path when compiling. The lm32_cpu_mico32.h header is only used by the
internal source files, and includes all the definitions and types needed by
this code. The lm32_cpu.h
header has the major class definition for the model (lm32_cpu), and the other header, lm32_cpu_hdr.h, has all the
definitions for need by external programs using the API.
The following listed files define the methods that belong to
the class lm32_cpu, and
headers specific to those methods. The class methods are split over several
files, but all belong to the single lm32_cpu class.
•
src/lm32_cpu.cpp
•
src/lm32_cpu_inst.cpp
•
src/lm32_cpu_elf.h
•
src/lm32_cpu_elf.cpp
•
src/lm32_cpu_disassembler.cpp
The entry point methods and program flow methods are all
defined in lm32_cpu.cpp,
whereas the instructions themselves have methods defined in lm32_cpu_inst.cpp. There is
almost a one-to-one mapping of LM32 instructions and the instruction methods,
but a couple of methods double up for multiple instructions. The processing of
the ELF program files are handled in methods defined in lm32_cpu_elf.cpp, with its header file as lm32_cpu_elf.h. Code disassembly
is handled by methods in defined in lm32_cpu_disassembler.cpp.
A cache is implemented, for timing modelling purposes, and
is instantiated in the main class for both the data and instruction caches. It
is defined in the following files:
•
src/lm32_cache.h
•
src/lm32_cache.cpp
Basic MMU support is added with models of data and instruction TLBs
(translation lookaside buffers). These are modelled on the TLBs added to the LatticeMico32
processor for the MilkyMist project by M-Labs [7]. The functionality is defined in the following files:
•
src/lm32_tlb.h
•
src/lm32_tlb.cpp
A C linkage interface is provided, for those requiring to
integrate the model into a C environment, and this is defined in the following
files:
•
src/lm32_cpu_c.h
•
src/lm32_cpu_c.cpp
For external programs interfacing to the model over the C
interface, the lm32_cpu_c.h
header must be included in source code making reference to that API, in place
of lm32_cpu.h. The lm32_cpu_hdr.h still needs to
be in the include path, when using the C interface.
Building code
Included in the package is a makefile
to build the code under Linux or Cygwin, and support is also provided for MSVC
2010. Under the UN*X systems, by default (i.e. simply typing 'make') it will build the following:
•
cpumico32
•
libmico32.a
•
libmico32.so
The first is an executable (see man/cpumico32.1) for running simple programs,
particularly the self-test programs provided in the package-see Testing
section below. The next two are a static and dynamic library respectively, and are
the libraries an external program can use to link with the model, choosing the
appropriate one depending whether static or dynamic linking was most
appropriate for the particular application. The API for the libraries, and its
use, is described in the API section below.
The makefile also, by default, builds the code with debug
information (with g++ option
-g) and as
position-independent code (with
option -fPIC-though this
option is not needed in Cygwin). These are defined in the make variable 'COPTS', and can be overridden at the make command line. By
default, the model is 'big endian', just like the Lattice processor. For
variants that have been modified to be 'little endian', the model can be
compiled with a COPTS
value that includes the definition option "-DLM32_LITTLE_ENDIAN".
No code coverage information is included by default, but the
'COVOPTS' make variable
can be set at the command line to add, say, 'gcov' coverage information in the build (e.g. COVOPTS="-coverage").
If 'lcov' is available,
the HTML output can be generated with 'make coverage', after the tests have been executed. The
output is placed in a directory cov_html/.
Support for MSVC 2010 is provided, with a solution file (.sln) in the msvc/ directory, along with
the minimal set of project files to read in to the MSVC 2010 IDE, and compile
and run the model, but if MSBuild.exe is in the PATH under Cygwin, then the makefile has support to build from the command
line with 'make MSVC'. The
MSBuild.exe executable
is part of Microsoft.NET,
and thus can normally be found in a directory (for example, under Cygwin) such
as:
<cdrive_path>/Windows/Microsoft.NET/Framework/v4.0.30319
The <cdrive_path>
is the Cygwin path to the windows disk (most likely /cygdrive/c) and the final directory name will
depend on the particular version of Microsoft NET installed. For 64 bit
machines, a 64 bit version of the executable will be under Framework64.
By default, a make
build for MSVC builds a 'Release' executable, which is placed in the same
directory as for the other builds of the makefile. If a 'Debug' version is required, then the
default can be overridden via the MSVCCONF
make variable-i.e.:
make MSVCCONF="Debug" MSVC
Like the make for UN*X, the MSVC build produces a cpumico32.exe executable, but
only a single library, libmico32.dll.
API
The API to the model is a C++ interface (though a C
interface is provided-see C Linkage Interface section below), that consists
of a single object (of class lm32_cpu,
as defined in lm32_cpu.h)
that has a set of methods for configuring the model, setting control of program
flow, and running executable code. Definitions are provided in lm32_cpu_hdr.h needed to
communicate with some of these methods, and set their parameters. This is all
described in the sections to follow. In summary, the methods are:
lm32_cpu (int verbose,
bool disable_reset_break,
bool disable_lock_break,
bool disable_hw_break,
bool disable_int_break,
bool disassemble_run,
uint32_t num_mem_bytes,
uint32_t
mem_offset,
int mem_wait_states,
uint32_t
entry_point_addr,
uint32_t cfg_word,
FILE* ofp,
lm32_cache_config_t*
p_dcache_cfg,
lm32_cache_config_t*
p_icache_cfg,
uint32_t
disassemble_start)
int lm32_run_program (const
char* elf_fname,
int run_cycles,
int break_addr,
int exec_type,
bool
load_code)
void lm32_register_int_callback (p_lm32_intcallback_t
callback_func)
void lm32_register_ext_mem_callback
(p_lm32_memcallback_t callback_func)
void lm32_register_jtag_mem_callback
(p_lm32_jtagcallback_t callback_func)
void lm32_reset_cpu (void)
void lm32_set_verbosity_level
(int level)
lm32_time_t lm32_get_current_time (void)
void lm32_set_configuration
(uint32_t word)
uint32_t lm32_get_configuration
(void)
lm32_time_t lm32_get_num_instructions
(void)
uint32_t lm32_read_mem (uint32_t
byte_addr, int type)
void lm32_write_mem
(uint32_t byte_addr, uint32_t data, int type, bool dis_cyc_cnt)
void lm32_dump_registers (void)
lm32_state lm32_get_cpu_state (void)
void lm32_set_cpu_state (lm32_state new_state)
void lm32_set_hw_debug_reg
(uint32_t address, int type)
void lm32_set_gp_reg (uint32_t index,
uint32_t value)
Initialisation
The model object is created by instantiating a variable of
type lm32_cpu class, or creating via
'new'. The constructor,
lm32_cpu(), has a set of inputs for the initial
configuration of the model.
verbose
|
type
|
int
|
valid values
|
LM32_VERBOSITY_OFF,
LM32_VERBOSITY_LVL_1
|
default value
|
LM32_VERBOSITY_OFF
|
description
|
Controls level of verbosity. Currently this is either on
or off (i.e. only one level). When on, a disassembled output showing program
flow is sent to the output stream (see 'ofp' description below).
|
disable_reset_break
|
type
|
bool
|
valid values
|
true, false
|
default value
|
true
|
description
|
Controls whether the model will break and return on a
reset exception. If, from a callback function, lm32_reset_cpu() is invoked, emulating a pin reset, a
reset exception is flagged internally to the model, and the exception is
handled if enabled. When this parameter is false, this will cause a break and return from lm32_run_program() with a value
of LM32_RESET_BREAK.
This break occurs whether or not the internal state of the model means the
exception is handled (e.g. if the IE register is set to disable interrupts). This is
useful if the calling program wants to reconfigure the model between resets.
If the parameter is true, the model
does not break on a reset.
|
disable_lock_break
|
type
|
bool
|
valid values
|
true, false
|
default value
|
false
|
description
|
Controls whether the model will break and return on
detection of a program 'lock' condition; i.e. an instruction with a 'jump to
self' characteristic that would lock further program flow. This is useful
when running a program with a definite termination point (while(1);). However, in an
event driven environment, the main thread may have this construct, with the
system simply responding to incoming events, and this feature would be
disabled, with alterative breaking needed to return from the model.
|
disable_hw_break
|
type
|
bool
|
valid values
|
true,
false
|
default value
|
true
|
description
|
Control whether the model will break and return on
detection of a hardware break or watch points (if configured). When enabled,
after a hardware breakpoint or watchpoint is reached, the model will return
control to the calling program. On re-entry to the model, the program flow
will continue from exception point, including calling the exception vector
code, as it would have without the external break.
|
disable_int_break
|
type
|
bool
|
valid values
|
true,
false
|
default value
|
true
|
description
|
Control whether the model will break and return on
detection of an external interrupt event When enabled, after an interrupt is active,
the model will return control to the calling program. On re-entry to the
model, the program flow will continue from exception point, including calling
the exception vector code, as it would have without the external break.
|
disassemble_run
|
type
|
bool
|
valid values
|
true, false
|
default value
|
false
|
description
|
When set 'true',
'running' the program does not execute the program code, but simply runs
through the code linearly generating disassembled output (as if verbose were
set). This turns the model into straight forward disassembler.
|
disassemble_start
|
type
|
uint32_t (#include <cstdint>)
|
valid values
|
0x0 to 0xfffffffc
|
default value
|
0
|
description
|
When verbose mode set, specifies after which cycle the
verbose output will start to be displayed.
|
num_mem_bytes
|
type
|
uint32_t (#include <cstdint>)
|
valid values
|
0x0 to 0xfffffffc
|
default value
|
65536
|
description
|
Define the number of bytes of internally modelled memory. This
value is rounded up to a 4 byte boundary. The internal memory is contiguous
and is read- and writeable. Internal memory does have to be specified, but
this places a requirement on having a registered external memory callback to
handle all accesses to memory (see Callbacks section below). Internal
memory will be masked by a callback that intercepts addresses that overlap
the internal memory space.
|
mem_offset
|
type
|
uint32_t (#include <cstdint>)
|
valid values
|
0x0 to 0xfffffffc
|
default value
|
0
|
description
|
Define the byte address offset for the internal memory, if
used. This value is rounded down to the nearest 4 byte word boundary. Useful
if code to be executed is located in a region away from address 0, but it is
still desired to have it loaded into internal memory. Usually used with entry_point_addr argument
(see below).
|
mem_wait_states
|
type
|
int
|
valid values
|
0 to INT_MAX
|
default value
|
0
|
description
|
Defines the number of wait states to be associated with
read or write accesses to internal memory. This value will be added to the
cycle count for any access to the internal memory over and above any issue or
stall due to the load or store instruction.
|
entry_point_addr
|
type
|
uint32_t (#include <cstdint>)
|
valid values
|
0x0 to 0xfffffffc
|
default value
|
0
|
description
|
Defines the address for the reset PC value. The model normally starts, or resets
to address 0, which is where the default internal memory resides. If code is
compiled for a different location, then the internal memory can be relocated
with mem_offset, and
the reset address specified with this argument, to point to the relocated
internal memory (or a region handled by an external memory callback).
|
cfg_word
|
type
|
uint32_t (#include <cstdint>)
|
valid values
|
0x0 to 0xffffffff
|
default value
|
LM32_DEFAULT_CONFIG
|
description
|
At construction, the value of the CFG register can be set with this value to
enable or disable hardware features. The value is exactly compatible with the
CFG register as
defined in the Lattice Mico32 Processor Reference Manual [1], section
'Control and Status Register'. A mask is applied to the value set, so that
features not supported by the model cannot be enabled-see Introduction
section above for unsupported features.
Some definitions are defined in lm32_cpu_hdr.h that can be ORed together into
the cfg_word, to
enable features:
LM32_MULT_ENABLE
LM32_DIV_ENABLE
LM32_SHIFT_ENABLE
LM32_SEXT_ENABLE
LM32_COUNT_ENABLE
LM32_DCACHE_ENABLE
LM32_ICACHE_ENABLE
LM32_SWDEBUG_ENABLE
LM32_HWDEBUG_ENABLE
LM32_JTAG_ENABLE
LM32_NUM_BP_[0-4]
LM32_NUM_WP_[0-4]
To
configure the number of external interrupts, a value (between 0 and 32) can
be shifted and ORed into the cfg_word. E.g. (32 << LM32_CFG_INT).
|
ofp
|
type
|
FILE* (#include <cstdio>)
|
valid values
|
NULL, <valid
file pointer>
|
default value
|
stdout
|
description
|
A file pointer to direct verbose and debug output data. By
default this is to stdout,
but a valid open writeable file pointer can be specified to direct the
output.
|
p_dcache_cfg
|
type
|
lm32_cache_config_t*
|
valid values
|
NULL, <valid lm32_cache_config_t
pointer>
|
default value
|
NULL
|
description
|
A pointer to a cache configuration structure. This
structure contains values for the configurable parameters of the data cache.
This structure is defined, in lm32_cpu_hdr.h,
as follows:
typedef struct {
uint32_t cache_base_addr;
uint32_t cache_limit;
int cache_num_sets;
int cache_num_ways;
int cache_bytes_per_line;
} lm32_cache_config_t;
Valid values for these parameters are as per the
LatticeMico32 Processor Reference Manual, Chapter 3, table 17. If the
parameter is set to NULL,
then the data cache will use default values (if a data cache is configured).
|
p_icache_cfg
|
type
|
lm32_cache_config_t*
|
valid values
|
NULL, <valid lm32_cache_config_t
pointer>
|
default value
|
NULL
|
description
|
A pointer to a cache configuration structure. This
structure contains values for the configurable parameters of the instruction
cache. See p_dcache_cfg
for details
|
Execution and breakpoints
Once a model object is created, a program can be run via the
lm32_run_program() method. At
its simplest, it is called with a program file name to load and execute, and
run 'forever'. However, other features are controllable on calling to limit the
amount of execution. The call to the method can also return due to other break
events that were specified at initialisation (see Initialisation section above).
A returned value indicates why the lm32_run_program() exited. The parameters to the methods
are described below:
elf_name
|
type
|
const char*
|
valid values
|
string pointer with valid ELF program name, "" (empty string)
|
default value
|
"test.elf"
|
description
|
The name of the ELF program to load and execute. Can be an
empty string if the call is not a load type (e.g. LM32_SINGLE_STEP─see exec_type below).
|
run_cycle
|
type
|
int
|
valid values
|
1 to INT_MAX,
LM32_FOREVER or LM32_ONCE
|
default value
|
LM32_FOREVER
|
description
|
Defines the run cycle count to reach before returning. Any
positive value between 1 and INT_MAX
can be specified, or LM32_FOREVER
to not break on a cycle count, or LM32_ONCE. Note that the timing model (see Timing
Model section below) is such that it can only break on instruction
boundaries. As some instructions take multiple cycles, the cycle count on
returning may be greater than that specified, up to the amount that the last
instruction took to execute. Note also that specifying a run_cycle of a value less
than the model's current cycle count is equivalent to running with a value of
LM32_ONCE (see below
for lm32_get_current_time()
method).
|
break_addr
|
type
|
int
|
valid values
|
0 to 0xfffffffc, LM32_NO_BREAK_ADDR
|
default value
|
LM32_NO_BREAK_ADDR
|
description
|
Specifies a return point when the PC reaches a particular address, or if no
break on address is required (i.e.LM32_NO_BREAK_ADDR). Note that the lower two bits of
the specified address are ignored.
|
exec_type
|
type
|
int
|
valid values
|
LM32_RUN_FROM_RESET, LM32_RUN_CONTINUE,
LM32_RUN_SINGLE_STEP, LM32_RUN_TICK
|
default value
|
LM32_RUN_FROM_RESET
|
description
|
Specifies the action on calling the method externally.
Normally a value of LM32_RUN_FROM_RESET
is specified with an executable file given in 'elf_name', and any break point settings. The program is
loaded into memory, and the CPU reset, starting execution from address 0. For
single stepping the program, LM32_RUN_SINGLE_STEP
is used. In this case the model returns after executing only one instruction.
The run_cycle, break_addr and elf_name parameters are
ignored for this type. A similar type is LM32_RUN_TICK. This advances just a single clock, but
doesn't necessarily execute an instruction; say if the last instruction takes
multiple cycles. It advances time by one clock only, and executes an
instruction only when the tick count and the instruction cycle counts agree.
It is useful if integration with an environment that has a timing model that
advances by single clock ticks. The LM32_RUN_CONTINUE is used to continue execution from
the point at which the method last returned. The break parameters are active
in this call type, but elf_name
is ignored, and no program is loaded.
|
load_code
|
type
|
bool
|
valid values
|
true, false
|
default value
|
false
|
description
|
Specifies to load the program specified by the elf_name argument into
memory. The program will be load to memory, in all events, when exec_type is set to LM32_RUN_FROM_RESET, but
this parameter can be used to re-load, or replace a program if the lm32_run_program() call
returned on some break point.
|
Return Value
The lm32_run_program()
method returns one of several values to indicate why it returned.
·
LM32_USER_BREAK:
returned having reached the user specified break address, set at configuration
or passed as a parameter when calling lm32_run_program().
·
LM32_SINGLE_STEP_BREAK:
returned whilst single stepping (i.e. exec_type argument was set as LM32_RUN_SINGLE_STEP when lm32_run_program() called
·
LM32_TICK_BREAK:
returned whilst executing with an exec_type
of LM32_RUN_TICK
·
LM32_RESET_BREAK:
returned if the model was externally reset via lm32_reset_cpu()
(and reset breaking was not disabled)
·
LM32_LOCK_BREAK:
Reached a program 'lock' condition (and lock breaking was not disabled)
·
LM32_DISASSEMBLE_BREAK:
reached the end of the program during disassemble mode (see "Disassembled
Output" section below)
·
LM32_HW_BREAKPOINT_BREAK†:
a hardware breakpoint fired, when breaking on hardware debug events were
enabled.
·
LM32_HW_WATCHPOINT_BREAK†:
a hardware watchpoint fired, when breaking on hardware debug events were
enabled.
·
LM32_INT_BREAK:
an external interrupt was active, when breaking on interrupt events were
enabled.
·
LM32_BUS_ERROR_BREAK†:
an instruction or data bus exception fired.
·
LM32_DIV_ZERO_BREAK†:
a divide-by-zero exception fired.
†
Note: the method will return with
these values regardless of whether interrupts are enabled in the IE register. This
allows for non-intrusive debugging. The calling function must process the
returned value and decide what the appropriate action is. The state of the IE
register is available to the calling routine via the lm32_get_cpu_state() method, if required. As the external interrupts are
level sensitive, this may cause the method to return for each executed instruction
that the interrupt(s) are asserted.
Run-time configuration and status
Some methods are provided for inspecting status and setting
configuration at run-time, i.e. after lm32_run_program() has returned.
lm32_set_verbosity(int
level)
|
description
|
Allows the verbosity level to be changed. Its single
parameter has valid values as verbose of the lm32_cpu constructor. This is useful for debugging of
long programs, which would generate a large output if verbosity specified
from time 0. A break can be set up to return at a known point before the area
of interest, and verbosity increased before continuing
|
lm32_get_current_time()
|
description
|
Returns the current internal cycle count of the model.
This counts upward from 0 for all execution, and is not the CC value which can be
reset. This is useful if wanting a break point on a cycle count relative to
current time, rather than an absolute value.
|
lm32_get_num_intructions()
|
description
|
Returns the current count of instructions executed since
time 0. Useful for statistical analysis and performance measurements.
|
lm32_get_configuration()
|
description
|
Returns the value of the CFG register as a uint32_t. It allows the external inspection of
'implemented' hardware. Since any configuration value is masked (see lm32_cpu() description
above, and lm32_set_configuration()
description below), then this give a definitive value as being used by the
model.
|
lm32_set_configuration(uint32_t
word)
|
description
|
Sets the value of the CFG register, masked to allow only supported features
to be enabled. This ability to dynamically update the hardware configuration
is not really supported in the Mico32, but it is useful for testing the
model. The test suite (see Testing section below) run on cpumico32 reconfigures the
model via the memory callback function to test that removing the features
disables them as far as the program is concerned, and the proper response is
seen.
|
Reset event
lm32_reset_cpu()
|
description
|
Method used to model asserting the reset pin of the
Mico32. It generates and internal event and is processed accordingly, as for
the real processor.
|
Callbacks
The ISS is a model of a processor core, and its main usage
is as a component in a larger system level model. It has an internal memory
model for convenience and to aid stand alone testing, but it is via the
callbacks that the model can be extended or integrated into a system model of
arbitrary complexity. The model supports three types of user defined callbacks
that can be registered with the model. One is for calling at each memory access
that the CPU performs, the second is called at regular intervals (from once
each instruction boundary, to as long as specified in a wait or sleep period).
The third is to allow a JTAG interface to be implemented as an add-on, and is
invoked whenever the model accesses the JTX or JRX
registers.
The main extension is to map peripherals (including more
memory, if desired) into the memory space via the external memory callback
function, trapping accesses to addresses with memory mapped peripheral
registers and implementing the functionality.
To take a trivial example, to model a system that has a
serial output then the external memory callback mechanism would seem to be what
you need. It is there to allow modelling of hardware that is external to the
lm32 model itself. By registering a callback function with the method lm32_register_ext_mem_callback(),
then all accesses to anywhere in the memory space by the model will call this
registered function first, with the address and any write data value (assuming
a write). Taking an example, if you modelled a serial port with an output
register located at, say, 0x80000000
or wherever, then the callback function can test for this address, and print a
character to the screen of the value of the lowest byte of the data word passed
in, returning a wait-state count for the access (any value of 0 or more). For
all other addresses the callback function returns LM32_EXT_MEM_NOT_PROCESSED to allow the model to
handle the access. This concept can be extended to model any hardware with a
memory mapped register set, accessible by the processor. You can try this
mechanism for yourself by registering your own external memory callback
function and have it print out all of the memory read and write accesses it
sees (returning LM32_EXT_MEM_NOT_PROCESSED
so that the model can continue). You'll see it accessing all the program
locations and any other locations it is trying to access.
For functionality that isn't a memory mapped register, but
can generate an interrupt, or requires updating on CPU time, then the interrupt
callback can be used. This can be called up to every instruction, or at longer
intervals, and it can generate an interrupt upon returning (or not). Taking the
serial output port as an example, if it is required that the peripheral
generates an interrupt some cycles after a byte is transmitted, then, after a
sufficient number of calls to the int callback, from the register write (via
the external memory callback), the callback can return a value, indicating an
interrupt on one of the interrupt pins.
Note that these two callbacks are shared amongst all
modelled peripherals, and the callbacks will have to do initial decoding. For
the external memory accesses, this is just decoding incoming addresses, perhaps
doing a page decode to identify a particular peripheral, and then calling a
model function which does the rest of the decode. For the interrupt callback,
then it is envisaged that all peripheral models are called at every cycle,
amalgamating any interrupts returned from the peripheral model functions.
A special callback is implemented to handle JTAG accesses,
via the JTX and JRX registers. This is simply
a hook for functionality that isn't implemented in the model, should anyone
wish to add support for this.
The three callback registration functions are described
below:
lm32_register_int_callback (p_lm32_intcallback_t callback_func)
|
description
|
The caller must provide as the parameter input, a pointer
to a function of type p_lm32_intcallback_t,
e.g.:
uint32_t cb_func(lm32_time_t time,
lm32_time_t* wakeup_time);
If a function is registered, the
model will call this function at least once at time 0. The user function
receives the current time in the 'time' parameter. Before returning the function must
update the contents of the integer pointed to by 'wakeup_time' to indicate the cycle it next
wishes to be called. A value
of 0 or less than or equal to the 'time' parameter will mean it is called after the next
instruction. If the callback function wishes to delay being called to a
future time, then the wakeup_time
is specified for some value greater than 'time'. A negative value returned informs the model that
a request to terminate is requested, and a user breakpoint is generated.
The return value from the
callback is the pattern of inputs to the external interrupt pins (up to 32).
For configured input pins, the event is set in the IP register of the model, and will generate an
interrupt if the model has these enabled (IE and IM
settings). If the CFG
register is not configured for 32 interrupts, then bits set for all
non-configured inputs are ignored, and the IP register bit is not set.
|
lm32_register_ext_mem_callback (p_lm32_memcallback_t callback_func)
|
description
|
The caller must provide, as the input parameter, a pointer
to a function of type p_lm32_memcallback_t
e.g.:
int
cb_func(uint32_t byte_addr, uint32_t *data, int type,
int
cache_hit, lm32_time_t time);
If a function is registered the model will call this for every memory access
made (i.e. via load and store instructions). The address being called is
passed in as 'byte_addr', and data is exchanged via the 32 bit
word pointed to by 'data'.
On write access types this contains the data to be written. It has meaning to
update this on write accesses, as the model will ignore it, but it is safe to
do so. On read access types, the data to be returned is placed into the
integer pointed to by data. The type
parameter takes on one of several values, to indicate the direction and size
of access, as shown below.
LM32_MEM_WR_ACCESS_BYTE
LM32_MEM_WR_ACCESS_HWORD
LM32_MEM_WR_ACCESS_WORD
LM32_MEM_WR_ACCESS_INSTR
LM32_MEM_RD_ACCESS_BYTE
LM32_MEM_RD_ACCESS_HWORD
LM32_MEM_RD_ACCESS_WORD
LM32_MEM_RD_INSTR
The cache_hit
parameter is a flag which indicates whether the memory access is a true
access to memory (when zero), or whether the cache is fetching data on a
cache hit (when non-zero). The cache model does not store actual data
internally, but only keeps track of which addresses are cached, and still
fetches data from memory. This flag allows the external callback functions to
differentiate between access types if, say, it is keeping statistics on
access rate, location etc.
The callback function can intercept any of the memory
accesses to update its own internal state, and thus model memory mapped
external blocks. If the callback processed the memory access it must return a
cycle count from 0 to INT_MAX
to indicate the number of wait states that the model must add to its internal
cycle count. If the callback is modelling a single cycle access, then 0 would
be returned (no wait-states). If the callback is modelling more complex
peripherals (e.g. with resource sharing and arbitration) with wait states
generated, it would return a positive integer to indicate the number of wait state
cycles elapsed. If the supplied address did not access any portion of memory
covered by the callback function, or the access was invalid for some reason
(misaligned, say), the callback must return LM32_EXT_MEM_NOT_PROCESSED, to allow the model
to attempt an access to its internal memory. Honouring this requirement is
important for correct timing operation, and the timing modelling is only as
good as the values returned by the callback.
Note that for cache hit accesses, (cache_hit
non-zero) the returned number can be any value zero or greater (but not LM32_EXT_MEM_NOT_PROCESSED,
if in a mapped region) as the timing model uses a number based on the timing
for a cache access.
|
lm32_register_jtag_callback (p_lm32_jtagcallback_t callback_func)
|
description
|
The caller must provide, as the single input parameter, a
pointer to a function of type p_lm32_jtagcallback_t,
e.g.:
void cb_func(uint32_t *data, int type,
lm32_time_t time);
If a callback function is
registered, the model will execute this each time the model
accesses the JTX or JRX
registers (via the rcsr or wcsr instructions). One of three types is
passed in:
LM32_JTX_WR
LM32_JTX_RD
LM32_JRX_RD
There is no JRX
write type as this has no function, and the model ignores the instruction
internally. When the type is a write, the pointer '*data' points to the data byte to be written.
On read types, the returned data is placed in to the location pointed to by *data. For both JTX and JRX, bit 8 should contain
the 'full' status of the TX or RX register.
By default, the model has JTAG as an unimplemented
featured. This can be enabled or disabled via lm32_set_configuration(), or set at the model object's
construction, but a side-effect of registering a JTAG callback function is
that the feature is enabled, and the bit set in the CFG register automatically.
|
Internal memory access
The API provides direct access to the model's internal
memory, via two methods. Using these methods will also invoke any external
memory callback function, and so can be used to peek and poke memory areas implemented
externally via the callback.
lm32_read_mem(uint32_t
byte_addr, int type)
|
description
|
Returns a 32 bit word with the value at address specified
by 'byte_addr'. Valid
types are LM32_MEM_RD_xx
types as specified for the memory callback (see above). Any other type will
cause a fatal error.
|
lm32_write_mem(uint32_t
byte_addr, uint32_t data, int type, bool dbl_cyc_cnt)
|
description
|
Writes the data in 'data' to the address specified in 'byte_addr'. Valid types are
LM32_MEM_WR_xx types
as specified for the memory callback (see above). Any other type will cause a
fatal error.
By default, the cycle count is advanced whenever this
function is called, depending on other settings. This can be disabled with
the optional third argument for use when, say, loading binary program data
using this function. This argument is not available in the C interface
function.
|
There are no safe guards on the calling of these memory
access functions, and all internal memory is accessible from them. However, accessing
invalid areas of memory will cause fatal exceptions. Use with caution.
Internal state access
Three methods are provided to give access to internal
register state of the model, either to the output stream, or returned to the
calling program.
lm32_dump_registers()
|
description
|
will print out the complete set of internal registers
states to the output stream (as defined by the ofp
parameter of the constructor), formatted as shown in the example below:
r00 = 0x00000000 r01 = 0x00003303 r02 =
0x00000000 r03 = 0x00000000
r04 = 0x00000000 r05 = 0x00000000 r06 =
0x00000000 r07 = 0x00000000
r08 = 0x00000000 r09 = 0x00000000 r10 =
0x00000000 r11 = 0x00000000
r12 = 0x00000003 r13 = 0x00000000 r14 =
0x00000000 r15 = 0x00000000
r16 = 0x00000000 r17 = 0x00000000 r18 =
0x00000000 r19 = 0x00000000
r20 = 0x00000000 r21 = 0x00000000 r22 =
0x00000000 r23 = 0x00000000
r24 = 0x0000900d r25 = 0x0000fffc gp =
0x00000000 fp = 0x00000000
sp = 0x0000fff0 ra = 0x00000000 ea =
0x00000000 ba = 0x00000578
pc = 0x000005ac ie = 0x00000005 ip =
0x00000000 im = 0x00000000
icc = 0x00000000 dcc = 0x00000000 cfg =
0x01120037 cfg2 = 0x00000000
cc = 0x000005bb eba = 0x00000000
bp0 = 0x00000241 bp1 = 0x00000251 bp2 =
0x00000261 bp3 = 0x00000271
wp0 = 0x00003000 wp1 = 0x00003101 wp2 =
0x00003202 wp3 = 0x00003303
dc = 0x000000fc deba = 0x00000000
|
lm32_get_cpu_state()
|
description
|
Returns a class lm32_cpu::lm32_state containing a complete set of the
register values, plus some other persistent internal state needed by the
model. All register fields are of type uint32_t: one for each register. See source file lm32_cpu.h for the details
of the class definition.
|
lm32_set_cpu_state()
|
description
|
Sets the internal model state to that passed into the
function, of type lm32_cpu::lm32_state.
This contains all the CPU registers, plus some other internal state, needed
by the model. This is intended for use in save and restore operations, rather
than as a means to update the models internal state externally. See source
file lm32_cpu.h for
the details of the class definition.
|
lm32_set_hw_debug_reg(uint32_t
addr, int type)
|
description
|
Allows the external setting of the h/w debug registers.
The address to be written to the register is passed in on the 'addr' parameter, and the
register to access is defined via the 'type'. This can take on one of the following values.
LM32_CSR_ID_BP0
LM32_CSR_ID_BP1
LM32_CSR_ID_BP2
LM32_CSR_ID_BP3
LM32_CSR_ID_WP0
LM32_CSR_ID_WP1
LM32_CSR_ID_WP2
LM32_CSR_ID_WP3
If the model is configured to have less than the full
complement of break- or watch points, then attempting to set the registers
via this method will have no effect. Note also, that the breakpoint 'addr' value must include
the enable bit, exactly as defined for the BPx registers in the reference manual [1], but the
watchpoint 'addr' is
a pure 32 bit byte address so, in addition to the basic LM32_CSR_ID_WPx value for the
'type', to define
which watchpoint is updated, one of four settings must be 'ORed' in with the
value to make up the complete watchpoint access type:
LM32_WP_DISABLED
LM32_WP_BREAK_ON_READ
LM32_WP_BREAK_IN_WRITE
LM32_WP_BREAK_ALWAYS
The method updates the relevant Cn fields of the DC register, based on these type values.
|
lm32_set_gp_reg(uint32_t
idx, uint32_t value)
|
description
|
Allows the external setting of the 32 general purpose
registers. The GP
register to be written to is passed in on the 'idx' parameter, and the value to set is passed in on
the 'value' parameter.
This function is meant for debug, and initialisation.
Caution should be used if setting the registers externally, whilst code is
executing.
|
C Linkage Interface
A C linkage API is provided as an alternative to the C++
interface, for those who have a C environment that they wish to integrate the
model into. The API is purposely as similar to the C++ API as possible, as it
has been described above. There is a one-to-one correspondence to the C++
methods, with all the C API functions called lm32c_<c++ equivalent suffix>, and each has an
additional parameter, except for the initialisation function, as explained
below.
The constructor is replaced with a function lm32c_cpu_init(). The parameters are the same as for the C++ constructor, only
with the Boolean types now defined to be of type 'int'. Thus their default values are no longer 'true' or 'false', but the API defines
values TRUE and FALSE, which replace these.
The function returns a handle to a unique object, of type lm32c_hdl, which must be
saved, as all the other API functions require it to access the initialised model's
instantiation. This allows for multiple instantiations of the model.
All the other C++ methods have an lm32c_xxxx equivalent, with identical parameters,
except that a new first parameter must be given, which is the handle returned
by lm32c_cpu_init(). For
example, if the handle has been saved in a variable lm32Hdl of type lm32c_hdl, then a reset call is now lm32c_reset_cpu(lm32Hdl), in
place of the C++ method lm32_reset_cpu(),
described in the above sections. The lm32c_get_cpu_state() function returns a structure of type lm32c_state, rather than the
class lm32_state, but
the field names are identical. The full list of C linkage functions is thus:
- lm32c_cpu_init()
- lm32c_run_program()
- lm32c_reset_cpu()
- lm32c_register_int_callback()
- lm32c_register_ext_mem_callback()
- lm32c_register_jtag_callback()
- lm32c_set_verbosity_level()
- lm32c_get_current_time()
- lm32c_set_configuration()
- lm32c_get_configuration()
- lm32c_get_num_instructions()
- lm32c_read_mem()
- lm32c_write_mem()
- lm32c_dump_registers()
- lm32c_set_hw_debug_reg()
- lm32c_get_cpu_state()
The API is defined in the source file header lm32_cpu_c.h, which must be
included in code using the C API.
Disassembled Output
The model can output (to 'ofp') fully disassembled output in one of two ways. Either
the disassembled output can show program flow, during a normal execution of code
on the model, or it can simply display a disassembled output of the specified
ELF executable file.
Normal execution flow disassembly is instigated either by
setting the constructor's 'verbose' parameter,
or by calling lm32_set_verbosity_level(). When verbosity is enabled the output
looks something like the example fragment shown below:
0x01ac: (0x2b9f0034) lw ba, (sp +00052) @433
0x01b0: (0x379c0038) addi sp, sp, 000056 @434
0x01b4: (0xc3e00000) b ba @435
*
0x0304: (0x5e8c00a8) bne r20, r12, 0000672 @439
0x0308: (0x9a94a000) xor r20, r20, r20 @440
0x030c: (0x38013101) ori r1, r0, 0x3101 @441
0x0310: (0x30220000) sb (r1 +00000), r2 @442
0x0314: (0x5e8000a4) bne r20, r0, 0000656 @443
0x0318: (0x38010144) ori r1, r0, 0x0144 @444
0x031c: (0xd1010000) wcsr DC , r1 @445
0x0320: (0x9a94a000) xor r20, r20, r20 @446
0x0324: (0x38013101) ori r1, r0, 0x3101 @447
The first field displays address of the instruction (i.e.
the value of the PC register). When
the program flow is disrupted (due to a branch, call, or exception), this field
shows '*', and the rest
of the line is left blank, to ease finding the jumps when debugging. Field 2
gives the raw instruction value being executed followed by the actual
disassembled instruction in field 3. The current cycle count is displayed in
the last field. This mostly increases by one, but for instructions that take
more cycles to execute, or are stalled etc., the value jumps by a larger
amount. In the example above the branch instruction takes 4 cycles to issue,
and so the cycle count jumps from 435 to 439.
The pure disassembled output is specified by setting the disassemble_run parameter of
the constructor, and has nearly identical output to that of the run-time
output.
0x02f8: (0x9a94a000) xor r20, r20, r20
0x02fc: (0x38013101) ori r1, r0, 0x3101
0x0300: (0x10220000) lb r2, (r1 +00000)
0x0304: (0x5e8c00a8) bne r20, r12, 0000672
0x0308: (0x9a94a000) xor r20, r20, r20
0x030c: (0x38013101) ori r1, r0, 0x3101
0x0310: (0x30220000) sb (r1 +00000), r2
0x0314: (0x5e8000a4) bne r20, r0, 0000656
0x0318: (0x38010144) ori r1, r0, 0x0144
0x031c: (0xd1010000) wcsr DC , r1
0x0320: (0x9a94a000) xor r20, r20, r20
0x0324: (0x38013101) ori r1, r0, 0x3101
The main difference here is that there is no cycle count (as
this has no meaning in this context), and there will be no breaks in address as
the disassembling runs from the lowest to the highest address (of text areas) linearly.
Timing Model
The ISS makes an approximation of time using the issue
cycles and result cycles associated with each instruction, as defined in the
LatticeMico32 reference manual [1]. Each instruction executed will advance the
cycle count by at least its issue cycles, as the next instruction cannot be
executed before this time. In addition, if it updates a register, then the
result cycles value plus the current cycle count is stored for the target
register. This is the earliest time that a future instruction can access this
register. When an instruction is executed, its source registers (RY and, if applicable, RZ) have their availability times
checked, and the cycle count is advanced to the time of the latest register's
availability. This timing model does not take into account branch prediction,
and uses the issue cycle numbers for 'taken' and 'not taken' unmodified, as
defined in the reference manual [1]. The internal cycle count is also used as
the basis for the CC
register value. Since this register can be changed by software, but the cycle
count needs to run continuously, the CC value is emulated by keeping a track of the offset
from cycle count and the last programmed value, such that a read of the CC register will be correct,
whilst still being based on the internal cycle count. This means only a single
source is used for all timing.
The model can be advanced by single cycles, as well as
single instructions, to allow the model to be called at a regular clock tick
count. At each instruction the cycle count is advanced by one or more,
depending on the instruction. At each clock 'tick', the clock time is advanced
by one clock cycle. Only when the clock tick count matches the instruction
cycle count is the next instruction executed. This is also useful when multiple
instances of the model are instantiated, as they can be kept in
synchronisation, by calling with a clock 'tick' rather than single-stepping,
and their internal sense of time will advance at the same rate and remain
locked, with just minor differences due to instruction execution granularity.
Caches
The model implements configurable caches for the data and
instruction fetches. The cache models are for timing purposes only, and do not
actually store data within them, but keep a record of which addresses are
cached. If an access to a cached region of memory is made, whether to internal
memory or to a region mapped by an external memory access callback function,
the data is still accessed as normal, but the reported timing wait states are
ignored if a cache hit, and single cycle accesses of the cache are used to update
time instead. If a cache miss, however, the memory access wait states (if any)
are scaled by the number of words accesses required to fill the cache line, as
these would have been fetched by the cache. When no cache is configured, the
wait states from memory callbacks or internal memory accesses are used
unmodified to update time.
By default, the caches are disabled (i.e. the configuration
is for no caches implemented). To enable caching, the configuration register (CFG) must be modified at
instantiation or via the lm32_set_configuration()
method (see API section above).
Source Code Architecture
It is not the intention to go into minute detail for the
internal architecture of the model here, but a brief overview of the main
program flow, internal state, and major structures is in order, to allow
anyone wishing to understand or modify the code enough of a handle, that they
can explore the details on their own.
Main execution flow
Below is shown some pseudo-code of the main program flow
when executing a program. The main lm32_cpu class member functions are shown as "funcname()", and the
phrases between "<"
and ">"
describe local functionality. The indentation of the pseudo-code shows the
calling hierarchy as implemented in the code.
lm32_run_program()
<if running from reset...>
<load ELF program to memory>
<while no break point reached...>
execute_instruction()
process_exception()
<process external interrupts>
<if master interrupts enabled...>
interrupt()
<if interrupt outstanding...>
<generate exception>
<fetch opcode from PC location in memory>
<lookup decode_table information using
opcode>
<extract argument fields from opcode>
<if verbose or disassemble_run...>
disassemble()
<if not disassemble_run...>
<lookup instruction function in tbl_p>
<execute instruction function using
decode_table lookup data>
<check for break points and flag>
The above pseudo-code is a rough outline only, and doesn't
show callback handling, memory accesses, disassembling or instruction execution
(though this last is described below, in the Disassembled Output section).
Key Model State
The list below show some of the major state used in the
model.
•
state: Contains
all the CPU's modelled registers, e.g. r[32], pc,
im etc. There is a
field corresponding to each register in the Mico32 CPU, including debug
registers. It is of type lm32_cpu::lm32_state.
It also carries other persistent state, used by the model, that will need saving
for save and restore operations
o
state.int_flags:
bitmapped value indicating pending exception. Each of the bits, from bit 0 to
bit 7, corresponds to the exception ID as defined in the reference manual [1].
This is part of the state
structure.
o
state.cycle_count:
number of executed cycles since time 0. Note that this is not the number of
instructions executed. Instructions that take multiple cycles, increment this
count by more than 1. This is part of the state structure.
•
mem: pointer to
internal memory. This can be NULL
if none defined, and all memory handled by callback functions.
•
mem_tag: pointer
to internal memory tag that contains debug tag data to mark the access types
for internal memory locations. Can be NULL-see mem above.
•
rt: CPU general
purpose registers' next availability times. See the Instruction Functions
section below for details of usage.
•
tbl_p: pointer
to table of instruction function pointers. See the Decode Table section below
for more details.
•
decode_table:
table of instruction decode information. See the Decode Table section below
for more details.
Decode Table
At the heart of the execution of the model is a decode table
used for quick lookup of decode information for a given instruction's opcode.
The decode table consists of 64 entries with the following structure type:
struct lm32_decode_table_t {
const char* instr_name;
unsigned instr_fmt;
lm32_time_t result_cycles;
lm32_time_t issue_cycles;
lm32_time_t issue_not_takencycles;
unsigned signed_imm;}
It is a constant table, and held in the global decode_table variable,
initialised at compilation. The instr_name
field is a string for disassembly purposes, whilst the instr_fmt gives information as to the
instruction format for that opcode. There are slightly more formats than that
defined in the reference manual [1], as quirks of some instructions need
uniquely identifying. The definitions in lm32_cpu_mico32.h prefixed "LM32_FMT_" give all the possible values. The
three time based fields, correspond to the values of cycle taken for each
instruction as defined by the reference manual [1], with an issue count a
results count and (if a decision branch) a not taken issue count. The signed_imm field indicates
whether any immediate bits of the instruction are signed or not. For instructions
with no immediate value, this is a "don't care". An example initialisation for
a table entry, for the sextb
instruction is shown below:
{"sextb ", INSTR_FMT_RC,
1, 1, 0, INSTR_SE_DONT_CARE}
The table is used during execution of instructions. During
decode, a structure is used for constructing decode information, as shown
below.
struct lm32_decode_t {
uint32_t opcode;
uint32_t reg0_csr
uint32_t reg1;
uint32_t reg2;
uint32_t imm;
const lm32_decode_table_t* decode; }
This structure is like a form that is filled in as the
instruction is processed. The 'opcode'
field is set with the raw fetched instruction value, and then the fields are
separated into the regXX
and imm fields,
depending on the instruction type. The type is derived from the last field
which is a pointer to an entry in the decode_table, described above. During decode the opcode
is used to fetch the decode_table
location for the instruction, and the pointer to the entry is stored in the
decode field. It is a pointer to this structure that is ultimately passed in to
the instruction functions for use in execution the instruction functionality.
The tbl_p
pointer of the lm32_cpu
class points to a table of 64 entries, corresponding to the 64 opcodes, and
contains pointers to functions that will implement that opcode's function. The
table's type is an lm32_func_table
class, with an array of pointers of type pFunc_t. This corresponds to a member function of lm32_class, with a form
"void lm32_<instr_name>
(p_lm32_decode_t p)", with the sole argument being a pointer to an
object of type lm32_decode_t,
as shown above. The table is constructed an initialised in the constructor of
the lm32_cpu class.
Instruction Functions
The actual instruction execution functions are defined in
the source file lm32_cpu_inst.cpp,
and all have a similar basic format. An example is shown below for the byte
sign extend instruction (sextb).
void lm32_cpu::lm32_sextb (p_lm32_decode_t p) {
if (state.cfg & (1 << LM32_CFG_X)) {
cycle_count += calc_stall(p->reg0_csr,
NULL_REG_IDX, cycle_count);
int32_t ry = SIGN_EXT8(state.r[p->reg0_csr]
& BYTE_MASK);
state.r[p->reg2] = ry;
rt[p->reg2] = cycle_count +
p->decode->result_cycles;
state.pc += 4;
cycle_count += p->decode->issue_cycles;
} else
lm32_rsrvd(p);
}
The function is passed in a pointer to the decode
information looked up in execute(), and filled in with extracted argument
fields (e.g. rx, rz indices, or immediate
values etc.). As sign extension is an optional feature, the CFG register state (state.cfg) is inspected, and
if sign extension is not implemented, then the lm32_rsrvd() instruction function is called instead. Not
all instructions are optionally implemented, and these instruction's functions
don't have a test like this.
If it is implemented, then the first job is to see if any source
registers are stalled. In this case there is only one source register (indexed
by p->reg0_csr), and
calc_stall() is called
that returns a number representing any number of cycles to wait before that
source register is available. The 'rt' table has a list of cycle counts indicating when each
of the 32 general purpose registers are next available. Any source register for
an instruction that has an 'rt'
entry in the future (relative to cycle_count)
generates a wait state count that is the difference between cycle_count and the 'rt' entry for the register.
In the case of instructions with two source registers, the larger of the two
calculated wait cycles is returned. This is added to the current cycle_count to effectively
delay execution of the instruction.
The value of the register indexed is retrieved from state
and signed extended, as required for this instruction's function, into a
variable ry. The destination
register, indexed by p->reg2,
is updated with the ry
value, and then the 'rt'
table entry for the indexed destination register is updated to contain the
cycle time that it will next be available. This is the current cycle_count (with stalling
already added), plus the result_cycles
for this particular executed instruction, as defined in the decode_table entry passed
into the function.
The PC
is incremented to the next instruction (for branches this might be to a
different address), and the cycle_count
incremented by the value of the issue_cycles
for the instruction, as defined in the decode_table entry, that was passed into the function via
the 'p' pointer.
The 64 opcodes all have an entry in the tbl_p table, and point to a
function like that in the above example.
Testing
Test Platform
As has been mentioned above, an executable environment, cpumico32, is constructed
that instantiates the lm32 model, and provides sufficient control and
facilities to allow the model to be fully tested. This includes a command line
control interface for configuring the model and testing, as well as a set of
callback functions to allow testing of such things as interrupts etc.
Detailed discussion of the code is not undertaken here-the
code is not complicated, and inspection of the source should be sufficient-but
a brief description of the program's usage is given. The usage message for cpumico32 is as follows:
Usage: cpumico32 [-h] [-g] [-t] [-G <port #>] [-v]
[-x] [-d] [-D] [-I]
[-n <num>] [-b <addr>] [-r
<addr>]
[-R <num>] [-f <filename>] [-m
<num>] [-o < addr>] [-e <addr>]
[-l <filename>] [-c <num>] [-w
<wait states>] [-i <filename>] [-T]
-h Display this help message
-g Start up in GDB remote debug mode (default: off)
-t Specify TCP socket connection for GDB remote debug
(default: COM/pty connection)
-G Specify COM port to use for GDB remote debug
(default: 6)
-n Specify number of instructions to run (default:
run forever)
-b Specify address for breakpoint (default: none)
-f Specify executable ELF file (default: test.elf)
-l Specify log file output (default: stdout)
-m Specify size of internal memory in bytes (default:
65536)
-o Internal memory offset (default 0x00000000)
-e specify an entry point address (default
0x00000000)
-v Specify verbose output (default: off)
-x Enable disassemble mode (default: disabled)
-d Disable breaking on lock condition (default:
enabled)
-H Dump opcode statistics on termination (default: no dump)
-r Address to dump value from internal ram after
completion (default: no dump)
-R Number of bytes to dump from RAM if -r specified
(default 4)
-D Dump registers after completion (default: no dump)
-I Dump number of instructions executed (default: no
dump)
-c Set configuration word value to enable/disable
features
-w Set the number of wait states for internal memory
(default 0)
-i Specify a .ini filename to use for configuration
(default none)
-T Enable internal callback functions for test
(default disabled)
For the most part, the command line options map directly to
configuration options of the model's constructor, or configuration methods. The
options -m, -o, -e, -v, -x,
-d, -c, and -w all get mapped to the
constructor's inputs unmodified. The -l option specifies an executable filename, which cpumico32 opens for writing,
and passes the file pointer to the constructor. The option -g (plus modifiers -G and -t) runs the program in GDB
debug mode, which is discussed in the GDB Interface section.
The options -f,
-n, -b map to the first three
arguments of the lm32_run_program()
method (elf_name, run_cycles and break_addr respectively). The
exec_type and load_code inputs are
controlled internally by the program, with the exec_type value defaulting to LM32_RUN_FROM_RESET, but this can be overwritten
by a test program to change its type, via the memory callback function. The
program always loads the specified ELF program, but controls loading of the
code in case of a break, where lm32_run_program()
will be re-entered, but the code does not need reloading.
The options
-H,
-r,
-R, -D and -I all control the post-run
calling of debug data dumping. The option
-H
generates a histogram dump of the number that each of the sixty four opcodes have been executed.
The -r and -R
has the program dump memory values from internal RAM, via the lm32_read_mem() method, specifying
the start address and the number of bytes (always rounded to a whole word). The
-D option causes a call
to the lm32_dump_registers()
method, and -I a call to
the lm32_get_num_instructions()
method.
As mentioned before, the cpumico32 program has internal callback functions for the
three callback that can be registered with the model. These are specific to
testing the model, and can have side effects if non-test code is run.
Therefore, by default, they are not enabled. when testing, a -T option is specified to
enable them.
All of the above configuration command line options, and
additional configurations, can be set by using a .ini configuration file. The -i option is used to specify the .ini file to use. Values
specified in this configuration file override the default values but, in turn,
can be overridden by the command line option, allowing a mix of methods, and
final command line control. The default test .ini file, used by model testing,
is show below:
;
; INI file used for test. DO NOT EDIT!
;
[program]
filename=test.elf
entry_point_addr=0
[configuration]
cfg_word=0x11203f7
[debug]
log_fname=stdout
test_mode=false
verbose=false
ram_dump_addr=-1
ram_dump_bytes=0
dump_registers=false
dump_num_exec_instr=false
disassemble_run=false
[breakpoints]
user_break_addr=-1
num_run_instructions=-1
disable_reset_break=false
disable_hw_break=false
disable_lock_break=false
[memory]
mem_size=65536
mem_wait_states=0
mem_offset=0
[dcache]
cache_base_addr=0
cache_limit=0x0fffffff
cache_num_sets=512
cache_num_ways=2
cache_bytes_per_line=4
[icache]
cache_base_addr=0
cache_limit=0x7fffffff
cache_num_sets=1024
cache_num_ways=2
cache_bytes_per_line=4
The first five sections should be fairly self-explanatory,
and map to command line options. The two cache sections allow control of the
cache configurations that are passed into the model's constructor, which have
no command line equivalent, and so can only be modified from default settings
with a .ini
configuration file.
Callback Functionality
The cpumico32
program implements and registers three callbacks with the model in order to
allow full coverage in testing the model. It provides a means to generate
external interrupts, with a time to fire them, a means to alter the
configuration register to dynamically enable or disable hardware features, a
means to reset the model, changing the execution type as it does so, and to
test JTAG accesses.
These controls (except the JTAG) are implemented by memory
mapping 'registers' from location 0x20000000,
implemented in the memory callback function, with offsets defined in the source
code. An interrupt pattern can be written at offset COMMS_INT_PATTERN_OFFSET, along with a time
(relative to current time) at COMMS_TIME_OFFSET.
The interrupt callback function, when called will generate an interrupt if any
of the pattern bits are set, after the time set by a write to the COMMS_TIME_OFFSET register.
The individual bits of the configuration register (CFG) can be written to via
the next set of locally defined offsets (COMMS_NUM_INT_OFFSET to COMMS_WDOG_EN_OFFSET).
Note that when reading these locations returns the whole configuration register
value, not the individual bit. At an offset of COMMS_RESET_OFFSET, a write will reset the model, as if
the reset pin had been activated, and also set the local execution type
variable to whatever data value was written, to override the default.
The JTAG callback function implements a simply loopback
functionality. JTAG transmit register write loads a value to a local variable,
that can be read when the JTAG receive register is read (or the TX register
read).
With this functionality and configurability in cpumico32 all features of the
model can be exercised, and a set of assembler code tests have been constructed
to do just that, detailed in the Test Code section.
GUI for cpumico32
The package comes with a python based GUI for cpumico32 program
(as well as the lnxmico32
program, of which more below),
if configuring the command lines seems to complicated and cumbersome.
When run (e.g., on windows, python3 python/lm32.py), a window appears looking like
the following:
The script is
located in the python/
directory and is called lm32.py.
It uses Python 3, and the tkinter
and ttk modules, which
are usually bundled with the python package.
All the flags and arguments of the command line can be
controlled from this GUI. The windows will open with the default values of the cpumico32 program, and the
GUI is used to adjust these. It will check for valid inputs, and raise an error
if outside of prescribed limits The program to be run is selected from the menu
under 'File->Open ELF File.' or with the toolbar button , with the selection displayed at the
bottom in the 'Files' frame. The CFG register setting is shown in the
'Variables' frame, and can be updated like the other entries. In addition, to
alter this value, the box may be double-clicked with the mouse, and a new popup
appears, like that shown below:
The popup window has two tabs, with one for the binary flags
enabling or disabling features, whilst the send has values defining the number
of watch- and breakpoints, as well as the number of external interrupts.
Updating the flags and values automatically updates the value in the
configuration register box.
When configuration is complete, the 'Run' button on the toolbar menu can be
pressed, and it will execute a cpumico32
command with all the appropriate command line arguments. It is assumed that cpumico32 is on the search
path, or is in the directory from whence the script was run.
By default, the program will search for the cpumico32 executable on the
path, as indicated by the 'executable Dir'
box in the Directories area. This can be changed from the file menu ('File->Executable Folder.'), or from the
toolbar , to select a
particular folder containing an executable. This is useful to select between a
debug or a release development version, say. In addition, the program uses the
directory from which it was run as the working directory (as indicated by the
'Run Dir' box), but this can also be changed from the file menu ('File->Change Run Folder.'), or from the
toolbar . Any relative
references (such as for the program file, log file, etc.) are automatically
updated if the run directory is changed.
The output from the running the command is sent to a new
window, including the contents of any specified log file. The window will look
something like the following figure (where, in this example, registers and
number of executed instructions are dumped, and the contents of memory at 0xfffc are printed). The
first line in the window is the command that was executed, with the command
line options, as a reference for using in scripting etc.
If the Debug button is pressed instead of the Run button, then the
program is run with debug options (e.g. -g, -t,
-G), selected based on
whether the 'TCP gdb connection' flag is checked, and what 'Debug Port' number
configured.
The menu has a 'Mode' pulldown, where 'Fast Mode' can be
selected, or this can be toggled from the toolbar . This is used if running an executable that
has been compiled with LM32_FAST_COMPILE
(see Compile Options section below). This removes some of the options
available for the normally compiled program and so, when 'Fast Mode' is
selected, these options on the window are disabled and greyed out, making clear
which options can be configured.
Test Code
A set of assembler programs were developed and are provided
for execution on cpumico32,
that execute a range of self-tests to verify the model. In order to compile and run these
tests it is assumed that the LatticeMico System Development Tools and the Lattice Diamond
Design Software (required for the Mico tools) are downloaded and installed. These are
freely availabe from the Lattice Semiconductor website
under the "Support->FPGA Software Home" page.
These tests are all
directed tests, but cover nearly all aspects of the model including all
instructions and all exceptions. Each program lives in a solitary directory
under the directory test/<category>/
and each sub-directory has a single source file, test.s. These tests are self-checking and return a value 0x0000900d in memory location
0x0000fffc if the test
passes, or 0x00000bad
if it fails (if the program never terminates cleanly, then this value is
undefined-but is unlikely to be the pass value). Below is listed the features
covered by the tests, and the test directory that contains the test that covers
that feature.
Arithmetic
instructions
Instruction
Covering test location Status
add instructions/add/ Completed
addi instructions/add/ Completed
sub instructions/sub/ Completed
sextb instructions/sext/ Completed
sexth instructions/sext/ Completed
mul instructions/mul/ Completed
muli instructions/mul/ Completed
div* instructions/div/ Completed
divu instructions/div/ Completed
mod* instructions/div/ Completed
modu instructions/div/ Completed
Comparative instructions
Instruction Covering test location Status
cmpe instructions/cmp_e_ne Completed
cmpei instructions/cmp_e_ne Completed
cmpne instructions/cmp_e_ne Completed
cmpnei instructions/cmp_e_ne Completed
cmpg instructions/cmpg/ Completed
cmpgi instructions/cmpg/ Completed
cmpgu instructions/cmpg/ Completed
cmpgui instructions/cmpg/ Completed
cmpge instructions/cmpge/ Completed
cmpgei instructions/cmpge/ Completed
cmpgeu instructions/cmpge/ Completed
cmpgeui
instructions/cmpge/ Completed
Shift instructions
Instruction Covering test location Status
sl instructions/sl/ Completed
sli instructions/sl/ Completed
sr instructions/sr/ Completed
sri instructions/sr/ Completed
sru instructions/sr/ Completed
srui instructions/sr/ Completed
Logical instructions
Instruction Covering test location Status
and instructions/and/ Completed
andi instructions/and/ Completed
andhi instructions/and/ Completed
or instructions/or/ Completed
ori instructions/or/ Completed
orhi instructions/or/ Completed
nor instructions/or/ Completed
nori instructions/or/ Completed
xor instructions/xor/ Completed
xori instructions/xor/ Completed
xnori instructions/xor/ Completed
xnor instructions/xor/ Completed
Branch instructions
Instruction Covering test location Status
be instructions/branch_cond/ Completed
bne instructions/branch_cond/ Completed
bg instructions/branch_cond/ Completed
bgu instructions/branch_cond/ Completed
bge instructions/branch_cond/ Completed
bgeu
instructions/branch_cond/ Completed
b instructions/branch_uncond/ Completed
bi instructions/branch_uncond/ Completed
call instructions/branch_uncond/ Completed
calli instructions/branch_uncond/ Completed
Memory access
instructions
Instruction Covering test location Status
lb instructions/load/ Completed
lbu instructions/load/ Completed
lh instructions/load/ Completed
lhu instructions/load/ Completed
lw instructions/load/ Completed
sb instructions/store/ Completed
sh instructions/store/ Completed
sw instructions/store/ Completed
Control/Status access
instructions
Instruction Covering test location Status
rcsr
instructions/csr/ Completed
wcsr instructions/csr/ Completed
* Note that the div and mod instructions are listed
in the instruction table in the LatticMico32 Processor Reference Manual [1],
but are not documented in the instruction descriptions. They are not supported
in the GNU assembler either. The implementation in this ISS implementation assumes
signed arithmetic and the tests use '.word <opcode>' to insert the instruction into the
test that the assembler won't recognise and compile.
Exceptions
Exception Covering test location Status
reset exceptions/instruction/ Completed
divide by 0 exceptions/instruction/ Completed
system call exceptions/instruction/ Completed
break instr exceptions/instruction/ Completed
DC.RE exceptions/instruction/ Completed
ext interrupt exceptions/external/ Completed
rsrvd
exceptions/ibus_errors/ Completed
instr bus err exceptions/ibus_errors/ Completed
disable instr exceptions/ibus_errors/ Completed
data bus err exceptions/dbus_errors/ Completed
hw
breakpoint exceptions/hw_debug/ Completed
hw
watchpoint exceptions/hw_debug/ Completed
Reset Event exceptions/hw_debug/ Completed
ISS user API testing
Test Covering test location Status
Mem callback exceptions/external/ Completed
Int callback exceptions/external/ Completed
Instr count api/num_instr/ Completed
Run-time ctrl covered in cpumico32.cpp Completed
HW
debug ctrl covered in cpumico32.cpp Completed
State access covered in cpumico32.cpp Completed
Re-entrance covered in cpumico32.cpp Completed
Extnl breaks covered in cpumico32.cpp Completed
Executing Tests
The tests are all run via a 'runtest.sh' script that lives in the test/ directory. Changing
directory to 'test/'
and running 'runtest.sh'
will execute all the tests, giving a pass/fail criteria for each, with a
summary at the end. An easier way to execute the tests, especially when doing
coverage measurements, is to use the makefile. When building code, a command 'make test' will get the build
up-to-date, and then run the test script. The tail end of the output should be
something like that shown below:
.
.
Running test exceptions/dbus_errors
PASS
Running test exceptions/hw_debug
PASS
Running test api/num_instr
PASS
Tests run : 24
Tests pass: 24
Tests fail: 0
The test script runs cpumico32 with arguments of '-T -r 0xfffc', but additional arguments can be
added by setting the environment variable CPUMICO32_ARGS. This must contain a string of valid cpumico32 arguments but, even
when valid, it is not guaranteed that testing will pass for all possible
argument combinations, so use with care.
Coverage
Coverage for the self-tests was performed using gcov and lcov,
with support in the makefile. Excluded from the coverage was any disassembler
or debug output code as, although this can be covered to a level of 100%, it
cannot be verified in an automatic self-test, and it is does not affect the
accuracy of the model. Similarly, the cpumico32 top level code was not included, as this is a
test/example program, and not part of the model. The core files covered were
thus:
•
lm32_cpu.cpp
•
lm32_cpu.h
•
lm32_cpu_inst.cpp
•
lm32_cache.cpp
•
lm32_cache.h
•
lm32_cpu_elf.cpp
The diagram below shows the LCOV report generated by
executing the following commands:
make clean
make COVOPTS="-coverage" test
make coverage
The report generated is created in the directory cov_html/src, and accessed
via index.html.
In order to obtain a goal of 100% coverage, some waivers on
lines of code were needed on unreachable lines of code. The exceptions followed
6 broad categories, detailed below:
Checks on parameters etc. that should never fail, and are
meant as debug aids for invalid calls from elsewhere in the code, or to protect
against invalid values from the API. The following waivers are of this type.
method
|
coverage waiver
|
lm32_set_verbosity_level()
|
invalid verbosity level
|
lm32_set_hw_debug_reg()
|
invalid debug register type
|
lm32_read_mem()
|
invalid read access type
|
lm32_write_mem()
|
invalid write access type
|
interrupt()
|
invalid exception ID
|
lm32_rcsr()
|
Invalid CSR register index
|
lm32_wcsr()
|
Invalid CSR register index
|
lm32_cache()
|
Parameter checks
|
Memory allocation failures that should never happen. Mostly malloc() calls, where a
failure here would indicate a system level problem. The following were waivered
on this basis:
method
|
coverage waiver
|
lm32_write_mem()
|
memory allocation, and error handling
|
lm32_run_program()
|
memory allocation failure
|
Code associated with disassembled and debug output, cannot
be self-tested. Coverage is possible, but not meaningful. lm32_cpu_disassembler.cpp was
wholly excluded, but calls to disassembler methods from the core functionality
still needed waiving:
method
|
coverage waiver
|
execute_instruction()
|
call to disassemble() function and pc update in disassemble mode
|
lm32_run_program()
|
break on disassemble run
|
User defined break address handling actually terminates the
program, and so cannot be self-tested. The feature is for debug purposes only,
in any case, so a waiver was added:
method
|
coverage waiver
|
lm32_run_program()
|
user break address trap
|
File exception handling also terminates the program, and has
no meaning in terms of model accuracy:
method
|
coverage waiver
|
read_elf()
|
file opening exception handling
|
read_elf()
|
unexpected EOF handling
|
read_elf()
|
program load overflowing memory
|
ELF file checks only fire with an invalid ELF executable
file, and don't affect the model's accuracy for executing a valid ELF file, and
so the checks were waivered:
method
|
coverage waiver
|
read_elf()
|
all ELF header checks
|
Creation of cache without parameters is not done in the
testing, as the tests exercise the various settings (including the defaults) by
explicitly setting the cache configurations.
method
|
coverage waiver
|
lm32_set_configuration
|
default parameters on cache object creation
|
With the above waivers in place, the three listed files,
containing all the methods, bar the disassembling, have 100% coverage.
Not Covered
Despite the 100% measured coverage metrics, there are some
aspects of the model as yet uncovered by formal testing.
•
Various internal memory sizes (all tests run with the default 64K
RAM)
•
Running multiple instances of the model
•
Accuracy of the timing model against a known good reference
GDB Interface
A GDB interface is available for connecting the model with a
GDB session via a remote target using a serial connection or a TCP socket. It
does this, when configured, by opening a pseudo-terminal, and advertising the
device path, or opening a TCP socket on a given port and advertising the port
number, which can then by used by GDB to connect to the model using its command
target remote <device>.
With this arrangement the model then looks like a hardware system connected to
the host via a serial interface or TCP/IP connection. One important difference
is that the debugging is non-intrusive on the model. Normally for hardware
systems connected via a serial port a 'stub' is required to be compiled with
the program ([6], section 20.5), along with some user supplied routines that
know about the local serial interface protocols etc, that intercept interrupts
and communicate with GDB to affect the debugging functionality. With this
implementation, with the advatages of visibility within the model, the code
being debuggd does not need to be modified with an additional stub.
As mentioned elsewhere, it is expected that the Lattice
Semiconductor tool chain for the LatticeMico32 processor is available to use
the GDB facility. In particular, the GDB program that must be used is lm32-elf-gdb.
The GDB interface is supported under both Linux and Windows
(not tested under Cygwin), but uses slightly different methods between the two
for the serial connection (the TCP method is common to both). For Linux it relies
on the built-in pseudo-terminal facilities, which are simple to utilise,
whereas on Windows third-party code or the implementation of virtual ports and
null modems is required. The com0com
project has been used to test the windows implementation of the interface,
requiring the code to only open a COM serial port, and have the GDB session
connect to a paired COM port, with the two communicating via a NULL modem
model.
Supported Features
The implementation supports only a subset of the possible
GDB commands that can be sent over a remote interface, but it supports more than
the minimum required ([6], section E.1).
·
Register reads and write (P, p,
G and g packets)
·
Memory reads and write (X, M
and m packets)
·
Single thread control (s and c
packets)
·
Session termination (k and D
packets)
·
Hardware breakpoints (Z1 and z1
packets)
·
Hardware watchpoints (Z[2-4] and z[2-4]
packets)
In addition, soft breakpoints are implicitly supported via
the memory read and writes, as GDB substitutes the instruction at the break
point by reading (and storing away) the original instruction, and writing a break instruction in its
place. The breakpoint interrupt is rasied when the break instruction is
reached, just as for hardware breakpoints, and the model returns control to the
GDB interface. Before resumption, the original instruction is restored with a
further write to memory. Soft breakpoints, in reality, can only work in code
that is in volatile memory, where the instructions may be overwritten. In the
model all code resides in 'memory' that cab be overwritten, and so this
technique can be used on code, even if it is destined for ROM of Flash.
Currently only support for single threaded code is
implemented, though this may change in the future.
Usage
The GDB interface is activated using the command line option
-g, and is available on
both the cpumico32 and lnxmico32 compiled programs.
The -G
option is available to run in debug mode and specify COM or TCP port to use, if
different from the default. The -t
option selects the use of TCP rather than serial remote connection to GDB.
Specifying any of these three options enables debug mode.
The interface itself is not part of
the lm32_cpu class, as
it it needs to sit on top of the model to control it. It is implemented in the
files lm32_gdb.cpp and lm32_gdb.h, and any new
projects must include the header and compile in the source code in order to use
it. A single function constitutes the API:
int process_gdb (lm32_cpu* cpu,
int port_num
= LM32_DEFAULT_PORT_NUM,
bool
tcp_connection = false);
The function expects (at least) a single argument that is a
pointer to an lm32_cpu
object, which has been created and configured prior to calling the function.
The configuration can include the loading of a program in to memory, though
this would normally be done from within the GDB session. In the lnxmico32 implementation, for
instance, the loading and configuring of the µClinux code is skipped when in
GDB mode. A second argument specifies either a COM port or a TCP port number to
use, depending on the value of the third argument, which is a flag to use a TCP
socket (when true) rather
than a serial port (when false,
the default). For Linux, if not using TCP, the port number can, and should, be
ommited, and it can also be ommited from Windows if the default port number is
that required.
When run (e.g. cpumico32
-g, or cpumico32.exe -G6,
lnxmic032 -t -G49152) the
function will not return until it either losses attachment to a GDB session, or
some error has occurred. The function returns 0 for a normal termination,
otherwise an error occurred. Once called, for TCP mode, the function opens a
TCP socket with the given port number and prints out the port details with a
message. E.g.:
LM32GDB: Using TCP port number: 49152
The GDB session uses this port number to connect as <hostname>:<port#>.
The hostname can be a host on the network if connecting from a different
computer, or localhost,
if running on the same machine. Actually, of the host name is ommited (but not
the colon), this defaults to being the local host. E.g. from the gdb session:
(gdb) target remote :49152
Similarly, when in serial mode running in Linux, the function
opens a pseudo-terminal and prints out the device created. E.g.:
LM32GDB: Using pseudo-terminal /dev/pts/14
Note that this terminal can change between sessions, so it
must be noted for the GDB session that must connect to it. Something similar
happens for Windows in serial mode, but the terminal is fixed, as specified
with -G, or the default
value. The Windows message looks something like the following:
LM32GDB: Using serial port /dev/ttyS6
Despite opening a COM port, the value needed by GDB still
needs to be of the form of a Linux style device file, as the program uses
Cygwin under the hood. With com0com,
the device needs to be the paired COM port with that being used by the GDB
interface. The ttySn
numbering, however is one less that the windows COM port numberings, so ttyS6 refers to COM7. With com0com installed and a COM
port pair added, the device manager on windows shows the port pairs simlar to
that shown below:
In the example above, the two ports are COM6 and COM7. The
GDB interface must use the lower number of the paired port (e.g. COM6), as it
advertises the tty equivalent to the COM port that is one value above it (e.g
COM7 mapping to /dev/ttyS6).
A typical session might look like the following. In this
example one of the test programs is used as the code to debug, and compiled
with symbols include
lm32-elf-as -g test.s -o test.o
lm32-elf-ld test.o -o test.elf
Assuming that the cpumico32 is run with the -g option (and any other appropriate options, such as -TDIv, say) and has displayed
the pseudo-terminal path, as above, then the GDB session, in separate terminal,
can be run thus:
lm32-elf-gdb test.elf
GNU gdb 6.8
Copyright (C) 2008
Free Software Foundation, Inc.
License GPLv3+: GNU
GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free
software: you are free to change and redistribute it.
There is NO
WARRANTY, to the extent permitted by law. Type "show copying"
and "show
warranty" for details.
This GDB was
configured as "--host=i686-pc-linux-gnu --target=lm32-elf"...
(gdb) target
remote /dev/pts/14
Remote debugging
using /dev/pts/1
main () at test.s:37
37
xor r0, r0, r0
Current language:
auto; currently asm
(gdb) load
Loading section
.text, size 0xec lma 0x0
Start address 0x0,
load size 236
Transfer rate: 1888
bits in <1 sec, 236 bytes/write.
(gdb) hb _ok7
Hardware assisted
breakpoint 1 at 0xc8: file test.s, line 109.
(gdb) b _finish
Breakpoint 2 at
0xe4: file test.s, line 122.
(gdb) c
Continuing.
Program received
signal SIGTRAP, Trace/breakpoint trap.
_ok7 () at
test.s:111
111
sexth r2, r2 # Sign extend
(gdb) s
_ok7 () at test.s:112
112
addi r3, r1, TEST_VAL10
(gdb) c
Continuing.
Breakpoint 2,
_finish () at test.s:123
123
sw (r31+0), r30
(gdb) s
_end () at
test.s:125
125
be r0, r0, _end
(gdb) x /wx
0xfffc
0xfffc: 0x0000900d
(gdb) detach
Ending remote
debugging.
(gdb) quit
In the above run, the compiled code (test.elf) was debugged, as specified on the
command line to the GDB program. Connection was established to the already
running cpumico32
program in GDB debug mode with the target remote /dev/pts/14 command. The program is loaded
into the cpumico32
memory with the load command. We could have loaded the program when running cpumico32, but there is a
chance that the program loaded by cpumico32
and that for the gdb session, from whence it gets its symbol information, might
mismatch. So loading from gdb ensures that the running program and the synbol
tables come from the same source.
Two breakpoints were set; one at _ok7 with a hardware breakpoint (hb _ok7) and another at _finish using a soft
breakpoint (b _finish).
Execution is started using the 'continue' command (c)—as this is a remote debug sesssion, gdb
assumes the target is running already, paused waiting for commands. When the
target reaches the first breakpoint GDB halts and displays the source line of
the breakpoint. Next the program is stepped (s) to the next instruction, before another 'continue'.
The program breaks again at _finish,
and another single step taken to complete the program. Memory is inspected at
location 0xfffc (which
contains the pass/fail code) using x /xw 0xfffc. Since 0x0000900d is displayed, the test passed. Detaching from
the remote system (detach)
ensures a clean exit by cpumico32,
which terminates.
The above example shows the main points in using GDB to
control and debug the model and a program running on it but, of course, there
are many more features that can be used by GDB that are not shown in the above
simple session. Any IDE using the relevant GNU toolchain, including GDB, can be
used as a front end to the interface, giving full development capabilities to
the model. The LatticeMico System Development Tools use Eclipse, and have
remote target capabilities, normally used to connect to a development system. Details
of setting this up are beyond the scope of this document, but the diagram below
summarises the connections between an IDE (such as Eclipse), the GDB debugger
application (for lm32) and the model's GDB interface, via the alternative
connections of either a TCP socket or serial connection (using a
pseudo-terminal for Linux, or the com0com application in windows).
Support for running in debug mode from the python GUI for cpumico32 is available. An
additional 'Debug' button , next to the 'Run' button is added to the toolbar.
Using this in place of 'Run' executes the model in debug mode, and the
pseudo-terminal/port information is printed on stderr on the terminal from which the python script was
run. A flag to choose between serial or TCP connections is also added to the
flags section. A GDB remote debug session can connect this as described above.
Compile Options
By default, when cpumico32 is compiled, it has the behaviour as described
in the previous sections. However, it can be compiled with various definition
in order to modify it's behaviour. There are, presently, two conditional compile
definitions that can be set:
- LM32_FAST_COMPILE : Removes compiling of much code
that is not strictly necessary for execution to improve execution speed,
but removes many features. (Used in files lm32_cpu.cpp, lm32_cpu_inst.cpp, lm32_get_config.cpp.)
- LM32_MMU : Adds in the MMU functionality. By default the model
is as specified by the LatiiceSemiconductor reference, but defining this adds
in the TLB functionality.)
- LNXMICO32 : Removes some command line options and
features not supported by the lnxmico32 program (see the case study in the next
section), and a test for instruction type when disassembly, since lnxmico32 loads code
differently. (Used in files lm32_cpu.cpp,
lm32_get_config.cpp.)
The LM32_FAST_COMPILE definition is used to remove
as much code from the model, whilst still retaining a viable simulation, in
order to maximise the execution speed. To this end, the following features are
removed:
•
Break point specification removed
o
Removes command line options -n, -b,
-d
o
Can still break from interrupt callback
•
Memory wait state specification
o
Removes command line option -w
•
Disassemble mode
o
Removes command line option -x
•
Removes timing accuracy
o
No cache modelling of timing
o
Callback timing information is ignored
o
Pipeline stalling not calculated
•
Memory access alignment checks are disabled
•
Memory stats information is not logged (memory tagging removed)
•
No watchpoint support
•
No hardware breakpoint support
With the
compile definition, an additional memory restriction applies-memory size must
be a power of 2, as the range masking requires this to avoid using a division.
The LM32_MMU definition,
when defined duriong compilation, includes all
the code for basic MMU support, as added by the M-labs MilkyMist project [7].
Including this code, by default, does not change the model's functionality,
as the reset state of the MMU logic is to disable the TLBs and their
translation of addresses. However, additional checks are made when the
logic is present (such as checking the state of the disable bits),
and so the model runs marginally slower if included. If the MMU is not required,
then the code can be compiled without defining
LM32_MMU. Three additional
CSRs appear when the MMU is included, to control the TLBS:
TLBPADDR,
TLBVADDR and
PSW. The first of these
(TLBPADDR) is write only, and
overloaded with a read-only register,
TLBBADVADDR, and they share the same CSR ID [7].
The LNXMICO32 definition simply
removes some further command line options and functionality not needed by the lnxmico32
case study program
described in detail in the next section. The configuration code (lm32_get_config.cpp) is
shared between the lnxmico32
and cpumico32 programs,
and the majority of the uses of the definition are found in this file. With the
definition active, the -T
option is not used, as the test callbacks are not implemented in lnxmico32, which has its own
callback functions, used to model peripherals. In addition a -V option is added, but only
if LM32_FAST_COMPILE is
not defined as well, which acts like -v, but allows specification of a cycle time when
verbosity is activated. The LNXMICO32
program loads binary images directly to memory, rather than load an ELF file,
so the -f option is
also removed.
One other use,
modifying lm32_cpu.cpp
code, is to remove the test to see if a memory location is an instruction
before disassembling it. The lnxmico32
program loads code as binary data via the lm32_write_mem() function, and thus does not label it as
instruction data. In order to debug the code, the check is suspended for this
compilation option.
Case Study: An Embedded Linux System
A case study in the usage of the model is given here,
inspired by Reginaldo Silva's Javascript Emulator,
in order to demonstrate the features and extensibility of the ISS. A basic,
non-mmu Linux system is put together, using the u-boot and µClinux ports to
lm32. A minimal system is put together in order to be able to boot the Linux
OS, with a light weight Unix environment provided by BusyBox, targeted at
embedded platforms.
The diagram below shows the general system layout. It
consists of the mico32 model, with two UARTs (UART0 and UART1) and a timer
(TIMER0). These are modelled as part of the lnxmico32 environment (see lnxuart.cpp and lnxtimer.cpp), and model the Lattice IP implementations
for these functions.
The top level for the system is
called lnxmico32, and has
a top level source file lnxmico32.cpp.
This instantiates mico32 model, shown as the lm32 and RAM boxes in the diagram.
It registers its own callback functions for both the external memory accesses
(using the model's API method lm32_register_ext_mem_callback())
and interrupts (using lm32_register_int_callback()). The
callbacks handle the register accesses to the peripherals, along with the
'ticking' and passing back interrupt status.
When run, the two binary files, vmlinux.bin and romfs.ext2, are expected to be in the directory from
which the program is executed-as is, by default, a configuration file, lnx.ini (see next section).
The binary images for the
u-boot/µClinux and RAM filesystem are loaded to memory (at 0x08000000 and 0x08400000, respectively),
and then memory is updated for hardware setup configuration, and an initial
boot command string. The simulation can then be started, and the system boots,
sending output characters, via UART0, and, once booted, accepting keyboard
input to allow logging in and issuing of commands in the shell (msh-minimal shell). After
boot the screen will look something like the following:
To login to the system, login as root, with a password of lattice. To exit from the program, from
anywhere, type #!exit!
and press enter.
Configuration
The lm32 model
The mico32 model must be configured correctly for the system
to boot properly and, by default, the program, will look for a configuration
file lnx.ini in the
directory from which it is run. This can, of course, be overridden with the -i command line option. The lnxmico32 program shares a
number of command line options of the cpumico32 program, (indeed, it shares common
configuration code). The full usage message for the lnxmico32 program is as follows:
Usage: lnxmico32 [-D] [-I] [-r <addr>] [-R
<num>]
[-l <filename>] [-c <num>] [-i
<filename>] [-s <filename>] [-S] [-L]
-l Specify log file output (default: stdout)
-r Address to dump value from internal ram after
completion (default: no dump)
-R Number of bytes to dump from RAM if -r specified
(default 4)
-D Dump registers after completion (default: no dump)
-I Dump number of instructions executed (default: no
dump)
-c Set configuration word value to enable/disable
features
-i Specify a .ini filename to use for configuration
(default none)
-s Specify .sav filename (default lnxmico32.sav)
-S Save state on exit (default no save)
-L Load saved state before execution (default no
load)
The provided lnx.ini
options file, for the most part specifies default, but does set the
configuration word to a specific value that represents the minimum
configuration for lnxmico32
functionality. The file looks like the following:
;
; INI file used for lnxmico32. DO NOT EDIT!
;
[configuration]
cfg_word=0x00003017
[debug]
log_fname=stdout
ram_dump_addr=-1
ram_dump_bytes=0
dump_registers=false
dump_num_exec_instr=false
[state]
save_file_name=lnxmico32.sav
save_state=false
load_state=false
; When LM32_FAST_COMPILE not defined
; verbose=false
; disassemble_run=false
; [breakpoints]
; user_break_addr=-1
; num_run_instructions=-1
; disable_reset_break=false
; disable_hw_break=false
; disable_lock_break=false
; [memory]
; mem_wait_states=0
; [dcache]
; cache_base_addr=0
; cache_limit=0x0fffffff
; cache_num_sets=512
; cache_num_ways=2
; cache_bytes_per_line=4
; [icache]
; cache_base_addr=0
; cache_limit=0x7fffffff
; cache_num_sets=1024
; cache_num_ways=2
; cache_bytes_per_line=4
Some of the options (those commented out) are only available
if lnxmico32 is not
compiled with LM32_FAST_COMPILE
defined, which disables disassembling, breakpoints, memory wait states and
cache timing simulation. These can be reinstated when compiled without the
definition, but will cause a warning if uncommented when compiled with it
defined.
Using the GUI
As for cpumico32, the python GUI
lm32.py
(see GUI for cpumico32 section above) has support for
lnxmico32 to configure the model.
The main display defaults to
cpumico32, but tabs at the top
select between this and lnxmico32.
When lnxmico32 is selected, the layout
is similar to that for cpumico32
(and most options are common), but with some changes relevant to this program. The main differences
are the save state and load state flags (in place of the internal callback enable), the deletion of
the memory configuration entries (it is fixed in this model), and the specification of the the
state save/load file with a browse button (in place of the program file entry—an informative only entry).
It should be noted that the common flags and entries on each tab are a view onto the same data.
That is, if a flag or entry is changed in one view, it will have been changed in the other view as well.
Just like for the cpumico32 tab,
selecting 'fast mode' from the menu will disable and grey out those options that are no longer
relevant. When the script is run, and the lnxmico32
tab is selected, the window looks something like that shown below:
The system software
To configure boot and OS software before running, three
things need to happen:
- A hardware setup table must be constructed at 0x0BFFE000
- A boot command line string placed at 0x0BFFF000
- GP registers 1 to 4 pre-charged with addresses for the
above two entries, plus the addresses of the romfs.ext2 image start (0x84000000) and end (0x84000000 + file
length)
The hardware setup table consists of a consecutive list of
structures, with one for the CPU, the memory, the two UARTs and the timer. This
is followed by a termination structure. Each structure has a similar format
struct
{
uint32_t length;
uint32_t id;
.
.
<specific payload>
.
.
}
The length gives the size of the entry (including the length
bytes), and the ID is a unique number. The payload for each of the entries also
have similar structures (except the terminator), with a 32 byte string array
containing the name of the instance (which can be shorter, but not longer, than
32 bytes), followed by parameters for the particular device. The terminator is
just a length (8) followed by an ID of 0, with no payload.
For the CPU ("LM32")
the payload is simply a 32 bit number for the clock frequency, in Hz. The
memory ("ddr_sdram"),
has a parameter for the base address, followed by the size in bytes. The timer
("timer0") has a 32
bit word for the base address, then four bytes for a write tick count, read
tick count, start/stop/control and counter width. A following 32 bit word
specifies the number of reload ticks, and the a byte giving the interrupt
number (i.e. which of the 32 bit external interrupt pins it is connected to).
The structure is then padded to a 32 bit boundary with bytes of 0 value.
The UARTs ("uart0"
and "uart1") have a
base address and baud rate parameters (both 32 bits), followed by 8 bytes for
number of data bits, number of stop bits, interrupt enable, block on transmit,
block on receive, RX buffer size, TX buffer size and its interrupt number. More
information can be found in Appendix A of the "Linux Port to LatticeMico32
System Reference Guide" [3].
These hardware setup structures are written consecutively to
memory, starting at 0x0BFFF000,
in the order, CPU, memory, timer0, uart0, uart1 and the terminator.
The
command line string is used as the u-boot command arguments when starting the
system. For lnxmico32,
this is
root=/dev/ram0
console=ttyS0,115200 ramdisk_size=16384
Finally, the general purpose
registers GP1 to GP4 are pre-charged with four addresses, based on the above
configurations. If these were invariant, then a small assembler program could
be added in memory that set these values and then jumped to the system entry
point, with the initial entry point being this initialisation program. However,
to ease modification to system parameters, these are written directly, using
the lm32 model's lm32_set_gp_reg()
method. GP1 to GP4 are set to have the following addresses: The h/w setup base
address, the command string base address, the RAMFS load start address, and the
RAMFS load end address + 1.
Having
configured the system, the execution of the code can begin.
Use of Callbacks
The lnxmco32 system registers two callbacks with the
lm32 model; one for memory accesses (ext_mem_access()) and one for ticking/interrupt
generation (ext_interrupt()).
The first of these (ext_mem_access()) intercepts
all memory accesses to the peripherals-the timer and the two UARTs. It
separates out the address passed in by the lm32 model into a page address (in
this case a 4KB page), and offset within that page. If the page address matches
the base address of one of the peripherals, it processes the address, otherwise
it simply returns with a LM32_EXT_MEM_NOT_PROCESSED status, informing the
model it must handle this access.
All the peripheral models for
the lnxmico32 system
provide three functions: a read function, a write function and a tick function
(see next section). The memory callback function is called with an address and
an access type. If the access type is LM32_MEM_WR_ACCESS_WORD, then the selected peripheral's
write function is called with the offset address and data value. If not, the
read method is called with the offset address and a pointer to the data
variable in which to return the read value.
When the address is matched by
the callback, a processing time is returned, so that the lm32 model can advance
time accordingly with the delay of the access. In lnxmico32, this defaults to 1 cycle for all
peripheral register accesses.
The tick/interrupt callback
function (ext_interrupt())
is called regularly by the lm32 model, with a timestamp. The callback returns
an interrupts status in a 32 bit word, with each bit representing an external
interrupt pin on the mico32 processor, of which there are up to 32. Each time
the function is called, it calls each of the tick functions for the
peripherals. For the timer this is a function that simply takes the time as an
argument, and returns true or false to indicate whether it is interrupting or
not. For the UARTs, they take additional parameters to return termination
request status, indicate whether it is the keyboard UART and its context. The
context is needed, as the function's code is common to all UART instantiations,
but supports up to 4 different contexts, and identifies whether UART0 or UART1
calls. Like the timer tick function, the UART tick function returns a Boolean,
indicating interrupt request status.
The callback function ORs
together all the interrupt statuses of the peripherals, which is returned when
the function exits. The termination request statuses of the UARTs are also combined,
and if a UART is requesting termination, the value returned in the wakeup_time pointer is set as
LM32_EXT_TERMINATE_REQ,
indicating to the lm32 model that an external termination is active. If no
termination, a wakeup time of the current time plus LM32_INTERRUPT_GRANULARITY (1000 cycles in this
case), is returned. This means that the tick function will not be called for at
least 1000 cycles (plus a bit to, say, complete an instruction). This might
be set to 1, so it is called after every instruction, but the rate of activity
for these peripherals is not that high, and would produce unnecessary
processing overhead. A granularity of even higher might be possible, but 1000
seems to give little noticeable overhead in processing speed.
Peripheral Models
There are two peripheral models implemented for the
lnxmico32 system: a timer and a UART. Both provide three interface functions:
void lm32_<peripheral>_write
(const uint32_t address,
const uint32_t data,
const int cntx
= 0);
void
lm32_<peripheral>_read (const uint32_t
address,
uint32_t* data,
const int cntx = 0);
bool
lm32_<peripheral>_tick (lm32_time_t time,
[
bool
&terminate,
const bool kbd_connected = false,
]
const int cntx = 0);
The two register access functions are fairly self
explanatory, with an address, and either a data value passed in for writes, or
a pointer passed in for returning the read value. All accesses are for 32 bit
words, as all the registers for the peripherals are aligned to 32 bit word
boundaries.
The tick functions all have a time parameter input, and a context value (cntx), for selecting the
particular instance of the peripheral (up to 4 of each), with a default context
of 0, allowing this parameter to be omitted of only one peripheral
instantiated, as for the timer in lnxmico32.
The UART model also has a terminate
parameter (passed by reference) and a kbd_connected input flag. The terminate parameter allows the UART to request
termination of execution by setting this parameter to true. This done if the user types a certain
sequence of characters as input to the UART, flagging indication of
termination, and allowing the program to exit cleanly with any post processing
requirements, such as dumping registers etc. The keyboard flag input enables
internal processing of keystrokes as RX data to the UART. As only one UART can
process the single keyboard, this flag enables the nominated UART to process
key stroke inputs, whilst the others ignore them.
The timer model simulates the behaviour of the Lattice timer
IP [4], implementing the four registers to control operation, the counter and
the generation of an interrupt, when the counter reaches zero. The counter can
continue or stop, depending on the register control settings.
The UART model simulates the behaviour of the Lattice UART
[5], implementing the 8 registers. Transmission modelling is the much simpler
functionality. When the processer writes a byte to the transmit holding
register, the byte is passed straight on to stdout, via a putchar() call. However, timing is simulated, and the
status bits are cleared to indicate that there is no space for another byte.
The timer model's tick function notes the time when transmission status goes
active, and counts until the transmission time has passed, before setting the
status back to allow further transmission. The time calculated is a function of
the configured BAUD rate and the clock frequency, and assumes a start, parity
and stop bit, on top of 8 bits of data (i.e. 11 bits). When the status bits
indicate that the TX buffer is empty once more, if enabled, the model generates
an interrupt, which is returned as a true status when the tick function exits.
For keyboard input, each time the tick function is called
(and the kbd_connected
flag is set), the model checks if a key has been pressed, and fetches the byte
value if it has. This is placed in the RBR register and the "data received"
status set. If interrupts are enabled for a data reception, a interrupt status
of true is returned by
the tick function. As well as simulating keyboard input for the system, the
model also monitors the input to detect for a specific sequence of key strokes.
If the input sequence matches a particular string ("#!exit!<enter>"), it sets, to true, the terminate parameter passed in
(by reference) to the tick function call. The system model can choose to ignore
this, but in lnxmico32,
this request is passed back to the lm32 model to terminate its execution as a
user breakpoint.
Save and Restore
The lnxmico32
program has the ability to save the state of the system on exit, and to reload
that state on re-running to restore the running system to exactly where it was
upon exit. In order to do this, the state of the CPU, memory and of all the
peripherals must be saved and restored. The lm32 model already has two methods
to support this, as documented previously:
•
lm32_get_cpu_state()
•
lm32_set_cpu_state()
These two methods transfer state within a single object of
type lm32_cpu::lm32_state.
The CPU model has been designed so that it keeps all relevant internal state
within a structure of this type, and it is returned or set as a single
structure, so that the calling program need not know what state to save, or
even its details. The returned state can be pointed to by a pointer to bytes,
and saved as a byte stream to the size of the type. The CPU state does not
contain the contents of the memory, as this can be large and needs to be
handled externally, using the read/write methods of the CPU model, for optimal
handling (more below).
The UART and timer peripherals
of lnxmico32 have followed
the same practice as the CPU model, and have single structures containing all
the relevant state (with defined, and exported, types lm32_uart_state_t and lm32_utimer_state_t), and methods for retrieving
them and restoring them:
•
lm32_get_timer_state()
•
lm32_set_timer_state()
•
lm32_get_uart_state()
•
lm32_set_uart_state()
The state returned by the
peripherals contains the state for all the contexts that the peripheral code
supports, and not just a single context whether the context has been used or
not. This simplifies the save and restore, and makes the returned data size
fixed in all cases. The amount of data is small (compared to the memory image),
and so has little overhead. Since the data is fixed size, it is saved
completely raw, with no additional information, such as size or target
peripheral. The format of the .sav
file has a fixed order of data (see below), so that on restore it is known what
data is expected next, and its size inferred.
Saving of memory
The memory of the lnxmico32 program is fairly large at 64MB. A large
proportion of this contains the invariant boot and OS programs, and the
filesystem ROM image. As these can be reloaded on restore runs, the data in
these locations need not to be saved. To make saving of data more efficient,
the memory callback function, which is called for all CPU memory accesses,
monitors for any write access to the RAM. The RAM, for this purpose, is divided
up into 1K pages and a tag kept on each page, setting a flag to true if a write
access is made on that page. The size of the page is a compromise between
granularity of save data and the size of the tag array. This can be altered
without change the function's integrity.
When a save operation is performed, only the pages in RAM
that have had a write access are saved. Since the tag array is cleared after
that OS and FS are loaded, only subsequent writes are recorded and the pages
saved. Since the page sizes are fixed, only the address of the page needs to be
pre-pended to the data. This is saved first as four bytes, with MSB format. By
forcing this format, the data is independent of the host, and its byte ordering,
on which it is run. The data, then, is saved as a set of consecutive addresses
and 1K binary data images.
The File Format
As mentioned above, the peripheral data is fixed size. Also,
the CPU state comes as a fixed size object. The RAM data is a dynamic set of
pages, which will vary from save to save. The file format was chosen to place
all the fixed sized data first, and then followed by the RAM data, avoiding the
need to delimit the fixed sized data, though the order becomes fixed. The
format of the .sav file
is as shown below:
Control
By default the state is saved to a file lnxmico32.sav. This can be
changed to a different file name by using the -s command line option, or the save_file_name parameter in the [state] section of the .ini file. Saving of state is
enabled with the -S command
line option or save_state,
and loading of previously saved state is enabled with the -L option or load_state.
When both saving and restoring are enabled, the affects are
accumulative. That is, each load will re-mark the RAM pages that are restored
to, with new pages accessed added on top of this, and so on. This is needed as
there is no guarantee that all previous pages will be accessed on each new run.
Performance
Performance measures were made using the Linux system model,
as this is sufficiently complicated, and representative of a real system, as to
yield meaningful results. The system was tested for all supported platforms (as
documented previously), with the addition of MSVC Community 2015.
The platform used was an Intel® i7 920 CPU, running at 2.67GHz, with a system having 6GB
RAM on an ASUS P6T SE Motherboard. The test was to run lnxmico32 -I, boot Linux, login as root and
exit, which yielded a test of > 400 Million instructions.
Compilation for the Microsoft Visual C environment was all
done with the 'Release' mode. For the gcc compilations, optimisation options '-Ofast -fomit-frame-pointer
-march=native' were used.
The results are summarised in the table below:
OS
|
Compiler
|
LM32_FAST_COMPILE
|
Unmodified
|
Windows 10
|
MSVC Express 2010
|
34.2 MIPS
|
22.6 MIPS
|
MSVC Community 2015 x86
|
31.4 MIPS
|
21.2 MIPS
|
MSVC Community 2015 x64
|
38.8 MIPS
|
26.3 MIPS
|
Cygwin 32 Bit
|
gcc v5.4.0 (-m32)
|
45.4 MIPS
|
28.8 MIPS
|
Ubuntu 16.04 LTS
|
gcc v5.4.0 (-m64)
|
43.8 MIPS
|
30.2 MIPS
|
gcc v5.4.0 (-m32)
|
30.5 MIPS
|
19.8 MIPS
|
The surprising result here is that the Cygwin 32 bit compilation
comes out on top, rather than the native 64 bit Linux system (Ubuntu)—though this
situation is reversed when not compiled with LM32_FAST_COMPILE. Since Cygwin is a 32 bit compiler, and
Ubuntu is 64 bit, additional differences may arise from this, though the 32 bit Ubuntu compile
was the worst of all. However,
compiling for 64 bits in MSVC improved the situation over 32 bits, and
the same might be expected for GCC. Unfortunately 64 bit Cygwin was not available
at the time of testing, and so a
complete analysis has not been done for these differences.
So, focussing on the best result, the system runs at around
45 MIPS, when compiled with LM32_FAST_COMPILE,
which yields a 57.6% improvement over the fully featured model. The speed of
the model will very much depend on the nature of the code being executed, and
the profile of the particular instructions, and so the results documented
here are only a rough guide of 'best' performances for a limited, though not
contrived, test.
Multi-processor System Modelling
In this section is discussed the method for constructing a
system model with multiple instantiations of the CPU model, to create a
multi-processor system. Note that, to date, the model has not been tested
extensively in this manner, but the model is designed to be able to support
this, and the recommended method is described here.
Running Models Concurrently
In the case study described in the previous section, a single
model is instantiated, and is run by calling the model's lm32_run_program() method
with an exec_type
argument set to a value of LM32_RUN_FROM_RESET.
This means that the CPU model will loop internally, executing instructions
indefinitely, until such time as the registered external interrupt callback functions
signal for a termination (returning a negative wakeup time), when the model
will return from the function call. Using this method when requiring multiple
CPU models to be running concurrently would require having each model running
in a separate thread, with all the complexities that that would entail.
Fortunately this is not required.
The model has two other execution types that can be used to
enable concurrency: LM32_RUN_SINGLE_STEP
and LM32_RUN_TICK. The
differences between these two types is explained in the
Execution and Breakpoints section but, in summary, the first steps one instruction, whilst
the second advances the clock by one tick (which may, or may not execute an
instruction, if waiting for the last to complete). The single stepping is the
simplest and quickest way to advance the model but requires time
synchronisation between the models (more below), whereas the ticking advances a
known time (one cycle), but the models will needs calling more frequently. Note
that ticking assumes that the timing model is enabled within the model. When
compiled with LM32_FAST_COMPILE,
for instance, the timing model is disabled, and a tick call and an instruction call
are one and the same thing. In this case LM32_RUN_SINGLE_STEP should be used, as the clock state
is not updated.
When calling the run methods with either the step or tick
execution type, the model will return after just one instruction or clock
cycle. When multiple models are instantiated, these can be called within an
external loop, one after the other, with either a step or tick execution type
(don't mix the type between models though). Termination of this external loop
is up to the implementer, but the status returned by the calls to the model can
be inspected, and breaking the loop could be based on one or all of them
requesting termination.
Synchronising Time
Running the models with an active timing model, but using
stepped execution type (for speed), can cause drift in time between models if
this is not managed. This is because instructions take different times to execute,
and unless the models are all running identical programs, the state of their clocks
will advance differently. In this case, the external program's loop needs to
inspect time, and run the models appropriately.
The CPU model has a method to inspect time: lm32_get_current_time(). On
the first loop, all model are stepped, and their time inspected. On the next
iteration, the CPU with the earliest time is the one to be run, and any others
that have this same earliest time. Any with a time in advance of this are not
run. This continues on all subsequent iterations, until termination. This
ensures that the CPUs are never more than a few cycles adrift, and keeps them
in sync, whilst allowing reduction in the number of calls to the models from a
pure ticking model. Whether a ticking model is better than a stepping and
synchronising one is a matter of circumstances and preference.
Shared Callbacks
Any system model using the lm32 model will have to have memory
and interrupt callbacks registered in order to model external peripherals etc.,
as for the Linux case study described earlier. If a multi-processor system is
to have completely different functionality implemented in each of the callback
methods, then different functions can be registered for each of the instantiated
CPUs. However, if the CPUs have shared functionality, such as modelling connection
to a shared bus, with shared access to memory and peripherals in a common
address space, then something else needs to be done.
The model does not return an ID when calling its registered
callback functions, and so a method is needed to identify which CPU is making
the call, if the callback code is to be shared. The idea here is to have the
common code in a separate function, not registered with the models, but with an
addition of an ID parameter. Individual 'wrapper' callback functions are registered
with each separate CPU which simply calls the common code, with the addition of
the ID for the particular processor. It is then up to the common code to keep
separate contexts for each processor, where necessary, or access common state.
The diagram below illustrates this concept:
Shared Memory Space
As mentioned in previous sections, the model can have an
internal memory, with controllable size, and memory space offset. This memory
is separate for each CPU instantiation. The internal memory can be configured
to be all, or partly, removed, and the memory callback functions trap and
process memory addresses across all or part of the modelled space. To share an
address space, whether mapped to memory, or to peripheral registers, the
external memory callback functionality processes these addresses, and models the
functionality.
For shared space, the external model common callback code must
access the common state, regardless of the ID passed in. If modelling state
unique to each processor, it must switch context based on the ID. One can also
envisage state that is common between a subset of processors, but separate to
another subset.
A similar sharing of interrupt sources can be envisaged for the
interrupt callback as well, where some external interrupts raise an interrupt
pin on all the CPUs (for a broadcast mailbox function, say), or be CPU
specific (for a private peripheral, for example).
In this manner, it is possible to set up any arbitrary system
of shared or particular memory and peripheral set between multiple CPU instantiations.
Internal RAM can be used for private memory (modelling data and/or instruction TCM,
say), with external callbacks mapping shared memory and peripherals to the
appropriate CPUs. The CPU's program code can be located either privately in
internal memory, or be common to multiple instantiations, as required. Such an
example system is illustrated in the diagram below:
In this example three processors have access to shared
memory and peripherals A and B (and any other on this bus). The third processor
also has access to peripheral C, but has no internal memory. Thus, access to
shared memory and peripherals A and B are handled by the common callback code,
without regard to ID. All memory accesses from the third processor must be
handled by the memory callback (as it has no internal memory), and peripheral C
is only accessed when receiving an ID for the third processor, otherwise the
access is marked as unprocessed. Not shown, but similarly for interrupts, it
can be envisaged that one or more of the peripherals on the common bus could
interrupt one or more processors as necessary. Peripheral C, however, would
only interrupt the bottom processor. So two memory busses, and two interrupt busses
are modelled, and can be extended at will.
Downloads
The model is released under version 3 of the GPL, and comes with no warranties whatsoever.
A copy of the license is included.
The lm32 package is available for download on github. As
well as all the source code, make files and MSVC 2010 file, the package contains all the
test assembly code, and scripts to run them.
Appendix A: Running and Debugging with the Model in an IDE
In the GDB Interface section, it was indicated that, having
connected the model to GDB, it could therefore be integrated with IDEs such as
Eclipse. In order to do this, some configuration is necessary. In this appendix
an example of configuring Eclipse is given using the simple_c example provided with the package. The
example is for the Linux environment (Ubuntu 16.04LTS) and for the Oxygen
version of Eclipse. The configuration details are similar under Windows and
various previous versions of Eclipse.
Compile model and code
The simple_c
example requires input and output, and thus a UART model, so this configuration
is going to use the lnxmico32
model which has the appropriate peripheral support, which cpumico32 does not. By
default, building the lnxmico32
model will generate the fast compilation version, which lacks the debug
features necessary to integrate to GDB and Eclipse. Therefore, the build must
remove the definition of LM32_FAST_COMPILE,
set in the file makefile.lnx.
From the model's root directory, building should be something like the
following:
make -f
makefile.lnx clean && make -f makefile.lnx
LOCALOPTS="-DLM32_MMU"
The -DLM32_MMU
is optional, but is defined in the makefile.lnx file. In MSVC, under the debug configuration
LM32_FAST_COMPILE is
not defined, and so the debug version should be used, rather than the release
version.
The simple_c
example must be built as well, to generate simple_c.elf, and this is simply a matter of going to the
code's directory location and performing a make:
cd
examples/simple_c
make
Note that, by default, the example is built with CRT_DISABLE_CPU_RUN_INDICATOR
defined, as required for debugging with the model. If a build exists without
this defined (targetted at a hardware platform), then a make clean should be
performed first, and the code rebuilt.
Eclipse
Having built the model and the example code to specification
for debug, Eclipse must be run and a project created and configured. It is
assumed you have Eclipse installed with the CDT (C Development Tooling)
extensions (https://eclipse.org/cdt/).
Create Project
Create a new project with File->New->Makefile Project with Existing Code. The
window that appears should have the name simple_c, and the Existing Code Location updated using
the Browse button, and navigating to the <mico32 root>/examples/simple_c directory. The
Toolchain for Indexer Settings should default to <none>, or should be set so if not already.
Configure Project
The project just created must now be configured. Here we are
going to set up some paths for the code, specifically for the ancilliary driver
code, and then specify the debugging criteria.
In the Project Explorer (from the C/C++ view) right click on
the simple_c project,
and select Properties. In the window that opens, select C/C++ General->Paths and Symbols. The
Includes tab will show three Languages (Assembly, GNU C, GNU C++), and we
should update all three to include the path to the <mico32 root>/drivers directory. This is
done (in all cases) using the Add.
button. A window appears for a place for the directory (use the File system. button , if
preferred). Check the Add to
all languages box, then all three shown languages are updated at once.
Next, the debug configurations are set. Choose Run->Debug Configurations..
From the window a new C/C++ Remote Application is created by right clicking the
appropriate entry in the lefthand pane, and selecting New. Assuming the simple_c project is still open and selected, the right
hand pane should show details of the new project, and have some fields already
filled in.
From the Main tab, the C/C++ Application field should be
updated by browsing to the location of the previously built simple_c.elf file in <mico32 root>/examples/simple_c/
directory. In addition, Disable
auto build should be selected.
By default, Eclipse will usually select Using GDB (DSF) Automatic Remote
Debugging Launcher. This would be for connecting to a remote server and
launching and configuring gdbserver
on the remote system. This is all taken care of by the model's GDB interface,
and so we need Manual mode. To change click Select other. and, in the resultant window, check the Use
configuration specific settings, and select Using GDB (DSF) Manual Remote Debugging Launcher.
Many of the fields in the configuration window's Main tab should now
disappear. When this is all configured, the window Main tab should look something line the
following:
Now select the Debugger
tab to configure the debugging program, and connection details. In the Main sub-tab, the GDB Debugger field should be
changed to lm32-elf-gdb.
This assumes that this tool is in the search PATH. If not, then the Browse button can be used to navigate directly to the executable.
The GDB command file
field should be blanked, unless you wish to specify some commands. Note,
however, that not all possible commands will work properly (such as connecting
to remote target and loading code), as after the command file is run, Eclipse
will send commands to gdb
to do connections to remote target, and the user commands can interfere with
this with out of order issues.
The Connection
sub-tab must then be selected. In this example, the Type is chosen as TCP, the Host
name or IP address field should be localhost, and the Port number is set to 49152 (0xc000
in hex—the first unreserved port number). You can, of course, choose a
different port number and even select a Serial connection—the model supports all of these.
It is up to you whether you wish to Stop on startup at: main. You can uncheck this,
or change to some other location, such as _start—the entry point in crt0.s. When configuration is compete, click Apply, and close the Debug
Configurations window. On completion, the window should look something like the
following:
Debug Code
Having chosen a manual remote debugger launch, we must fire
up the model ourselves and load the target code.
cd
<mico32 root>/examples/simple_c
../../lnxmico32 -tG 49152 -f simple_c.elf
A message should appear: LM32GDB: Using TCP port number: 49152, indicating the
model is waiting. The debugging session can now be started. Select Run->Debug Configurations.,
and chose simple_c Default
and click the Debug
button at the bottom right of the window. If all went well, the model should
have printed a message:
LM32GDB:
host attached.
In Eclipse, the main windows should show the code stopped in
the main() function, ready for debug. All the facilities for debugging sould
name be available for stepping code, setting breakpoints, and inspecting state
etc. When run once Debug
Configurations., it can be run again much more simply using the 'Bug'
icon in the tool bar. The pull-down menu from this has the last few runs
listed, and simple_c Default
can be slected directly from here.
When debugging is finished by pressing the stop toolbar
button, you can check the model exited cleanly as the following message should
have apperaed, and the executable exited.
LM32GDB:
host detached or received 'kill' from target: terminating.
For each new debug session, the model must be relaunched
manually before running the debugger again. This is all that is needed to
integrate the model with Eclipse, and debug a program, but Eclipse can be, configured
to launch the model automatically, before running the debug session, avoiding the
need to run the model from a command line prompt each time. This is optional,
but makes for more convenient debugging.
Launch group for running model and debugging
To automatically run the model before debugging in Eclipse
we must create a Run
configuration to execute the model, and then a Launch Group to combine the model execution with
the debugging session. The run configuration is similar to the debug
configuration set up as described previously. To create a Run configuration for
lnxmico32 select RUN->Run Configurations.,
and create a new C/C++
Application, naming it lnxmico32.
On the Main sub-tab,
for the C/C++ Application
field, Browse. to debug
build of lnxmico32.
Also, disable auto build. In the Arguments
sub-tab, set to:
-tG 49152
-f simple_c.elf
Even though we have made a Run configuration, an issue arises when it is executed
where it appears to still stop at start up. To avoid this, open up Debug Configurations, in which
lnxmico32 will also
appear. Select and navigate to Debugger
sub-tab, and uncheck Stop on
startup at:.
Now we have created a Run configuration for the model, this needs to be
combined with the simple_c
debug configuration, previously detailed, within a Launch Group. Select RUN->Run Configurations. and create a new Launch Group which we can
also name simple_c. In
the Launches sub-tab
click the Add. button. In
the pop-up window, select a Launch
mode of run, and
from the C/C++ Applications
select lnxmico32, and then
change the Post launch action
to be a Delay of 1
second. OK this
configurations, and then repeat an Add., but this time adding simple_c Default from the C/C++ Remote Application. This needs a Launch mode of Debug, and a Post launch action of None. The resultant window
should look something like the following:
This new Launch
Group can be run just like for the Debug configuration, but now it will run first the lnxmico32 configuration which
will start the model, and then after a second delay, run the debug session. Consoles
are created for the configurations, and one will appear for the model for
general I/O.
Further Reading
[1] LatticeMico32 Processor Reference Manual, Lattice Semiconductors,
June 2012
[2] Using as,
the GNU Assembler, version 2.19.51, Free Software Foundation, 2009
[3] Linux Port to
LatticeMico32 System Reference Guide, 2008, Lattice Semiconductors
[4] LatticeMico Timer,
version 3.1, 2012, Lattice Semiconductors
[5] LatticeMico UART,
version 3.8, 2012, Lattice Semiconductors
[6] Stallman et. al., Debugging with GDB, 10th
Edition, Free Software Foundation, 2012
[7] LatticeMico32 Memory Management Unit,
M-Labs, 2013,
https://github.com/m-labs/lm32/blob/master/doc/mmu.rst
retrieved 11th May 2017