Wyvern Semiconductors

Expert in Digital IP Solutions

Reference Manual
for the tcpIpPg 10GbE TCP/IPv4
XGMII packet generator

Simon Southwell

August 2021

(last updated: 29th August 2021)

abstract

A TCP/IPv4 verilog simulation element, tcpIpPg, is presented that can generate packet data over a 10GbE XGMII interface and receive packets over that same link. Formatted output of received packets is displayed for ease of tracking activity. The verification IP is based around the VProc virtual processor element and the majority of the functionality is in software running in this component. An example test bench is also included as a base test platform and to provide example of the tcpIpPg usage, targeted at ModelSim. This test bench includes some basic connection and disconnection software. The IP us freely available for download on github.

Introduction
- Features
Architecture
- The Verilog Environment
- The Software Stack
  - VProc Layers
  - Packet Generator Class
Test Bench
- Structure
- Compiling and Running the Test Bench
Usage
References
Download

Introduction

The tcp_ip_pg project is a set of verification IP for generating and receiving 10GbE TCP/IPv4 Ethernet packets over an XGMII interface in a Verilog test environment. The generation environment is a set of C++ classes, to generate packets in to a buffer and then send that buffer over the Verilog XGMII interface. The connection between the Verilog and the C++ domain is done using the Virtual Processor, VProc [1]-a piece of VIP that allows C and C++ code, compiled for the local machine, to run and access the Verilog simulation environment, and VProc is freely available on github.

The intent for this packet generator is to allow ease of test vector generation when verifying 10G Ethernet logic IP, such as a MAC, and/or a server or client for TCP and IPv4 protocols. The bulk of the functionality is defined in the provided C++ classes, making it easily extensible to allow support for other protocols such as UDP and IPv6. It is also meant to allow exploration of how these protocols function, as an educational vehicle.

An example test environment is provided, for ModelSim, with two packet generators instantiated, connected to one another-one acting as a client and one acting as a server. Connection establishment and disconnection software is provided in the test code to illustrate how packet generation is done, and how to easily build up mode complex and useful patterns of packets. Formatted output of received packets can be displayed during the simulation.

Features

The basic functionality provided is as listed below

A Verilog module tcp_ip_pg
- Clock input, nominally 156.25MHz (10×10⁹ ÷ 64)
- XGMII interface, with TX and RX data and control ports
- A halt output for use in test bench control
A class to generate a TCP/IPv4 packet into a buffer
A class to send a generated packet over the XGMII interface
A means to receive TCP/IPv4 packets over the XGMII interface and buffer them
A means to display, in a formatted manner, received packets
Connection state machine not part of the packet generation class, but examples provided as part of the test environment (not complete).
A means to request a halt of the simulation (when no more test data to send)
A means to read a clock tick counter from the software<

Architecture

The tcp_ip_pg verification IP is split across two domains; namely the Verilog HDL domain and the C/C++ domain. The interface between these two domains is via the Virtual Processor (VProc) using the PLI interface of Verilog-which may be configured for either 1.0 (tf interface) or 2.0 (VPI). The details of VProc will not be discussed at length in this document, as the provided tcp_ip_pg logic and software hides most of this away, and more details can be found in the VProc manual [1]. Suffice it to say that in the Verilog domain a generic processing element is instantiated, allowing the reading and writing of data over a memory-mapped bus, and in the C/C++ domain, and API is provided to read and write over this bus.

The diagram below shows the complete stack for the tcp_ip_pg component, with the VProc logic wrapped in a Verilog module, and then the VProc interface software providing access to a specific TCP driver, which provides an API to the packet generation software. The user test code is written on top of this, using the packet generation class to build and then send data. It also provides a means for receiving packets and presenting them to the test routines. A VProc entry point, VUserMainX, replaces main() (much like WinMain or DllMain for Windows GUI and dynamic linked library software), with X being the VProc node number.

The Verilog Environment

The Verilog module is defined in tcp_ip_pg.v in the verilog/ folder. It has clock input (clk) which nominally needs to be 156.25MHz to meet the 10GbE standards. It also has a halt output, which can be controlled by test software to indicate a request to end the simulation when all tests have completed.

The only other ports are the transmitter and receiver data and control. These are XGMII signals, with 64 bit data busses and 8 bit control busses for each direction (txd/txc and rxd/rxc). The busses are organised with the control and data bits on separate ports for flexibility, though some interfaces may have a single 72 bit port, with 8 bits of data then the corresponding control bit and so on for all byte lanes. Since this is just wiring, this is left to be done externally.

The tcp_ip_pg model has a single parameter, NODE, specifying a node number which is passed on to the VProc component. This defaults to 0, but if multiple tcp_ip_pg components are instantiated, or any other components with a VProc module, each must have a unique node number. The node number determines the VUserMainX entry point, and software accessing the virtual processors use this number to address the correct processor.

This is all that is required in the Verilog domain. The XGMII interface can now be used to connect to, say, a 10GbE MAC with an XGMII interface, or perhaps a XAUI PHY.

The Software Stack

The packet generation software can be found in the src/ folder of the repository.

VProc layers

The software layer starts with the API provided by VProc, which has a set of software attached to the PLI (VSched) and running in the same thread as the simulator. This communicates with a code running in a 'user' thread (VUser) that provides an API to the application. Fortunately, the details of this are hidden from the higher layers via and interface class called tcpVProc. This gets inherited by the packet generator class to provide a low level TCP/IP oriented API. The following lists the prototypes for the public methods:

tcpVProc(int node) : the constructor with the VProc node number specified
uint32_t TcpVpSendIdle(uint32_t ticks) : A method to 'pause' for a given number of cycles
uint32_t TcpVpSendRawEthFrame(uint32_t* frame, uint32_t len) : A method to send a constructed packet out over the transmission interface.
void TcpVpSetHalt(uint32_t val) : A method to set or clear the halt port.

As mentioned above, this will be inherited by the packet generator class described later. This effectively means this API is available in that class directly. The separation of the access to the VProc component and the generation of the TCP/IP packets is done to allow ease of reuse in other environments. The higher layer software could interface to a different simulation or modelling environment by simply replacing this access layer, which is agnostic to the details of the packet data protocol. Or, conversely, this access layer can be inherited by another protocol packet generator to allow that to be used with the tcp_ip_pg Verilog component.

The TcpVpSendIdle() method will yield control back to the simulator for the specified number of ticks. It is crucial that, when not sending packets out over the XGMII interface that this method is used in the intervening time, for two reasons. Firstly, all VProc programs execute whilst the simulation has halted, waiting for new access calls from the program. When sending a packet, this is fine as time advances whilst the data is sent over the interface. But if n packet is being transmitted, time won't advance automatically, and the TcpVpSendIdle() method allows time to advance in these periods. Think of it, from a simulation point of view, as the software runs infinitely fast and simulation time stands still. So a while(1); loop would hang the simulation. The second reason is that, when time does advance, the code is monitoring for received packet data. Again, when sending a frame, the receive inputs are monitored as data is transmitted. When no data is transmitted, the idle methods allows the receive ports to still be monitored for arrive packets.

The TcpVpSendRawFrame() method is the means to send the pre-constructed TCP/IP packet out over the XGMII interface. It takes a pointer to the frame buffer as an argument, and a length for the entire packet. It is expected that the frame buffer has already had the packet data placed in it before calling the method (see below).

Finally the TcpVpSetHalt() method sets or clears the value of the halt output pin, based on the value of the arguments 0^th bit. Note that one would not normally need to clear the halt bit, as setting it was a request to finish the simulation.

Packet Generator Class

The packet generator class, tcpIpPg, is where the TCP/IPv4 packets are generated. Without the tcpVProc access class, it just generated packets, optionally with payloads, into a local buffer. Therefore it could be used as a standalone class in, say, a C++ model or in a SystemC environment, or whatever.

It is responsible for not only constructing the packets, but also to provide a means, where desired, for receiving packets and presenting them to the higher layer test software for processing. This achieved by allowing the higher layer test code to register a callback function to be called whenever a packet is (correctly) received, passing in the packet information and payload (if present).

The public API presented by the tcpIpPg class (in addition to the inherited tcpVProc class methods) is as follows:

tcpIpPg (uint32_t node, uint32_t ipv4Addr, uint64_t macAddr, uint32_t tcpPort) : the constructor specifying a unique node, an IPv4 address for the node, a MAC address for the node and a connection TCP port number to use.
uint32_t genTcpIpPkt (tcpConfig_t &cfg, uint32_t* frm_buf, uint32_t* payload, uint32_t payload_len) : A method to construct a TCP/IPv4 packet suitable for transmission on the tcp_ip_pg XGMII interface.
void registerUsrRxCbFunc (pUsrRxCbFunc_t pFunc, void* hdl) : a method for registering a callback function on reception of a validated packet.

Construction

The constructor generates a packet generator object and requires some parameters. The node number must be unique and match the node number specified for the tcp_ip_pg Verilog component parameter. An IPv4 address is specified for the packet generator along with a MAC address. These must be different from any other components' addresses in the test environment. Finally a TCP port number is specified. Currently only one TCP connection is supported. It may be possible to instantiate multiple tcpIpPg objects, each with different port numbers (or even different IP and MAC addresses), addressing to the same tcp_ip_pg node, but this has not been tested.

Packet Generation

To generate a TCP/IPv4 packet genTcpIpPkt is called. It takes a configuration input, defined as a structure in the tcpIpPg class and a pointer to a buffer to place the frame data. This buffer must be large enough to contain the data. Since the maximum transmit unit is 1500 bytes, and overhead data 18 bytes, then 2048 is sufficient. However, to accommodate control bits, the data type of the required buffer is uint32_t, with one 'byte' occupying one 32 bit value. So the buffer must be > 1518 uint32_t in size.

If a data payload is required then a pointer to a payload buffer is passed in, with a specified payload length. This, too, must be an array of uint32_t, with a byte occupying a single entry. If no payload is required, then the payload pointer may be NULL, but the payload_len value must be 0 in this case.

The configuration structure, tcpConfig_t, passed in to genTcpIpPkt is defined within the class as follows:

typedef class {

public:

uint32_t dst_port;

uint32_t seq_num;

uint32_t ack_num;

bool ack;

bool rst_conn;

bool sync_seq;

bool finish;

uint32_t win_size;

uint32_t ip_dst_addr;

uint64_t mac_dst_addr;
} tcpConfig_t;

The configuration specifies all that's required to send out a packet. Firstly the port number to be used (or being used) is specified. This would normally be the same as the port number specified at construction, but need not be (if, say, testing for rejection of a non-open port).

The sequence number and ACK number are the values to be transmitted. These are not managed by the tcpIpPg class, and it is expected that this is done as part of the test code-again, potentially to send 'invalid' values to test certain cases. The provided test bench example does manage this values to a certain degree as an example (see later).

The four Boolean values are the flags that can be set in the packet. Currently these concentrate on connection and disconnection, with other flags not yet supported. The class allows invalid combinations (to test), but correct usage in the test code allows connection, transmission and disconnection protocols to be followed.

The windows size parameter is used to advertise space available and thus flow control the reception of data. As the model, if data received correctly via the callback function, will not run out of space, this may be fixed at some suitable value. If flow control testing is required, though, then this may be changed according to needs.

The packet generation needs foreknowledge of the destination IP and MAC addresses to communicate with a remote piece of IP, and this are provided in the last two fields. Once more, this may be set to any value if testing of invalid/missing addresses is required.

Packet Reception

Packet reception, if required, is provided to the user code via the means of a callback function. This registered function must be of type pUsrRxCbFunc_t, that is, have a prototype of:

void cbFunc(rxInfo_t rx_info, void *hdl)

To register this function, registerUsrRxCbFunc() is used, passing in the function pointer and a void pointer 'handle'. This handle will be passed in to the callback function when called, and can be a pointer to anything, or even NULL. The intended use is to allow the received packet to be processed before returning from the callback function. So it might be a pointer to a queue to push the packet on to, or a pointer to a function to process the data. In the example test bench (see below), it is a pointer to the test class (the 'this' pointer) in which the callback function is defined as a static method. This gives it access to the other public (non-static) methods and members of the class to which it belongs to help process the received packet. However it is defined, the received packet must be processed and/or saved before returning from the callback function, as the tcpIpPg buffers will be reused and the data lost. In addition, the callback cannot make calls to access the tcpIpPg methods that advance simulation time, as this becomes recursive. Instead, it is expected that the callback does the minimum to process the packet into a buffer or queue, which the main test program can then inspect and process.

The rx_info argument passed in to the callback contains the received packet information. The rxInfo_t is defined as:

typedef struct {

uint64_t mac_src_addr;

uint32_t ipv4_src_addr;

uint32_t tcp_src_port;

uint32_t tcp_seq_num;

uint32_t tcp_ack_num;

uint32_t tcp_flags;

uint32_t tcp_win_size;

uint8_t rx_payload[ETH_MTU];

uint32_t rx_len;
} rxInfo_t;

This contains similar information as for the transmitted packet configuration structure, but for the packets sent from the remote device. Note that the tcpIpPg receiver code will reject packets that do not match the MAC and IP addresses, and the open TCP port number configured at construction, and will print a warning that such a packet was received, but it will not call the callback function, and discards the packet.

If a payload is present, the rx_len field indicates its size, and the rx_payload buffer has the data. Unlike transmission, as this is only user payload data, with no control, this buffer is a byte buffer, with one byte per array entry.

Test Bench

A test bench is provided in the repository to validate the model and to provide as an example of the usage of the packet generator-both logic and software.

Structure

The test bench consists of the instantiation of two tcp_ip_pg components, one acting as a client (configured as node 0) and one acting as a server (configured as node 1). The two components then have their XGMII interface connected together (tx to rx and vice versa), and a 156.25MHz clock routed to them. The halt outputs are routed to some test bench control logic to stop the simulation when a test is completed. The diagram below shows the test bench arrangement.

The test bench can be found in the test/ folder of the repository, which is where the test bench verilog can be found (tb.v), along with the scripts and a makefile to allow compilation. All the test software can be found in the src/ folder, which is a sub-folder of the test/ folder.

With reference to the Architecture section, the test bench provides user test code via two separate classes for the client (tcpTest0) and the server (tcpTest1).

The Test Base Class

The code between the two test classes differ in some respects, but much is common, and a tcpTestBase class is defined for this common code. This is a virtual base class with all the functionality defined, except a virtual method, runTest(), which must be provided by the two derived classes, where the specific functionality is implemented. The class inherits a utility class, tcpPrintPkt, which adds formatting capability for received packets. This functionality was separated out so that it might be easily reused in another context.

The constructor for the base class just requires a node number, which is passed on to the tcpIpPg class, and also used in the formatted output to identify the node.

A couple of utility methods are defined to aid simulation control. The haltSim() method is called when all testing has completed and the node wishes to end the simulation. It is effectively a request to halt to the underling test logic, via the tcpIpPg class. The simulation may not finish straight away, but control must be given back to the simulation when idle, and so a sleepForever() method is provided for calling as the last thing in a test, which will continue to tick the simulation indefinitely until the simulation is complete.

The base method provides a common receive callback method to be registered with the tcpIpPg class. In this method, a call is made to a method to display a received packet in a formatted manner to the simulation output console. The packet is then added to an internal vector queue, making it available to the runTest() code provided by the derived test class. T is expected that the test code will monitor, periodically, the status of the queue and process packets as and when it is able, popping each packet from the front of the queue when it's not empty. This allows as many packets to be received as required, subject only to the flow control set by the test program.

Connection and Disconnection Utility Class

As part of the test software, a class is defined, tcpConnect, to implement connection and disconnection states. This follows the TCP protocols, but is example code only and not considered complete. The tcpTestBase class has a tcpConnect object defined as a member variable for access to this functionality in the tests. The connection class has two pairs of methods; a pair for connection and a pair for disconnection.

To go through a connection cycle, the client uses the initiateConnect() method, whilst the server uses the listenConnect() method. For both methods a node number is required, along with a reference to a tcpIpPg object and to a vector of rxInfo_t types. This allows the class to create, send and receive packets during the connection process. Also both have initial window size and initial sequence number parameters, with the later defaulting to 0. The initiateConnect method also requires a destination TCP port number and destination addresses for the IPv4 and MAC.

For disconnection, things are similar to connection. The two methods provided by the class for disconnection are initiateTermination and waitForTermination. Like the other methods a node number, tcpIpPg object reference and a receive queue reference are required. Also, both require a window size and sequence number. The initiateTermination method also needs the TCP port number and the IP and MAC addresses of the destination. The waitForTermination method also takes a TCP port number as an 'open port' to specify the port number for the open connection, and will reject disconnection for a non-matching port number. The initiateTermination method must also be passed the ACK value needed at the point the disconnection is started, based on the traffic so far transmitted and received. Finally, the waitForTermination method has a couple of Boolean flags. The first of these, finAlreadyReceived, flags whether, on entering the method a received packet already had a FIN flag set. This defaults to false, meaning the method will first wait for a packet with the FIN flag set, before continuing the disconnection protocol. If it is set to true, then the method assumes a FIN packet has been received (but not acknowledged), and will skip waiting for a packet with this flag. This allows a test to be written where either the order of the packets where a FIN packet's position is known in advance, or a test where a FIN packet can arrive at an arbitrary point.

Whilst waiting for a FIN packet, the waitForTermination method can still process received packets if the processPkts argument is set to true (default is false). In this case the code will process received packets and acknowledge them. The test bench can generate packets with data payloads, but these are (for this case) null terminated strings messages which can be printed to the simulation screen. Whilst wait for a FIN packet, any payload of a received packet will have the message printed to the screen (as well as the normal formatted header output).

The Test Code

The two test classes simply inherit the base class and provide the runTest() method to perform a test. As mentioned before, the node 0 code (tcpTest0) acts as a client, whilst the node 1 code (tcpTest1) acts as a server. The main difference being that the client will instigate a connection and close it down, whilst the server will only respond.

In the particular example test provided, the client runTest() code:

Creates a tcpIpPg object, providing the client IP and MAC addresses and a TCP port
It registers a receive callback function, using the method provide by the testBaseClass, and uses its 'this' pointer as the handle.
It then initiates a connection using the tcpConnect method.
1. The method sends a SYN packet
2. waits for a SYN+ACK return
3. ACKs this with a message payload
It then sends a normal packet with a payload
1. The payload has a string message
2. It then waits for an ACK
A check is made that the ACK has all packets acknowledged and then initiates a termination for the connection, using the tcpConnect object.

The server test follows a similar, but more passive flow:

Creates a tcpIpPg object, providing the server IP and MAC addresses and a TCP port
It registers a receive callback function, using the method provide by the testBaseClass, and uses its 'this' pointer as the handle
It then calls the listenForConnect method of the tcpConnect object.
1. It waits for a first packet and checks it's a SYN packet
2. It sends a SYN+ACK packet in return
3. It then waits for an ACK packet back, and processes any payload-i.e. it prints the string message.
It then calls waitForTermination using the tcpConnect object, with the processPkts argument set to true to allow any packets sent by the client to be processed before it sends a FIN packet.

At this point, this is the limit of what this code does. It is hoped that there is enough functionality implemented to demonstrate how to use the tcpIpPg software and elaborate more complex patterns of packets and more complex functionality.

Compiling and Running the Test Bench

The test bench provides various scripts and a make file to compile and run the code, with GUI or batch simulation runs. It makes assumptions about the tools installed and the prerequisites a defined in the next section. It is expected that these can be adapted for other simulation environments. The VProc [1] repository has many more scripts for other simulator support to act as examples.

Prerequisites

The provided makefile allows for compilation and execution of the example simulation test. As mentioned above it is only for ModelSim at this time, and has some prerequisites before being able to be run. The following tools are expected to be installed

ModelSim Intel FPGA Starter Edition. Version used is 10.5b, bundled with Quartus Prime 18.1 Lite Edition. The MODEL_TECH environment variable must be set to point to the simulator binaries directory
MinGW and Msys 1.0: For the tool chain and for the utility programs (e.g. make). The PATH environment variable is expected to include paths to these two tools' executables
VProc: checked out to vproc/ in the same directory as the tcp_ip_pg repository folder. See [1] in References section for link to github.

The versions mentioned above are those that have been used to verify the environment, but different versions will likely work just as well. If MODEL_TECH environment variable is set correctly, and MinGW and Msys bin folders added to the PATH, then the makefile should just use whichever versions are installed. Note that the test code must be compiled for 32 bit, which the simulation test makefile does. This is required for ModelSim PLI code and may be different for other simulators. So, if compiling PLI code independently of the VProc makefile then remember to add the -m32 flag.

Running the Test

The entire body of code needs to be compiled to shared object (DLL, in Windows parlance), so that it may be loaded into the simulator. The VProc code, packet generator code and user code (VUserMainX, and associated test source) are catered for using a makefile in test/. The test makefile can both compile and run C/C++ source and also run the simulations. Typing make help displays the following usage message:

Giving no arguments compiles the C/C++ code. The compile argument compiles the Verilog. The various execution modes are for batch, GUI and batch with logging. The batch run with logging assumes there is a wave.do file (generated from the waveform viewer) that it processes into a batch.do with signal logging commands. When run, a vsim.wlf file is produced by the simulator, just as it would when running in GUI mode, with data for the logged signals. The make wave command can then be used to display the logged data without running a new simulation. Note that the batch.do generation is fairly simple at this point, and if the wave.do is constructed with complex mappings the translation may not work. The make clean command deletes all the artefacts from previous builds to ensure the next compilation starts from a fresh state.

Output

When run, the test produces printed out of the received packets at each node, along with any string payload messages. The following diagram is a fragment of the output produced during simulation.

From a simulation point of view, the XGMII interface has the valid signals corresponding to the packets displayed on the console output, as shown below.

Usage

The test bench provided is just an example to illustrate the usage of the packet generator VIP. A more realistic scenario is where the VIP is used to test various other 10GbE IP. In this section will be briefly discussed an intended use case. The diagram below shows a test environment using the tcp_ip_pg component, connected to some 10GbE IP, along with a virtual processor acting as a processing element for configuration and control, and a memory model based on the wyvernSemi memory model VIP[2].

In this use case the IP under test is a 10GbE data path chain with (optionally) a XAUI PHY block, a MAC, a TCP client or server and a memory interface to read and write payload data (possibly with an interrupt to indicate packet reception). The memory model serves as the buffer space for the TX and RX payload and has the advantage that the programs running on the virtual processors (including that on the tcp_ip_pg block) can have direct software access to the memory via an API (see [2]) if required.

If the path being tested does have a XAUI PHY, then an external XAUI must be instantiated between the tcp_ip_pg XGMII interface and the IP under test. Possibly this can be another instantiation of the XAUI IP being tested, or some other model. If not, the XGMII interface might connect directly to the MAC.

In this scenario, the VProc component acts as the system processor, and is intended to control the test flow. As the test program running on the tcp_ip_pg is also a VProc program, and because it can have direct access to the memory, communication can be established between the processor program and the packet generator program via a shared memory interface. Note that the programs on the two virtual processors are running in separate threads but, due to the nature of the VProc design, only one thread can be active at any given time. Therefore, exchanging data through shared memory requires no additional control to make thread safe. It is envisaged that the processor software and the tcp_ip_pg software would co-ordinate to keep their programs in synchronisation, exchanging status information, and also that, if required, the tcp_ip_pg software can request updates to system registers by the processor as it has no access itself.

A likely scenario is that the processor will be running on a different clock to that of the 10GbE logic. The test environment may have other pieces of IP that need to be controlled by the processor software running on a system clock. In this case a bridge would be required for register access across the two domains, but this is likely to be a piece of IP already available or being developed and tested as part of the system implementation. Different clock domains does not affect the direct memory access API as this is independent of the simulation environment.

References

[1] Virtual Processor (VProc), https://github.com/wyvernSemi/vproc , Simon Southwell, June 2010

[2] Reference Manual for the Verilog Memory Model Simulation Component, https://github.com/wyvernSemi/mem_model , Simon Southwell, August 2021

Download

The source code is available for download on github.

Contents