|
- Articles
-
- Article 1: JPEG Concepts and Format
- Article 2: JPEG Decoder Implementation
- Article 3: JPEG IP Development Process
JPEG IP Development Process (incomplete)
by Simon Southwell
11th March 2014
abstract |
This article is the third paper discussing JPEG. Covered in this article are the development processes
that were used to implement the decoders, and to verify their operation, along with the tools and methds used.
It serves as a cases study of an
IP development using the minimum of outlay regarding CAD tools and test hardware, whilst covering
the major steps used in development of digital IP and its quality assurance. The verification and
synthesis environments are described in detail, along with the performance and resource results obtained.
Finally a critique of alternative designs is made for trading off performance versus resources.
|
Introduction
The development flow adopted for the JPEG decoder IP is based on developing a
C model decoder, and using this as a reference to verify the HDL implementation,
before targetting the IP at an FPGA development board, and verifying operation
at full speed. For the most part, the tools used, for both software and hardware
development are available at no cost, but are editions of the full commercial
products, such that all the development described is relevant to commercial tool
flows. The major cost incurred in implementing the processes described in this
article was for the FPGA development board, at a list price of around $150.
Some additional costs for cabling were also incurred, but the whole development
was less than $200 (access to a PC is assumed). Therefore, in this article it
is hoped to show a complete digital IP development flow from start to finish,
covering the major processes of such a development (though not all—there
are many additional requirements if targetting the IP at an ASIC), which can be
repeated or adapted by others for a minimum of financial outlay, whilst producing
IP to a high level of quality assurance. An overview of the development flow is
shown below:
Figure 13: JFIF Development Flow Overview
|
The diagram shows a simplified designed flow, but attempts to give a general appraoch
used for most IP developments. Alongside the major steps are some graphics to indicate what
tools might be used at each state (see Third Party CAD tools section)
The heart of the process is the specification. This is vitally important,
as everything else hang off of this document. It is, of course, possible, to have a specification 'in mind'
when developing IP, but if a project is being developed by multiple designers (possibly at multiple sites),
and is complex, with multiple complex sub-units, it would be all but impossible to develop a design
without this common reference. Depending on the project complexity, the specification may just be
list of requirements or desired features, or it may specify the top level architecture and detail
the interfacing between the sub-blocks and other more detailed requirements. From this, all other documentation
follows, such as a datasheet, listing only those external feeature relevant to a user. Internal design
documentation also needs to be consistent with the specification, documenting how how
the implementation achives the requirements. A key document that flows from the specification is the
test plan. At a fundamental level, this is a checklist of all the functional requirements from the
specification, that can be cross-referenced with individual tests, or sets of tests. This is key to
ensure that all functionality is covered. The document may also include more details on the means
of testing, environment details, required tools and intended usage information.
Once the specification is settled, then implementation begins. In reality, and not shown in the
simplified diagram changes to specification may occur during the implementation phase. This could be
because requirements change from the end customer, or through unanticipated limitations to
implementations, discovered only after having started, or even due to schedule constraints—only
a sub-set of functionality can be achieved within the given timescales, and is more acceptable
than failing to deliver on-time. A multitude of reasons may occur. For our purposes, let's assume
the specification is finalised.
For JFIF, a reference C model is implemented, and is the heart of verifying the HDL implementation.
The code is compiled with the GNU C/C++ tool chain, and also has an MSVC solution defined. Both
gdb and the MSVC IDE were used in debugging. Verification consists of a set of directed tests,
with files containing all pertinent cases, and also of passing through a very large number of
random JPEG files. Many of the directed test files were picked from files that originally failed
from the random list, in order to verify that the fix placed i nthe code did not regress with subsequent
changes and updates to the model. Other files were generated via the GIMP application, where many
options are available when saving to JPEG format that control desirable features of the test image;
e.g. Quantisation tables, DRI frequency, standard or optimsed Huffman tables, colour or greyscale,
sunsampling options etc. Verifying the contents of the test files, before the model was fully working
was greatly aided by the use of JPEGsnoop, which allowed access to all information of the encoded
data, for comparison with the model. These are specific to the JFIF project, but the general
philosophy for this point of the process, is the need to have means to generate test data and
a way of verifying the validity of the output, perhaps by a know good reference, or some other
constructed means to generated expected values. This maps to the HDL verification as well. In JFIF's
case, the reference model then becomes the expected value generator, but other means might be
more relevant to other projects. For instance, third part vendors can provide Verification Intellectual
Property (VIP), particularly for standardised functionality. An example is a USB bus-functional model
and monitor (from Synopsys and others), where input stimulus can be generated by the BFM and
output reponses checked by the monitor. Obviously implementation and verification of the
model is an iterative process. It is extremely unlikely that one can code the entire model and
generate all the test vectors, and have everything work by running the vectors through the model
just one. In practice, one implements some of the basic functionality, ans the means to exrecise
it, and then tests and ebugs this before adding more advanced functionality. This
process iterates until all testing requirements are met. This will also be true of the HDL
verification. Indeed it is not unreasonable to expect that an issue in the model will not be highlighted
during testing of the HDL, and require going back to the model implementation and debug stage.
This highlights another philosophy in the verification strategy, in that by having two
separate implmentations of the functionality (in this case the model and the HDL) it becomes
very unlikely that they will have the exact same defect in both models—but the reference
could still be the one that does not conform to the specification. In ideal development
circumstances, the reference and the HDL would be constructed by different parties. There is
still a danger that wrong interpretations or assumptions can be model by the same group or
individual if both implementations come from the same source—testing for self-consistency is much
easier than testing for compliance (but not very useful).
Synthesis for JFIF consists of wrapping the core in code to interface to the FPGA pins, along
with ancilliary module instansiation, such as PLLs for clock generation. A set of design
constraints is required to tell the synthesiser what the clock frequencies are and what the
timing requirements on the inputs and output pins are. For JFIF this is all fairly simple,
with a single clock domain, and the I/Os treated as asynchronous. In more complex, multi-clock
designs, more elaborate constraints are needed to defined the design, inculding false-path definitions
between unsynchronised sections of the design, that cross the boundaies between them. When
running the Quartus II tool, the flow automatically moves on from synthesis (called mapping in Quartus)
to place-and-route (called fitting). Timing is checked in another step—static timing analysis (STA)
to report whether the mapping and fitting produced a solution that can run at the frequencies
specified in the constraints, and meet the I/O timing specs. If timing is not met, then
the implementation will need to be changed to give the same functionality as before, but to
reduce timings on 'critical paths'. It is well beyond the scope of this article to look at
strategies for doing this, and the problem is not a trivial one. It involves both awareness of
the likely limitations of the target for the given frequencies (i.e. how much can be done in one
clock cycle) whan initially implementing (this comes with experience) and techniques for
refactoring a design (e.g. cycle stealing). The final step of the Quartus II flow is to generate
(assemble) a configuration file for the FPGA. This is (very loosely) equivalent to 'tape out' or
'pattern generation' in an ASIC flow; i.e. generation of data suitable for the foundary that will
manufacture the ASIC. A lot, lot more is done before tape-out that is not covered in the JFIF
FPGA flow. Production test vectors need to be generated, gate level simulation would almost certainly
have been done, and formal verification between the generated (or possible modified) netlist
against the original RTL—and much, much more.
For JFIF, the final step is to make the IP available. This utilised the Inno Setup tool which
allows a set of files to be bundled in a single executable package. This isn't really part of
a standard flow, but was done for JFIF, and various other mean can be employed if IP is to be
made available. Of course, if making an ASIC, it may be that the source is not be be published
at all.
Specification and Features
Included Features
Below is listed a sketch specification for the JFIF decoder. The overall objective for the specification
is to support decoding of all baseline JFIF files, and some baseline variants of JPEG.
- Images up to 64K x 64K pixels
- SOF0 (baseline) decode
- Greyscale and Colour (YCbCR)
- Sub-sampling decode (4:4:4, 4:2:2, 4:2:0)
- Single and Multiple ECS (DRI of abitrary interval)
- Out-of-order header section tolerance
- 32 bit FIFO style input interface
- 8 Byte Y[CbCr] FIFO style output interface
- Header and scan data error detection
- Single clock synchronous design
- Asynchronous reset
- >40MHz clock for Cyclone II EP2C20-C7 FPGA target
- <10K LEs for Cyclone II EP2C20-C7 FPGA target
- Pipelined iDCT architecture allowing multiple channel processing and 1 clock cycle average
x8 column or row processing
The specifications is set to not limit the size of JFIF/JPEG files that can be decoded, which dictates
that the design cannot hold an entire svan within its internal buffering. This implies that the design
must be pipelined, and be able to be stalled. Some internal buffering cab employed
at the granularity of MCUs (8x8 matrices). The SOF0 support is for baseline encoded images, which
constitute the vast majority of the images encountered—certainly those sample from the 15000
or so used in verifying the reference model. Both greyscale and colour images are supported. Some files
have been seen to be RGB rather than YCC encoded, and the reference model supports these. This is not
yet specified for support in the JFIF decoder. All normal sub-sampled files are to be supported. An exception
is the veritcally sub-sampled 4:0:0 encoding— the code can support this, but this is not yet
specified for verification, and is an optional feature. Multiple ECS scan support is required.
The DRI reset interval segment must be supported, but the actual resets of the DC values is to
be a function of the RSTx codewords only. Thus, if a mis-match between the DRI interval and the actual
encountering of a RSTx marker do not tally, this should not cause a failure, or an error code. Also
the ordering of RSTx markers is not to be checked, and out-of-order sequences shall not cause an error.
Additional robustness is to be implemented in allowing any order of header segments between
SOI and SOS, regardless of JPEG or JFIF specifications. The SOS must be followed by the scan data,
and terminated with and EOI marker. There is no specification for automatic recovery from missing
EOI, and external logic will be responsible for recovery, and the resetting of the core. The core can
skip over and ignore any APPx and COM sections. (Optionally APPx segments can have internal data
decoded, if desired.) All unrecognised segments shall flag an error, as shall all unsupported
segment types—each with a separate error code.
The input interface shall consist of a 32 bit (4 byte) interface, with a unary byte valid input,
and a returned, single bit acknowledge. Data is transferred only when both a valid bit is set
and the acknowledge bit is set. An individual image decode (from SOI to EOI) must
be presented to the core 4 bytes at a time, until the last word transfer, where less than 4
(but contiguous) bytes may be transfered. The output interface is an 8 byte interface, with a single
bit valid output, with a single bit acknowledge returned. Data is transfered only when the
valid and the acknowlege signals ar active. The data transferred is always 8 bytes. A set
of consecutive 8 byte transfers constitute an MCU. For colour (YCC) images, the data is one or
more luminance (Y) MCUs (depending on the sub sampling) followed by the Cb and the Cr MCUs in that order.
Each 8 byte output within an MCI is a column, starting from column 0 and incrementing. The byte order
within a column is byte 0 (bits 70 downto 0) containing row 0 byte, through to byte 7 (bits 63 downto
56) containing row 7 byte. Header information must also be available at the core top level for use
in subsequent colour decoding of the output data, and positioning in an image buffer. These shall consist
of a 144 bit SOF output interface and a 72 bit SOS interface. The bit ordering of these two vectors
is not defined in the specification (implemented bits are defined in rtl/verilog/jfif_extract_hdr.vh).
Any other internal state suitable for external visibility, such as FSM state, may be added to the core's ports.
Which state is brought to the core's edge is not specified here.
Internal error detection shall be flagged on an external port. The catagories of error that shall be
reported are:
- Invalid marker : An unrecognised segment marker was encountered
- Unsupported marker : A recognised, but unsupported segment marker was encountered
- Unexpected code : A recognised marker was encountered, but when not expected (e.g. an EOI before an SOS)
- Scan error : An error was encountered in decoding scan data; i.e. a failure in decoding input in the active Huffman table.
- Internal error : Any other error encountered shall reported under a single blanket code
When any of the above errors are encounted a non-zero error code shall be reported on an external port
(zero shall be reserved for 'no error'). When an error is encountered the core shall halt in
an error state until a 'clear error' input is set, and then cleared, when the core shall resume searching
for a new image—i.e. gobbling input data until a new SOI marker is encountered.
The core will have fully synchronous operation, in a single clock domain. The core shall be reset with a single,
active low, asynchronous reset input. The clock rate shall be a minimum that allows 1080p HD colour frames,
with no sub-sampling (3 bytes per pixel), at a rate of 25 frames per second:
1920×1080×3×25/(4 bytes per clock) = 38.88MHz.
Unsupported Features
- Progressive scans
- Lossless scans
- Extended scans
- Differential scans
- Arithmetic scans
- RGB colour space (future feature—supported in reference model)
- No detection of missing EOI (future feature—add end-of-data input)
Third Party CAD tools
Figure 13 has, annotated to the flow diagram, logos representing examples tools used for the various
stages. For example, at the top is shown the logos for
windows with Cygwin, or Ubuntu as an alternative (both used in JFIF), as the major operating system
for the deveopment. An editor is also going to be needed,
such as Emacs (or in JFIF's case Xvile)—any editor would do. In JFIF, the documentation (such as this page)
is HTML with CSS3, and the diagrams were created with Libre Office. The reference model was coded in C++ (and some C)
using both GNU gcc (and g++) Micorsoft Visual C++, and the GTK+ library was used. Creating test JPEG files
was via various sources, but GIMP was used for many of the directed test files. The JPEGsnoop utility was
an enormous aid in debugging code and checking test files. Once the reference model is verified, development
of the HDL takes place, along with the verificatoin environment. The test environment for JFIF involves
using bash shell scripts, through other languages would probably be used for more sophisticated test platforms,
such as Python or Perl. Simulation of JFIF was done exclusively on Modelsim, though NC-SIM or VCS would
also be suitable. As JFIF was targeted at a Cyclone II device, Altera's Quartus II is used for
synthesis, with automatic place and route. For an ASIC flow, the synthesis might be using Synopsys' Design
Compiler, targetting an specific vendor library ofr a given geometry (e.g. 90nm). The place and routing is
is a separate step, and way more complicated than for an FPGA, and involves additional features in the design
to allow for the production testing of the circuits. Once all the final design is synthesised, the Cyclone II
FPGA on the development board can be configured, and full speed testing can begin. The release of the source
code, for JFIF, was via the Inno software.
What follows is a list of the tools used in the development of the JFIF project, and some instructions
for obtaining and installing them. The idea here is provide enough information to replicate fully
the results of the JFIF development, and have all the tools available to further develop this project.
For the most part the discussion assumes a Cygwin environment on
a Windows platform, as the majority of the development was done with this setup. However, the project
has been tested under a linux environment (using Ubuntu 12.04LTS), and some notes are included for
this to allow those wishing to use this operating system to do so.
The following table is a list of all the components, software and tools used during the development.
It is not essential to have all the tools listed above to compile and run the JFIF
package as delivered. As a minimum, however, then some basic tools are needed. If
running under Cygwin, then Cygwin must be installed, with gcc/g++ and GTK+. Then
the windows editions of Modelsim and Quartus II must be installed, and your Cygwin
environment updated to point to the binaries of this tools. This will allow a
full regresion run and synthesis of the JFIF project. For Linux, the situation
is very similar, except the linux versions of the tools must be installed, and
the OS a suitable linux version—Ubuntu12.04LTS has been verified, but
a RedHat or similar is very likely to work with little trouble.
If wishing to load and run on an FPGA board, the the DE1 and cable are the configured
and tested units, but the project can be adjusted to target other FPGAs and
development boards, though al lot more work is required if not an Altera based
board.
For the windows/Cygwin environment. Microsoft visual C++ express is an alternative
to compiling the reference model. In this case, GTK+ must be available and
configured for windows (and not just Cygwin), or the model must be compiled with
JPEG_NO_GRAPHICS defined when compiling. The resultant executable would have to
be copied to the sw/jpeg_cpp/build directory, as the other scripts expect a model
executable in this location. This is more useful if the reference model is of
interest stand-alone, as if Cygwin is environment, one may as well let the
build scripts use gcc/g++ and place the results in the expected places.
As mentioned above, the GTK+ libraries and development headers are required by
default, and should easili be cpnfigured for Cygwin (or Linux). If there are
issues with this, then defining JPEG_NO_GRAPHICS when building the
reference model (by updating the makefile or MSVC Preprocessor Definitions)
removes the graphical capabilities, which are not used in the automatic
regression flow. Therefore GTK+ is not absolutely essential, but is
recommended, to avoid unnecessay work.
The coverage tools LCOV and GCOV are not essential tools, but were used to give
a measure of the test case coverage of JFIF functionality. With good coverage
on the model for the test cases used, then good coverage on the HDL is acheived
by running the same cases through the hardware (in simulation) and bit-matching
against the reference. A 100% coverage, though, does not guarantee full functionality,
and these tools were used specifically to check that not functionality was uncovered.
As a collegue once said to me—is just to ge a "warm feeling". There is still
lots of testing that would need doing.
Calvin Hass's JPEGsnoop tool is not essential, but was used extensively in the
development of the JFIF project. It is a fantastic utility for exploring the
details of a JPEG file, and proved invaluable as a debug aid. If you plan to
modify the code, you would do well to have this tool available. It is ony
for the windows environment though—and I haven't tried it under wine
as yet.
The GNU Image Manipulation Program (GIMP) is another non-essential tool, but
was a useful application for generating JPEG files of various characteristics
from source bitmap files. When exporting to JPEG format, the tool gives various
options that allow generation of quantisation tables of varing compression/quality,
all the subsampling options and various forward DCT generation. Other tools
can probably do the same thing (e.g. Adobe Photoshop), but at least GIMP is free
of charge, and a Linux version is available.
Finally, the Inno setup tool was only used to package up the project into a
self-extracting executable (windows only). If you're not planning to re-release
modified code (under the terms of the licenses) then you probably don't need
this. It is easy to use, however, and a powerful way of packging up source
files etc.
As well as the tools mentioned above, additional programs, programming languages
and scripting languages were used in the JFIF project, that are standard in the
Cygwin and Linux environments. The heart of compiling and running the various
stages of the project is the GNU 'make' program, and various makefiles are used
throughout the project, with a top level makefile in "sim/rtl/run". This makes
uses of various scripts which have deliberately been restricted to BASH shell
scripts. The required sophistication of the scripts was not high, and the BASH
shell is ubiquitous in the Linux based environments. The use of Perl or Python
was not warranted, would be overly complex and suffers (to varying degrees) on
version incompatibilities. As a main objective of this project is as a teaching
reference, adding this level of complexity for no functional advantage would
counter this objective. The reference model is written in C and C++ and, though
the teaching of these programming languages is not a goal of the project, it
serves as a reference point for these languages. Speed of execution is also
a factor for the reference mode (though not an overriding one, where accurate
modelling of h/w architecture takes precedence), and so a compiled language was
chosed over a scripting or interpreted language. C and C++ is very standard and
portable, making it ideal.
Package Structure
The diagram below gives a hierarchical view of the delivered JFIF package to aide
in navigation around the various files and directories. The directory structure is
based on that defined by the OpenCores organisation
(see the OpenCores coding guidelines
document). Unfortunately, the project was started before being aware of these guidelines.
The HDL is compliant around 90% of the guidelines. The main violations is the
naming of port signal direction, where these are prefixed in JFIF, but the guidelines
specify suffixes. At some point, it is likely that the HDL will be updated to
meet these requirements fully.
<Install directory>
|
|-- README.txt -- Top level README
|-- LICENSE.txt -- LGPL v2.1
|
|-- doc/ -- Documents
| |-- itu-t81.pdf -- JPEG specification
| |-- jfif3.pdf -- JFIF specification
| |-- index.html -- Access to project documentation
| |-- todo.txt -- Project status
| `-- src/
| `-- web/... -- Main project docs (not listed)
|
|-- rtl/ -- Core RTL
| `-- verilog/
| |-- timescale.vh -- Simulation global `timescale
| |-- verilog.vh -- Project common definitions
| |-- idct.vh -- iDCT definitions
| |-- jfif_extract_hdr.vh -- Header extractor definitions
| |-- jfif_huff_decode.vh -- Huffman decoder definitions
| |-- verilog_lib.v -- Common modules
| |-- idct_lib.v -- Generic iDCT module library
| |-- jfif_decoder.v -- Top level core
| |-- jfif_extract_hdr.v -- Header extractor logic
| |-- jfif_huff_decode.v -- Huffman decode top level
| |-- jfif_huff_decode_core.v -- Huffman decode logic
| |-- jfif_huff_swing_buff.v -- Huffman decode o/p swing buffer
| `-- jfif_idct.v -- Top level iDCT module
|
|-- bench/ -- source files for testbench
| `-- verilog/
| |-- testbench.vh
| `-- top.v
|
|-- sim/ -- Simulation files
| `-- rtl/ -- Simulation files for RTL
| |-- run/ -- Main run directory
| | |-- makefile -- Top level makefile
| | |-- run.do -- Modelsim script
| | `-- wave.do -- Modelsim script
| |-- bin/ -- Compile and run test scripts
| | |-- compile.sh*
| | |-- regression.sh*
| | `-- runsim.sh*
| |-- log/ -- Directory for logs
| `-- out/ -- Directory for simulation output
|
|-- sw/ -- Verification and ref model s/w
| |-- comms/ -- FPGA board driver software
| | |-- src/ -- Source code directory
| | | `-- serial_port.cpp -- Driver code
| | |-- comms/ -- MSVC files
| | | |-- comms.vcxproj
| | | `-- comms.vcxproj.filters
| | `-- commms.sln -- MSVC solution file
| |-- hex2rgb/ -- Bitmap utility software
| | |-- cmpbmp.c
| | |-- hex2rgb.c
| | `-- makefile
| `-- jpeg_cpp/ -- Reference C/C++ model
| |-- makefile -- Model makefile for cygwin/linux
| |-- README.txt -- Initial documentaion
| |-- LICENCE.txt -- GPL v3
| |-- msvc/ -- MSVC 2010 solution
| | |-- jpeg/
| | | |-- jpeg.vcxproj
| | | `-- jpeg.vcxproj.filters
| | `-- jpeg.sln
| |-- src/ -- Reference model source code
| | |-- bitmap.h -- 24 bit bitmap definitions
| | |-- jfif.h -- External API definitions
| | |-- jfif_local.h -- Internal model definitions
| | |-- jfif_class.h -- jfif class definition
| | |-- jfif_idct.h -- jfif_idct class definition
| | |-- jpeg_dct_cos.h -- Math definitions for iDCT
| | |-- jfif_gtk.h -- GTK+ graphical display definitions
| | |-- jfif.cpp -- jfif class body. Main functionality
| | |-- jfif_idct.cpp -- jfif_idct class body.
| | |-- jfif_gtk.c -- GTK+ graphical display code
| | `-- jfif_main.c -- Top level user interface code
| |-- build/ -- Build output directory
| |-- build.mingw/ -- Build output directory (for mingw)
| |-- obj/ -- gcc object directory
| `-- test/... -- All the test files (not listed)
|
`-- syn/ -- Synthesis scripts
`-- altera/ -- Cyclone II synthesis
|-- makefile
|-- jfif_test.qpf -- Quartus II project for test module
|-- jfif_test.qsf -- Quartus II settings for test module
|-- jfif_test.sdc -- Quartus II constraints for test module
|-- jpeg_decoder.qpf -- Quartus II project for speed/area meas
|-- jpeg_decoder.qsf -- Quartus II settings for speed/area meas
|-- jpeg_decoder.sdc -- Quartus II constraints for speed/area meas
`-- src/ -- Top level wrapper source code
|-- alt_jfif_test.v -- Top level FPGA test module
|-- alt_jfif_pll.v -- PLL wrapper for test module
|-- controller.v -- Test module controller
|-- uart_transceiver.v -- Test module UART
|-- filter.v -- Test module switch filter
|-- bin_2_hex_led.v -- Test module binary to hex LED convertor
`-- alt_jpeg_decoder.v -- Top level wrapper for speed/area meas
|
The package is divided into several convenient sections. All the documentation
(including this document) can be found under doc/. Copies of the ITU and JFIF specs can
also be found there. The decoder core HDL design is under rtl/. As this is all
written in verilog, all the code is under a verilog/ sub-directory. The started
edition of Modelsim and Quartus do not support mixed language modes, and so
all RTL and testbanch behavioural code is in a single language (verilog). Which
brings us onto the verilog for the testbench under bench/. This is a single
top level file, with a header. RTL simulation files are under sim/rtl. This
includes all the scripts (sim/rtl/bin) and makefiles (sim/rtl/run) needed
to comile and run tests. Most standard activities associated with compilation and the
running of tests should be executed from sim/rtl/run/, using the makefile provided.
Synthesis can also be invoked from here.
The software directory (sw/) contains all C/C++ source code used for verification,
including the C reference model, and the utilities. The model files are housed
in jpeg_cpp, and the utilities under hex2rgb. Each sub-directory has its own
makefile, but these can be accessed via the sim/rtl/run/ makefile. The reference
model directories is also where all the directed test imagaes are kept (sw/jpeg_cpp/test),
as these are common to both testing the reference model and the HDL. The refence
model source code and compile scripts all reside here, and the sw/jpeg_cpp sub-directory
can be separated as a complete stand-alone package, if only a software implementation
is of interest.
The final directory, syn/, if for HDL synthesis. As only Altera FPGA synthesis is currently
supported, a sub-directory (syn/altera/) houses the project and settings files
etc. A verilog wrapper is defined in syn/altera/src/. Currently this is just for
Timing closure purposes, and not for targetting the development board. A makefile
is provided dor synthesis compilation, which can also be accessed from the
sim/rtl/run makefile.
Verification Environment
Top Level Verilog
Utilities
Scripts
Reference Model
Compilation and Execution
The JFIF project uses GNU make to compile and run all its elements. There is a hierarchy of
make files, such that individual components, like the reference model, or the synthesis of
the FPGA image, can be made. However a top level make file is constructed which give
access to building all the individual components, suitable for almost all needs. The
make file is located in sim/rtl/run. By changing to this directory and typing 'make'
a help message is displayed giving information about the various options. The table
below shows the details of these options.
Command |
Options |
Description |
run |
INPUTFILE=<jpeg file>
HEXFILE=<o/p hex file>
BMPFILE=<o/p bmp file>
GUI=yes|no
PLUSARGS="<plusargs string>"
VERBOSE=yes|no
|
Run a single simulation with specified test file, outputs and simulation plusargs.
Optional run in GUI window. Will invoke a compilation. Optional compile verbosity.
|
compile |
INCR=yes|no
VERBOSE=yes|no
|
Compile sim, with optional incremental compile (default full). Invokes ref model make.
|
ref_model |
VERBOSE=yes|no |
Build C reference model |
utils |
VERBOSE=yes|no |
Build C verification utilities |
regression |
VERBOSE=yes|no |
Run regression tests. Forces a clean and full compilation of sim, utils and reference model. |
synth |
PROJECT=<project name> |
Run synth for Cyclone II FPGA. By default builds jpeg_decoder
for measuring area and speed of decoder. When jfif_test will
build full test environment. |
clean |
VERBOSE=yes|no |
Clean simulation and test files |
sparkle |
VERBOSE=yes|no |
Clean all simulation, test, reference model and utils files |
all |
VERBOSE=yes|no |
Does a clean, build, regression and synthesis run. |
Starting with the top option 'run', this is used for running an individual RTL simulation.
Various options are available to select the input JPEG test file (INPUTFILE), the simulation
HEX output file (HEXFILE) and the bitmap file (BMPFILE) target when converting the hex data.
By default the simulation will run in batch mode, but by setting GUI=yes the simulation
will fire up the GUI window and run the simulation there, for debug. The simulation
environment is controllable via several PLUSARG options. These can be passed to the simulations
by setting PLUSARGS to a single string containing all the relevant plus options. Executing
a 'make run' will also invoke a compilation of the reference model, the utilities and the
simulation. The make process will display little output when running, unless an error occurs,
and normally simply returns with any messages. This is fine unless an error occured in one of the
steps (particularly in a sub-make build), where it might be handy to know at what point in the
process the error occured. By setting VERBOSE=yes, all messages regarding changes of directory
and the simulation compile messages etc., are enabled allowing ease of debugging. The VERBOSE
option is common to almost all the make targets.
If only a simulation compilation is required, then 'make compile' is used. By deafult, this is a full
compilation, to ensure complete coverage. Hwoever, with INCR=yes set, then an incremental
compile is invoked. Compiling te simulation will also compile the reference model and the
utilities. These can also be compiled separately with 'make ref_model' and 'make utils'.
To run a complete regression test of the directed test cases, 'make regression' is used. In order
to ensure that the regression run is definitely using the latest source code, a clean is
performed, followed by a full build of the simulation, as well as the reference model and the utilities,
before the tests are run. A pass/fail message is printed at the end of the regresion run, but a
log is generated in sim/rtl/log/regression.log for post-test inspection, as well as logs of
the individual test runs.
To compile the core for the FPGA, 'make synth' is run. This causes the makefile in
syn/altera to be invoked, which is a batch compilation consisting of the four main stages
of the synthesis: mapping, fitting, static-timing and assembly. The syn/altera directory
will contain all the Quartus II output data, where individual reports may be inspected, or
the GUI invoked on jpeg_decoder.qpf, and the design results explored through this.
Two levels of cleaning are provided in the makefile, one for simulation files, and
a more thorough clean that invokes the sub-make cleaning. The commands for these
are 'make clean' and 'make sparkle'.
The final command to mention is 'make all'. This is a 'do everything' commands that does
a deep clean, full build, regression and synthesis. It is a handy command to check the
package is installed and setup correctly.
Note that using the makefile in sim/rtl/run sets specific configurations when building
sub-components&mdask;for example, the reference model, when DEBUG_MODE is enabled. If
compilations for the sub-components are required with different configurations, then
the makefile for the sub-component must be invoked in its home directory with the
appropriate options. For the vast majority of cases, though, the sim/rtl/run makefile
will be all that is needed. For reference, the current sub-component makefiles are
located in sw/jpeg_cpp/ for the reference model, and sw/hex2rgb/ for the utilities.
Synthesis
Two Quartus II projects are defined for synthesis. One is purely for measuring the speed and area
of the JPEG decoder—called jpeg_decoder. This is the default project when building from
the "sim/rtl/run/" or "syn/altera/" directories. A second project, jfif_test, is the full hardware
test environment, as documented below.
Constraints
The constraints for the projects are fairly straight forward. The top level wrappers
instantiate a PLL to give a clean and balanced clock. The constraints in jpeg_decoder.sdc
and jfif_test.sdc, then, just define the input clock, which feeds the PLL, and a generated
clock for the PLL output. Some clock uncertainty is defined for better accuracy. By default,
the constraints assume using the 24MHz input clock, and a multiplier of 5 and a division
of 3, to give an internal clock frequency of 40MHz. Obviously, for the jpeg_decoder, the
values were altered to find the maximum limits of speed for the design, the results of
which are detailed in the next section.
Implementation results
The following table gives a summary of synthesis results obtained for various Altera FPGA
target devices using the jpeg_decoder project. The devices chosen were based on allowing comparison with a commercially
available decoder core provided by Cast Inc.
They also give a spread of device types and speed grades to show what the limits are for each
target device.
For the purposes of obtaining these figures, the synthesis was carried out for the target devices,
routing the core input and output ports, as well as the error status port, to external pins. The
clock for the core was derived from a single PLL.
FPGA |
Fmax Fast |
Fmax Slow |
LEs |
Mem Usage |
DSP |
Cyclone EP1C12-C6 |
97MHz |
51MHz |
9008 |
14848 |
- |
Cyclone II EP2C8-C6 |
105MHz |
51MHz |
8145 |
14848 |
6 x 9 bits |
Cyclone III EP3C10-C6 |
108MHz |
64MHz |
9185 |
14848 |
6 x 9 bits |
Startix EP1S10-C5 |
92MHz |
52MHz |
8406 |
15104 |
10 x 9 bits |
Stratix II EP2S15-C3 |
152MHz |
89MHz |
8486 |
15104 |
10 x 9 bits |
The Cast Inc. results table, for Altera devices, can be found
here
for comparison. Note that it is not stated, that I could find, which operating corner
the Fmax figures for the Cast offering come from. The above table shows both slowest and fastest corner figures
and either the JFIF core is comparable in speed, or runs at about 50%—I
wish I knew which.
Development Board
The diagram below shows an overview of the FPGA "jfif_test" environment. A block diagram gives the
major components, from the test driver software, communicating with the FPGA on the development
board. The FPGA design consists of a UART for communication with the host, a central controller
for managing data transfer to and from an SRAM buffer to the host and to and from the buffer to
the JPEG decoder itself. INternal status is displayed on the seven segment LED display.
Figure 14: JFIF Hardware Setup
|
Underneath the block diagram is shown the connection diagram for the system. A PC or laptop is
used to configure the FPGA board via the USB blaster cable, which may also power the board
when connected, if the power supply is not used.
Once configured the PC is also used to run the driver code for communication
and data transfer with the board, via the USB blaster cable, using FTD2XX driver code that
communicates to the DE1 board's FT245BL USB to Parallel FIFO chip, ultimately connected to the
controller code in the FPGA, via the USB blaster JTAG outputs.
Via this interface 'chunks' of data are placed in the buffer and
then fed through the decoder at full speed, back into then buffer, with the resultant decoded
'chunks' then streamed back through the interface to the host.
Decoder Wrapper Logic
UART
Controller
Support Modules
Test Environment
Performance Results
Project Status
The JFIF project is far from complete, and much more verification needs to be
done, as well as develop the FPGA development board environment. Below is
show the current status against planned work. In Summary, the hardware is
implemented and passing regression runs of turn-on directed testing. It has
been synthesised for timing closure purposes, but not yet targetted at the
FPGA development board.
C/C++ reference model
---------------------
Code development
iDCT completed
header extractor completed
Amplitude decoding completed
bitmap generation completed
Command line interface completed
Graphical output completed
Debug/bitmatch output completed
C model environment
Linux/cygwin makefiles completed
MSVC project completed
Coverage not started
Put under source code control completed
RTL Development
---------------
Header extractor implemented and passing directed tests
Huffman/RLE decode implemented and passing directed tests
Dequantise implemented and passing directed tests
De-zigzag implemented and passing directed tests
Core structural code implemented and passing directed tests
Top level synth wrapper completed
RGB/bitmap convertor (optional/external) not started
Additional header checks (optional) not started
Put under source code control completed
Verification Infra-structure
----------------------------
Simulation compile/run scripts
compile.sh completed
runsim.sh completed
regression.sh completed
Top level testbench code completed
C-model bit matching comparator completed
Put under source code control completed
Simuation Testing
-----------------
Turn-on testing with directed test cases
colour YCC completed
monochrome completed
colour RGB (optional) not yet supported
single ECS completed
multiple ECSs
various reset intervals started
ZRL code (0xF0) completed
PAD code (0xFF00) completed
Sub-sampling
4:4:4 completed
4:2:2 completed
4:2:0 completed
4:4:0 (maybe?) not started
Out-of order header sections not started
Error/unsupported case handling not started
Invalid marker not started
Unsupported marker not started
Unexpected marker not started
Scan error not started
Restart after error not started
Multiple jpeg decodes in single run not started
Random directed testing
All case files with random i/o timings started
Random files not started
Coverage not started
FPGA
----
Synthesis setup/constraints completed
Operational frequency for various targets completed
PC driver code implemented
Data input/output block completed
Interface to internal memory started
Regression test script not started
Run all directed test cases not started
Run random file testing not started
Documentation
-------------
Concepts completed
C model completed
Core RTL completed
Development Processes
Verification
test environment started
Testplan completed (see above)
Results and performance measurements
Data rates started
Variance from floating point iDCT not started
FPGA
FPGA test setup started
Synthesis completed
Fmax (various targets) completed
Resources started
Alternative area/speed trade-offs not started
Datasheet started
Upload to web started
|
Conclusions
Copyright © 2014 Simon Southwell
simon@anita-simulators.org.uk
|