FPGA Function Accelerator

Altera DE1 Development Board

In my 3rd year, I took the digital systems design module. This module introduced design digital systems implemented with Field Programmable Gate Array devices, memory devices and microprocessors. For the coursework, we were tasked with accelerating a function shown below.

f(x)=i=1N0.5xi+xi2cos(xi128128)f(x) = \sum^N_{i=1} 0.5 \cdot x_i + {x_i}^2 \cdot \cos\bigg(\frac{x_i-128}{128}\bigg)

We implemented the digital system on an Altera Cyclone V FPGA development board using the NIOS II embedded processor programmed using the Quartus IDE. This processor was modified by adding a custom instruction to a hardware block that calculated a single iteration of the summation. To accelerate the computation, we used:

  • CORDIC algorithm which only uses shifts and adds for computating Cosine Cordic Algorithm for single iteration
  • Floating-point operators to replace floating-point emulation in fixed-point operators
  • Parallelism to reduce latency
  • Data and instruction cache tuning
  • Word length optimisation to meet error bound with fewest resources.

The final design achieved a 98.7% decrease in latency compared to the non-hardware accelerated version.