(1/2) Adventures in RISC-V - the Angry Goose Initiative
Over the course of my last coop term, I ended up joining a project started by a few of my fellow classmates: the Angry Goose Initiative. This project aims to create an FPGA implementation of a fully linux-capable RISC-V CPU, from scratch. I was already becoming increasingly interested in CPU architecture and RTL design because of my exposure to ASIC design at my two co-op terms at Untether, as well as an ongoing interest in an FPGA implementation of a Sharp SM83 8-bit CPU (the one from the original Game Boy- blog post to come.) So, when I heard that my classmate (and coworker for the term) John was working on such a project, I was sold. This would turn out to be an immensely valuable learning experience over the course of the term, as well as a good excuse to keep in touch with some friends from class.
The project aimed to build the core in two main phases: first, to build a fully functional C++ software emulation of the core (IRVE), in order to become familiar with the architecture, and work out as many software kinks as possible. Then, to use the lessons learned from the emulator to build the FPGA implementation in SystemVerilog (LETC). At my time of joining, the software emulator was nearly complete and booting linux, with just a few kinks to iron out.
Because of my experience on the firmware team at Untether, John suggested that a good first step to get acquainted with the project would be to do some work on the SBI. For those that are not familiar, the SBI serves as the lowest level software interface between the program code and the CPU core itself. It runs on bare metal, and has entry points that are triggered by various exceptions and system calls that occur from the Linux kernel. All of the baseline functionality had already been implemented, but there was still one piece missing that would make the hardware implementation work.
There were some CPU instructions that were supported on the software emulation that would be omitted from the hardware implementation for simplicity. The RISC-V architecture suggests to handle this in a clever way:
When the CPU enounters an exception, execution is vectored into the SBI. The SBI can then check the “machine trap cause” (mcause) control status register (CSR) to find out what the exception was, and behave accordingly. The “machine exception program counter” CSR can also be checked to determine the location in memory at which the illegal instruction was encountered. When the exception is that of an illegal instruction, we can completely handle it in software before jumping back to the next instruction in the main code. The following steps are taken:
Preserve all current register values on the stack.
Load the offending instruction pointed to by mepc.
Decode the instruction in software, and determine the necessary operation, source, and destination register.
Compute the operation, and modify the register values on the stack to replicate the desired result from the given instruction.
Increment the value in mepc and jump to that location to continue execution past the illegal instruction.
This is a pretty sneaky way to handle unsupported instruction, and I love it. It’s exactly the kind of crackpot scheme I like to think I would come up with in a pinch. With a plan in place, it was simply a matter of getting the emulator up and running, building the toolchain, and getting to work with some low-level firmware.
One of the biggest challenges with getting my environment up and running for development was cross-compiling across the right toolchain. RISC-V has a handful of instructions, and the toolchain can be built with any variety of these instructions included. The name of the game was to build the SBI without the “M” extensions, since these are the instructions we will be emulating. Building without “M” will cause the compiler to include the built-in GCC algorithms for multiplication and division, which would allow us to make those calculations in software with ease. We would of course have to build the priviliged code with the “M” extension, so that it would still attempt to excecute multiplies and divides and actually cause the exceptions.
I ran into a handful of issues with building the right toolchain, as well as accidentally building with the wrong toolchain (due to sloppy path variable editing on my part). Another problem I had was that I was building my test program code to run in U-mode, along with the SBI, which was also in U-mode. This caused the emulator to load the code into the same memory space as the SBI, which led to all kinds of wacky issues with the debugger, which took a while to realize what the problem was. With these kinks ironed out, I was finally able to actually drop into the SBI from my S-mode code, do what I needed to do, and jump back.
Decoding and ececuting the instructions was the easy part, as I already had a decent amount of experience working with bit-fields and the like, so it was simply a matter of reading the instruction reference and emulating the right activity. I also was able to look at the emulator’s source code to see how the team had implemented it. After some trial and error, and lots of debugging and benchmarking, I had things working nicely, and the code structured in a logical and modular way. Once John was happy with my work, I would be ready to move on to the deeper part of the project- RTL for LETC.
My contributions to the SBI are summed up in this commit.
Continued in part 2!