Embedded Multicore Systems Using Virtual Machine Approach
The purpose of the VirtuES project is (a) to have an efficient multicore hardware implementation for currently popular programming language, Java, and (b) to develop understanding on how a multicore platform should be efficiently programmed. This understanding is reflected back to multicore architecture hardware design, to Java virtual machine implementation as well as to API based language constructs and design of parallel Java programs. Our project meets the current trends: (i) focus on processor hardware design is moving to multicore systems, (ii) Java continues to be popular language for writing applications to embedded (mobile) systems, and (iii) writing efficient thread-level parallel programs for multicore systems is seen very challenging – clear consensus on used parallel programming methods is missing.
As a starting point for the VirtuES project we have a Java virtual machine partitioned so that the execution engine is separated to a co-processor from the portion providing services and access to platform dependent devices (disk systems, memory, network, etc.). As the execution engine we will use a modified version of the co-processor designed in a previous project, RealJava.
Our first goal is to introduce true hardware (HW) level parallelism to the system. This is done by adding more execution engines, executing Java threads in parallel with each other. As an extension of this, we aim to make our virtual machine multitasking. . This means that the virtual machine would be able to run several Java programs, each with their own set of threads, at the same time. This would save a lot of memory, as current implementations start a new virtual machine for each Java program running in the system.
Our second research goal is developing the programming methods used in multicore environments. Parallel algorithms are typically expressed via parallel loops, which implicitly define several threads and enable the execution to advance the thread’s unique identity number. The efficiency of a parallel algorithm leans on semi-synchronous execution of threads. Semi-synchronous execution can be used to guarantee certain factors from the algorithm viewpoint on the global state of computation. Without such guarantees the threads might be forced to execute costly checks on the state of computation. Since it is not desirable to have modifications to the Java language, we consider having a set of classes to implement expressing parallel loops and synchronization mechanisms. This enables us to run the parallel applications in ordinary Java environments and at the same time makes it possible to use hardware supported mechanisms for thread creation/handling and synchronization on multicore systems. Designing such an API is very challenging, since it must provide useful mechanisms for writing parallel programs but on the other hand the mechanisms must have efficient HW support in the multicore system.
The research will be carried out using FPGA demonstration boards and design tools from Xilinx for the hardware components and Gnu cross compiling environment for the software components. These will be used to compose a complete embedded system with hardware and software components and also an operating system running on the background. As the operating system we are going to use Linux, which is widely used in embedded systems both in the research community and the industry as well. With this setup we can measure actual impact of using multicore virtual machine in an embedded system and also find out the benefits of the improved programming models developed within this project.
The project will focus on four main work packages. The first three will be executed one after another, while the fourth one is executed in parallel with the others. The work packages are: (1) creating a distributed Java Virtual Machine with multiple HW accelerators, (2) finding, analyzing and removing the inefficiencies in multithreaded virtual machine, (3) implementing multitasking to the virtual machine and (4) creating multicore programming model and API with HW acceleration support. The work packages are presented in more detail below, with a short summary of tasks at the end of each work package. The work packages are not assigned to any specific researcher, because active co-operation is required in each of them.
A multicore version of the REALJava co-processor has been successfully implemented on Xilinx ML310 and ML410 platforms. Both platform showed significant gain in performance for multithreaded Java applications. Latest developments have moved the the system to a new platform, a Virtex 5 based board from Avnet. The system currently supports 8 co-processor cores in parallel, limited only by the amount of memoryblocks in the FPGA device. Future work will improve the system and focus on the software techniques required to fully utilize the underlying multicore virtual machine.