Jaroslav Sýkora
Defense type
Ph.D.
Date of event
Venue
ČVUT
Mail
The roots of all evil are the latencies that are statically unpredictable. Dynamic schedule of operations,
constructed on-the-fly in data-driven machines, is needed to overcome them. Microthreading
is a unified data-driven and dynamically scheduled model for efficient programming of many-core
general-purpose processors. It overcomes unpredictable latencies in off-chip memories (DRAMs)
and in on-chip shared interconnect. As silicon chips became power-limited, causing the shift from
frequency scaling to many-core scaling, the previous work envisioned large-scale homogeneous manycore
chips because it assumed that low-clock frequency silicon is easily scalable in space. However,
the contemporary and future power constraints will favour heterogeneous (specialized) rather than
homogeneous (general-purpose) many-cores because the thermal design power of a chip could be so
low that not all cores may be powered up simultaneously.
Besides the power issues the other negative side-effect of silicon scaling is an increase in latency of
interconnect (metal wires) relative to that of gates: new designs are becoming limited by interconnect
delays. As the interconnect delays depend on details of physical placement of modules in a chip
or in a reconfigurable array they are difficult to predict accurately early on in the design process.
Consequently, future hardware will be special-purpose and customized due to the power issues, and it
will be data-driven to overcome on-chip interconnect latencies.
This dissertation explores dataflow latency-tolerant techniques with a focus on customized hardware
design using reconfigurable hardware arrays. Dataflow is studied at the gate and chip levels:
gate-level dataflow overcomes on-chip interconnect delays, and chip-level dataflow allows for the
composition of scalable heterogeneous many-cores.
The first contribution is an analysis of a contemporary statically scheduled instruction-driven
architecture for customized computing realized in an FPGA. In contrast to the original design bases of
the architecture it is shown here that high-frequency instruction issue is needed even in an architecture
with batch (vector-based) data processing. The second contribution is a method to achieve the highfrequency
instruction issue by using dictionary tables of instruction fragments.
Statically scheduled data-path used to be preferred because all latencies (including interconnect)
were assumed to be fully known early in the design time. The third contribution is a new structured
and extensible approach for synthesis of hardware controllers from synchronous Petri nets. The fourth
contribution is a new technique for dataflow hardware synthesis from Petri nets. The technique is based
on augmented synchronous Petri nets with optimal throughput.
The fifth contribution is a technique that combines the data-driven microthreaded procedural
computation model with the special-purpose data-driven hardware in structurally programmed reconfigurable
arrays. Adaptive transparent migration of microthreads between the general-purpose and
special-purpose hardware is demonstrated.