An Efficient Scheme for Hardware Implementation of Processes with Multiple Active Instances

2023-05-02 来源：小侦探旅游网

An Efﬁcient Scheme for Hardware Implementation of

Processes with Multiple Active Instances

Bengt Svantesson, Shashi Kumar1, Ahmed Hemani

Electronic System Design Laboratory

Royal Institute of Technology

Electrum 229, S-164 40 KISTA, SWEDEN

{bsv,shashi,ahmed}@ele.kth.se

1Visiting Professor from Computer Science & Engineering Department, I.I.T. Delhi, INDIA

Abstract

Very large and complex digital systems can be described using communicating concur-rent processes. Hardware synthesis of these systems is a recent area of research. Thispaper describes a scheme for hardware implementation of processes and inter-processcommunication. Speciﬁcally, our scheme is capable of efﬁciently handling processeswith multiple active instances. A single copy of hardware implementing the process isshared among various instances, thus leading to an efﬁcient implementation. Thisscheme is being implemented as part of a system level synthesis system which has SDLas a front end language.

1.0 Introduction

Structural and functional hierarchy present in large and complex systems can bedescribed and implemented using communicating concurrent processes. This has longbeen done for developing complex software systems. With “system on a chip”, includ-ing processor core(s), DSP processors, memories and random logic, as a feasibleoption, implementation of concurrent processes in hardware has become important. Inmany important systems, multiple instances of processes may concurrently exist andprocesses may be created and killed dynamically. For example, a telephone switch maybe handling multiple calls (processes) and the calls may start and end dynamically.Unfortunately, the two most popular HDL’s, VHDL and Verilog, cannot represent thissituation in a natural manner. However, it is possible to describe and implement this sit-uation in software using normal programming languages. System Description Lan-guage (SDL) allows speciﬁcation of systems using static and dynamic processes[6]. Italso allows multiple instances of processes to exist concurrently. Recently, researchershave started looking at hardware synthesis from system speciﬁcation in SDL[1][2][3][5] and other system speciﬁcation languages [4].

A complex system may be implemented using a mix of hardware and software. A puresoftware solution may not be acceptable because of performance requirement, and apure hardware solution may be too expensive. In such a system, not only we require toimplement processes in hardware, we also need to implement interprocess communica-tion and process management functions in hardware. In this paper, we describe ascheme for hardware implementation of a process. In particular, we describe how acommon hardware can be shared among multiple instances of a process. This problemhas not been looked at by any other researcher so far.

There are two important issues to be handled for hardware implementation of systemswith parallel processes. The ﬁrst one is the implementation of a process and the secondis the implementation of interprocess communication. Generally, a process can be rep-resented as an Extended FSM(EFSM), and therefore, can be synthesized using welldeveloped High Level Synthesis techniques [9]. There have been work done in the areaof hardware synthesis of communicating concurrent processes described in SDL [2][1]and VHDL[10]. All these researchers assumed statically declared processes and imple-mented a separate copy of hardware for each instance of the process. Hardware imple-mentation of dynamic processes was considered to be cost wise prohibitive and was notconsidered. To make hardware implementation of a process with multiple activeinstances costwise feasible, we need to have a scheme for sharing of hardwareresources among multiple instances of the process. This sharing of resources brings inthe requirement of hardware management of multiple instances of the process. It mustbe mentioned that in the software implementation of systems, the CPU is shared by allthe active processes. Lindh [8] have proposed schemes for hardware implementation ofreal time kernels to speed up real time applications. In this paper, we describe a schemein which multiple instances of a process can share a single hardware block implement-ing the EFSM corresponding to the process functionality.

Rest of the paper is organized as follows. In section 2, we describe the basic ideas ofour scheme. The most important among these ideas is sharing of hardware resourcesamong various active instances of a process. We discuss F4- the operations and mainte-nance functionality of an ATM switch as a motivational example for our scheme. Insection 4, we describe a generic hardware implementation of the proposed scheme.Section 5 concludes the paper summarizing the advantages and limitations of ourscheme and outlining areas for future work in this direction.

2.0 Overview of our Scheme2.1 Important Issues

Hardware implementation of processes is motivated by the realization that it can givehigher performance as compared to software implementation. The price to be paid is interms of cost. Direct hardware implementation [1][2] of processes require separatehardware for each process. And, if we extend this idea further, we require a separatecopy of hardware for each instance of the process. Hardware implementation of proc-esses having multiple instances, which are statically declared or dynamically gener-ated, have not been considered earlier.

Another important issue to be considered is the implementation of inter-process com-munication if the processes are implemented in hardware. This issue has beenaddressed by other researchers also [1][2]. This issue also becomes more complicatedto handle for systems which have processes with multiple active instances. [5] haveaddressed this issue for SDL processes.

2.2 Basic Ideas

The basic idea of our strategy is to implement each processP with two hardwareblocks, namelyPCompute andPmanagement.PComputeimplements the computations and

state transitions described in the process as an Extended FSM [6]. Implementation ofPCompute can basically be done using High Level Synthesis ideas and techniques [9].Pmanagement implements the communication of P with other processes and environ-ment. If multiple instances of process P are active, thenPmanagement also has theresponsibility of scheduling and context switching for various instances of the processon a single copy of hardwarePCompute. These ideas are borrowed from design of Unixtype OS allowing multiple processes. However, there are important differences. Thereis a differentPmanagement for every process in the system, rather than a single OS.APmanagement also has to manage only one type of process (multiple instances of thesame process). SincePmanagement is to be implemented in hardware, we must know themaximum number of instances of P which can be active in the system. Many systemdescription languages, like SDL, allows a designer to specify the maximum number ofinstances which can be active during its operation.

P1P2P3P1ComputeP1ManagementP2ManagementP2ComputeInter-Process CommunicationSwitchP3ManagementP3ComputeFigure 1. Implementation of processes and inter-process communication

We are aware that a single copy ofPCompute may become a performance bottleneck, ifnumber of active instances of the process P becomes very large. Although, this possi-bility exists, we think that our simple solution is applicable to a large number of situa-tions and can lead to a much higher performance as compared to pure softwaresolutions. A system may be described using multiple instances of processes to specifyparallelism in the system and/or for functional modularity and clarity. In practice, onlya small subset of created process instances may require hardware resources for compu-tation (others may be waiting for input from other processes or environment).Figure 1. describe how a simple system with three processes will be implemented inour scheme.Inter-process communication is handled by a block called InterprocessCommunication Switch. This block has the responsibility of decoding the destination

address and proper routing of signals. This block also provides the possibility of shar-ing hardware interconnection resources, like buses, among various communicatingpairs of processes. If the processes in a system are organized hierarchically in blocks,then the Inter-Process Communication Switch may also be implemented as a tree ofswitches [5].

3.0 A Motivational Example

In this section, we illustrate the relevance of our scheme by describing F4 - the opera-tions and maintenance functionality(OAM) of ATM at virtual path layer. F4 functional-ity can be classiﬁed into four types of tasks:

•Fault management: when the appearance of a fault is reported to the F4, specialOAM cells will be generated and sent on all affected connections; if the fault per-sists, the management system should be notiﬁed.•Performance monitoring: normal functioning of the network is monitored by contin-uous or periodic checking of the transmission of OAM cells.•Fault localization: when a fault occurs it might be necessary to localize it further.For this purpose special loop back OAM cells are used.•Activation/deactivation: a special protocol for activation and deactivation of OAMfunctions for performance monitoring.One F4 block is present on each physical link connected to the ATM switch (see Figure2.). Since all connections are bidirectional, the F4 block has two inputs and two outputsas shown in Figure 3. To perform its functions, the F4 block deals with speciallymarked ATM cells. These cells are referred to as OAM (Operations and Maintenance)cells and are distinguished from user cells by dedicated values for the Virtual ChannelIdentiﬁer (VCI), a ﬁeld in the cell header.

F4F4F4F4Figure 2. Location of F4 units in an ATM Switch

A straightforward implementation of F4 would be to have one component or handlerper Virtual Path. However, a more natural speciﬁcation would be is to have a handlerwhich can handle one path and dynamically instantiate (or kill) a new copy to deal asper the need arises. This dynamic array of handlers will be fed by a single ReceiverUnit that ﬁlters out OAM cell and routes them to the appropriate path handler. This isshown in Figure 3..

We have the speciﬁed the functionality of F4 in SDL using the multi-process schemejust described. Further, a hardware implementation is being worked out using the syn-

thesis strategy described in [5]. This work is part of providing an SDL front end to ourCMIST high-level synthesis system and Bekka HW-SW co-design environment.

In1In2ReceiverunitNON OAM cellsOAM cellsOAM cellsSenderunitOut2Out1OAMhandlercompo-nentFigure 3. F4 Block diagram

4.0 Hardware Implementation of Pmanagement

Figure 4. shows a schematic diagram of implementation of a process with multipleactive instances1. As mentioned in the previous section, actual computations corre-sponding to various instances of the processes is carried out by sharing a single copy ofPCompute. PCompute, basically implements an EFSM corresponding to the process andconsists of hardware implementing control part (FSM) and the data part (ALU and reg-isters for storing local variables) and their interconnections. Pmanagement has the follow-ing functions.

1.It manages the inputs coming from other processes for various instances of the proc-ess. These inputs are buffered and forwarded to FSM in PCompute to make therequired state transitions.2. Local variables and FSM, in PCompute, has the status of only the current runninginstance of the process. The status of all other instances of the process is stored inPManagement.3.PManagement also manages creation of new instances of the process and also killingof an existing process instances.4.It schedules a new instance on PCompute at an appropriate time. This involves

switching of context by saving the status of running instance and loading the statusof the new instance. We assume that a context switch is only done if there is a inputin the Input Data FIFO for a new instance. Scheduling scheme implements ﬁrstcome ﬁrst serve strategy.5.PManagement also prepares outputs from various instances of the process and routethem to the other processes or environment via an Inter-Process CommunicationSwitch.

1.Implementation of a process with a single instance is a subset of this solution. For this case most of the infra-structure for process management disappears.

The implementation of these functions of PManagement block is done as shown in Figure4. We assume that the designer speciﬁes the maximum number of instances of the proc-ess which are going to be active at any time of system’s operation. This restriction is

Inter-Process Communication SwitchPmanagementinputsignalsInput DataFIFOe SvSaPComputeetatFSMControl &StatussignalsControllerStateBufferLocalVariableBufferrotsReteate SALUOutputsignalsOutputSignalHandlerLocalVariablesVariablesResult_signalsFigure 4. Hardware implementation of PManagement

necessary for deciding the sizes of various buffers. Various components of PManagementare described below.

Input Data FIFO: There is a common FIFO buffer for receiving signals for variousinstances of the process. The input signal has the identity of the instance along withsignal name and signal value. The number of words in the FIFO buffer depends on themaximum number of active instances of the process and also on the rates of productionof various input signals by the source process and rate of consumption (processing) byPCompute [5]. The width of the FIFO buffer will depend on the input signal requiringmaximum number of bits for storing its value.

State Buffer: The state buffer stores the state of FSMs of all the active instances of theprocess. The number of words in State Buffer will be equal to the maximum number ofinstances and number of bits will depend on the number of states in FSM and the stateencoding. We assume that maximum number of instances of all process are known atspeciﬁcation time.

Local Variable Buffer: This buffer stores the local variables of all the instances of theprocess. Its size depends on the number of instances, number and sizes of local varia-bles of the EFSM corresponding to the process.

Controller: Plays a key role in PManagement. It generates the required timing signals forInput Data FIFO, the context switch and communication between PManagementandPCompute. It keeps track of active instances at all times. It also takes care of address

mapping of various process instances to addresses for State Buffer and Local VariableBuffer. A simple direct mapping from instance number to address works in this case,since we have reserve a ﬁxed space in State Buffer and Local Variable Buffer for all thepossible instances. It also takes care of creation and killing of processes. Controller canbe implemented by an FSM.

Output Signal Handler: Output handler prepares signals which are to be sent to otherprocesses. Addresses of the source and the destination processes are appended to thesignal before it is sent out to the destination process through the Inter-process Commu-nication Switch.

Context Switching: The data at the head of the FIFO buffer decides which instance ofthe process will run next on PCompute. If data at the head of the queue is for the cur-rently running instance, then data is given to PCompute and no context switching takesplace. If the data is for a new instance then a context switch takes place. The state reg-ister in FSM and the registers storing local variables are saved in the State Buffer andthe Local Variable Buffer. The state and local variables of the new instance are restoredin PCompute from State Buffer and Local Variable Buffer. This scheme of processscheduling leads to ﬁrst come ﬁrst serve strategy.

As mentioned earlier in section 2, this scheme works well for a class of systems whereonly a small subset of process instances require computation at any time. This schememay not be able to give the required performance if the above condition is not satisﬁed.We are currently working on a scheme to remove the above limitation.

5.0 Conclusions

The scheme described in the previous section is being implemented as an environmentfor HW/SW codesign, having SDL as a front end for system speciﬁcation. We haveapplied our scheme for implementing (generating synthesizable VHDL code) smallexamples described in SDL [5].

In this paper. we have described a scheme for hardware synthesis of such systemswhich have been described in terms of communicating concurrent processes. Ourscheme is able to efﬁciently handle multiple instances of a process which are dynami-cally created or statically declared. This problem has not been looked at by otherresearchers so far. This scheme is being incorporated in a large project which aims atadding a SDL front end to a High Level Synthesis system [9].

However our scheme is efﬁcient, only if the shared hardware (PCompute) is comparableor larger than management overheads (PManagement). The second limitation is the per-formance (speed) degradation due to sharing of PComputeby a large number of processinstances. The problem is similar to poor response time on a multi-user computer sys-tem when the number of users (processes) become large. This limitation reduces theapplicability of our scheme to those systems in which either the total number ofinstances are small, or only a small fraction of process instances are executing at anytime. An obvious, but non trivial, extension to our work will be to extend our schemeso that we can use more than one copy of hardware (PCompute) to achieve the requiredspeed performance.

6.0 References

[1]Jean- Marc Daveau, Gilberto Fernandes Marchioro, Carlos Alberto Valderrama, Ahmed AmineJerraya, “VHDL generation from SDL speciﬁcations”, CHDL 97, April, 1997.

[2]I.S. Bonatti and R.J. Figuerido, “An algorithm for the translation of SDL into synthesizableVHDL”, Current Issues in Electronic Modelling, Vol. 3, August 1995.

[3]O. Pulkkinen and K. Kronlöf, “Integration of SDL and VHDL for High Level Digital design”, Pro-ceedings of the European Design Automation Conference with Euro-VHDL, September 1992, pp 624-629.

[4]Daniel D. Gajski, Frank Vahid, Sanjiv Narayan and Jie JongSpeciﬁcation and Design of Embed-ded SystemsP T R Prentice Hall Englewood Cliffs, New Jersey 07632 ISBN 0 13 150731 1.

[5]S. Kumar, B. Svantesson, A. Hemani “A Methodology and Algoritms for Efﬁcient Synthesis fromSystem Description in SDL” technical report TRITA-ESD-1997-06, ESD lab KTH, Stockholm, Sweden.[6]A. Olsen, O. Færgemand, B. Møller-Pedersen, R. Reed, J. R. W. Smith “Systems EngineeringUsing SDL-92” First edition, second impression, Elsevier Science B. V., ISBN 0 444 89872 7.[7]ITU-T Recommendation I-610 “B-ISDN Operation and Maintenance Principles and Functions”[8]J. Stärner, J. Adomat, J. Furunäs and L. Lindh “Real-Time Scheduling Coprocessor in Hardwarefor Single and Multiprocessor System” Euromicro Control 1996, Prague, 1996.

[9]A. Hemani, B. Svantesson, P. Ellervee, “High-Level Synthesis of Control and Memory IntensiveCommunications System” Eighth Annual IEEE International AISIC Conference and Exhibit, Austin,Texas, September 18-22, 1995, pp 185-191.

[10]P. Eles, K. Kuchcinski, Z. Peng and M. Minea “Synthesis of VHDL concurrent processes” in Proc.of European Design and Automation Conference EDAC, pp. 540-545, 1994.

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文

全部栏目

An Efficient Scheme for Hardware Implementation of Processes with Multiple Active Instances