ERCIM News No.45 - April 2001 [contents]
Metacomputing Applications in All-optical Networks
by Thomas Eickermann, Helmut Grund, Wolfgang Ziegler and Lothar Zier
Todays commodity networking technology is one limiting factor for the performance of distributed applications with high bandwidth communication. Another obstacle is the still rudimentary middleware layer for convenient set up, operation and steering of metacomputing applications. The proposed project RePhoNet is aimed to provide a testbed bringing together optical network technologies, advanced middleware for metacomputing and real world applications. It will be funded by the German government and the DFN (German Research Network).
Following two other successful advanced networking projects of the last years (RTB-NRW and Gigabit Testbed West) RePhoNet is a new joint R&D project of 15 partners from industry, governmental research centres and universities: German Telekom, Siemens, GMD, FZJ, DLR, Caesar Bonn, Universities of Bonn and Cologne, FH Rhein-Bonn-Sieg and others. It consists of three major building blocks:
- OptiNet, provision of the underlying photonic network technology of the testbed
- MetaComp, developing and implementing the essential technology to run metacomputing applications in the testbed
- a number of real world applications ranging from simulation of molecular dynamics to distributed virtual reality systems.
Most applications will use PC-clusters with shared memory multi-processor nodes as compute resource. There are six PC-clusters available in the testbed today, distributed over five sites, having between 14 and 144 CPUs. Each cluster has a fast Myrinet internal network. It is planned interconnect the local Myrinet infrastructures to the optical network through special high speed links.
Topology of the all-optical network testbed. OptiNet will establish an optical network in the Bonn/Cologne region. Later on an extension to Aachen and Darmstadt is planned. The network is based on optical switches of several vendors. Signalling will allow the dynamic, on demand allocation of multiple lightpaths. In collaboration with MetaComp the Meta-scheduler will be extended with features for optical signalling. Each of the optical connections will provide several Gigabit capacity with QoS guarantees. Various networking technologies in the end systems (eg ATM, Gigabit Ethernet, 10GigE, Myrinet) can be connected with the all-optical equipment in the network core.
MetaComp is active in two main areas: a metacomputing enabled communication library (MetaMPI) and the Meta-scheduler. Additionally MetaComp covers a survey of existing communication software and interconnect hardware, system management, and end-user advising.
MetaMPI will be ported to Myrinet/Intel clusters to allow an efficient MPI-communication between PC-clusters and other parallel computers. Apart from this work it is necessary to generalize the communication model of MetaMPI to make an optimal use of the different architectures of supercomputers and clusters: while supercomputers still communicate through dedicated router-nodes the planned Myrinet infrastructure allows a direct point-to-point communication between all nodes of a distributed PC-cluster. In order to support MetaMPIs dynamic features (MPI-2 process spawning) an interface to the Meta-scheduler has to be implemented.
Using the Meta-scheduler requires a local job-scheduler running on all PC-clusters supporting the RAA (Resource Allocation Protocol). This might be done by either porting and implementing GMDs EASY scheduler or through modification and extension of a local existing and suitable scheduler. Another important task is the integration of a system that allows resource allocation on the networking level, i.e. scheduling of wave-lengths, or binding a dedicated QoS to a job in the RePhoNet network.
During the first phase of the three years project the applications will focus on simulation and virtual reality systems. The goal of the simulation subproject (Molecular dynamics with proteins) is to run the first simulation of protein folding in an ion solution while breaking the nano-seconds barrier which limits protein folding simulations until today. The RemoteCAVE subproject aims at building an infrastructure that interconnects museums and powerful scientific institutions allowing museums and exhibitions to get graphics power on demand over the network. The approach is similar to allocating compute power over the net but has never been done up to now. The third subproject is a virtual multi-user environment with a large number of (distributed) participants which could not be realized due to lack of both compute power and network bandwidth until today. The RePhoNet project is expected to start April 2001.
Link:
http://rephonet.gmd.de/Please contact:
Thomas Eickermann, Helmut Grund, Lothar Zier - GMD
Tel: +49 2461 616596, +49 2241 14 2298, +49 2241 14 2943
E-mail: Th.Eickermann@fz-juelich.de, Helmut.Grund@gmd.de, Lothar.Zier@gmd.de