Like Hadoop and unlike most DG, PAR is designed to be used exclusively on private resources. PAR’s ideal scale is then smal-ler than what DG systems usually target, but this permits a lower latency. For simplicity, PAR uses pull-driven task distribution. This removes the need for a complex software component (called a sche-duler) and also allows to scale smoothly even in large, dynamic and heterogeneous environments. In addition, PAR never requires administrator privileges and is only run on-demand.
3 Example use
The first example experiment consists of computing Alpha Carbons Root Mean-Square Deviation after optimal superposition, noted CαRMSDopt hereafter, on one thousand ab initio generated structures for the protein target 256B. Distances between proteins are computed using the software from (Zhang and Skolnick (2004)). The second experiment performs Molecular Replacement (MR), a method of solving the phase problem in X-ray crystallography using homologous structures, on a set of 192 decoys for the protein target 1m6t. We present the time elapsed with and without using PAR. PAR in parallel mode uses several cores of a given computer while the distributed mode uses distinct computers. The current imple-mentation of PAR is known to work well with up to 16 and 64 CPUs in parallel and distributed mode respectively.
Prior to timing experiments, needed programs and data were copied to each machine by the user. During experiments, PAR was started in server mode with a list of commands to execute. Workers were started soon after the server, but could have joined the compu-tation later if we were not interested in the shortest completion time. The Unix ’time’ command was used and averaged over two trials to measure the real time spent by PAR to complete all tasks. Unlike previous job crushers, PAR server’s life cycle is only tied to the application’s execution time (no Unix daemon involved) and PAR runs only in user-space.
Results are shown in Figure 1. The first bar is the real time elapsed when not using PAR. The second bar is the time spent when using PAR in parallel mode, following bars are durations in distributed mode. On a CPU-intensive task and when using 16 CPUs, the speedup obtained by PAR can be as high as 14.01 in the parallel case and 15.54 in the distributed one. Lower performance of the parallel version is attributed to Python’s problem with multithread applications (the Python interpreter uses a global lock mechanism shared by all threads). We can see that the application scales remarkably well. The overhead due to communications between workers and the master is very small, this allows for an effective use of the parallel hardware with minimum effort required on the user’s side.
4 Future developments
PAR can be used on network of Unix-like workstations. It can take advantage of a Network shared File System (NFS). However,because of poor NFS performances, data-intensive tasks should be computed on top of a Distributed File System (DFS). As DFS are still rare even within clusters, we envisage to plug in such a func-tionality into PAR. A prototype has been implemented but is still in experimental stage.
PAR should integrate fault-tolerance policies, in order to be used safely even with more workers over longer periods, and with minimal overhead.
Furthermore, compression could be added to speedup communi-cations. Encryption would be similarly easy to add and would allow PAR to be used over untrusted networks.
Finally, features can be added for large-scale experiments. For example, requesting groups of jobs instead of one at a time wouldlower the load on the server part. Allowing PAR to run both as a server and as a client would allow it to be deployed in layers, which could be used to connect several clusters together and incre-ase scalability. Requests and contributions from users are also considered.