Internal, open access

Campus Grid

Introduction

The Reading Campus Grid is a Condor Pool consisting of between 200 and 500 nodes. It is provided by IT Services and SSE.  The aim of the Campus Grid is to provide researchers with a High Throughput Computing (HTC) resource.

HTC differs from High Performance Computing (HPC) in that it is designed to process tasks that require fairly short processing times (usually minutes or hours) but that need to be run 100's or 1000's of times.

For example; Say you have a process that takes a sample of data from a data-set, analyses the sample and then stores a result. This process takes 30 minutes but need to be done 1000 times. This requires 500 hours of processing time. On a single computer this task will take about 21 days to complete whereas using the Campus Grid (using around 250 nodes) this can be done in 2 hours!

Tasks that take longer than an hour or so can be run on the Campus Grid but may require some alteration to make the most of the way that the Campus Grid works. This normally involves creating check-points of the tasks progress so that it can restart from where it left-off if interrupted (please see Campus Grid User Guide).

To give you more of a feel of what the Campus Grid can be used for, the Campus Grid Users page gives details of some of the work that is already using the Campus Grid.

Although the worker machines in the Condor pool natively run Windows XP, CoLinux is installed as a Windows service on these machines to provide the Linux environment in which Condor runs. This means that users can run their Linux executable on the pool without the need to dual-boot the worker machines.

We anticipate that the capacity of the Campus Grid will increase to include more desktop machines and to provide access to high-end resources such as clusters around the University. We are in the process of linking the Campus Grid with the Campus Grid at the University of Oxford (OxGrid) to provide a powerful shared facility for researchers at both universities.

 

Features and Limitations

Machine availability

As it stands up to about 500 machines could be available for running your jobs.  The actual number of machines will rise and fall during the day as students use the machines, but overnight nearly all are available.

When a user logs on to a machine in person then the machine becomes no longer available to the Campus Grid, and if there is a Campus Grid job running then it will be stopped.  After a while Condor will try restarting stopped jobs on another machine.  

Therefore, your code needs to either be one that needs many short runs, or if it runs for a longer time needs to make a record of where it got up to so it can restart from where it left off when it gets restarted on another machine.

Account and Access

To access the Campus Grid you need an account on the main server.  This will have a different username to your normal Reading account (see the Getting Access section).  Access can be though ssh and username and password or using the GSI-SSHTerminal and a e-Science Certificate.

This does mean that you'll need to transfer files to and from this account.

Filesystem

All the nodes in the campus Grid have access to the N drive belonging to your Grid account.  This has the advantage that whatever node you write a file it will be visible immediately on the other nodes, although it does mean that you will need to be careful to make sure that each different code run writes its results to a different output file. 

The Grid account's N drive appears as your home directory on the nodes.  Initially your Grid account will have a quota of 2Gb, but we realise that for some this will not be enough and so we can extend this up to a maximum 10Gb if this is required.

Operating System

The Campus Grid runs the Linux Operating System, on the login node CentOS release 5.2 (fully compatible with Red Hat Enterprise Linux 5) and on the computational nodes Fedora core 8.

Software

For information on the software available on the Campus Grid please see the pages about Using Matlab on the Campus Grid and Using R on the Campus Grid.

National Grid Service (NGS)

As well as the local methods for access the Campus Grid described bellow, the Campus Grid also supports access through the National Grid Service interfaces.  It is possible to use the NGS from the Campus Grid, this means that it can be quick to move your code to the wider National Grid, should you need a different type of resource. In addition there are many applications available for immediate use on the NGS (see NGS Softwarefor more details). Resources include

  • Central Core Services: these four resources (Leeds, Manchester, Oxford and STFC-RAL)have 256 cores with 2GB per core arranged in dual cpu-dual core and quad cpu-dual core. They are Hhigh Performance Resources and so are suited to parallel jobs where the different parts of the job must communicate.
  • Other cluster compute resources: at Cardiff (SGI), Lancaster (Linux), Glasgow (Linux x2), Westminster (Linux).
  • Condor Pools: Cardiff's Windows Condor Pool.

For more details of these resources see NGS Resources.

 

Getting Access

In-order to access the Campus Grid you will need to email ITSHelp@reading.ac.uk in IT Services and request an account on the Campus Grid.

 

Using the Grid

Once you have your account you can find detailed instructions for using the Campus Grid in the Campus Grid User Guide.

Further useful sources of information are:

  • e-Research Documentation - the e-Research documentation page has links to Condor help on other sites.
  • Useful user tutorials from the European Condor Week: Condor User Tutorial and Advanced User Tutorial (please note that some of the features described may not work on the Campus Grid, but if they would be useful for your work please contact ITSHelp and we will try to enable them).
  • The full Condor Manual from the Condor Project

Things to do now

Page navigation

 

Search Form

A-Z lists