Fermilab Short Term SDSS Data Distribution to the Collaboration

There are currently four methods of receiving or retrieving SDSS data from Fermilab:

All of these processes are described in great technical detail at http://sdss.fnal.gov:8000/~bclee/guru/data.html. Data transmissions, the current workhorse of data distribution, is described below.

Data Transmission

Author: Brian Lee (additional scripts written by Gordon Richards and Jeff Munn are used in the pre-transmit prep script.)

Description

The automated data transmissions are currently the primary means of data distribution to collaboration members wanting large amounts of the data, and will continue until the SX provides access to all data.

After the processing of each run is completed, Fermilab copies this data to disk available to all collaborators. From here Fermilab transmits a subset of the data files to any collaborating institution which requests it. Fermilab announces what is available and when the transfer will take place approximately one day in advance. Each institution selects a standard subset of files to receive, which are then transfered to a disk on their own machines.

The transfer is done via scp with encryption disabled and authentication by RSA. Although the channel is clear text, nothing passed over the network should allow anyone listening in to access the account. No passwords or keys are passed, just an encrypted challenge. Encryption can not be used due to its cpu intensive nature and the limited and shared cpu resources the EAG uses -- the large data sets and high transfer rates involved would require dedicated multiprocessor machines if encryption were employed.

Status

The automated data transmissions have been performed since October of 1999, with continual improvements in transfer rate, error handling, and automation. The system is now quite robust and transmissions are regularly made by various members of the data processing team.

This process is limited by available bandwidth between Fermilab and the collaborating institutions. Fermilab is able to distribute all data products for a full nights run to the University of Chicago within well under 24 hours, for instance. The same is not true of all institutions, many of which have slower connections and thus must request smaller subsets of the data products.

Improvements

Work on complete automation of the transfers continues. At the present, some options and logging for the batch system are still left to the operator. The default command options are automatically generated and logged and the operator enters the commands by hand with the option of changing them.

Since data transmissions are a temporary solution until the SX is able to provide access to all data, and the system is currently fully functional and robust, no further major changes or improvements beyond improved automation are planned.


Brian Lee / bclee@fnal.gov / (630) 840-6646
Last modified: Wed Jul 5 16:34:10 GMT 2000