The input and output options can be used separately or together,
although register_data requires copy_data. See
"
"
below for instructions on how to get output back to your computer.
- create X jobs
crab -create X
- For ORCA, this will create X jobs in chunks of "job_number_of_events."
- For CMSSW, this will create X jobs running on "files_per_job"
(which may be from rounding events_per_job to an integer number of files_per_job).
or create all jobs
crab -create all
- For ORCA, this will create all jobs in chunks of "job_number_of_events"
for the "total_number_of_events."
- For CMSSW, this will create all jobs, each running on all the
events in the "files_per_job" until "total_number_of_events" is reached
(again, files_per_job, may be a result of rounding events_per_job).
- the jobs are created in the project directory in the
subdirectory
crab_?_date_time
this directory
contains
- the executable (if specified) and the needed libraries in a tarball
- submission and steering scripts
- the grid job-id's are stored in the crab directory in
the subdirectory
log/submission_id.log
- this directory has to be handed over to "crab" for every
following command by
crab ..... -continue crab_?_date_time
as the last argument. If the specific crab directory
is omitted, the last created is taken.
- As the crab directory has a cryptic name, it is possible
to link it to a more simple name in the project directory
and use the link instead of the full crab directory name
- to submit X jobs of the created
jobs
crab -submit X -continue crab_?_date_time
or to submit all created jobs
crab -submit all -continue crab_?_date_time
- CRAB provides an easy interface to check the status
of the submitted
jobs
crab -status -continue crab_?_date_time
- the output gives a list of the submitted jobs, their
crab job-id's and their current status
Ready: job is submitted to the
grid
Scheduled: job is accepted by the farm publishing
the requested dataset and is waiting for
execution
Running: job is running
Done: job has
finished with EXIT_CODE
Cleared: job output has been
retrievedAborted:
job has been aborted
Killed: job has been killed
and looks like
crab. crab (version Version 1) running on
Sun Nov 13 12:45:25 2005
crab. Working
options:
scheduler boss
job type ORCA
(or CMSSW, FAMOS)
working
directory ... crab_0_051113_123437/
crab. Using
BOSS
INTERNAL_ID STATUS E_HOST EXE_EXIT_CODE
JOB_EXIT_STATUS
1 Running lxgate13.cern.ch
2 Scheduled
lxgate13.cern.ch
3 Scheduled lxgate13.cern.ch
4
Scheduled lxgate13.cern.ch
5 Scheduled
lxgate13.cern.ch
6 Scheduled lxgate13.cern.ch
7
Scheduled lxgate13.cern.ch
8 Scheduled
lxgate13.cern.ch
9 Scheduled lxgate13.cern.ch
10
Waiting
>>>>>>>>> 10
Total Jobs
>>>>>>>>> 8
Jobs Scheduled
>>>>>>>>> 1
Jobs Running
crab. Log-file is ...
crab_0_051113_123437/log/crab.log
. The standard error and standard output is always
retrieved using CRAB commands.
- To retrieve standard error and standard output:
crab -getoutput job-id -continue
crab_?_date_time
where job-id can be a single crab
job-id, a comma separated list of job-id's or a range of
job-id's from the status
check. To get the output of all jobs:
crab -getoutput -continue crab_?_date_time
- If you set return_data to true (1), this command
will also retrieve any output files specified by
"output_file".
- Jobs do not have to be retrieved all at once.
- The shell used for submission or status checking does not have to be used to get output.
To use a different shell, simply follow the setup procedure.
- the stdout and stderr dumps are saved in the
crab directory, in the subdirectory
res
where the output of an individal
crab job-id is identified by its id:
ORCA_000001.stderr (or CMSSW_000001.stderr,
FAMOS_000001.stderr)
ORCA_000001.stdout
ORCA_000002.stderr
ORCA_000002.stdout
ORCA_000003.stderr
ORCA_000003.stdout
ORCA_000004.stderr
ORCA_000004.stdout
ORCA_000005.stderr
ORCA_000005.stdout
ORCA_000006.stderr
ORCA_000006.stdout
ORCA_000007.stderr
ORCA_000007.stdout
ORCA_000008.stderr
ORCA_000008.stdout
ORCA_000009.stderr
ORCA_000009.stdout
ORCA_000010.stderr
ORCA_000010.stdout
(copy_output or register_output), you must
retrieve your output specified by output_file manually:
- First verify that the file exists. For sites using Castor as the
storage element file system (such as CERN), ssh
to the server hosting your storage element and execute the command:
nsls -l directory
at CERN,
nsls -l /castor/cern.ch/user/u/username/subdir
For sites using dCache as the storage element file system (such as FNAL),
ssh to the server hosting your storage element and use ls as usual.
At FNAL,
ls /pnfs/cms/WAX/resilient/username/subdir
If you wish to verify without logging into the storage element
server, use edg-gridftp-exists for the gsiftp protocol and
srm-get-metadata for the srm protocol.
- edg-gridftp-exists using the gsiftp protocol:
edg-gridftp-exists gsiftp://storage_elementstorage_path/output_file_job#.ext
This command will print no output if the file exists and
will print an error if the file does not exist. If you're
not certain of the name of your file, you can use:
edg-gridftp-ls gsiftp://storage_elementstorage_path/
| grep RegExp
to look for files with names like RegExp. The
edg-gridftp-ls command will list all contents of the
directory, but typically the output is quite large,
necessitating grep.
- srm-get-metadata using the srm protocol:
srm-get-metadata srm://storage_element:8443storage_path/output_file_job#.ext
Note that if storage_path contains a ? and you are issuing this
command from a c-shell, you must replace every ? with a \?.
The srmls command should do the same thing as edg-gridftp-ls, but,
as of the writing of this tutorial, this command does not function.
For now, you will need to know the exact name of any files sent to
dCache storage elements when using the srm protocol.
If your files don't exist, check the stdout file(s) from the crab -getoutput command
stored in the res subdirectory. Storage element output is near the bottom
of the file. If you set register_data to true (1), you
may still be able to retrieve your data (see below).
- Next, retrieve your data. This can be done simply at all sites. ssh to
the server hosting your data. If the server uses Castor, use rfcp just as
you use the cp command. For example, at CERN:
rfcp /castor/cern.ch/user/u/username/subdir/output_file_job#.ext destination
If the server uses dCache, use dccp just as you use the cp command.
For example, at FNAL:
dccp /pnfs/cms/WAX/resilient/username/subdir/output_file_job#.ext
destination
If you wish to retrieve your data without logging into the storage element
server, use lcg-cp for the gsiftp protocol and srmcp for the srm protocol.
- lcg-cp using the gsiftp protocol:
lcg-cp --vo cms gsiftp://storage_elementstorage_path/output_file_job#.ext file:////`pwd`/localfilename.ext
- srmcp using the srm protocol:
srmcp srm://storage_element:8443storage_path/output_file_job#.ext file:////`pwd`/localfilename.ext
Attention: If you used register_data and the storage element and path
you specified failed, register_data will attempt to copy your output to a different
storage element. You can attempt to salvage your data using the lcg-cp protocol, the LFN
you specified and the variable lfc_home set in your crab.cfg file (typically /grid/cms):
lcg-cp --vo cms lfn:lfc_home/lfn_dir/output_file_job#.ext file:////`pwd`/localfilename.ext
- Finally, delete your data. Again, this can be done simply at the site.
For Castor sites, use the command rfrm just as you would use rm. At CERN:
rfrm /castor/cern.ch/user/u/username/subdir/output_file_job#.ext
For dCache sites, use rm. At FNAL:
rm /pnfs/cms/WAX/resilient/username/subdir/output_file_job#.ext
If you wish to delete without logging onto the site, you can use srm-advisory-delete for the srm protocol.
There is currently no safe way to delete unregistered files using lcg tools
(for the gsiftp protocol).
Attention: If you used register_data, you should not
delete your files using rfrm/rm or srm-advisory-delete. Instead, use lcg-del,
the LFN you specified and the variable lfc_home set in your crab.cfg file
(typically /grid/cms). This will ensure the catalog is updated with the
deletion of the file.
lcg-del --vo cms -a lfn:lfc_home/lfn_dir/output_file_job#.ext
This command attempts to delete the lfn, guid and the file itself. It is usually
successful at deleting the lfn and guid, but occasionally will not delete the file.
If you know where the file is stored, check if it was deleted after issuing this
command; if it wasn't, delete it manually.
- if the status
of the jobs in question is "Killed", "Aborted" or "Done" and
the "Exit_Code" is not 0, they can be resubmitted
immediately with the same executable and the same settings
by
crab -resubmit job-id -continue
crab_?_date_time
where job-id can be a single crab
job-id, a comma separated list of job-id's or a range of
job-id's from the status
check. To resubmit all "failed" jobs at once (ignoring
jobs with "Done" or "Cleared" status)
crab -resubmit -continue crab_?_date_time
- IMPORTANT: if the status is "Done" and the "Exit_Code"
is not 0, the user has to retrieve
the job output before he can resubmit the job
- if the user submitted jobs and needs to kill them, he
can do this by:
crab -kill job-id -continue crab_?_date_time
where job-id can be a single crab job-id, a comma
separated list of job-id's or a range of job-id's from the
status
check