DPE 2_4 Production Operations Manual

(MOP for CMKIN/CMSIM/OSCAR and Hit with Condor_G Match-Making)

Quick Reference: General Setup:

Please carefully read all Notes and Important notes on this page.


1. Install DPE-Master 2_3 using pacman, as specified on DPE page.
2. It will create a setup.sh file for you in your prod-ops directory with these contents,
export PROD_OPS=`pwd`
export DPE_PATH=<<DPEClientPath>>
export GLOBUS_PATH=$DPE_PATH/globus
source $DPE_PATH/setup.sh
export MOP_DIR=$PROD_OPS
#
##MOP Staging Setup
#
#Stage-in
#
# Set this flag true if you like your input data file to be staged in from dCache directly
export MOP_DCACHE_STAGEIN_FLAG=true
#
#Provide the dCache host and Path of the input files.
#iExample: For assignment 3921, the input fz files were generated by assignment 3417
export
MOP_DCACHE_STAGEIN_HOST=cmsgridftp.fnal.gov
export MOP_DCACHE_STAGEIN_PATH=/prod/PCP/3417/data         
<-------- pre-existing directory (in dCache mind mount point (/pnfs/cms for FNAL) )
#
#Stage-out
#
export MOP_LOG_STAGEOUT_HOST
=cmsgridftp.fnal.gov
export MOP_DATA_STAGEOUT_HOST=cmsgridftp.fnal.gov
export MOP_MASTER_HOST=`hostname`
export MOP_LOG_GUC_DIR=/prod/PCP/validation/test/anzar/log        <-------- pre-create this directory (in dCache mind mount point)
export MOP_DATA_GUC_DIR=/prod/PCP/validation/test/anzar/data    <-------- pre-create this directory (in dCache mind mount point)
#
## McRunjob setup
#
export PROD_RESOURCES=$PROD_OPS/McRunjob/cms/ImpalaLite
export TrackingPath=$PROD_OPS/McRunjob/cms/ImpalaLite/cms_db
export MC_RUNJOB_PATH=$PROD_OPS/McRunjob/mcj_scripts/IMPLite
export localCacheArea=$PROD_OPS/McRunjob/mcj_scripts/IMPLite/localCache     <--------- pre-create this directory
export PYTHONPATH=$PROD_OPS/mop_submitter:$PROD_OPS/McRunjob/py_script
export commonOutDir=$PROD_OPS/IGT-tests/commonOutDir
export MOP_MATCHMAKER_URL=<<match-maker-hostname>>:/jobmanager-condor         <<---------------  This is a new variable

Note: In general the match-maker-hostname is same as the MOP master host. Also one could use only jobmanager (fork) but using a jobmanager-condor (condor) is highly recomended, even if that means setting up condor for local host only.
3. Source setup.sh

4. Make sure if you are user "X" then your DN is mapped to your used-id "X" in the grid-mapfile on the gatekeeper specified by MOP_MATCHMAKER_URL above (in general on MOP Master). This is not required on other worker nodes. Yujun Wu (yujun@fnal.gov) or VOMS manager should be contacted to make sure this happens on your MOP Master. Without meeting this condition, Match maker might not function.

5. cd mop_submitter and run
./mop_matchmaker_monitor.py --scheduler --group_size = 20
This will start Condor_G Match-Maker. Verify by running condor_q. This process should always be running.
Take special care when using wild-cards to with condor_rm not to remove this process too. If it happens just restart it as mentioned here.

6.  Make sure site files (in mop_submitter/site-info directory) are UPTO DATE, each site is represented ONLY by a <site>.vars file now (also read the note below for backward compatibility). Which need to have following variables,
MOP_MAX_JOBS=100    <<-------Total number of Jobs allowed at a site.
# Globus gatekeeper contact strings.
MOP_REMOTE_JOB_MANAGER_FOR_RUN=<worker node>:/jobmanager-condor
MOP_REMOTE_JOB_MANAGER_FOR_STAGE_IN=<worker node>:/jobmanager
MOP_REMOTE_JOB_MANAGER_FOR_STAGE_OUT=<worker node>:/jobmanager
MOP_REMOTE_JOB_MANAGER_FOR_PUBLISH=<worker node>:/jobmanager
MOP_REMOTE_JOB_MANAGER_FOR_CLEANUP=<worker node>:/jobmanager

# Where to create working directories for jobs.
MOP_REMOTE_RUNTIME_AREA=/home/anzar/MOPTMPAREA

MOP_EXPORT_DIR=/home/anzar/MOPTMPAREA/flatfiles

# this is the dir under which Globus is installed on the remote system
MOP_REMOTE_VDT_LOCATION=/vdt

# this is the dir under which the CMS DAR is installed on the remote system
MOP_REMOTE_DAR_ROOT=/home/anzar/DAR

##Leave this one like this
MOP_NO_SHARED_FS=N

###These are new parameters for localizing jobs on a worker node
#To turn on localization
MOP_USE_WORKER_SCRATCH=Y

#Loacl worker node are where you want MOP to create runtime ae for your jobs (sufficiently big ~2 GB)
MOP_WORKER_SCRATCH_AREA=/tmp


Note:

  1. The old format of having 05 files per site is NOT "Preserved", so no need to "maintain" 05 files per-site.
  2. Also make sure that you have 05 site files for "Generic site" always present. 
                                      generic.site, generic.site.publish, generic.site.stage-out, generic.site.cleanup, generic.site.stage-in, generic.vars

Site file Creation Using ConfMon:

The site file for a particular site could be generated by running configuration monitor client, like this,

<DPE>/confmon/client/client_query_glue.py  site-host  GIIS-host mop

IMPORTANT:  Turning ON/OFF a worker site to Matchmaker.
The sites could be turned on or off to match-maker, i.e. Match-maker will submit/not-submit jobs to a particular site. Submitting to a site directly
1. Create jobs for "generic" site. (ALWAYS.)
2. Submit to the <site>.

Running CMKIN Production assignments

   7. Edit McRunjob/cms/ImpalaLite/CMKIN.conf
Update following varibales as given here,

IfSaveOutput=true   <<<-----Note the change from previous version
OutProtocol=cp
OutputPath=/data/ANZAR/dgt-prod-ops/commonOutDir
useBoss=0
useDAR=1   <<<-----Note the change from previous version, instead of true/false we use 1/0 now.
DARpath=$MOP_REMOTE_DAR_ROOT
EnvironmentType=MOP

  8.  To run jobs,  
    i. cd McRunjob/py_script
    ii. Run Linker and create jobs
python Linker.py script=ImpalaMOPOneStepForAll.mcj AssignmentID=2210 AssignmentType=CMKIN mopSite=generic cmkinOutPath=$commonOutDir
dagOutPath=$commonOutDir nloop=5
(Please note changes from the previous versions).
iii. Submit jobs.
python Linker.py script=IMPLRunJob_MOP.mcj Scheduler=MOP AssignmentID=2210 useBoss=0
 
mopSite=generic  mopDagPath=$commonOutDir mopNumOfJobs=1

Submitting to a site directly

python Linker.py script=IMPLRunJob_MOP.mcj Scheduler=MOP AssignmentID=2210 useBoss=0
 
mopSite= <site>  mopDagPath=$commonOutDir mopNumOfJobs=1

Running CMSIM Production assignments

   7. Edit McRunjob/cms/ImpalaLite/CMSIM.conf
Update following varibales as given here,

GeometryFile=./cms133.rz
InputPath=/data/ANZAR/dgt-prod-ops/commonOutDir                             <<<-----Note the change from previous version
IfStageInput=true                      <<<-----Note the change from previous version                
IfSaveOutput=true                   <<<-----Note the change from previous version 
InProtocol=cp
IfSaveHBOOK=true
useBoss=0
useDAR=1                               <<<-----Note the change from previous version
DARpath=$MOP_REMOTE_DAR_ROOT
EnvironmentType=MOP

  8.  To run jobs,  
    i. cd McRunjob/py_script
    ii. Run Linker and create jobs
python Linker.py script=ImpalaMOPOneStepForAll.mcj AssignmentID=2788 AssignmentType=CMSIM mopSite=generic dagOutPath=$commonOutDir nloop=5
iii. Submit jobs.
python Linker.py script=IMPLRunJob_MOP.mcj Scheduler=MOP AssignmentID=2788 useBoss=0 mopSite=generic  mopDagPath=$commonOutDir  mopNumOfJobs=1

Submitting to a site directly

python Linker.py script=IMPLRunJob_MOP.mcj Scheduler=MOP AssignmentID=2788 useBoss=0
 
mopSite= <site>  mopDagPath=$commonOutDir mopNumOfJobs=1

Running OSCAR Production assignments

   7. Edit McRunjob/cms/ImpalaLite/OSCAR.conf
Update following varibales as given here,
EnvironmentType=MOP
GeometryPath=.
InputPath=/data/ANZAR/dgt-prod-ops/commonOutDir       <-------- input base-path of datasetname directory containing ntpl file(s)
IfStageInput=true
InProtocol=cp
OutProtocol=cp
IfSaveOutput=true
useBoss=0
useDAR=1
DARpath=$MOP_REMOTE_DAR_ROOT

  8.  To run jobs,  
    i. cd McRunjob/py_script
    ii. Run Linker and create jobs
python Linker.py script=ImpalaMOPOneStepForAll.mcj AssignmentID=1278 AssignmentType=OSCAR mopSite=generic dagOutPath=$commonOutDir nloop=5
iii. Submit jobs.
python Linker.py script=IMPLRunJob_MOP.mcj Scheduler=MOP AssignmentID=1278 useBoss=0 mopSite=generic  mopDagPath=$commonOutDir mopNumOfJobs=1

Submitting to a site directly

python Linker.py script=IMPLRunJob_MOP.mcj Scheduler=MOP AssignmentID=2788 useBoss=0
 
mopSite= <site>  mopDagPath=$commonOutDir mopNumOfJobs=1

Running Hit Production assignments

   7. Edit McRunjob/cms/ImpalaLite/Hit.conf
EnvironmentType=MOP
InputPath=/data/ANZAR/dgt-prod-ops/commonOutDir   <-------- input base-path of datasetname directory containing FZ file(s)
IfStageInput=true
IfSaveOutput=true
InProtocol=cp
OutProtocol=cp
useBoss=0
useDAR=1
DARpath=$MOP_REMOTE_DAR_ROOT
##Geometry_PATH will be resolved through DAR
  8.  To run jobs,  
    i. cd McRunjob/py_script
    ii. Run Linker and create jobs
python Linker.py script=ImpalaMOPOneStepForAll.mcj AssignmentType=Hit AssignmentID=3921 mopSite=generic dagOutPath=$commonOutDir nloop=10
iii. Submit jobs.
python Linker.py script=IMPLRunJob_MOP.mcj Scheduler=MOP AssignmentID=3921 useBoss=0 mopSite=generic mopDagPath=$commonOutDir mopNumOfJobs=10

Submitting to a site directly

python Linker.py script=IMPLRunJob_MOP.mcj Scheduler=MOP AssignmentID=3921 useBoss=0 mopSite=<site> mopDagPath=$commonOutDir mopNumOfJobs=10


To be Noted:
When submitting to Generic site (Match Maker), for every submitted batch of jobs
there will be TWO (instead of one) dagman process, displayed by condor_q. Do not confuse that with some error. First dagman is running job that you have submitted to match-maker site, and then match-maker site has further submitted another dagman job to a grid-site, that it has matched with.

Production Operation Tools.

MOP now has a new set of tools to help Production Operations. Thanks to Nickolai !.
These tools are present in mop_submitter/misc directory. Adding this to your path/python-path will make them available.

Follow this link http://home.fnal.gov/~kuropat/cms/cms.html .


Error Reporting: Please report errors to dpe-discuss[AT]fnal.gov or contact lists from DPE page.



======================================================================
M. Anzar Afaq anzar[AT]fnal.gov
Fermi National Accelerator Laboratory phone: (630) 840-6856
Computing Division - CMS Group fax : (630) 840-2783
P.O.Box 500, MS 234, Batavia, IL 60510 http://home.fnal.gov/~anzar