DPE 2.1 Production Operations Manual
(MOP with Condor_G Match-Making)
Please carefully read all Notes and Important notes
on this page.
1. Install DPE-Master 2.1 using pacman, as specified on DPE page.
2. It will create a setup.sh file for you in your prod-ops
directory with these contents,
export PROD_OPS=`pwd`
export DPE_PATH=<<DPEClientPath>>
export GLOBUS_PATH=$DPE_PATH/globus
source $DPE_PATH/setup.sh
export MOP_DIR=$PROD_OPS
export MOP_LOG_STAGEOUT_HOST gyoza7.fnal.gov
export MOP_DATA_STAGEOUT_HOST gyoza7.fnal.gov
export MOP_MASTER_HOST `hostname`
export MOP_LOG_GUC_DIR /prod/PCP/validation/test/anzar/log
<-------- pre-create this directory (in
dCache mind mount point)
export MOP_DATA_GUC_DIR /prod/PCP/validation/test/anzar/data
<-------- pre-create this directory (in dCache mind mount
point)
export PROD_RESOURCES $PROD_OPS/McRunjob/cms/ImpalaLite
export TrackingPath $PROD_OPS/McRunjob/cms/ImpalaLite/cms_db
export MC_RUNJOB_PATH $PROD_OPS/McRunjob/mcj_scripts/IMPLite
export localCacheArea $PROD_OPS/McRunjob/mcj_scripts/IMPLite/localCache
<--------- pre-create this directory
export PYTHONPATH $PROD_OPS/mop_submitter:$PROD_OPS/McRunjob/py_script
export commonOutDir $PROD_OPS/IGT-tests/commonOutDir
export MOP_MATCHMAKER_URL=<<match-maker-hostname>>:/jobmanager-condor <<---------------
This is a new variable
Note: In general the match-maker-hostname is same as
the MOP master host. Also one could use only jobmanager (fork) but using a
jobmanager-condor (condor) is highly recomended, even if that means setting
up condor for local host only.
3. Source setup.sh
4. Make sure if you are user "X" then your DN is mapped to your
used-id "X" in the grid-mapfile on the gatekeeper specified
by MOP_MATCHMAKER_URL above (in general on MOP Master). This is not required
on other worker nodes. Yujun Wu (yujun@fnal.gov) or VOMS manager should be
contacted to make sure this happens on your MOP Master. Without meeting
this condition, Match maker might not function.
5. cd mop_submitter and run
./mop_matchmaker_monitor.py --scheduler --group_size
= 20
This will start Condor_G Match-Maker. Verify by running condor_q. This process
should always be running.
Take special care when using wild-cards to with condor_rm not to remove
this process too. If it happens just restart it as mentioned here.
6. Make sure site files (in mop_submitter/site-info directory) are
UPTO DATE, each site is represented ONLY by a <site>.vars file now
(also read the note below for backward compatibility). Which need to have
following variables,
MOP_MAX_JOBS=100 <<-------Total number
of Jobs allowed at a site.
ing)
# Globus gatekeeper contact strings.
MOP_REMOTE_JOB_MANAGER_FOR_RUN=<worker node>:/jobmanager-condor
MOP_REMOTE_JOB_MANAGER_FOR_STAGE_IN=<worker node>:/jobmanager
MOP_REMOTE_JOB_MANAGER_FOR_STAGE_OUT=<worker node>:/jobmanager
MOP_REMOTE_JOB_MANAGER_FOR_PUBLISH=<worker node>:/jobmanager
MOP_REMOTE_JOB_MANAGER_FOR_CLEANUP=<worker node>:/jobmanager
# Where to create working directories for jobs.
MOP_REMOTE_RUNTIME_AREA=/home/anzar/MOPTMPAREA
MOP_EXPORT_DIR=/home/anzar/MOPTMPAREA/flatfiles
# this is the dir under which Globus is installed on the remote system
MOP_REMOTE_VDT_LOCATION=/vdt
# this is the dir under which the CMS DAR is installed on the remote system
MOP_REMOTE_DAR_ROOT=/home/anzar/DAR
##Leave this one like this
MOP_NO_SHARED_FS=N
Note:
- The old format of having 05 files per site is "Preserved"
for backward compatability.
- Also make sure that you have 05 site files for matchmaker
itself.
The site files will be generated by a continously running configuration-monitor
script (when yujun returns the script is not working for me !).
IMPORTANT: Turning ON/OFF a worker site to Matchmaker.
The sites could be turned on or off to match-maker, i.e. Match-maker
will submit/not-submit jobs to a particular site.
- To make a site visible to matchmaker, just "touch" a <sitename>.ClassAd
file in mop_submitter/site-info directory. You can almost immediately notice
that Match-maker puts site ClassAd into this file.
- To remove a submission site, just delete the <sitename>.ClassAd
file from mop_submitter/site-info directory
Running CMKIN Production assignments
7. Edit McRunjob/cms/ImpalaLite/CMKIN.conf
Update following varibales as given here,
IfStageOutput=true
IfSaveLocal=false
IfSaveRemote=true
useBoss=0
useDAR=true
DARpath=$MOP_REMOTE_DAR_ROOT
EnvironmentType=MOP
8. To run jobs,
i. cd McRunjob/py_script
ii. Run Linker and create jobs
python Linker.py script=ImpalaMOPOneStep.mcj AssignmentID=2210
AssignmentType=CMKIN mopSite=matchmaker
cmkinOutPath=$commonOutDir
dagOutPath=$commonOutDir nloop=5
(Please note changes from the previous versions).
iii. Submit jobs.
python Linker.py script=IMPLRunJob_MOP.mcj Scheduler=MOP
AssignmentID=2210 useBoss=0
mopSite=matchmaker
mopDagPath=$commonOutDir mopNumOfJobs=1
Running CMSIM Production assignments
7. Edit McRunjob/cms/ImpalaLite/CMSIM.conf
Update following varibales as given here,
GeometryFile=insert_mopgeom_loc
LocalInputPath=/pnfs/cms/prod/PCP/validation/test/anzar/data
<-------- input base-path of ntpl file(s) (could
be anywhere)
IfStageinLocal=true
IfStageinRemote=false
IfStageOutput=true
IfSaveLocal=false
IfSaveRemote=true
IfSaveHBOOK=true
useBoss=0
useDAR=true
DARpath=$MOP_REMOTE_DAR_ROOT
EnvironmentType=MOP
8. To run jobs,
i. cd McRunjob/py_script
ii. Run Linker and create jobs
python Linker.py script=ImpalaMOPOneStep.mcj AssignmentID=2788
AssignmentType=CMSIM mopSite=matchmaker cmsimInputPath=/home/anzar/TEMP/IGT-tests/commonOutDir/Validation_USMOP
dagOutPath=$commonOutDir nloop=5
iii. Submit jobs.
python Linker.py script=IMPLRunJob_MOP.mcj Scheduler=MOP
AssignmentID=2788 useBoss=0 mopSite=matchmaker
mopDagPath=$commonOutDir mopNumOfJobs=1
To be Noted:
For every submitted batch of jobs there will be TWO instead
of one dagman process, displayed by condor_q. Do not confuse that with
some error. First dagman is running job that you have submitted to match-maker
site, and then match-maker site has further submitted another dagman job to
a grid-site, that it has matched with.
Important Note:
In case one need to submit jobs to a specific site (like older versions
of MOP). The current version is back-ward compatible,
- Make sure you have all 05 site files for your target-site
- Use <site-name> in step 7 and 8 above instead of "matchmaker"
for mopSite parameter.
Production Operation Tools.
MOP now has a new set of tools to help Production Operations. Thanks to Nickolai !.
These tools are present in mop_submitter/misc directory. Adding this to your path/python-path will make them available.
Follow this link http://home.fnal.gov/~kuropat/cms/cms.html .
Error Reporting: Please report errors to dpe-discuss@fnal.gov or contact lists from DPE page.
======================================================================
M. Anzar Afaq anzar@fnal.gov
Fermi National Accelerator Laboratory phone: (630) 840-6856
Computing Division - CMS Group fax : (630) 840-2783
P.O.Box 500, MS 234, Batavia, IL 60510 http://home.fnal.gov/~anzar