Setting Up a MOP Worker Site.


            Overview
            ========

            MOP is a system for distributing CMS production jobs. There is a MOP master site where jobs are
            defined by the CMS production scripts. The mop_submitter then distributes those jobs to remote
            sites through CondorG and Globus. When the jobs are finished, the output is collected by GDMP.

  
            Preparing remote sites for MOP jobs
            ===================================

            Remote site overview:

            In general, the remote site will need VDT installed on one or more machines and a Globus job
            manager for the local batch system.
           
            Accounts:

            MOP worker site should map cmsprod account from fnal.gov to one of its local user accounts.
           
            The contact string is:
            "/O=Grid/O=Globus/OU=fnal.gov/CN=CMS Production"

            Remote site summary:
           
            The following information needs to be conveyed to MOP Master site owner:

            --------------------------------------------------------------------------
            (1-1) stage-in job manager
            (1-2) GLOBUS_LOCATION value.
            (1-3) Shared directory for mop files if not home directory.

            (2-1) run job manager.
            (2-2) location of CMS DAR installation. NB: only the path needs to be provided. The DAR file(s)
            themselves can be installed through MOP.

            --------------------------------------------------------------------------

            Example remote site values:

            The first "remote" site is at Fermilab. Here are the Fermilab values:

            (1-1) droidf.fnal.gov:/jobmanager
            (1-2) /opt/globus/globus20
            (1-3) /cms/work

            (2-1) droidf.fnal.gov:/jobmanager-condor
            (2-2) /data/dar

            The site parameters are stored in the mop_submitter/site-info directory. Job manager and scratch
            directory info is in the *.site.* files. The .vars files hold the following information:

            Appendum 1:
            ============

            (1) In the $GLOBUS_LOCATION/etc/globus-job-manager-condor.conf file, edit the two lines at the
            bottom according to the comment above them, adding INTEL and LINUX as arguments, like so:

            -condor-arch INTEL
            -condor-os LINUX

            and remove the two comment lines (appearing just before):

            # Edit the following two lines to complete
            # the configuration of your condor jobmanager

            Then add another line to retain debugging info on errors:

            -save-logfile on_errors

            Finally, rename the file to globus-job-manager-condor-INTEL-LINUX.conf

            (2) Let's use testulix.phys.ufl.edu as an example. In
            $GLOBUS_LOCATION/etc/jobmanager-condor, add "-condor-os LINUX -condor-arch INTEL" to
            the end of the argument list, and change the existing -conf and -rdn arguments to refer to
            globus-job-manager-condor-INTEL-LINUX.conf and
            testulix.phys.ufl.edu/jobmanager-condor-INTEL-LINUX instead of the old names.

            Finally, rename the file to jobmanager-condor-INTEL-LINUX.

            We thereafter refer to that job manager as
            testulix.phys.ufl.edu/jobmanager-condor-INTEL-LINUX instead of just
            testulix.phys.ufl.edu/jobmanager-condor (when using globus-job-run, Condor-G,
            etc).

            At Florida, the condor job manager is then:

            % cat jobmanager-condor-INTEL-LINUX
            stderr_log,local_cred - /usr/local/globus/globus-2.0/libexec/globus-job-manager
            globus-job-manager -conf
            /usr/local/globus/globus-2.0/etc/globus-job-manager-condor-INTEL-LINUX.conf -type
            condor -rdn testulix.phys.ufl.edu/condor-INTEL-LINUX -machine-type unknown
            -publish-jobs -condor-os LINUX -condor-arch INTEL

            and the condor job manager config file reads as:

            % cat globus-job-manager-condor-INTEL-LINUX.conf
            -home "/usr/local/globus/globus-2.0"
            -e /usr/local/globus/globus-2.0/libexec
            -globus-gatekeeper-host testulix.phys.ufl.edu
            -globus-gatekeeper-port 2119
            -globus-gatekeeper-subject "/O=Grid/O=Globus/CN=testulix.phys.ufl.edu"
            -globus-host-cputype i686
            -globus-host-manufacturer pc
            -globus-host-osname Linux
            -globus-host-osversion 2.2.14-5.0smp
            -condor-arch INTEL
            -condor-os LINUX
            -save-logfile on_errors

            Appendum 2:
            ============

            The default Condor installation is configured so that Condor will suspend all jobs on detection of
            keyboard activity. The instructions on how to fix this are to modify "PART 3" of the
            $CONDOR_LOCATION/etc/condor_config file underneath where it says:

            #####################################################################
            ## This where you choose the configuration that you would like to
            ## use. It has no defaults so it must be defined. We start this
            ## file off with the UWCS_* policy.
            ######################################################################

            The modifications are:

            Original condor_config file:
            START = $(UWCS_START)
            SUSPEND = $(UWCS_SUSPEND)
            CONTINUE = $(UWCS_CONTINUE)
            PREEMPT = $(UWCS_PREEMPT)
            KILL = $(UWCS_KILL)

            Modified condor_config file:
            #START = $(UWCS_START)
            START = True
            #SUSPEND = $(UWCS_SUSPEND)
            #CONTINUE = $(UWCS_CONTINUE)
            #PREEMPT = $(UWCS_PREEMPT)
            SUSPEND = False
            CONTINUE = True
            PREEMPT = False
            #KILL = $(UWCS_KILL)
            KILL = $(ActivityTimer) > $(MaxVacateTime)
            #PREEMPTION_REQUIREMENTS = $(UWCS_PREEMPTION_REQUIREMENTS)  
            PREEMPTION_REQUIREMENTS=False

            Then, to enact the changes for the new Condor configuration, execute

            % condor_reconfig node1 node2 node3 ...

            where node1, node2, node3 ... are the different Condor compute machines (including the
            Condor Master machine).