These notes are located at http://home.fnal.gov/~mrenna/HCPSS.

These tutorials will guide you on an investigation of the similarities and differences that can be encountered when comparing predictions made with different Monte Carlo (MC) tools.

In all cases, we focus on W+ and W- production and decay to leptons at the LHC (a 7 TeV, proton-proton collider).

We are using a ROOT-based analysis package similar to the one used by the Generator Services (GENSER) group for release-to-release validation of MC tools distributed by CERN. Since this package makes use of the HepMC standard for MC event records, it is called HepMCAnalysis. We rely on the WplusJets analysis tool.

The Take Home Lessons

You should leave this tutorial understanding the following concepts:

  1. Different Monte Carlo tools provide predictions at different levels of approximation. You will be exposed to the terminology LO (leading order), leading log(arithm) (LL), and NLO (next-to-leading order). You will hear about these extensively in the lectures, and it is not the purpose of this tutorial to replace that. Rather, you should understand that they are different, and provide different predictions of physical observables.
  2. Matched or merged calculations contain both matrix element predictions for parton emissions and parton showers.
  3. Monte Carlo truth information is stored in a format defined by the HepMC standard. You should be exposed to how to navigate this record and identify, for example, an electron arising from a W- decay compared to an electron arising from a b quark decay.
  4. Parton distribution functions, or PDFs, are an important input to phenomenological predictions. Many properties of events at hadron colliders are sensitive to them.
  5. In Electroweak processes, such as those studied here, spin correlations can lead to important observable effects. Not all Monte Carlo tools include spin correlations, and some include them to higher order in perturbation theory than others.
  6. ROOT is a tool for packing the HepMC record and for visualizing it. While the basic tutorial does not rely on any individual knowledge of ROOT, there are ample opportunities for students to learn how to write C++ code to work with ROOT.

Location of Examples

For the basic tutorial, the directory "hcpss" referred to below is located at


On this webpage, you can see output examples for all of the basic Tutorials.

You can have several choices on how to proceed:

  1. You can choose to look at the end product of the analysis, and try to answer some of the questions below. This requires you only to use your web browser.
  2. You can choose to reproduce the steps of the analysis yourself as you read through the exercise. This requires you to build and modify analysis code. You will have to follow the HCPSSinstructions.
  1. You can perform Step A or Step B, then proceed immediately to the Expert exercises.


You may choose to work alone, or in pairs, or even in a larger group.

However, you should do you work in one of the assigned rooms.

In each room, there will be Mentors to help you.

If you do not have a laptop, or cannot get the software to install, or do not want to try and install the software, then you should proceed to room One North (WH1N), where there are summer school laptops, with the necessary software installed. It is likely you will have to share these.

You should feel free to ask your fellow students for help as well.

Tutorial 0

The first tutorial is based on Pythia (version 6.422) and MadGraph+Pythia.

Pythia is a leading-logarithm (LL) event generator including models for semi- and non-perturbative phenomena, such as hadronization and underlying event structure. LL parton showers can describe well the main properties of event structure, but is based on soft and collinear approximations which break down for high-pT and well-separated partons. The Pythia parton shower comes in two incarnations. The first (and oldest) is the virtuality-ordered or mass-ordered parton showers. This is the version used for these datasets. The second (and newer) shower is based on ordering of the transverse momentum pT of parton shower emissions.

The ordering of the partons emitted in a shower, q2_1 >> q2_2 >> q2_3 ... for mass-ordering or pT_1 >> pT_2 >> pT_3 ... for pT-ordering, leads to the appearance of large logarithms, e.g. log(pT_1/pT_2). The leading-logarithm (LL) approximation takes into account the effects of the most important (numerically largest) logarithms.

The matching calculation of MadGraph+Pythia incorporates tree-level, matrix-element predictions for high-pT, well-separated partons into the Pythia showering in such a way as to recover the Pythia prediction in the soft and collinear limit. Like Pythia, MadGraph uses LO Parton Distribution Functions (PDFs) for its matrix-element predictions.

Due to the simplicity of the W+1 parton matrix element, it can be easily mapped onto the first emission of the Pythia parton shower. In this manner, Pythia includes an internal reweighting of the parton shower, so that it reproduces the W+1 parton prediction for hard emissions. However, the MadGraph+Pythia prediction used here also includes matrix-element predictions for up to W+4 parton events.

We have already processed and compared these two datasets. The results can be studied in the directory hcpss/Tutorial0_WplusJets.

See WplusJetsDistributions for an explanation of the distributions that are being plotted and compared.

Check the distributions (by clicking on the thumbnails to observe a full view in the main pane), noting the results of the KS test.

Note, the KS test is based on comparing the integrated shapes of
two distributions (ranging from 0 to 1), identifying the point of
maximum separation, and constructing a statistic based on that.
KS->1 is an indication that two distributions have been drawn from
the same parent probability distribution.

See KSWikipedia for further details on the KS test.

Make a list of the similarities and differences of these distributions.

Note that many properties are quite similar, though there are noticeable differences.

Among the differences are:

  1. The pT of the W boson at low pT does not exactly match between the two samples. This is mainly because the Pythia matrix-element correction is internal to the parton shower, and it becomes part of the "tuning" of Pythia parameters. When we match MG+Pythia, this internal correction is turned off, and the matrix element for W+1jet is cut off at some finite pT, usually larger than 15-20 GeV. This effect is remedied when performing matching with the newer, pT-ordered shower, since the newer shower does not rely on the internal matrix-element correction to describe parton emissions below 15 or 20 GeV.
  2. As expected, MG+Pythia has a slightly harder pT spectrum for the leading jet.
  3. The hadron-level properties (the charged track pT) are quite different, indicating some mismatch in scale choices between the internal running of Pythia and the interface of external events from MadGraph. Special tuning of Pythia's non-perturbative parameters may be necessary when matching other programs to the Pythia shower.
  4. The pT of the lepton is different, but less so than the pT of the W boson. This is because the pT of the lepton is already non-zero when there are no QCD emissions, whereas the pT of the W is exactly zero in this case.

Why is this important?

  • Since there are differences in the low pT spectrum of the W boson, you would likely

    not use matched samples for precision measurements, such as the W mass. On the other hand, since the matched MG+Pythia samples include hard jets to high multiplicity, you may want to use such a sample for a new physics search involving a lepton, jets, and missing transverse energy.

Another point

  • Conceptually, MadGraph+Pythia and Pythia with its internal matrix element correction should give almost identical predictions for low pT of the W boson. In practice, however, there is not a perfect matching between the two types of calculations, and we observe residual differences in observable quantities. It is important to be aware of these differences when using different Monte Carlo samples.

Description of the analysis code

See WplusJetsCode for a description of the C++ code used to analyze the HepMC record and fill the histograms.

Understanding the code requires one to also understand the HepMC record. See HepMCRecord for a basic explanation.

To further understand how the HepMC record is analyzed, a tool printHepMCEvent is provided with the software package (see HCPSSinstructions) to dump the output to the terminal.

Tutorial 1

Since the LHC is a proton-proton collider, the rates for W+ and W- production are expected to be different. To investigate this, you can modify the analysis code to filter out just W+ or W- events. We have made this change for you, allowing for a comparison of Pythia W+- production and Pythia W+ production. The results can be viewed in hcpss/Tutorial1.

Are there any observable kinematic differences between W+ and W- production?

Yes, since the hard process relies on different species of incoming partons.

A useful tool for investigating effects arising from PDFs is the PDF plotting tool available at:


(open this in a new tab of your browser)

Use this tool to explain the kinematic difference.

Hint: remember that quarks in the proton come from valence and sea,
while antiquarks all come from the sea (i.e. g-> q qbar evolution).
The relevant scale for W production is Q**2=6400 GeV**2, and assume
that the dominant processes are u dbar -> W+ and d ubar -> W-.

Here is an example screenshot:


and the example output:


The lesson

  • .. is that the parton distribution functions can have important effects on final states with different charges.

Tutorial 2

So far, we have concentrated on leading-order (LO) matrix elements and leading-logarithm (LL) parton shower predictions. This tutorial compares LO/LL predictions to a next-to-leading-order (NLO) prediction generated with PowHeg. PowHeg is a Monte Carlo tool based on a methodology for adding parton showers to complete NLO calculations. Thus, it should provide NLO accuracy for rates and inclusive distributions of the W boson and/or decay leptons, while providing realistic events.

(Click to find out more about NLO)

To emphasize differences, we retain the filter on just W+ events.

We have collected results for you in hcpss/Tutorial2.

Note similarities and differences, paying attention to the KS statistic.

You should have observed:

  1. The rapidity of the W+ boson has changed at NLO. In particular, the NLO prediction is more central. The reason for this will become apparent later.

  2. The rapidity of the decay lepton is also more central, as a side-effect of observation 1.

  3. The transverse momentum (pT) distribution of the decay lepton is larger on the tail for the Pythia prediction. The effect here is subtle, but real, arising from spin correlations. A spin correlation means that the momentum of one parton depends non-trivially on another parton's momentum. PowHeg includes the exact spin correlations between the lepton and the quarks up to NLO. For example, in q_1 g -> l+ nu_l q_2, the matrix element depends upon the dot product of q_1 and nu_l and the dot product of q_2 and l+, but NOT q_1 and l+, for example. Pythia includes the exact spin correlation only up to LO, i.e. for q qbar -> l+ nu_l. Even though Pythia includes a ME-correction for the parton shower, that is not propagated to the decay lepton!

    An interesting study for the advanced student is to plot the ratio
    of the l+ and l- pT distributions for the PowHeg and Pythia predictions.

The lesson

  • .. is that NLO can be important for many reasons. Note that, for W+- production, the "NLO" effect arises from a tree-level matrix-element.

Tutorial 3

Since PowHeg is based on a consistent, NLO calculation, it uses NLO PDFs for the generation of the hard process (here, hard refers to partons not generated by the parton shower). Pythia, however, uses LO PDFs for the hard process, the parton shower, and the underlying event (UE) model. It is illustrative to compare with Pythia results based on using a NLO PDF. Some aspects of the parton shower or UE model may not be consistent, but we are mainly interested in the properties of the W boson and decay lepton.

We have generated a Pythia sample using the CTEQ6M (NLO) PDF. The comparison with PowHeg for W+ production is collected in hcpss/Tutorial3.

What do you observe? Make sure to compare with the results of Tutorial 2.

You should note that the rapidity of the W boson and the decay lepton are in better agreement with the NLO prediction, illustrating the impact of the choice of PDF for the hard process. (Actually, the KS test for the W+ rapidity is actually slightly worse for the Pythia-CTEQ6M sample with this statistics. However, "by eye," one observes that this distribution is less peaked at large rapidity. Larger statistics would confirm this.)

The fact that LO ME's + NLO PDFs describe high-pT kinematics for several key processes at the LHC, while NLO PDFs can be problematic for small-x aspects of the event simulation, has led to the development of modified LO PDFs. Using the PDF plotter tool, compare a modified PDF (MRST2007lomod) to CTEQ6L1 and CTEQ6M for the u quark and gluon distributions. How does the lomod PDF compare to the LO and NLO PDFs in different kinematic regimes?

The lesson

  • .. is that much of the effect of NLO for W production at the LHC is in the choice of PDFs, not in the matrix elements. This is true also for Z, Higgs, and t-tbar production, especially when considering the rapidity distribution. It is not a theorem, but a useful rule of thumb. However, this is only the case when discussing the shapes of distributions. NLO is very important for understanding the absolute normalization of the distributions.

Tutorial 4

In the course of the Summer School, you will have lectures QCD, including NLO calculations, by one of the experts in this field. Many such predictions are available in computer programs. However, most of these do not produce "events" that can be compared to multiparticle final states. On the other hand, suitable averages of the data can be made that allow a comparison to theory predictions.

In this tutorial, we compare to a dataset generated from the NLO program MCFM. Since we have already seen a prediction of inclusive W+- production using the PowHeg event generator, we will instead focus on a process that has not yet been merged with a parton shower. In particular, we will compare W+1jet inclusive production at NLO with our Pythia dataset. For the MCFM calculation, we require at least one jet in the final state with pT>15 GeV, defined using the anti-kT jet algorithm. This "jet" make be composed of one or two final state partons. To allow for a reasonable comparison, the Pythia dataset was filtered to require the W boson to have pT > 15 GeV.

The results are collected at hcpss/Tutorial4.

Late Note

To make sensible comparisons, the jet algorithms must be the same. The MCFM predictions are made using the so-called "anti-kT" algorithm, not "SISCone". Therefore, the analysis code must be modified to define the new algorithm. You will need to add the following to the end of your "wjets.cc" code to override the definition in the shared object libraries.

#include "fastjet/ClusterSequence.hh"
#include "fastjet/SISConePlugin.hh"
#include "include/baseAnalysis.h"

InitJetFinder: Initialisation of JetFinder
int baseAnalysis::InitJetFinder(double coneRadius, double overlapThreshold, double jet_ptmin, double lepton_ptmin, double DeltaR_lepton_track)

     // If the Pointer already exist, delete them
     if(m_jetDef) {
         delete m_jetDef;

         delete m_plugin;

// initialise fastjet
m_coneRadius = coneRadius;
m_overlapThreshold = overlapThreshold;
m_jet_ptmin = jet_ptmin;
m_lepton_ptmin = lepton_ptmin;
m_DeltaR_lepton_track = DeltaR_lepton_track;

m_plugin = new fastjet::SISConePlugin(m_coneRadius, m_overlapThreshold);
m_jetDef = new fastjet::JetDefinition(fastjet::antikt_algorithm,m_coneRadius);

if(!m_jetDef) return false;

return true;

To make these comparisons, we needed to ignore bins with negative weights from the MCFM calculations.

Note the similarities between many of the distributions, though the KS test shows they are not identical.

  1. The transverse mass
  2. pT(W) above 30 GeV
  3. pT(leading jet) below 50 GeV

More interesting are the differences:

  1. pT(W) below 30 GeV. Here we see remnants of the cancellation between positive and negative weights. In this region, the Pythia calculation is more reliable.
  2. The rapidity of the W boson. NLO predicts a more central distribution.
  3. The pT of the decay lepton. Again, the Pythia distribution is larger on the tail.
  4. The leading jet pT. NLO predicts a harder jet spectrum.
  5. Charged track properties. We artifically labelled the outgoing partons from MCFM to be pi+ or pi-, but this was only to allow the jet algorithm to work in the C++ code. Since MCFM is not an event generator, it does not produce a sensible distribution of charged tracks per event.
The Lesson
... is that NLO tools provide more reliable predictions for suitably defined observables.

Advanced Exercises

  1. Given the Monte Carlo datasets, reproduce all of the comparisons done in Tutorials 0-4.

  2. The Sherpa Challenge

    Sherpa is a Monte Carlo tool that implements its own method for merging matrix element predictions and parton showers. In some cases, it provides different predictions for some jet and lepton-jet observables than most other methods for matching. It is useful to know where differences can arise.

    Using the Sherpa sample, compare to any other dataset of your choosing.

    Did something surprising happen?

    None of the histograms associated with the W boson or decay lepton should be filled. The reason for this is that Sherpa includes no W boson mother in the event record. As a result, the HepMCAnalysis tool as written cannot plot properties of the W boson.

    Modify the analysis code to identify the correct, hard leptons, and reconstruct the W boson by hand.


    The HepMC documentation may be useful in this respect. For Sherpa, you should identify the vertex that has the neutrino as a stable (status==1), outgoing particle. The incoming particles to this vertex are the ones that can be used to reconstruct the W boson.


Vertex:-9        ID:6     (X,cT)=+4.21e-01,+4.96e-02,-4.96e+00,-6.80e+01
I:2  10028    12        +1.19e+01,-2.24e+01,-3.12e+02,+3.13e+02 2   -9
     10029    -11       -3.58e+01,+4.04e+01,-2.87e+02,+2.92e+02 2   -9
O:3  10030    12        +1.19e+01,-2.24e+01,-3.12e+02,+3.13e+02 1
     10031    -11       -1.25e+01,+1.40e+01,-9.96e+01,+1.01e+02 1
     10032    22        -2.32e+01,+2.63e+01,-1.88e+02,+1.91e+02 1
Redo the comparisons in Tutorials 0-4.
  1. The Reweighting Exercise

    The pT spectrum of the W boson is predicted to a higher accuracy than Pythia using a NLO prediction. Reweight the Pythia pT spectrum to match that of a NLO prediction, and observe the effect on all other distributions (by comparing to the unweighted on). Use the MCFM W+1jet dataset to generate a histogram that can be used as a reweighting function. Comment on the utility of this method for improving the Pythia predictions. Try reweighting with a 2-D histogram in the pT and rapidity of the W boson.