Cluster finding in an type of environment

The usage of cluster algorithms for applications started in the late 1970's. A number of different approaches were proposed [Bab80], see review in [Mor98]. Of these, we will here only discuss those based on binary joining. In this kind of approach, initially each final-state particle is considered to be a cluster. Using some distance measure, the two nearest clusters are found. If their distance is smaller than some cut-off value, the two clusters are joined into one. In this new configuration, the two clusters that are now nearest are found and joined, and so on until all clusters are separated by a distance larger than the cut-off. The clusters remaining at the end are often also called jets. Note that, in this approach, each single particle belongs to exactly one cluster. Also note that the resulting jet picture explicitly depends on the cut-off value used. Normally the number of clusters is allowed to vary from event to event, but occasionally it is more useful to have the cluster algorithm find a predetermined number of jets (like 3).

The obvious choice for a distance measure is to use squared invariant
mass, i.e. for two clusters and to define the
distance to be

(293) |

Another instability may be seen by considering the clustering in a simple 2-jet event. By the time that clustering has reached the level of three clusters, the `best' the clustering algorithm can possibly have achieved, in terms of finding three low-mass clusters, is to have one fast cluster around each jet, plus a third slow cluster in the middle. In the last step this third cluster would be joined with one of the fast ones, to produce two final asymmetric clusters: one cluster would contain all the slow particles, also those that visually look like belonging to the opposite jet. A simple binary joining process, with no possibility to reassign particles between clusters, is therefore not likely to be optimal.

The solution adopted [Sjö83] is to reject invariant
mass as distance measure. Instead a jet is defined as a collection
of particles which have a limited transverse momentum with respect to
a common jet axis, and hence also with respect to each other. This
picture is clearly inspired by the standard fragmentation picture,
e.g. in string fragmentation. A distance measure between two
particles (or clusters) with momenta and should
thus not depend critically on the longitudinal momenta but only on the
relative transverse momentum. A number of such measures were tried,
and the one eventually selected was

(294) |

For small relative angle , where
and
, this measure reduces to

(295) |

The basic scheme is of the binary joining type, i.e. initially each particle is assumed to be a cluster by itself. Then the two clusters with smallest relative distance are found and, if , with some predetermined distance, the two clusters are joined to one, i.e. their four-momenta are added vectorially to give the energy and momentum of the new cluster. This is repeated until the distance between any two clusters is . The number and momenta of these final clusters then represent our reconstruction of the initial jet configuration, and each particle is assigned to one of the clusters.

To make this scheme workable, two further ingredients are introduced, however. Firstly, after two clusters have been joined, some particles belonging to the new cluster may actually be closer to another cluster. Hence, after each joining, all particles in the event are reassigned to the closest of the clusters. For particle , this means that the distance to all clusters in the event has to be evaluated and compared. After all particles have been considered, and only then, are cluster momenta recalculated to take into account any reassignments. To save time, the assignment procedure is not iterated until a stable configuration is reached, but, since all particles are reassigned at each step, such an iteration is effectively taking place in parallel with the cluster joining. Only at the very end, when all , is the reassignment procedure iterated to convergence -- still with the possibility to continue the cluster joining if some should drop below due to the reassignment.

Occasionally, it may occur that the reassignment step leads to an empty cluster, i.e. one to which no particles are assigned. Since such a cluster has a distance to any other cluster, it is automatically removed in the next cluster joining. However, it is possible to run the program in a mode where a minimum number of jets is to be reconstructed. If this minimum is reached with one cluster empty, the particle is found which has largest distance to the cluster it belongs to. That cluster is then split into two, namely the large-distance particle and a remainder. Thereafter the reassignment procedure is continued as before.

Secondly, the large multiplicities normally encountered means that, if each particle initially is to be treated as a separate cluster, the program will become very slow. Therefore a smaller number of clusters, for a normal event typically 8-12, is constructed as a starting point for the iteration above, as follows. The particle with the highest momentum is found, and thereafter all particles within a distance from it, where . Together these are allowed to form a single cluster. For the remaining particles, not assigned to this cluster, the procedure is iterated, until all particles have been used up. Particles in the central momentum region, are treated separately; if their vectorial momentum sum is above they are allowed to form one cluster, otherwise they are left unassigned in the initial configuration. The value of , as long as reasonably small, has no physical importance, in that the same final cluster configuration will be found as if each particle initially is assumed to be a cluster by itself: the particles clustered at this step are so nearby anyway that they almost inevitably must enter the same jet; additionally the reassignment procedure allows any possible `mistake' to be corrected in later steps of the iteration. However, if computer time is not a problem, one may well let and circumvent this preclustering.

Thus the jet reconstruction depends on one single parameter, , with a clearcut physical meaning of a transverse momentum `jet-resolution power'. Neglecting smearing from fragmentation, between two clusters of equal energy corresponds to half the invariant mass of the two original partons. If one only wishes to reconstruct well separated jets, a large should be chosen, while a small would allow the separation of close jets, at the cost of sometimes artificially dividing a single jet into two. In particular, quark jets may here be a nuisance. The value of to use for a fixed jet-resolution power in principle should be independent of the c.m. energy of events, although fragmentation effects may give a contamination of spurious extra jets that increases slowly with for fixed . Therefore values as low as a GeV was acceptable at PETRA/PEP, while 3-4 GeV may be better for applications at LEP and beyond.

This completes the description of the main option of the `PYCLUS`
routine. Variations are possible. One such is to skip the reassignment
step, i.e. to make use only of the simple binary joining procedure,
without any possibility to reassign particles between jets.
(This option is included mainly as a reference, to check how important
reassignment really is.) The other main alternative is
to replace the distance measure used above with the ones used in the
JADE or Durham algorithms.

The JADE cluster algorithm [JAD86] is an attempt to save the
invariant mass measure. The distance measure is defined to be

(296) |

The measure is very closely related to the squared mass distance measure: the two coincide (up to the difference in normalization) if . However, consider a pair of particles or clusters with non-vanishing individual masses and a fixed pair mass. Then, the larger the net momentum of the pair, the smaller the measure. This somewhat tends to favour clustering of fast particles, and makes the algorithm less unstable than the one based on true invariant mass.

The successes of the JADE algorithm are well known: one obtains a good agreement between the number of partons generated on the matrix-element (or parton-shower) level and the number of clusters reconstructed from the hadrons, such that QCD aspects like the running of can be studied with a small dependence on fragmentation effects. Of course, the insensitivity to fragmentation effects depends on the choice of fragmentation model. Fragmentation effects are small in the string model, but not necessarily in independent fragmentation scenarios. Although independent fragmentation in itself is not credible, this may be seen as a signal for caution.

One should note that the JADE measure still suffers from some of the diseases of the simple mass measure (without reassignments), namely that particles which go in opposite directions may well be joined into the same cluster. Therefore, while the JADE algorithm is a good way to find the number of jets, it is inferior to the standard measure for a determination of jet directions and energies [Bet92]. The measure also gives narrower jets, which agree better with the visual impression of jet structure.

Later, the Durham algorithm was introduced [Cat91],
which works as the JADE one but with a distance measure

(297) |

The main difference therefore is that the standard `PYCLUS`
option allows reassignments, while the Durham algorithm does not.
The latter is therefore more easily calculable on the perturbative
parton level. This point is sometimes overstressed, and one could
give counterexamples why reassignments in fact may bring better
agreement with the underlying perturbative level. In particular,
without reassignments, one will make the recombination that seems
the `best' in the current step, even when that forces you to make
`worse' choices in subsequent steps. With reassignments, it is
possible to correct for mistakes due to the too local sensitivity of
a simple binary joining scheme.