next up previous contents
Next: The Veto Algorithm Up: Monte Carlo Techniques Previous: Monte Carlo Techniques   Contents

Selection From a Distribution

The situation that is probably most common is that we know a function $f(x)$ which is non-negative in the allowed $x$ range $x_{\mathrm{min}} \leq x \leq x_{\mathrm{max}}$. We want to select an $x$ `at random' so that the probability in a small interval $\d x$ around a given $x$ is proportional to $f(x) \, \d x$. Here $f(x)$ might be a fragmentation function, a differential cross section, or any of a number of distributions.

One does not have to assume that the integral of $f(x)$ is explicitly normalized to unity: by the Monte Carlo procedure of picking exactly one accepted $x$ value, normalization is implicit in the final result. Sometimes the integral of $f(x)$ does carry a physics content of its own, as part of an overall weight factor we want to keep track of. Consider, for instance, the case when $x$ represents one or several phase-space variables and $f(x)$ a differential cross section; here the integral has a meaning of total cross section for the process studied. The task of a Monte Carlo is then, on the one hand, to generate events one at a time, and, on the other hand, to estimate the total cross section. The discussion of this important example is deferred to section [*].

If it is possible to find a primitive function $F(x)$ which has a known inverse $F^{-1}(x)$, an $x$ can be found as follows (method 1):

  $\textstyle \displaystyle{ \int_{x_{\mathrm{min}}}^{x} f(x) \, \d x =
R \int_{x_{\mathrm{min}}}^{x_{\mathrm{max}}} f(x) \, \d x }$    
$\displaystyle \Longrightarrow$ $\textstyle x = F^{-1}(F(x_{\mathrm{min}}) +
R(F(x_{\mathrm{max}}) - F(x_{\mathrm{min}}))) ~.$   (2)

The statement of the first line is that a fraction $R$ of the total area under $f(x)$ should be to the left of $x$. However, seldom are functions of interest so nice that the method above works. It is therefore necessary to use more complicated schemes.

Special tricks can sometimes be found. Consider e.g. the generation of a Gaussian $f(x) = \exp(-x^2)$. This function is not integrable, but if we combine it with the same Gaussian distribution of a second variable $y$, it is possible to transform to polar coordinates

f(x) \, \d x \, f(y) \, \d y = \exp(-x^2-y^2) \, \d x \, \d y =
r \exp(-r^2) \, \d r \, \d\varphi ~,
\end{displaymath} (3)

and now the $r$ and $\varphi$ distributions may be easily generated and recombined to yield $x$. At the same time we get a second number $y$, which can also be used. For the generation of transverse momenta in fragmentation, this is very convenient, since in fact we want to assign two transverse degrees of freedom.

If the maximum of $f(x)$ is known, $f(x) \leq f_{\mathrm{max}}$ in the $x$ range considered, a hit-or-miss method will always yield the correct answer (method 2):

select an $x$ with even probability in the allowed range, i.e. $x = x_{\mathrm{min}} + R(x_{\mathrm{max}} - x_{\mathrm{min}})$;
compare a (new) $R$ with the ratio $f(x)/f_{\mathrm{max}}$; if $f(x)/f_{\mathrm{max}} \le R$, then reject the $x$ value and return to point 1 for a new try;
otherwise the most recent $x$ value is retained as final answer.
The probability that $f(x)/f_{\mathrm{max}} > R$ is proportional to $f(x)$; hence the correct distribution of retained $x$ values. The efficiency of this method, i.e. the average probability that an $x$ will be retained, is $(\int \, f(x) \, \d x)
/ (f_{\mathrm{max}}(x_{\mathrm{max}} - x_{\mathrm{min}}))$. The method is acceptable if this number is not too low, i.e. if $f(x)$ does not fluctuate too wildly.

Very often $f(x)$ does have narrow spikes, and it may not even be possible to define an $f_{\mathrm{max}}$. An example of the former phenomenon is a function with a singularity just outside the allowed region, an example of the latter an integrable singularity just at the $x_{\mathrm{min}}$ and/or $x_{\mathrm{max}}$ borders. Variable transformations may then be used to make a function smoother. Thus a function $f(x)$ which blows up as $1/x$ for $x \rightarrow 0$, with an $x_{\mathrm{min}}$ close to 0, would instead be roughly constant if transformed to the variable $y = \ln x$.

The variable transformation strategy may be seen as a combination of methods 1 and 2, as follows. Assume the existence of a function $g(x)$, with $f(x) \leq g(x)$ over the $x$ range of interest. Here $g(x)$ is picked to be a `simple' function, such that the primitive function $G(x)$ and its inverse $G^{-1}(x)$ are known. Then (method 3):

select an $x$ according to the distribution $g(x)$, using method 1;
compare a (new) $R$ with the ratio $f(x)/g(x)$; if $f(x)/g(x) \le R$, then reject the $x$ value and return to point 1 for a new try;
otherwise the most recent $x$ value is retained as final answer.
This works, since the first step will select $x$ with a probability $g(x) \, \d x = \d G(x)$ and the second retain this choice with probability $f(x)/g(x)$. The total probability to pick a value $x$ is then just the product of the two, i.e. $f(x) \, \d x$.

If $f(x)$ has several spikes, method 3 may work for each spike separately, but it may not be possible to find a $g(x)$ that covers all of them at the same time, and which still has an invertible primitive function. However, assume that we can find a function $g(x) = \sum_i g_i(x)$, such that $f(x) \leq g(x)$ over the $x$ range considered, and such that the functions $g_i(x)$ each are non-negative and simple, in the sense that we can find primitive functions and their inverses. In that case (method 4):

select an $i$ at random, with relative probability given by the integrals
\int_{x_{\mathrm{min}}}^{x_{\mathrm{max}}} g_i(x) \, \d x =
G_i(x_{\mathrm{max}}) - G_i(x_{\mathrm{min}}) ~; \nonumber

for the $i$ selected, use method 1 to find an $x$, i.e.
x = G_i^{-1}(G_i(x_{\mathrm{min}}) +
R(G_i(x_{\mathrm{max}})-G_i(x_{\mathrm{min}}))) ~;

compare a (new) $R$ with the ratio $f(x)/g(x)$; if $f(x)/g(x) \le R$, then reject the $x$ value and return to point 1 for a new try;
otherwise the most recent $x$ value is retained as final answer.
This is just a trivial extension of method 3, where steps 1 and 2 ensure that, on the average, each $x$ value picked there is distributed according to $g(x)$: the first step picks $i$ with relative probability $\int g_i(x) \, \d x$, the second $x$ with absolute probability $g_i(x) / \int g_i(x) \, \d x$ (this is one place where one must remember to do normalization correctly); the product of the two is therefore $g_i(x)$ and the sum over all $i$ gives back $g(x)$.

We have now arrived at an approach that is sufficiently powerful for a large selection of problems. In general, for a function $f(x)$ which is known to have sharp peaks in a few different places, the generic behaviour at each peak separately may be covered by one or a few simple functions $g_i(x)$, to which one adds a few more $g_i(x)$ to cover the basic behaviour away from the peaks. By a suitable selection of the relative strengths of the different $g_i$'s, it is possible to find a function $g(x)$ that matches well the general behaviour of $f(x)$, and thus achieve a reasonable Monte Carlo efficiency.

The major additional complication is when $x$ is a multidimensional variable. Usually the problem is not so much $f(x)$ itself, but rather that the phase-space boundaries may be very complicated. If the boundaries factorize it is possible to pick phase-space points restricted to the desired region. Otherwise the region may have to be inscribed in a hyper-rectangle, with points picked within the whole hyper-rectangle but only retained if they are inside the allowed region. This may lead to a significant loss in efficiency. Variable transformations may often make the allowed region easier to handle.

There are two main methods to handle several dimensions, each with its set of variations. The first method is based on a factorized ansatz, i.e. one attempts to find a function $g(\mathbf{x})$ which is everywhere larger than $f(\mathbf{x})$, and which can be factorized into $g(\mathbf{x}) = g^{(1)}(x_1) \, g^{(2)}(x_2) \cdots g^{(n)}(x_n)$, where $\mathbf{x} = (x_1,x_2,\ldots,x_n)$. Here each $g^{(j)}(x_j)$ may in its turn be a sum of functions $g^{(j)}_i$, as in method 4 above. First, each $x_j$ is selected independently, and afterwards the ratio $f(\mathbf{x})/g(\mathbf{x})$ is used to determine whether to retain the point.

The second method is useful if the boundaries of the allowed region can be written in a form where the maximum range of $x_1$ is known, the allowed range of $x_2$ only depends on $x_1$, that of $x_3$ only on $x_1$ and $x_2$, and so on until $x_n$, whose range may depend on all the preceding variables. In that case it may be possible to find a function $g(\mathbf{x})$ that can be integrated over $x_2$ through $x_n$ to yield a simple function of $x_1$, according to which $x_1$ is selected. Having done that, $x_2$ is selected according to a distribution which now depends on $x_1$, but with $x_3$ through $x_n$ integrated over. In particular, the allowed range for $x_2$ is known. The procedure is continued until $x_n$ is reached, where now the function depends on all the preceding $x_j$ values. In the end, the ratio $f(\mathbf{x})/g(\mathbf{x})$ is again used to determine whether to retain the point.

next up previous contents
Next: The Veto Algorithm Up: Monte Carlo Techniques Previous: Monte Carlo Techniques   Contents
Stephen Mrenna 2007-10-30