A non-positive measure allows application of the law of large numbers when a sketch of the standard proof of the convergence can be carried out with the particular measure and variables in question. There are heuristics for spotting when this works, which are pretty intuitive, and the basic heuristic is that you can look at the absolute value of the measure to see how big across the distribution is, how slow the falloff is, and this will tell you how fast the convergence will be. When it looks like it works, Polyakov won't bother justifying it with careful estimates, as this is tedious. But it can be justified if you need to do it.

It is true for any measure that two "independent random variables" (in scare quotes because the measure $\mu$ is no longer positive) x and y have additive "means" (when defined):

$$ \langle x + y \rangle = \langle x \rangle + \langle y \rangle $$

and further, if you subtract out the means so that the variables have zero mean, the variance is additive (when defined):

$$ \langle (x+y)^2 \rangle = \langle x^2 \rangle + \langle y^2 \rangle $$

The question is whether the "probability density function" of the sum of many values of some function of these variables converges to a delta-function, this is the law of large numbers.

The convergence here is in the sense of distributions, so that the statement is that the "probability density function" $\mu_S$ of the sum $\sum_i F(x_i)$ for these i.i.d variables (with positive/negative distribution) converges to a delta function around the central value, i.e., the integral against any smooth test function converges to the value of the test function at $N \langle F\rangle$ as the number of variables becomes large.

The "distribution" $\mu_S$ is still the convolution of the "distribution" of $\mu_F$:

$$ \mu_S = \mu_F * \mu_F * \mu_F ... *\mu_F$$

Where the number of convolutions is N, in your case, $T/\epsilon$. The main step of the proof of the standard central limit theorem consists of taking the Fourier transform of both sides of the equation, and noting that:

$$ \mu_S(k) = \mu_F(k)^N $$

So that as long as the Fourier transform values obey the rules:

normalization/zero-center: $\mu_F(0) = 1$

shrinky-ness: $|\mu_F(k)| < |\mu_F(0)| $

Then, for large N, outside of a small region near $k=0$, the Fourier transform of the distribution of the sum will vanish in a way controlled by the cusp-behavior at k=0. The shrinking condition needs to be stated more precisely, you want the Fourier transform to not approach one away from zero, but in the case of interest, this is obvious, because you have something like a Gaussian or exponentially decaying Fourier transform. I said "zero center" instead of "zero mean" because the mean might diverge. You can still define a center by translating the measure until the Fourier transform at 0 is real (and then you can normalize by rescaling so that the value is 1).

When $\mu_F(k) = 1- A {k^2$\over 2}$ (the second derivative of the Fourier transform at zero, A, is necessarily the variance of F), you get the standard Gaussian when you raise to a large power, simply from the law:

$$ (1 - Ak^2)^N \approx e^{-ANk^2} $$

Which is one of the definitions of the exponential: $\lim_{N\rightarrow\infty} (1+A/N)^N = e^A$, but Polyakov would have just replaced $(1-Ak^2) \approx e^{-Ak^2}$. to show this, although it is strictly only true at large N.

when $\mu_F(k) = 1 - A |k|^\alpha$ with $0<\alpha<2$, you get Levy behavior, and in this case $\alpha=1$, and you get the fourier transform of the spreading Cauchy distribution:

$$ (1 - A|k|)^N = e^{-N|k|}$$.

From this, you can see that the sum of many Cauchy distributed variables spreads out linearly in time. To get rid of the N dependence, you need to absorb N into k by rescaling linearly (the linear shrinking of the k distribution turns into linear spreading of the x distribution). In the Gaussian case, to get rid of the N dependence of $Nk^2$, you rescale k to absorb $sqrt{N}$, so that the usual Gaussian variables have the normal $\sqrt{N}$ spreading. This is the standard argument for the central limit theorem/Levy's-theorem. It doesn't assume that the measure is positive, only that the Fourier transform is biggest at the origin, and has a definite diferentiable or cusp behavior there.

This means that the central limit convergence can be established with Gaussians with complex means and complex variances too, since the Fourier transform is still Gaussian. It also works with Cauchy distributed variables, or other Levy distributions, as long as the Fourier transform of the pseudo-probability function is still well behaved.

But you weren't quite asking about the central limit theorem, you were asking about the law of large numbers. In this case, you add N times the center value to the convolved distribution, to find the new center value of the sum, and rescale the Fourier transform appropriately for the average value (or whatever you are computing). The convergence of this to a delta function is guaranteed when the rescaled Fourier transform becomes constant over a wider and wider range (in k) as N becomes large.

The special case of Cauchy distribution is right on the boundary where the law of large numbers stops working, because here, the width parameter of the distribution scales linearly in N. I bring it up, because if you look at the absolute value of the $\gamma$ distribution, it is

$${1\over \gamma^2 + c^2}$$

This is the Cauchy case. If you were adding together $\gamma$s, they wouldn't obey the law of large numbers, as the positive case Cauchy distribution is on the border of obeying it.

But for your case, you are adding together the quantities $1\over \gamma +i c $, which transforms the Cauchy-like variable $\gamma$ into a variable with finite mean and variance. This means the result will be a standard Gaussian central limit type thing, and you can just find the mean and variance.

These types of limit arguments are common in Polyakov, and I go through each one justifying it quickly mentally like this. The main way in which this appears in the physics is when you substitute the classical value for a variable with no derivative terms in a path integral. Each position value is fluctuating, but the integral over a region, or the average value of the variable, is by the minimum action location. The justiication for this is through the rigamarole above, but you can't always go through all the steps, so you use your gut.

To justify this stuff is much easier than justifying path-integrals.