# Unusual generalization of the law of large numbers

+ 6 like - 0 dislike
683 views

I have seen in physical literature an example of application of a very unusual form of the law of large numbers. I would like to understand how legitimate is the use of it, whether there are mathematically rigorous results in this direction, or at least some clarifications would be helpful. What is non-typical in the example is that the "probability measure" is not positive, but rather complex valued (though still normalized by one).

The example is taken from the book "Gauge fields and strings", $\S$ 9.1, by A. Polyakov. The argument is a part of computation of some path integral.

Let us fix $T>0$. Divide the segment $[0,T]$ to $T/\varepsilon$ parts of equal length $\varepsilon$. For small $c>0$ consider the integral $$\int_{\mathbb{R}^{T/\varepsilon}}\left(\prod_{t=1}^{T/\varepsilon}d\gamma_t(\gamma_t-ic)^{-2}e^{i\varepsilon (\gamma_t-ic)}\right)\Phi(R;-i\varepsilon\sum_t(\gamma_t-ic)^{-1}),$$ where $\Phi(R,x)=x^{-2}\exp(-R^2/x)$. (Here $R$ is a real number; my notation is slightly different from the book.)

The measure is not normalized, but one can divide by the total measure. Clearly $(\gamma_t-ic)^{-1}$ are i.i.d. The above integral depends only on their sum $\sum_{t=1}^{T/\varepsilon}(\gamma_t-ic)^{-1}$. Thus formally it looks like one is in position to apply some form of LLN when $\varepsilon\to 0$ and replace this sum inside $\Phi$ by the expectation of $(\gamma_t-ic)^{-1}$ times $T/\varepsilon$. (In fact Polyakov gives few more estimates of the variance to justify that. It would be standard if the measure was positive, but otherwise it looks mysterious to me.)

This post imported from StackExchange MathOverflow at 2015-04-28 10:48 (UTC), posted by SE-user MKO
asked Apr 26, 2015
retagged Apr 28, 2015
I am not sure I understand your notation. Anyway, I suggest that you apply the LLN to the total variation of your measure (which is a positive measure). $1/n \sum X_i$ is converging to $1/|\mu|(X) \int X d|\mu|$ for a complex valued measure, not to $\int X d\mu$.

This post imported from StackExchange MathOverflow at 2015-04-28 10:48 (UTC), posted by SE-user coudy
Technically one can do that. But I think this is not what is done in the book mentioned above.

This post imported from StackExchange MathOverflow at 2015-04-28 10:48 (UTC), posted by SE-user MKO

## 2 Answers

+ 6 like - 0 dislike

A non-positive measure allows application of the law of large numbers when a sketch of the standard proof of the convergence can be carried out with the particular measure and variables in question. There are heuristics for spotting when this works, which are pretty intuitive, and the basic heuristic is that you can look at the absolute value of the measure to see how big across the distribution is, how slow the falloff is, and this will tell you how fast the convergence will be. When it looks like it works, Polyakov won't bother justifying it with careful estimates, as this is tedious. But it can be justified if you need to do it.

It is true for any measure that two "independent random variables" (in scare quotes because the measure $\mu$ is no longer positive) x and y have additive "means" (when defined):

$$\langle x + y \rangle = \langle x \rangle + \langle y \rangle$$

and further, if you subtract out the means so that the variables have zero mean, the variance is additive (when defined):

$$\langle (x+y)^2 \rangle = \langle x^2 \rangle + \langle y^2 \rangle$$

The question is whether the "probability density function" of the sum of many values of some function of these variables converges to a delta-function, this is the law of large numbers.

The convergence here is in the sense of distributions, so that the statement is that the "probability density function" $\mu_S$ of the sum $\sum_i F(x_i)$ for these i.i.d variables (with positive/negative distribution) converges to a delta function around the central value, i.e., the integral against any smooth test function converges to the value of the test function at $N \langle F\rangle$ as the number of variables becomes large.

The "distribution" $\mu_S$ is still the convolution of the "distribution" of $\mu_F$:

$$\mu_S = \mu_F * \mu_F * \mu_F ... *\mu_F$$

Where the number of convolutions is N, in your case, $T/\epsilon$. The main step of the proof of the standard central limit theorem consists of taking the Fourier transform of both sides of the equation, and noting that:

$$\mu_S(k) = \mu_F(k)^N$$

So that as long as the Fourier transform values obey the rules:

normalization/zero-center: $\mu_F(0) = 1$

shrinky-ness: $|\mu_F(k)| < |\mu_F(0)|$

Then, for large N, outside of a small region near $k=0$, the Fourier transform of the distribution of the sum will vanish in a way controlled by the cusp-behavior at k=0. The shrinking condition needs to be stated more precisely, you want the Fourier transform to not approach one away from zero, but in the case of interest, this is obvious, because you have something like a Gaussian or exponentially decaying Fourier transform. I said "zero center" instead of "zero mean" because the mean might diverge. You can still define a center by translating the measure until the Fourier transform at 0 is real (and then you can normalize by rescaling so that the value is 1).