# What's the probability distribution of a deterministic signal or how to marginalize dynamical systems?

+ 4 like - 0 dislike
6724 views

In many signal processing calculations, the (prior) probability distribution of the theoretical signal (not the signal + noise) is required.

In random signal theory, this distribution is typically a stochastic process, e.g. a Gaussian or a uniform process.

What do such distributions become in deterministic signal theory?, that is the question.

To make it simple, consider a discrete-time real deterministic signal

$s\left( {1} \right),s\left( {2} \right),...,s\left( {M} \right)$

For instance, they may be samples from a continuous-time real deterministic signal.

By the standard definition of a discrete-time deterministic dynamical system, there exists:

- a phase space $\Gamma$, e.g. $\Gamma \subset \mathbb{R} {^d}$
- an initial condition $z\left( 1 \right)\in \Gamma$
- a state-space equation $f:\Gamma \to \Gamma$ having $z\left( 1 \right)$ in its domain of definition such as $z\left( {m + 1} \right) = f\left[ {z\left( m \right)} \right]$
- an output or observation equation $g:\Gamma \to \mathbb{R}$ such as $s\left( m \right) = g\left[ {z\left( m \right)} \right]$

Hence, by definition we have

$\left[ {s\left( 1 \right),s\left( 2 \right),...,s\left( M \right)} \right] = \left\{ {g\left[ {z\left( 1 \right)} \right],g\left[ {f\left( {z\left( 1 \right)} \right)} \right],...,g\left[ {{f^{M - 1}}\left( {z\left( 1 \right)} \right)} \right]} \right\}$

or, in probabilistic notations

$p\left[ {\left. {s\left( 1 \right),s\left( 2 \right),...,s\left( M \right)} \right|z\left( 1 \right),f,g,\Gamma ,d} \right] = \prod\limits_{m = 1}^M {\delta \left\{ {g\left[ {{f^{m - 1}}\left( {z\left( 1 \right)} \right)} \right] - s\left( m \right)} \right\}}$

Therefore, by total probability and the product rule, the marginal joint prior probability distribution for a discrete-time deterministic signal conditional on phase space $\Gamma$ and its dimension $d$ formally/symbolically writes

$p\left[ {\left. {s\left( 1 \right),s\left( 2 \right),...,s\left( M \right)} \right|\Gamma ,d} \right] = \int\limits_{{\mathbb{R}^\Gamma }} {{\text{D}}g\int\limits_{{\Gamma ^\Gamma }} {{\text{D}}f\int\limits_\Gamma {{{\text{d}}^d}z\left( 1 \right)\prod\limits_{m = 1}^M {\delta \left\{ {g\left[ {{f^{m - 1}}\left( {z\left( 1 \right)} \right)} \right] - s\left( m \right)} \right\}p\left( {z\left( 1 \right),f,g} \right)} } } }$

Should phase space $\Gamma$ and its dimension $d$ be also unknown *a priori*, they should be marginalized as well so that the most general marginal prior probability distribution for a deterministic signal I'm interested in formally/symbolically writes

$p\left[ {s\left( 1 \right),s\left( 2 \right),...,s\left( M \right)} \right] = \sum\limits_{d = 2}^{ + \infty } {\int\limits_{\wp \left( {{\mathbb{R}^d}} \right)} {{\text{D}}\Gamma \int\limits_{{\mathbb{R}^\Gamma }} {{\text{D}}g\int\limits_{{\Gamma ^\Gamma }} {{\text{D}}f\int\limits_\Gamma {{{\text{d}}^d}z\left( 1 \right)\prod\limits_{m = 1}^M {\delta \left\{ {g\left[ {{f^{m - 1}}\left( {z\left( 1 \right)} \right)} \right] - s\left( m \right)} \right\}p\left( {z\left( 1 \right),f,g,\Gamma ,d} \right)} } } } } }$

where ${\wp \left( {{\mathbb{R}^d}} \right)}$ stands for the powerset of ${{\mathbb{R}^d}}$.

Dirac's $\delta$ distributions are certainly welcome to "digest" those very high dimensional integrals. However, we may also be interested in probability distributions like

$p\left[ {s\left( 1 \right),s\left( 2 \right),...,s\left( M \right)} \right] \propto \sum\limits_{d = 2}^{ + \infty } {\int\limits_{\wp \left( {{\mathbb{R}^d}} \right)} {{\text{D}}\Gamma \int\limits_{{\mathbb{R}^\Gamma }} {{\text{D}}g\int\limits_{{\Gamma ^\Gamma }} {{\text{D}}f\int\limits_\Gamma {{{\text{d}}^d}z\left( 1 \right)\int\limits_{{\mathbb{R}^ + }} {{\text{d}}\sigma {\sigma ^{ - M}}{e^{ - \sum\limits_{m = 1}^M {\frac{{{{\left\{ {g\left[ {{f^{m - 1}}\left( {z\left( 1 \right)} \right)} \right] - s\left( m \right)} \right\}}^2}}}{{2{\sigma ^2}}}} }}p\left( {\sigma ,z\left( 1 \right),f,g,\Gamma ,d} \right)} } } } } }$

Please, what can you say about those important probability distributions beyond the fact that they should not be invariant by permutation of the time points, i.e. not De Finetti-exchangeable?

What can you say about such strange looking functional integrals (for the state-space and output equations $f$ and $g$) and even set-theoretic integrals (for phase space $\Gamma$) over sets having cardinal at least ${\beth_2}$? Are they already well-known in some branch of mathematics I do not know yet or are they only abstract nonsense?

More generally, I'd like to learn more about functional integrals in probability theory. Any pointer would be highly appreciated. Thanks.

edited Apr 29, 2016

I don't understand the goal of your question. The only difference between the determinsitic and the stochastic case is that in the dynamics the coefficient of the noise term is zero. Thus one can use all tools for stochastic time series analysis also in the deterministic case - where only the initial condition is random. (That one cannot easily evaluate certain integrals is a problem one everywhere has....)

Are you interested in the discrete or the continuous time case?

Regarding comment 2: I'm interested in both the discrete- and continuous-time cases but the discrete-time one is already sufficiently nasty I believe!

Regarding comment 1: suppose the experimental noise is additive. It is common practice to model the sum of the theoretical signal + noise as a stochastic process and to use the tools from stochastic time series analysis/signal processing.

But there are in fact two radically different cases: either the theoretical signal is itself stochastic or it is deterministic. It appears that most of time we are actually assuming, more or less explicitly, the theoretical signal to be itself stochastic.

From this, it also appears that common tools in stochastic time series/signal processing such as Wiener's classical cross-correlation function may not be not suitable for deterministic signals. Please see this question on MO, which is the motivation underlying this question:

http://mathoverflow.net/questions/236527/is-there-a-bayesian-theory-of-deterministic-signal-prequel-and-motivation-for-m?rq=1

I'm gonna ask it on PO as well.

So, my goal was precisely to fix classical cross-correlation functions for deterministic signals.

For this purpose, in theory I just need to assign a suitable joint probability distribution for the samples of my discrete-time deterministic signals in order to determine more suitable time series/signal processing tools for deterministic signals.

But when you write down such probability probability distributions, by marginalizing 1) the initial condition 2) the state-space equation 3) the output/observation equation 4) and the phase space and its dimension, you fall on seemingly monstrous functional integrals that are still unidentified at this time.

Should those probability distributions for deterministic signals be also usual stochastic processes, in particular should they be invariant by permutation of the time points, then classical time series analysis/signal processing tools would work for both stochastic/random and deterministic theoretical signals.

But should they be different from usual stochastic processes because time still plays an essential role in them, while time plays essentiallu no role in (i.i.d. or De Finetti-exchangeable) stochastic processes, then there would exist two different theories of time series analysis/signal processing, one for stochastic theoretical signals that we know well, the other one for deterministic signals waiting to be developed, to the best of my knowledge, if we can ever define and compute those monstrous functional integrals.

+ 2 like - 0 dislike

A discrete stochastic process for $x_t$ with a deterministic dynamics $x_{t+1}=f(x_t)$ is specified by the distribution of the initial condition.

Thus one models $x_0$ as a random vector $x_0(\omega)$ with a measure $d\mu$ on the space $\Omega$ over which $\omega$ varies, and defines $x_{t+1}(\omega):=f(x_t)(\omega)$. This specifies all expectations $$\langle f(x_0,\ldots,x_t)\rangle=\int d\mu(\omega)f(x_0(\omega),\ldots,x_t(\omega))$$ and hence the (highly singular) joint probability distribution. Working with the functional integral is in my opinion overkill in this case.

if the determinsitc model equation is not known one generally assumes a parametric form $f(x)=F(\theta,x)$ for it. then all expectations above depend on $\theta$ as well, one one can use experimental or data to estimate in the traditional way $\theta$ from a number of empirical expectations.

On the other hand, in practical estimation, one always assumes the presence of process noise and estimates it together with the noise in the initial conditions, the noise in the observations, and the parameters of the process. The process can be taken to be deterministic if the standard deviation of the process noise is negligible compared with the signal according to some test for negligible covariance parameters. Indeed, this is the way to numerically distinguish deterministic chaotic time series from stochastic ones. In particular, one can use all standard statistical tools for time series.

answered Apr 28, 2016 by (15,787 points)
edited May 2, 2016

@ArnoldNeumaier

Yes Arnold, that was just a comment following your answer, not an answer to my own question.

I don't understand how my comment became an answer, I've to be more careful!

I'm preparing a comment following your answer's update and I will post it ASAP.

See you... Fabrice

@ArnoldNeumaier

I allow myself to reply Arnold please. To be very short, I can tell you
that Poincar&eacute; is right, I've been studying this particular point
over the last 20 years. Hint/starting point:

Mister Poincar&eacute;, you are wrong. Your works prove that some people
can think only nonsense.

Vladimir Ilitch Oulianov, better known as Lenin, Materialism and
empiriocriticism, 1908.

Lenin --> Stalin --> Kolmogorov (Stalin prize, 1941)

Kolmogorov was definitely not allowed to follow Poincar&eacute; (= one
way ticket to the Gulag) but, of course, he would have followed.

Read the Grunbegriffe one more time very carefully then, right after,
one of his last paper, Foundations of probability theory, 1983...

want. My question does not come from out of space...

Fabrice

@FabricePautot

I have just reinstalled some comments that got lost due to our recent techical problems.
Maybe you would like to consider registering an account, such that I can correctly assign all of your contributions to it?

Yes, definitely, I need to register.

I'm happy to see that our discussion with Arnold has finally been restored.

One remark please: French accents have been corrupted due to those recent technical difficulties: for instance Poincaré now displays as Poincar&eacute.

Kindest regards, Fabrice.

@FabricePautot

Ok, I have just created a thread to claim unregistered contributions

http://physicsoverflow.org/36103/claims-of-unregistered-contributions

Maybe you can answer it as soon as you have registered?

After your contributions are assigned to your registered account, you will have full control over them to edit or correct them etc ...

If you want to answer practical questions you can always add a tiny
amount of Gaussian process noise and then take in the answer the limit
of vanishing variance.

@ArnoldNeumaier

Dear Arnold,

Thank you again for your kind reply. Ok, this is my last comment.

As you said, you are still considering the problem of
estimating/identifying/modelling a dynamical system from (noisy)
experimental data. This is a kind of problems I know quite well since
I'm earning my living modelling and processing nonlinear deterministic
signals, in particular physiological signals from
electroencephalography, electromyography, electrooculography or MRI and
Computed Tomography functional imaging of the brain.

My PO question/problem arises from some practical problems in this area:
quantifying the dependency between two signals/time series. For random
signals having for instance improper uniform distribution, it is easy to
prove (see Scargle's paper) that a sufficient statistics for this
problem is the classical covariance. But for deterministic signals,
that's a completely different story: we have many tools such as
nonlinear dependencies, instantaneous phase synchronization via Hilbert
transform, much entropic stuff, etc. See for instance this thesis from
TU Wien:

http://publik.tuwien.ac.at/files/PubDat_189752.pdf

But as far as know, all of them are adhockeries from the point of view
of Bayesian probability theory: they are not derived from the joint
marginal posterior probability distribution for the current problem. In
particular, for a given problem, there should be only one sufficient
statistics, not dozens of them!

In theory, we just need to compute this joint marginal posterior
probability distribution and marginalize all nuisance parameters in
order to derive our sufficient statistics. See Scargle's paper for an
important, illuminating example. But in order to that, we need to supply
the prior probability distribution of our signal(s).

Heres is the main problem and the purpose of my PO question: again, in
theory, we know how to compute the (marginal) prior probability
distribution of our deterministic signal: just marginalize all nuisance
"parameters" that are (at most) the initial condition, the state-space
equation, the output equation, the phase space and its dimension for
dynamical systems.

But in "practice" it is not yet clear, at least to my poor
understanding, if we can well define (noninformative) joint prior
probability distributions over those parameters because some of them are
not usual random variables but functions. Subsequently, it is even less
clear how to compute the required marginal prior probability
distribution of the signal(s) from those hypothetical joint prior
probability distributions.

So, starting from some practical problems for which no sufficient
satistics seems to be known to the best of my knowledge, we finally fall
on a purely theoretical and mathematical one (which has nothing to do
with dynamical system identification/estimation or modelling), which is
to my mind of fundamental interest because its solution could give birth
to a new theory of deterministic signal (processing) if it ever appears
that (noninformative) prior probability distributions of deterministic
signals are different from usual stochastic processes, for instance not
invariant by permutation of the time points/De Finetti-exchangeable
because the time, in particular the time arrow, still plays an essential
role in them. Contrary to your point of view, at this point I am unable
to see any reason why those prior probability distributions should
necessarily match usual usual, e.g. i.i.d. or De Finetti-exchangeable
stochastic processes. On the contrary, some of us (at least myself!)
could/would conjecture that they are NOT De Finetti-exchangeable because
they are not yet ready to abandon the time (arrow) within deterministic
dynamical system theory.

That was my last chance to explain my problem.

Finally, I used to believe, together with Henri Poincar&eacute;, that
time (arrow), dynamical system theory and Bayesian probability theory
were all part of (mathematical) physics. Please see his Calcul des
Probabilit&eacute;s:

http://visualiseur.bnf.fr/Visualiseur?Destination=Gallica&O=NUMM-29064

Time to say goodbye to you Arnold. Thanks for the many comments!!!

Hope I will continue the discussion with other good fellows.

+ 1 like - 0 dislike

Conditioning by event D = {'System deterministic'} you are not restricting the search space. There are infinite non-parametrized functions that will agree with it. You will find such a deterministic function when $\mid \Omega \mid$= 1 of the chosen probability space. Formulated as an optimization problem, D states, in the best case, that such a minimum exists.

The question closely related to Kolmogorov complexity, algorithmic information theory and machine learning.

 Please use answers only to (at least partly) answer questions. To comment, discuss, or ask for clarification, leave a comment instead. To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL. Please consult the FAQ for as to how to format your post. This is the answer box; if you want to write a comment instead, please use the 'add comment' button. Live preview (may slow down editor)   Preview Your name to display (optional): Email me at this address if my answer is selected or commented on: Privacy: Your email address will only be used for sending these notifications. Anti-spam verification: If you are a human please identify the position of the character covered by the symbol $\varnothing$ in the following word:p$\hbar$ysicsOv$\varnothing$rflowThen drag the red bullet below over the corresponding character of our banner. When you drop it there, the bullet changes to green (on slow internet connections after a few seconds). Please complete the anti-spam verification