# What's the probability distribution of a deterministic signal or how to marginalize dynamical systems?

+ 4 like - 0 dislike
1304 views

In many signal processing calculations, the (prior) probability distribution of the theoretical signal (not the signal + noise) is required.

In random signal theory, this distribution is typically a stochastic process, e.g. a Gaussian or a uniform process.

What do such distributions become in deterministic signal theory?, that is the question.

To make it simple, consider a discrete-time real deterministic signal

$s\left( {1} \right),s\left( {2} \right),...,s\left( {M} \right)$

For instance, they may be samples from a continuous-time real deterministic signal.

By the standard definition of a discrete-time deterministic dynamical system, there exists:

- a phase space $\Gamma$, e.g. $\Gamma \subset \mathbb{R} {^d}$
- an initial condition $z\left( 1 \right)\in \Gamma$
- a state-space equation $f:\Gamma \to \Gamma$ having $z\left( 1 \right)$ in its domain of definition such as $z\left( {m + 1} \right) = f\left[ {z\left( m \right)} \right]$
- an output or observation equation $g:\Gamma \to \mathbb{R}$ such as $s\left( m \right) = g\left[ {z\left( m \right)} \right]$

Hence, by definition we have

$\left[ {s\left( 1 \right),s\left( 2 \right),...,s\left( M \right)} \right] = \left\{ {g\left[ {z\left( 1 \right)} \right],g\left[ {f\left( {z\left( 1 \right)} \right)} \right],...,g\left[ {{f^{M - 1}}\left( {z\left( 1 \right)} \right)} \right]} \right\}$

or, in probabilistic notations

$p\left[ {\left. {s\left( 1 \right),s\left( 2 \right),...,s\left( M \right)} \right|z\left( 1 \right),f,g,\Gamma ,d} \right] = \prod\limits_{m = 1}^M {\delta \left\{ {g\left[ {{f^{m - 1}}\left( {z\left( 1 \right)} \right)} \right] - s\left( m \right)} \right\}}$

Therefore, by total probability and the product rule, the marginal joint prior probability distribution for a discrete-time deterministic signal conditional on phase space $\Gamma$ and its dimension $d$ formally/symbolically writes

$p\left[ {\left. {s\left( 1 \right),s\left( 2 \right),...,s\left( M \right)} \right|\Gamma ,d} \right] = \int\limits_{{\mathbb{R}^\Gamma }} {{\text{D}}g\int\limits_{{\Gamma ^\Gamma }} {{\text{D}}f\int\limits_\Gamma {{{\text{d}}^d}z\left( 1 \right)\prod\limits_{m = 1}^M {\delta \left\{ {g\left[ {{f^{m - 1}}\left( {z\left( 1 \right)} \right)} \right] - s\left( m \right)} \right\}p\left( {z\left( 1 \right),f,g} \right)} } } }$

Should phase space $\Gamma$ and its dimension $d$ be also unknown *a priori*, they should be marginalized as well so that the most general marginal prior probability distribution for a deterministic signal I'm interested in formally/symbolically writes

$p\left[ {s\left( 1 \right),s\left( 2 \right),...,s\left( M \right)} \right] = \sum\limits_{d = 2}^{ + \infty } {\int\limits_{\wp \left( {{\mathbb{R}^d}} \right)} {{\text{D}}\Gamma \int\limits_{{\mathbb{R}^\Gamma }} {{\text{D}}g\int\limits_{{\Gamma ^\Gamma }} {{\text{D}}f\int\limits_\Gamma {{{\text{d}}^d}z\left( 1 \right)\prod\limits_{m = 1}^M {\delta \left\{ {g\left[ {{f^{m - 1}}\left( {z\left( 1 \right)} \right)} \right] - s\left( m \right)} \right\}p\left( {z\left( 1 \right),f,g,\Gamma ,d} \right)} } } } } }$

where ${\wp \left( {{\mathbb{R}^d}} \right)}$ stands for the powerset of ${{\mathbb{R}^d}}$.

Dirac's $\delta$ distributions are certainly welcome to "digest" those very high dimensional integrals. However, we may also be interested in probability distributions like

$p\left[ {s\left( 1 \right),s\left( 2 \right),...,s\left( M \right)} \right] \propto \sum\limits_{d = 2}^{ + \infty } {\int\limits_{\wp \left( {{\mathbb{R}^d}} \right)} {{\text{D}}\Gamma \int\limits_{{\mathbb{R}^\Gamma }} {{\text{D}}g\int\limits_{{\Gamma ^\Gamma }} {{\text{D}}f\int\limits_\Gamma {{{\text{d}}^d}z\left( 1 \right)\int\limits_{{\mathbb{R}^ + }} {{\text{d}}\sigma {\sigma ^{ - M}}{e^{ - \sum\limits_{m = 1}^M {\frac{{{{\left\{ {g\left[ {{f^{m - 1}}\left( {z\left( 1 \right)} \right)} \right] - s\left( m \right)} \right\}}^2}}}{{2{\sigma ^2}}}} }}p\left( {\sigma ,z\left( 1 \right),f,g,\Gamma ,d} \right)} } } } } }$

Please, what can you say about those important probability distributions beyond the fact that they should not be invariant by permutation of the time points, i.e. not De Finetti-exchangeable?

What can you say about such strange looking functional integrals (for the state-space and output equations $f$ and $g$) and even set-theoretic integrals (for phase space $\Gamma$) over sets having cardinal at least ${\beth_2}$? Are they already well-known in some branch of mathematics I do not know yet or are they only abstract nonsense?

More generally, I'd like to learn more about functional integrals in probability theory. Any pointer would be highly appreciated. Thanks.

edited Apr 29, 2016

I don't understand the goal of your question. The only difference between the determinsitic and the stochastic case is that in the dynamics the coefficient of the noise term is zero. Thus one can use all tools for stochastic time series analysis also in the deterministic case - where only the initial condition is random. (That one cannot easily evaluate certain integrals is a problem one everywhere has....)

Are you interested in the discrete or the continuous time case?

Regarding comment 2: I'm interested in both the discrete- and continuous-time cases but the discrete-time one is already sufficiently nasty I believe!

Regarding comment 1: suppose the experimental noise is additive. It is common practice to model the sum of the theoretical signal + noise as a stochastic process and to use the tools from stochastic time series analysis/signal processing.

But there are in fact two radically different cases: either the theoretical signal is itself stochastic or it is deterministic. It appears that most of time we are actually assuming, more or less explicitly, the theoretical signal to be itself stochastic.

From this, it also appears that common tools in stochastic time series/signal processing such as Wiener's classical cross-correlation function may not be not suitable for deterministic signals. Please see this question on MO, which is the motivation underlying this question:

http://mathoverflow.net/questions/236527/is-there-a-bayesian-theory-of-deterministic-signal-prequel-and-motivation-for-m?rq=1

I'm gonna ask it on PO as well.

So, my goal was precisely to fix classical cross-correlation functions for deterministic signals.

For this purpose, in theory I just need to assign a suitable joint probability distribution for the samples of my discrete-time deterministic signals in order to determine more suitable time series/signal processing tools for deterministic signals.

But when you write down such probability probability distributions, by marginalizing 1) the initial condition 2) the state-space equation 3) the output/observation equation 4) and the phase space and its dimension, you fall on seemingly monstrous functional integrals that are still unidentified at this time.

Should those probability distributions for deterministic signals be also usual stochastic processes, in particular should they be invariant by permutation of the time points, then classical time series analysis/signal processing tools would work for both stochastic/random and deterministic theoretical signals.

But should they be different from usual stochastic processes because time still plays an essential role in them, while time plays essentiallu no role in (i.i.d. or De Finetti-exchangeable) stochastic processes, then there would exist two different theories of time series analysis/signal processing, one for stochastic theoretical signals that we know well, the other one for deterministic signals waiting to be developed, to the best of my knowledge, if we can ever define and compute those monstrous functional integrals.

+ 2 like - 0 dislike

A discrete stochastic process for $x_t$ with a deterministic dynamics $x_{t+1}=f(x_t)$ is specified by the distribution of the initial condition.

Thus one models $x_0$ as a random vector $x_0(\omega)$ with a measure $d\mu$ on the space $\Omega$ over which $\omega$ varies, and defines $x_{t+1}(\omega):=f(x_t)(\omega)$. This specifies all expectations $$\langle f(x_0,\ldots,x_t)\rangle=\int d\mu(\omega)f(x_0(\omega),\ldots,x_t(\omega))$$ and hence the (highly singular) joint probability distribution. Working with the functional integral is in my opinion overkill in this case.

if the determinsitc model equation is not known one generally assumes a parametric form $f(x)=F(\theta,x)$ for it. then all expectations above depend on $\theta$ as well, one one can use experimental or data to estimate in the traditional way $\theta$ from a number of empirical expectations.

On the other hand, in practical estimation, one always assumes the presence of process noise and estimates it together with the noise in the initial conditions, the noise in the observations, and the parameters of the process. The process can be taken to be deterministic if the standard deviation of the process noise is negligible compared with the signal according to some test for negligible covariance parameters. Indeed, this is the way to numerically distinguish deterministic chaotic time series from stochastic ones. In particular, one can use all standard statistical tools for time series.

answered Apr 28, 2016 by (14,437 points)
edited May 2, 2016

I added a paragraph to my answer to address this. The parameter $\theta$ there may include information about an unknown dimension of the state space. I see no necessity for introducing all the complexity you talk about. It is irrelevant for the estimation problem.

Moreover, if you don't know the dynamics how can you know that it is a deterministic dynamics? This is why it is wise to include process noise into the model. It usually gives a more parsimonious description still consistent with all data.

(Note that your ping worked, contrary to what you said.)

@ArnoldNeumaier

Yes Arnold, that was just a comment following your answer, not an answer to my own question.

I don't understand how my comment became an answer, I've to be more careful!

I'm preparing a comment following your answer's update and I will post it ASAP.

See you... Fabrice

Given a finite amount of indormation only. the mere fact that the dynamics is determinsitic without specifying a parametric law says nothing at all, Any finite set of data can be exactly explained by a smooth deterministic model. (This is an ill-posed interpolation problem, with infinitely many solutions) Typically, the estimated solutions just won't generalize to additional data. This is the reason why in practice one assumes noisy dynamics.

@ArnoldNeumaier

Hello Arnold,

Of course, I perfectly agree that any finite set of data can be exactly explained by a smooth deterministic model (and infinitely many of them). This is still true for a countably infinite set of data.

But does this really imply that the mere fact that the dynamics is deterministic without specifying a parametric law says nothing at all?

This is exactly what I'd like to know and the only way I see to answer this question is to compute the marginal, unconditional joint prior probability distribution of a deterministic signal.

Do you see another way? (in my understanding, it might be possible to answer this question without computing the unconditional probability distribution of a deterministic signal. For instance, it might be possible to prove that it is (NOT) De Finetti-exchangeable without computing it explicitly but I don't see how this could be done).

But your point is highly relevant: there are many, so many state-space equations that it is not easy to see how to marginalize them at a first glance. But only the images of a finite (or countably infinite) number $M$ of points under those functions are relevant for our problem in the discrete-time case. For this reason, we may need to consider the sets of all functions having the same given images in order to compute the required functional integral and to marginalize all possible functions/state-space equations $f$...

Yes, definitely, I need to register.

I'm happy to see that our discussion with Arnold has finally been restored.

One remark please: French accents have been corrupted due to those recent technical difficulties: for instance Poincaré now displays as Poincar&eacute.

Kindest regards, Fabrice.

@FabricePautot

I have just reinstalled some comments that got lost due to our recent techical problems.
Maybe you would like to consider registering an account, such that I can correctly assign all of your contributions to it?

@FabricePautot

Ok, I have just created a thread to claim unregistered contributions

http://physicsoverflow.org/36103/claims-of-unregistered-contributions

Maybe you can answer it as soon as you have registered?

After your contributions are assigned to your registered account, you will have full control over them to edit or correct them etc ...

+ 1 like - 0 dislike

Conditioning by event D = {'System deterministic'} you are not restricting the search space. There are infinite non-parametrized functions that will agree with it. You will find such a deterministic function when $\mid \Omega \mid$= 1 of the chosen probability space. Formulated as an optimization problem, D states, in the best case, that such a minimum exists.

The question closely related to Kolmogorov complexity, algorithmic information theory and machine learning.

 Please use answers only to (at least partly) answer questions. To comment, discuss, or ask for clarification, leave a comment instead. To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL. Please consult the FAQ for as to how to format your post. This is the answer box; if you want to write a comment instead, please use the 'add comment' button. Live preview (may slow down editor)   Preview Your name to display (optional): Email me at this address if my answer is selected or commented on: Privacy: Your email address will only be used for sending these notifications. Anti-spam verification: If you are a human please identify the position of the character covered by the symbol $\varnothing$ in the following word:p$\hbar$ysic$\varnothing$OverflowThen drag the red bullet below over the corresponding character of our banner. When you drop it there, the bullet changes to green (on slow internet connections after a few seconds). To avoid this verification in future, please log in or register.