And Other Withered Stumps Of Time

Poisson Process: I


I am taking the course Introduction to Stochastic Processes offered by my institute and taught by Parthanil Roy this semester, and wanted to jot down some notes interspersed with my own thoughts somewhere, hence this post. We have started by discussing about point processes, so that is what I expect to talk about in the first few posts of the series. I have recycled the exposition and terminology present in the lecture notes from the instructor of the course, and the text “Lectures on the Poisson Process” by Last and Penrose, also available online, which I am reading independently.

A point process on a space \(\mathbb{X}\) is a random collection of countably many points on \(\mathbb{X}\). One can think of it as a measure on the configuration space of countably many points on \(\mathbb{X}\) but this becomes rather uselessly abstract. One way to understand this is to imagine \(\mathbb{X}\) as a dartboard, and we are randomly throwing countably many darts at it. We can fix any measurable subset \(B \subset \mathbb{X}\) of the dartboard and ask how many darts hit \(B\). This itself is a random variable, and taking expectation of this variable defines a measure \(\lambda\) on \(\mathbb{X}\), where \(\lambda(B)\) is the average number, or the intensity of darts that hit \(B\). As we vary \(B \subset \mathbb{X}\), probing the whole dartboard by its measurable subsets, the random variables which count the number of darts in the region \(B\) of the dartboard determines the full random configuration of darts.

To make this setup precise, we’ll change our viewpoint. Let \((\mathbb{X}, \mathcal{X})\) be a measurable space, and \(M(\mathbb{X})\) be the space of \(\Bbb N_0\)-valued measures on this space. Moreover let \(\overline{M}(\mathbb{X})\) be the space of countable linear combination of elements of \(M(\mathbb{X})\); we shall think of elements of this space as counting measures on \(\mathbb{X}\), although generally they can be more complicated than counting measures, which are really linear combinations of Dirac masses at distinct points on \(\mathbb{X}\). We can equip \(\overline{M}(\mathbb{X})\) with a \(\sigma\)-algebra \(\mathcal{M}\) generated by collection of all cylinder subsets of the form \(\{\mu \in \overline{M}(\mathbb{X}) : \mu(B) = k\}\), where \(B \in \mathcal{X}\), \(k \in \Bbb N_0 \cup \{\infty\}\). Define a point process on \((\mathbb{X}, \mathcal{X})\) to be a measurable map \(\eta : (\Omega, \mathcal{A}, \Bbb P) \to (\overline{M}(\mathbb{X}), \mathcal{M})\) from some probability measure space \((\Omega, \mathcal{A}, \Bbb P)\).

This looks different from what we were describing in the first paragraph but all we did was a currying trick; for any fixed measurable set \(B \subset \mathbb{X}\), denote by \(\eta(B)\) to be the random variable \((\Omega, \mathcal{A}, \Bbb P) \to \Bbb N_0 \cup \{\infty\}\), \(\omega \mapsto \eta(\omega)(B)\), which is the number of points of \(\eta\) in \(B\). The intensity measure on \((\mathbb{X}, \mathcal{X})\) is given by \(\lambda(B) := \Bbb E \eta(B)\). What this perspective allows us to see is that a point process can alternatively be described as a random counting measure on \((\mathbb{X}, \mathcal{X})\). Alternatively, one can unfurl the whole thing to a map \(\eta : \Omega \times \mathcal{X} \to \Bbb N_0 \cup \{\infty\}\), \((\omega, B) \mapsto \eta(\omega)(B)\). This is called the transition kernel of the process.

A simple example of a point process is as follows. Let \(X_1, \cdots, X_n : (\Omega, \mathcal{A}, \Bbb P) \to (\mathbb{X}, \mathcal{X})\) be \(\mathbb{X}\)-valued random variables, or random points on \(\mathbb{X}\), which are independently distributed with the same law \(\mu\). Then define \(\eta := \delta_{X_1} + \cdots + \delta_{X_n}\) where \(\delta_x\) denotes the Dirac mass at \(x \in \mathbb{X}\). This is called the binomial process with sample size \(n\) and sampling distribution \(\mu\) on \(\mathbb{X}\), as

\[\displaystyle \Bbb P(\eta(B) = k) = \binom{n}{k} \mu(B)^k (1 - \mu(B))^{n-k}\]

A point process \(\eta\) on \(\mathbb{X}\) is called proper if there exists \(\mathbb{X}\)-valued random variables \(X_1, X_2, \cdots\) and a \(\mathbb{N}_0 \cup \{\infty\}\)-valued random variable \(Z\) such that almost surely, \(\eta = \sum_{i = 1}^Z \delta_{X_i}\).

A class of examples of central attention for us are Poisson processes. Let us begin by introducing such a process on the non-negative real line \([0, \infty)\) for ease of exposition.

A countable collection of points on \([0, \infty)\) can be imagined as marking off the times - starting the clock at \(0\) - that certain random events of similar nature occur; for example, arrival time of buses in a bus-stop, or calls in a telephone exchange, or perhaps decay of a particle (we shall stick to the second analogy for its relative simplicity and antiquity). In such an experiment we have various arrival times \(0 = S_0 \leq S_1 \leq S_2 \leq S_3 \leq \cdots\) for the phone calls. Let us denote the consecutive differences \(X_i = S_i - S_{i-1}\), \(i \in \Bbb N\) to be the interarrival times between the \((i-1)\)-th and the \(i\)-th phonecalls. Motivated from statistical and physical models, we shall assume that the interarrival times are identically and independently distributed exponential variables, i.e., \(X_1, X_2, \cdots \stackrel{\mathrm{iid}}{\sim} \mathrm{Exp}(\alpha)\). This describes our point process completely; it is a random configuration of countably many points in \([0, \infty)\) including \(0\) such that the spacing between two consecutive points is exponential with parameter \(\alpha\). We call this the homogeneous Poisson point process \(\wp_\alpha\) on \([0, \infty)\) with intensity \(\alpha\).

But we would like to put this in the context of our discussion earlier involving the kernel of a point process. So let us start off by denoting \(N_t = \max\{n \geq 0 : S_n \leq t\}\) to be the number of phonecalls that arrive upto time \(t\). This is the number of points of \(\wp_\alpha\) in \([0, t]\) in our language from earlier. Note that \(N_t\) is apriori a \(\Bbb N_0 \cup \{\infty\}\)-valued random variable for every \(t \geq 0\). Observe:

\[\displaystyle \Bbb P(N_t = n) = \Bbb P(S_n \leq t < S_{n+1}) = \Bbb P(S_n \leq t) - \Bbb P(S_{n+1} \leq t)\]

Remember that \(S_m = X_1 + \cdots + X_m\) where \(X_i\) are i.i.d. exponential with parameter \(\alpha\). Thus, \(S_m \sim \mathrm{Gamma}(m, \alpha)\). Plugging the cdf of the Gamma distribution in the expression above, we obtain for any \(n \in \Bbb N_0\),

\[\displaystyle \begin{aligned}\Bbb P(N_t = n) &= \int_{0}^t \frac{\alpha^n}{(n-1)!} s^{n-1} e^{-\alpha s} ds - \int_{0}^t \frac{\alpha^{n+1}}{n!} s^n e^{-\alpha s} ds \\ &= \frac{\alpha^n}{n!} e^{-\alpha t} t^n = \frac{(\alpha t)^n}{n!} e^{-\alpha t}\end{aligned}\]

Using a simple application of integration by parts on the first integral, writing \(s^{n-1} ds = d(s^n)/n\). Thus, we obtain \(\Bbb P(N_t < \infty) = \sum_{n \in \Bbb N_0} \Bbb P(N_t = n) = 1\), hence \(N_t\) can be treated as a \(\Bbb N_0\)-valued random variable, in which case \(N_t \sim \mathrm{Poi}(\alpha t)\) for all \(t > 0\) by comparing pmf’s. In the case \(t = 0\), we simply have \(N_0 \equiv 0\).

Next, we would like to compute more generally the number of phone calls that arrive in a specific time-interval \(I \subset [0, \infty)\), that is to say, the number of points of the point process \(\wp_\alpha\) in \(I\). To proceed, fix \(t_0 > 0\) and consider the interval \(I = [t_0, t_0 + t]\). Then the number of interest is \(\widetilde{N}_t := N_{t_0 + t} - N_{t_0}\). Taking cue from the telephone exchange analogy, we set up the whole process again, but starting our clock at time \(t_0\). That is to say, suppose the telephone operator slept through their alarm and arrives at the telephone exchange office late. They hurriedly start jotting down the arrival times of the calls after time \(t_0\), which would then be

\[t_0 \leq S_{N_{t_0}+1} \leq S_{N_{t_0}+2} \leq \cdots\]

The relevant interarrival times are then given by \(\widetilde{X}_i = S_{N_{t_0} + i} - S_{N_{t_0} + i-1} = X_{N_{t_0} + i}\) for all \(i \geq 2\) and \(\widetilde{X}_1 = S_{N_{t_0} + 1} - t_0\). Define \(\widetilde{S}_n = \widetilde{X}_1 + \cdots + \widetilde{X}_n\) and \(\widetilde{S}_0 \equiv 0\). What we have done is simply the following: we have chopped off the original Poisson process on \([0, \infty)\) at time \(t_0\) and translated \([t_0, \infty)\) back to \([0, \infty)\) to define a new point process, by using the random configuration of countably many points that appear after \(t_0\). It turns out that this is also a Poisson process with intensity \(\alpha\)!

To see this, all we have to do is to prove \(\widetilde{X}_1, \widetilde{X}_2, \widetilde{X}_3, \cdots \stackrel{\mathrm{iid}}{\sim} \mathrm{Exp}(\alpha)\). Observe,

\[\displaystyle \begin{aligned}& \Bbb P(\widetilde{X}_1 > x_1, \cdots, \widetilde{X}_k > x_k, N_{t_0} = n) \\ &= \Bbb P(S_{n+1} > t_0 + x_1, X_{n+2} > x_2, \cdots, X_{n+k} > x_k, S_n \leq t_0 < S_{n+1}) \\ &= \Bbb P(S_n \leq t_0, S_{n+1} > t_0 + x_1, X_{n+2} > x_2, \cdots, X_{n+k} > x_k) \\ &= \Bbb P(S_n \leq t_0, S_{n+1} > t_0 + x_1) \Bbb P(X_{n+2} > x_2, \cdots, X_{n+k} > x_k) \end{aligned}\]

where in the last equality we used \((S_n, S_{n+1}) \perp \!\!\! \perp (X_{n+2}, \cdots, X_{n+k})\). A long calculation involving the law of conditional probability ensues:

\[\displaystyle \begin{aligned}\Bbb P(S_n \leq t_0, S_{n+1} > t_0 + x_1) &= \Bbb P(S_n \leq t_0, X_{n+1} > t_0 + x_1 - S_n) \\ &= \int_0^{t_0} \Bbb P(X_{n+1} > t_0 + x_1 - u \vert S_n = u) f_{S_n}(u) du \\ &= \int_0^{t_0} \Bbb P(X_{n+1} > t_0 + x_1 - u) f_{S_n}(u) du \\ &= \int_0^{t_0} e^{-\alpha(t_0 + x_1 - u)} f_{S_n}(u) du \\ & = e^{-\alpha x_1} \int_0^{t_0} e^{-\alpha(t_0 - u)} f_{S_n}(u) du \\ &= \Bbb P(X_{n+1}> x_1) \int_0^{t_0} \Bbb P(X_{n+1} > t_0 - u) f_{S_n}(u) du \\ &= \Bbb P(X_{n+1} > x_1) \int_0^{t_0} \Bbb P(X_{n+1} > t_0 - u\vert S_n = u) f_{S_n}(u) du \\ &= \Bbb P(X_{n+1} > x_1) \Bbb P(S_n \leq t_0, X_{n+1} > t_0 - S_n) \\ &= \Bbb P(X_{n+1} > x_1) \Bbb P(S_n \leq t_0, S_{n+1} > t_0) \\ &= \Bbb P(X_{n+1} > x_1) \Bbb P(N_{t_0} = n)\end{aligned}\]

Substituting this back in the earlier equation, we obtain:

\[\displaystyle \Bbb P(\widetilde{X}_1 \!>\! x_1, \cdots, \widetilde{X}_k\! >\! x_k, N_{t_0} \!=\! n) = \Bbb P(X_{n+1}\! >\! x_1) \cdots \Bbb P(X_{n+k}\! >\! x_k) \Bbb P(N_{t_0} \!=\! n)\]

This is of course expected, because the identity here expresses the fact that in probability the new Poisson process is obtained from shifting the old Poisson process back by time \(t_0\). Note that this not only proves that the variables \(\widetilde{X}_i\) are i.i.d. exponential with parameter \(\alpha\) (replace the variables \(X_{n+i}\) on the right hand side of the identity by the equally distributed exponential variables \(X_i\), and sum over \(n\)), but also that \((\widetilde{X}_1, \widetilde{X}_2, \cdots) \perp \!\!\!\perp N_{t_0}\). The new Poisson process has forgotten how many calls arrived before time \(t_0\). This is the most rudimentary form of Markov property of stochastic processes.

Observe now that \(\widetilde{N}_t = \max\{n \geq 0 : \widetilde{S}_n \leq t\}\) is just the number of phonecalls which arrived before time \(t\) in the new chopped off Poisson process with intensity \(\alpha\), so of course, \(\widetilde{N}_t \sim \mathrm{Poi}(\alpha t)\) as well. This proves that the number of points of \(\wp_\alpha\) in \(I \subset [0, \infty)\) follows the Poisson distribution with parameter \(\alpha \vert I\vert\), i.e., \(\wp_\alpha(I) \sim \mathrm{Poi}(\alpha \vert I\vert )\). Therefore, we also have that the intensity measure induced by \(\rho_\alpha\) on \([0, \infty)\) is the Lebesgue measure scaled by \(\alpha\). The fact that \(N_t \perp \!\!\! \perp N_{t + s} - N_t\) for any \(t, s \geq 0\) indicates in general that if \(0 \leq t_1 \leq t_2 \leq \cdots \leq t_k\) then \(N_{t_1}, N_{t_2 - t_1}, \cdots, N_{t_k - t_{k-1}}\) are independent random variables, which is to say, if \(I_1, I_2, \cdots, I_k \subset \Bbb R\) are pairwise disjoint intervals then \(\wp_\alpha(I_1), \cdots, \wp_\alpha(I_k)\) are independent random variables.

In general let \((\Bbb{X}, \mathcal{X})\) be a measurable space and \(\lambda\) be a \(\sigma\)-finite measure on \(\Bbb{X}\). We define a (inhomogeneous) Poisson point process \(\wp_\lambda\) with intensity measure \(\lambda\) to be a point process such that \(B \in \mathcal{X}\), \(\wp_\lambda(B) \sim \mathrm{Poi}(\lambda(B))\) and for every collection \(B_1, \cdots, B_n \in \mathcal{X}\) of pairwise disjoint sets, \(\wp_\lambda(B_1), \cdots, \wp_\lambda(B_n)\) are independent random variables.

We have constructed such a process on \([0, \infty)\). It is worthwhile to note that in this case the full package of the Poisson process can be presented as just the stochastic process \(\{N_t\}_{t \in [0, \infty)}\) given by the number of phonecalls upto time \(t\), because of linearity of the underlying space; namely, \(\wp_\alpha([a, b]) = N_b - N_a\) for any interval \([a, b] \subset \Bbb R\), so it suffices to specify the variables \(N_t\) for all \(t \geq 0\) to recover the whole point process. \(\{N_t\}_{t \in [0, \infty)}\) is a continuous-time stochastic process with independent Poisson increments, which are precisely what the two properties in the definition of a general Poisson process above are trying to capture. In the next post we shall construct a Poisson process on any measurable space with a given \(\sigma\)-finite measure.

Finally, let us close this conversation with a few words on the Markov property. We view the homogeneous Poisson process with intensity \(\alpha\) on \([0, \infty)\) as a random collection of countably many points on \([0, \infty)\) as before in the telephone exchange model. We claim that for any fixed \(n \geq 1\), the joint distribution \((S_1, \cdots, S_n)\vert S_{n+1} = s\) of the first \(n\) points given the knowledge of the \((n+1)\)-th point has the same distribution as the order statistics vector \((U_{(1)}, \cdots, U_{(n)})\) of a sample \(U_1, \cdots, U_n \stackrel{\mathrm{iid}}{\sim} \mathrm{Unif}(0, s)\) of \(n\) uniform points.

To prove this, define a linear transformation \(T(x_1, \cdots, x_{n+1}) = (x_1, x_1+x_2, \cdots, x_1+\cdots+x_{n+1})\) for all \(\mathbf{x} \in (0, \infty)^{n+1}\). This has range \(\{\mathbf{s} \in \Bbb R^n : 0 < s_1 < \cdots < s_{n+1}\}\) and is invertible on the range, with inverse given by \(T^{-1}(s_1, \cdots, s_{n+1}) = (s_1, s_2 - s_1, \cdots, s_{n+1} - s_n)\). Clearly \(\det(T) = 1\) as the matrix of \(T\) is upper triangular with \(1\)’s along the diagonal. By the change of density formula,

\[\displaystyle \begin{aligned}f_{S_1, \cdots, S_{n+1}}(s_1, s_2, \cdots, s_{n+1}) & = f_{X_1, \cdots, X_{n+1}}(s_1, s_2 - s_1, \cdots, s_{n+1} - s_n) \\ &= \lambda e^{-\lambda s_1} \lambda e^{-\lambda (s_2 - s_1)} \cdots \lambda e^{-\lambda (s_{n+1} - s_n)} \mathbf{1}_{(0 < s_1 < \cdots < s_{n+1})} \\ &= \lambda^{n+1}e^{-\lambda s_{n+1}} \mathbf{1}_{(0 < s_1 < \cdots < s_{n+1})} \end{aligned}\]

\(S_{n+1}\) is distributed as \(\text{Gamma}(n+1, \lambda)\), hence \(f_{S_{n+1}}(s) = \lambda^{n+1} s^n e^{-\lambda s}/n! \cdot \mathbf{1}_{(s > 0)}\), so the conditional pdf of the conditional distribution \((S_1, \cdots, S_n)\vert S_{n+1} = s\) is given by

\[\displaystyle \begin{aligned} f_{S_1 \cdots S_n \vert S_{n+1}}(s_1, \cdots, s_n \vert s) = \frac{f_{S_1, \cdots, S_{n+1}}(s_1, \cdots, s_n, s)}{f_{S_{n+1}}(s)} & = \frac{\lambda^{n+1} e^{-\lambda s}}{\lambda^{n+1}s^ne^{-\lambda s}/n!} \mathbf{1}_{(0<s_1<\cdots<s_n<s)} \\ &= n! \left (\frac{1}{s}\right )^n \mathbf{1}_{0<s_1<\cdots<s_n<s} \end{aligned}\]

which is precisely the pdf of \((U_{(1)}, \cdots, U_{(n)})\), as promised. One way to interpret this is to say that the knowledge of the position of the \((n+1)\)-th point in the configuration gives no insight into the position of the points before, except their relative order, and we have already seen before that restarting the clock at time \(S_{n+1}\) completely reboots the Poisson process.

Navigate posts with or A   /   or D