Economics 520 Lecture Note 9: Introduction to Stochastic Processes This Version: October 5, 2013 These notes are based on S. Ross, Introduction to Probability Models, Academic Press, and J. Hamilton, Time Series Analysis, Princeton University Press. De? nition 1 A stochastic process {X (t ), t ? T } is a collection of random variables: for each t ? T , X (t ) is a random variable. The set T is called an index set, and t ? T is often interpreted as time. So X (t ) would be the (random) state of a process at time t . Sometimes for simplicity we write X t = X (t ). Note that each X t is a random variable, so really we should be writing X t (? . Each X t is a function from the sample space ? to a subset of R. In applications we often are interested in modelling the evolution of some variable over time, so it is reasonable that the range of X t is the same across time. In that case we call the range of X t the state space. If the index set T is a countable set, we call the stochastic process a discrete-time process. If the index set is an interval of the real line, we call the process a continuous-time process. Although t is often used to indicate time, it can be used in other ways as well. For example, when modelling spatial phenomena (e. . geographical concentrations of pollution), we might use a two-dimensional index t corresponding to longitude and latitude. Example 1: Consider ? ipping a coin repeatedly. The sample space ? would contain every possible in? nite sequence of H s and T s. We could de? ne the index set as T = {1, 2, 3, . . . , } and X 1 = 1 if the ? rst toss is heads and 0 otherwise, X 2 = 1 if the second toss is heads and zero otherwise, and so on. This de? nes a stochastic process {X t , t ? T }, where X s is independent of X r for s = r . Next, we could de? ne a new stochastic process {Y t , t ?

T }, where Y t is the total number of heads up to that point in time: Y t = t i =1 X i . Now there is a very distinct dependence between, say, Y s and Y s+1 . We could also consider another stochastic process {Z t , t ? T } where Zt = Yt 1 t = Xi . t t i =1 Since Z t is the average of X t up to that point in time, we might think that Z t would converge to 1/2 as t increases. We will come back to this idea in later lecture notes. Markov Chains Suppose that {X t , t = 1, 2, 3, . . . } is a discrete-time stochastic process with a ? nite or countable 1 state space. That is, X t takes on a ? ite or countable number of possible values, and for simplicity let us say that the range is actually the nonnegative integers 0, 1, 2, 3, . . . We can fully specify the joint probability distribution of these random variables by starting with the marginal probabilities P r (X 1 = i ) = f X 1 (i ), i = 0, 1, 2, . . . and then de? ning various conditional probabilities recursively: P r (X 2 = j |X 1 = i ) P r (X 3 = j |X 2 = i , X 1 = k) and so on. Suppose that the conditional probabilities have a simple form, depending only on the most recent past random variable: P r (X t +1 = j |X t = i , X t ? 1 = k t ? 1 , . . . X 1 = k 1 ) = P r (X t +1 = j |X t = i ) = P i j , ?t = 1, 2, 3, . . . We call such a stochastic process a Markov chain. The numbers P i j represent the transition probability, the probability of going from state i to state j . They must be nonnegative: P i j ? 0 for all i and j , and for each i , we must have ? j =0 P i j = 1. It’s handy to collect these into a matrix: ? ? P 00 P 01 P 02 ··· ? P = ? P 10 ? . . . P 11 . . . P 12 ? ··· ?. ? Example 1 continued For independent coin ? ips, the possible values are 0 and 1, and the transition matrix would be P= .5 . 5 .5 . 5 . Next, consider Y t , the cumulative sum of heads.

For any time t , Y t +1 can either be equal to Y t (with probability 1/2) or Y t +1 (with probability 1/2). Thus P i j = . 5 for j = i , i +1 and 0 otherwise. Example 2: The paper “Intra? rm Mobility and Sex Differences in Pay,” by Michael Ransom and Ronald Oaxaca, studies employment records for a major grocery store. They construct transition probabilities of transitions between different job categories, separately for males and females. For example, male produce clerks have a . 17 probability of being terminated the following year, 2 a . 65 probability of remaining in their position, a . 4 probability of being promoted to produce manager, and various other probabilities of shifting position within the ? rm. A female produce clerk has a . 25 probability of being terminated, a . 375 probability of remaining as a produce clerk, and so on. Next, we can calculate the transition probabilities more than one step ahead: let the m-step transition probabilities be: P im = P r (X t +m = j |X t = i ), j t ? 0, i , j ? 0. Result 1 For a Markov chain, and for any n, m ? 0, the n + m step transition probabilities are related to the n and the m step transition probabilities by: P in+m = j ? k=0 P in P k j . k (1) Proof: The intuition is that the right hand side represents the probability of starting at state i , passing through state k at the nth transition, and then going to j after another m transitions. Formally: P im+n j = P r (X m+n = j |X 0 = i ) ? P r (X m+n = j , X n = k|X 0 = i ) = k=0 ? P r (X m+n = j |X n = k, X 0 = i )P r (X n = k|X 0 = i ) = k=0 ? P r (X m+n = j |X n = k)P r (X n = k|X 0 = i ) = k=0 ? = k=0 (by the Markov property) m P k j P in k The equations (1) are often referred to as the Chapman-Kolmogorov equations. They are particularly easy to write in matrix form.

If P (n) denotes the matrix of n step transition probabilities, then (1) can be written as P (n+m) = P (n) · P (m) . where the multiplication is the usual matrix multiplication. Thus, P (2) = P · P, P (3) = P · P · P, 3 and in general: P (n) = P n . In some cases, as n > ? , the P n converge to a constant matrix. The following theorem is one of the key results in Markov chain theory: Theorem 1 Suppose that a Markov chain with transition matrix P satis? es: 1. Aperiodicity: for all i , P i i > 0. 2. Positive recurrence: starting in any state i , the expected time to return to state i is ? ite. 3. Irreducibility: for all states i and j , there is some n such that P inj > 0. Thus, state j is “accessible” from state i . Then, limn>? P inj exists and is independent of i . Furthermore, letting ? j = lim P inj , n>? the limiting probabilities are the unique nonnegative solution to ? j = ? ?i P i j , j ? 0, i =0 ? ? j = 1. j =0 Notice that if we start with probabilities ? i of being in state i , then at the next step we have the same probabilities of being in any given state. Thus, the vector ? could be thought of as an “invariant” or “steady-state” distribution over the states.

Also: ? j is interpreted as the limiting probability that the process is in state j at time n. It can be shown that ? j also equal the long-run proportion of times that the process is in state j . Poisson Processes Next, we will consider a simple but important example of a continuous-time process that is used to model the total number of events up to different points in time. In a sense, we have already discussed counting processes, especially in connection with the exponential and Poisson distribution, so some of the following will be familiar. De? ition 2 A counting process is a stochastic process {N (t ), t ? 0} with index set [0, ? ) such that: 1. N (t ) ? 0. 4 2. N (t ) is integer-valued. 3. if s < t , then N (s) ? N (t ). The counting process has the following interpretation: for s < t , N (t ) ? N (s) equals the number of events that have occurred in the interval (s, t ). The Poisson process is a special kind of counting process: De? nition 3 A counting process {N (t ), t ? 0} is a Poisson process with rate ? > 0 if 1. N(0)=0. 2. If (s, t ) and (a, b) are disjoint intervals of time, then N (t ) ? N (s) is independent of N (b) ?

N (a). 3. The number of events in any interval of length t is Poisson with mean ? t : for any s, t ? 0, P r (N (s + t ) ? N (s) = n) = e ?? t (? t )n , n! for n = 0, 1, . . . This should look very familiar. In Lecture Note 5, we considered such a process, and showed that the time to the next event is an exponentially distributed random variable with parameter ?. Autoregressive and Moving Average Processes The Poisson process is a continuous-time, discrete state space stochastic process. Next, let us consider discrete-time, continuous state space processes. Here is a particularly simple one: De? ition 4 A discrete-time stochastic process { t , t = 1, 2, . . . } is a white noise process if: 1. for all t , E [ t ] = 0. 2. for all t , V [ t ] = ? 2 . 3. for all t , s such that t = s, C ov( t , s ) = 0. A white noise process has constant mean 0 and constant variance, and zero correlation across time periods. The terminology comes from signal processing and Fourier analysis, where this type of process was used to describe excess noise that attached to a signal at all frequencies. Often we work with a particularly convenient type of white noise process, the Gaussian white noise, where t iid ? N (0, ? ). 5 iid The notation “iid” stands for “independent and identically distributed,” and ? means that the random variables are independent with the same N (0, ? 2 ) distribution. (The term “Gaussian” is another name for the normal distribution. ) The Gaussian white noise process exhibits no dependence across observations. However, we can use this process as a building block to construct many interesting processes with various types of dependence. 1. First-order autoregression: let Y1 have some distribution F 1 , and let Y t = ? + ? Y t ? 1 + t . The term “autoregression” deserves some explanation.

In general, a “regression” is a model describing how the mean of a variable depends on another variable. 1 An “autoregression” means that we are explaining the mean of Y t in terms of its own past value. Since the white noise term has mean zero, E [Y t |Y t ? 1 , Y t ? 2 , . . . ] = ? + ? Y t ? 1 . In the special case that ? = 0, ? = 1, we get Y t = Y t ? 1 + t . This is a type of random walk, where the current value of the process is the best (in terms of squared error) predictor of next period’s value. 2. pth-order autoregression (AR(p)): Y t = ? + ? 1 Y t ? 1 + ? 2 Y t ? 2 + · · · + ? p Y t ? + t . 3. qth-order Moving average (M A(q)): Yt = µ + Thus, the Y t is a weighted average of t + ? 1 t ? 1 + · · · + ? q t ? q . t , t ? 1 , . . . , t ? q . 4. AR M A(p, q) process: Y t = ? + ? 1 Y t ? 1 + ? 2 Y t ? 2 + · · · + ? p Y t ? p + 1 Regression analysis is the main focus of Economics 522A. 6 t + ? 1 t ? 1 + · · · + ? q t ? q . A key issue with these processes is whether they are in some sense “stable. ” One notion of stability is covariance stationarity: De? nition 5 A discrete time stochastic process {Y t , t = 1, 2, . . . } is said to be covariance stationary or weakly stationary if . E [Y t ] = µ for all t . 2. E [(Y t ? µ)(Y t ? j ? µ)] = ? j for all t and all j . This says that the mean is constant, the variance is constant, and the covariances only depend on the distance between the two points in time. It’s easy to see that any white noise process is covariance stationary: the mean is constant at 0, and E [(Y t ? 0)(Y t ? j ? 0)] = ?2 j =0 0 j =0 What about the AR(1) process Y t = ? + ? Y t ? 1 + t ? It turns out that if ? 1 < ? < 1, then the process is covariance stationary. The random walk, where ? = 1, is not stationary. 7