Suppose you had a black box, and all you could measure from it are streams of data, which for the sake of simplicity, are streams of bits.
We would like to construct a theory that explains its behaviour. Indeed, this is essentially what science is about – the construction of theories to explain phenomena we observe.
Of course, there’s an infinite number of different constructions that explain output. We can explain the ways the stars move by saying individual angels moved them each day, but it wouldn’t be a good explanation since you’re left with the question of why the angels are moving them that way. Essentially, good explanations should compress information.
So, our basic goal is to
Construct the simplist machine that accurately captures the patterns within the data.
In this article, I’ll outline one possible approach, known as epsilon-machines, developed by C. Shalizi and J. Crutchfield.
Machines that Generate Strings – A Simple Example:
Essentially, every machine can be thought of as a physical system that can be in one of N possible physical states. The dynamics of the system can then be defined as transitions between different states. Since we want a system that can produce bitstrings, we’re going to add another condition. At every discrete timestep, the system is going to emit a bit, which is determined by what dynamics is happening at that timestep.
For example, one simple system as a two-state system, whose states we label A and B that oscillates between the A and B, emitting a 1 or a 0 depending on its current state. Such a system, when you run it, will clearly always give bitstrings of the form
What we want to do is the reverse. We have the bitstring above… and we want to reverse engineer the two state machine.
Any physical system can exists in a number of different states. Thus, the first step for us is to introduce bare minimum number of physical states that allow us to explain the output.
We begin with a finite state machine, where all the information about the future dynamics and hence output of the machine is determined by its current state. Any extra information would be redundant. Thus, the idea is to take a string of bits up to a certain point (the history), and ask what information is required to predict the next bit.
Therefore, we group the set of all possible output histories into equivalence classes, such that replacing one history with another history within the same class does not alter does not affect our prediction of its future evolution. We refer to these classes as Causal States
In our example, for instance, we observe that all strings alternate between 0 and 1. So, any string the ends with a 0 will have identical future evolution (the same also applies to 1). Thus, we can define a machine with two causal states
The next step is to model of dynamics. That is, at each discrete time step, our system jumps from one causal state, to another, and in process, emits a bit. We model these transitions by transition matrices, where represents the probability of transitioning fromto while emitting the bit s. Thus, this simple example, features two matrices, and , with values as listed in the figure above.
With the intuition afforded by out example, we can define an epsilon machine more generally by the ordered pair , where S is the collection of causal states, and T the collection of matrices the define how the states transform.
In out simple example, the string we analysed was deterministic, such that 0 always succeeded a 1 and vice versa. In general, e-machines describe stochastic processes. For example, the completely random string can be described by a trivial e-machine with a single causal state, with .
Features of e-Machines:
Some useful properties of these machines I list without proof:
- The complexity of a e-machine equates the the Entropy of its eternal states, i.e. where p_i is the probability it is in causal state $$S_i$$. This is easy to compute.
- e-machines are the simplest possible representations of a process. Complexity. The entropy of its causal states is minimal when compared to any other state machines that predicts the dynamics of the process.
Note this this more general definition takes into account non-deterministic emissions… the measurement results can contain inherent randomness.