Suppose you had a black box, and all you could measure from it are streams of data, which for the sake of simplicity, are streams of bits.

10100010010…

00101100100…

We would like to construct a theory that explains its behaviour. Indeed, this is essentially what science is about – the construction of theories to explain phenomena we observe.

Of course, there’s an infinite number of different constructions that explain output. We can explain the ways the stars move by saying individual angels moved them each day, but it wouldn’t be a good explanation since you’re left with the question of why the angels are moving them that way. Essentially, good explanations should *compress *information.

So, our basic goal is to

*Construct the simplist machine that accurately captures the patterns within the data.*

In this article, I’ll outline one possible approach, known as *epsilon-machines, *developed by C. Shalizi and J. Crutchfield.

**Machines that Generate Strings – A Simple Example:**

Essentially, every machine can be thought of as a physical system that can be in one of *N* possible physical states. The dynamics of the system can then be defined as transitions between different states. Since we want a system that can produce bitstrings, we’re going to add another condition. At every discrete timestep, the system is going to emit a bit, which is determined by what dynamics is happening at that timestep.

For example, one simple system as a two-state system, whose states we label *A* and *B* that oscillates between the *A* and *B*, emitting a 1 or a 0 depending on its current state. Such a system, when you run it, will clearly always give bitstrings of the form

..1010101010101…

What we want to do is the reverse. We have the bitstring above… and we want to reverse engineer the two state machine.

Any physical system can exists in a number of different states. Thus, the first step for us is to introduce bare minimum number of physical states that allow us to explain the output.

We begin with a finite state machine, where all the information about the future dynamics and hence output of the machine is determined by its current state. Any extra information would be redundant. Thus, the idea is to take a string of bits up to a certain point (the history), and ask what information is required to predict the next bit.

Therefore, we group the set of all possible output histories into equivalence classes, such that replacing one history with another history within the same class does not alter does not affect our prediction of its future evolution. We refer to these classes as *Causal States*

In our example, for instance, we observe that all strings alternate between 0 and 1. So, any string the ends with a 0 will have identical future evolution (the same also applies to 1). Thus, we can define a machine with two causal states

The next step is to model of dynamics. That is, at each discrete time step, our system jumps from one causal state, to another, and in process, emits a bit. We model these transitions by transition matrices, where represents the probability of transitioning fromto while emitting the bit *s*. Thus, this simple example, features two matrices, and , with values as listed in the figure above.

**The Formalities:**

With the intuition afforded by out example, we can define an epsilon machine more generally by the ordered pair , where **S** is the collection of causal states, and **T** the collection of matrices the define how the states transform.

In out simple example, the string we analysed was deterministic, such that 0 always succeeded a 1 and vice versa. In general, e-machines describe stochastic processes. For example, the completely random string can be described by a trivial e-machine with a single causal state, with .

**Features of e-Machines:**

Some useful properties of these machines I list without proof:

- The complexity of a e-machine equates the the Entropy of its eternal states, i.e. where p_i is the probability it is in causal state $$S_i$$. This is easy to compute.
- e-machines are the simplest possible representations of a process. Complexity. The entropy of its causal states is minimal when compared to any other state machines that predicts the dynamics of the process.

Note this this more general definition takes into account non-deterministic emissions… the measurement results can contain inherent randomness.

**References:**

My programmer is trying to convince me to move to .net from PHP.

I have always disliked the idea because of the costs.

But he’s tryiong none the less. I’ve been using WordPress on numerous websites for about

a year and am anxious about switching to another platform. I have heard excellent things about blogengine.net.

Is there a way I can import all my wordpress content into it?

Any kind of help would be greatly appreciated!

[…] Comments Occam’s Mathematical Razor | MileGu.com on Epsilon Machines – Occam’s mathematical RazorMile Gu on Of Go (Weiqi), intuition, and Artificial IntelligenceBob Hearn on Of Go (Weiqi), […]

[…] Epsilon Machines: Formalising Occam’s Razor […]