# Information: what exactly is it?

I was walking to the tennis courts in Battersea Park a few years back, when I heard something on my Walkman radio. It stuck with me for years, and until tonight I haven’t followed up on it, read about it or written about it. Though I have told everyone at my work, which has resulted, as usual, in groans about how nerdy I am (and genuine amazement at how I could spend valuable time pondering these things).

What I heard was a very short anecdote about someone who wrote a little regarded paper in the 1940’s (see ref below) in which he made an attempt to define a ‘measure’ for information. Although I never read any more about it (until today), what I heard was enough to set me thinking…

————–

Now, if you know lots about this subject then bear with me. Those readers who don’t know what he came up with: I challenge you to this question:

• what contains more information, a phone-number, a ringtone or a photo?

Are they even comparable?

### Bits & Bytes…

In this computer age, we already have some clues. We know that text doesn’t use up much disk space, and that photos & video can fill up the memory stick much quicker.

But what about ZIP files? These are a hint that file-size is not a very accurate measure of information content.

So what is a megabyte? Is it just so many transistors on a microchip? Happily, its not, its something much more intuitive and satisfying.

### Information: what is it?

If you go to Wikipedia and try to look up Information Theory, within a few seconds you are overrun with jargon and difficult concepts like Entropy; I hope to avoid that.

Let’s rather think about 20 questions. 20 Questions is the game where you have 20 questions to home in on the ‘secret’ word/phrase/person/etc. The key, however, is that the questions need to elicit a yes/no response.

To define information simply: the more questions you need in order to identify a ‘piece of information’, the more information content is embodied in that piece of information (and its context).

This helps us to answer questions like: “How much information is in my telephone number?”

Let’s play 20 questions on this one. How would you design your questions? (Let’s assume we know it has 7 digits)

You could attack it digit by digit: “is the first digit ‘0’? Is the first digit ‘1’? Then changing to the next digit when you get a yes. If the number is 7 digits long, this may take up 70 questions (though in fact if you think a little you will never need more than 9 per digit, and on average you’ll only need about 5 per digit – averaging ~35 in total).

But can you do better? What is the optimum strategy?

Well let’s break down the problem. How many questions do we really need per digit?

We know that there are 10 choices. You could take pot luck, and you could get the right number first time, or you might get it the 9th time (if you get it wrong 9 times, you don’t need a 10th question). However, this strategy will need on average 5 questions.

What about the divide and conquer method? Is it less than 5? If yes, you have halved the options from 10 to 5. Is it less than three? Now you have either 2 or 3 options left. So you will need 3 or 4 questions, depending on your luck, to ID the number.

Aside for nerds: Note now that if your number system only allowed 8 options (the so-called octal system), you would always be able to get to the answer in 3. If you had 16 options (hexadecimal), you would always need 4.

For the decimal system, you could do a few hundred random digits, and find out that you need, on average 3.3219… questions. This is the same as asking “how many times do you need to halve the options until no more than one option remains?’

Aside 2 for nerds : The mathematicians amongst you will have spotted that 23.3219 = 10

Now, we could use 4 questions (I don’t know how to ask 0.32 questions) on each of the 7 digits, and get the phone number, and we will have improved from 35 questions (though variable) to a certain 28 questions.

But we could take the entire number with the divide and conquer method. There are 107  (100 million) options (assuming you can have any number of leading zeroes). How many times would you need to halve that?

1. 50 00o 000
2. 25 000 000
3. ….

22. 2.38…
23. 1.19…
24. 0.59…

So we only needed 24 questions. Note that calculators (and MS Excel) have a shortcut to calculate this sort of thing: log2(107) = ~23.25…

OK, so we have played 20 questions. Why? How is the number of questions significant? Because it is actually the accepted measure of information content! This is the famous ‘bit‘ of information. Your 7 digit number contains about 24 bits of information!

### Epilogue

As you play with concept, you will quickly see that the amount of information in a number (say the number 42), depends hugely on the number of possible numbers the number could have been. If it could have been literally any number (an infinite set) then, technically speaking, it contains infinite information (see, I’ve proven the number 42 is all-knowing!).

But the numbers we use daily all have context, without context they have no practical use. Any system that may, as part of its working, require ‘any’ number from an infinite set would be unworkable, so this doesn’t crop up often.

Computer programmers are constantly under pressure to ‘dimension’ their variables to the smallest size they can get away with. And once a variable is dimensioned, the number of bits available for its storage is set, and it doesn’t matter what number you store in that variable, it will always require all those bits, because it is the number of possibilities that define the information content of a number, not the size of the number itself.

————

I hope that was of interest! Please let me know if I’ve made any errors in my analysis – I do tend to write very late at night 😉

References:

# The scientific method defined (well hypothesised at any rate)

I recently realised that the jury is out on exactly what science and the scientific method are (or should be, at least).

Some would say that science is the endeavour to understand the world, answer the “how” behind the ocean tides, rainbows or seed germination. So the scientific method is any way we might do this. Sounds reasonable to me.

However, some would say that science is the business of ‘facts’ or ‘truth’ and proofs. We do experiments to ‘prove’ our hypothesis. This is the definition I would like to take issue with.

Theories and facts confused…

I get really agitated when I hear people say that evolution is a ‘fact’. Not because I’m a  nutty young earth creationist (I’m not), because no-one has yet furnished a proof. But, you may argue, there’s loads of evidence, its clearly a fact.

But evidence is not the same as proof.

Even if something is 99.999% sure, it is still not sure.

I think the trouble comes because people are never taught that those ‘theorems’ and ‘proofs’ they learned in maths class are not quite the same as the theories and evidence in the scientific method.

So is maths a science? Well, yes, sort of. But while it can deal with real things, like counting sheep, it actually deals with a sort of imaginary world (the so-called Platonic ‘world of ideas’). The whole of maths is a mental construct with no known (‘proven’) basis is reality. But nonsense, you say, of course there are numbers in the real world! Well so there are, but there are no proofs!

Proofs are only possible is a fully ‘understood’ world, and because the world of maths is underpinned by a set of axioms, it is, more or less, ‘understood’. But the real world in which we live is not like that. We don’t understand how the brain works, we don’t know how many dimensions there are, we don’t even know if there is a god.

So does that mean we don’t know anything? The media (and opponents of science) use this uncertainty to undermine science. “You can’t prove there is no God, because there is!” Hey presto, a proof of God.

No, science and the scientific method doesn’t do proofs and facts. So what does it do?

Let’s consider the old chestnut, evolution. People had a book that explained the marvellous spectrum of life, from the caterpillar to the jellyfish. This was good enough for many years. But some clever folks started to question why God would bother to make different tortoises on different islands, and why He would go to all the trouble of putting dinosaur bones in certain rocks and why he would disguise their uranium-lead isotopes to make them look millions of years old.

So a theory was proposed (Darwin’s natural selection) that explained the incredible story of species and, for good measure, predicted that humans are apes, which went down well in the church.

Since then, loads and loads of observations have been made that confirm the theory (with the odd tweak). Its a theory that would have been easy to disprove. If it was wrong, some animals that couldn’t have logically been explained by the theory would have cropped up. But they haven’t.

But all this evidence is not proof. And the lack of a disproof isn’t a proof.

The same is true for all accepted theories. The sun and the moon are thought to cause the tides. If that a fact?

If you ask a scientist, even a good one, he/she may well say yes, its a fact. Because it is so darn likely to be right. Because there is no good alternative theory. Because non-one is disputing it. Because the maths is just so neat. Because the theory can make predictions. All good reasons to accept a theory. But they do not make it fact.

So we do know ‘stuff’, plenty of stuff, facts to all intents and purposes, but not strictly facts in the sense of logical proof.

So what is the scientific method, then?

Science is the system of theories and hypotheses about the nature of reality that have not yet been disproven and which are ranked by the weight of evidence in their favour.

It is like a model of the world that we are ever refining, chucking out wrong theories, refining the ones that work. The scientific method is that refinement process. Well that is my hypothesis. The truth may be altogether different!