04 July 2009

Problems with WHO's Swine flu numbers

Anyone who has talked to me recently will know that I'm going on a lot about swine flu right now.  That is because I think its important in a way that most of what hits the headlines is not.  I'll be addressing issues with infection and death figures in this post.  I will probably come back to the topic in subsequent posts.

How many people are/were infected?

The WHO collates figures on numbers infected right? No! The WHO collates the number of tests for swine flu that have come back positive.  Wikipedia also collates these figures (and does so faster than the WHO).  At the time I'm writing this the figure stands at 80,633.

I don't know of any credible estimate of world wide infections at all.  However, the CDC recently estimated that 1,000,000 people have been made ill in the USA.  The difference between these two figures is large but it can be explained as follows:  Once the number of cases rises beyond about 1,000 countries stop testing people they believe have the disease because the testing facilities are insufficient.  The more people get infected the worse a guide these numbers will become (and they're probably poor to start with).  The USA stopped testing most people for swine flu after a few thousand cases.  But they can still estimate how many people have become ill by how many people turn up with flu like symptoms adjusted for how many of them on average test positive (or other similar statistical techniques).

The WHO numbers (and those reported by national governments) have been endlessly reported, picked over and analyzed.  However, these figures are a poor way to track the disease (except for most other ways that are currently available).  I will explain two of the problems with the figures so you get the idea:

1) The reporting procedures of countries may differ.  Some countries will test almost no one (such as many African countries) and some may test everyone they can in the first few weeks and then start monitoring the disease in other ways (like the USA and Australia).

2) The chance of a case being noticed will be higher in a person for whom the disease is worse so more infections may be detected amongst groups who are worse affected.

At the end of the day these figures are really lower bounds and probably only give a very rough guide to the early stages of the disease in a country.  Better estimates of the numbers infected should begin to become available but the best estimates may well emerge many months or years after the first wave of the pandemic is over.

Who is getting infected?

This is relatively easy to answer.  Most people for are infected are young, in the Southern hemisphere or in North America.  The geographic distribution will change and the age distribution could change too.  Unlike the seasonal flu few old people are getting infected (and very few are dying)

Who is dying?

The highest deaths are between the ages 15 and 50.  The USA has (as of now) a mean age of death of 37.  Mexico had a mean age of death in the 20s towards the end of April (I don't have more up to date figures).  I have found good information on who is at greater risk: Pregnant women, asthmatics, diabetics and those with heart disease.  All increase the risk significantly.  However, between 1/3 and 1/2 of all deaths are in fit healthy people.

How many people have died from swine flu?

The WHO is keeping a running tally of essentially all deaths from swine flu, right? No, wrong again!  The WHOs confirmed death figures cannot even be used as an estimate of this number.  This might seem absurd (surely its easy to notice that someone has died?) but I'll explain why this is:

People die all the time.  There are many many deaths every day and a significant number die in a way that looks superficially similar to a flu caused death.  Many people who have seasonal flu die from opportunistic secondary infections or complications that are essentially caused by the flu itself.  By the time these people die the flu symptoms and the virus may have gone.  Because we are in the early stages of this pandemic the deaths are swamped by deaths that look superficially similar.  This makes it hard to detect deaths in people not already known to have swine flu.

As an example of this difficulty consider the deaths from flu in the USA.  The number of death certificates issued each year which list flu as a contributing factor is about 1,800.  However, it is estimated that flu is a significant factor in the deaths of 36,000 people in the USA each year.  That is a difference of about 20 times.  The second figure can be calculated by varying methods (and the numbers change but not enough to invalidate my point) which rely on statistical sampling or modeling.

So the number of deaths from the swine flu could easily be 20 times as great as the WHO confirmed figure.  However, we shouldn't assume that the problems with estimation are the same for seasonal flu as for swine flu.  We may:

*Miss fewer deaths because more young people are dying from swine flu.  Younger people are less likely to randomly die so they are more likely to be detected above the background noise.  The strong media attention has surely increased the chance that people who are in a serious condition will be tested for swine flu.  On the other hand the availability of tests is much more limited for swine flu than for seasonal flu.

*Miss more deaths because at this early stage its easier for deaths to be drowned out by the noise and economic pressures may cause countries to mislead the international community about the deaths they've had.  Sadly this second point is probably significant.

Finally it should be noted that you cannot calculate a fatality rate by dividing the confirmed deaths by the estimated infections (as many news sites have).  That will just give you a hopelessly inaccurate figure.

Its not just me saying this either.  For instance the CDC is on record saying "Only counting deaths where influenza was included on a death certificate would be a gross underestimation of influenza’s true impact.".

01 July 2009

Apology

There will be another post along soon.  Please check back at the weekend!

14 June 2009

Is there a lowest level of reality?

I want to take my idea of a coding invariance principle further (see my earlier post defining the philosophical principle).

What do we mean when we say "The law of gravity is true"? We could mean that the law of gravity matches all the available data.  We could even require that it matches all true statements about the universe (or does so to within some degree of approximation).  But more might be required.  We might ask that there is no other simpler theory that matches the world as well or better.

What if there are two theories explaining different physics (like quantum mechanics and relativity) and a third simpler theory comes along that explains both areas (the very small and the very large) better than either theory.  This third theory might be simpler than the combination of quantum mechanics and relativity.  But in this case would it make sense to say quantum physics is true or even the best theory for the very small anymore?

Clearly its conceivable that a theory explains a narrow domain well but is superseaded by a theory which is little better on this domain but explains a much wider domain and therefore is a better theory overall.  A theory could also be better at explaining that narrow domain than our original theory in which case you would also say that it was a better theory.

A significant question is then "Is there a best theory to explain any domain of our universe?".  It is consistent that the answer is no and that for any theory there is another theory which is either more elegant (explains more with little additional complexity) or explains the same data with a higher degree of accuracy.

This isn't to say that in such a universe science could not progress and make better predictions over time.  However, it does draw into question whether it is meaningful to talk about a theory giving a truthful description of the universe.

There are various bizarre possibilities in such a universe:
For instance the best available theory may keep changing its opinions on matters such as:

1) Whether the universe had a beginning
2) What exactly happened in the past
3) Whether the universe is infinite
4) The number of dimensions the universe has

A universe that has many partial descriptions matching 1,2,3,4 would be a strange beast indeed but it would certainly be an interesting place to reside!

31 May 2009

Three versions of Occam's razor

In today's post I will be discussing differing interpretations of Occam's razor.

Wikipedia defines it as recommending that "the explanation of any phenomenon should make as few assumptions as possible, eliminating those that make no difference in the observable predictions of the explanatory hypothesis or theory"

I shall give three common interpretations of how the razor should be applied.  I have come across each of these versions in use but they are not equivalent to each other.

1) One should always assume the existence of the fewest objects necessary to achieve an explanation.

This version has major problems.  It depends on what you consider your basic level of reality to be and how you measure quantity.  So for example  if we consider matter to be the basic level of reality and quantity to be mass then we should assume only those galaxies exist we have direct measurements from.  However, our best cosmological theories imply the existence of a multitude of galaxies for which we have no other evidence.

But had we applied such a principle with stars or planets in the past then we would have been proved wrong.  Indeed its hard to think of an instance where it could have been correctly implied.  Occam's razor is often justified by a claim that "its worked in the past!".  Although this is circular (similar to the sentence "I think I make accurate claims") its still the case that if it hadn't worked in the past this would be evidence against it (similar to the sentence "I think I make innacurate claims").

I have a further criticism of this version using the coding equivalence principle.  The problem is that universes with differing ontologies (accounts of what exists) are equivalent under this principle.  So this is going to cause inconsistencies.  There are ways to get around this but they're messy and not easy to justify.


2) Of two theories the simplest is more likely to be true.

Note that this version makes a strong truth claim for the simplest theory (it doesn't just say that the simplest theory is closest to the truth).

In particular according to this version one should always assume any theory that matches reality is true if there are no simpler theories that do.  But this is problematic because almost every theory of physics so far  proposed has been proven false including ones which at some point had no evidence against them.  Again as it hasn't worked in the past we shouldn't trust it in the future.

Note also that this version is supportive of the idea of a general unified theory in that it will always assign a non-zero probability to such a theory existing.

3) The simplest theory is the best choice of theory.  If two theories produce identical predictions choose the simpler presentation.

This version I am happier with.  But why is it good?  Is it because it is easier to have a theory that is simple for making calculations?  I would argue no as complex theories are often easier to compute with in practice.

I think that this version of Occam produces theories that are better because they produce better predictions.

One example would be continental drift as opposed to the various ad hoc explanations previously accepted (land bridges and the like).

A second example would be how Newtonian dynamics was better than previous theories of planatery motion.

Note that Einstein's theory of relativity does not harm this version of Occam's razor as it makes no truth claims concerning simpler theories.  Simple theories are often replaced by better theories that are more complicated.  Never the less one cannot necessarily tell which more complicated theory will end up replacing the current theory so this does not invalidate this version of Occam as a heuristic.

This is the only version of the razor of the three which has worked well in the past and I argue that it is the version that we should use in the future.

There are some interesting applications of the first two versions that do not follow from the third:

A) The 2nd version is used to argue that the 2nd law of thermodynamics is probably absolutely true (not just approximately true).

The prediction of the heat death of universe needs the 2nd law of thermodynamics to be absolute.

B) The first version supports the assumption that the universe is finite and the second the assumption that a general unified theory exists.

Neither of these assumptions are supported by the third version.  Indeed it it difficult to see how one could ever have positive evidence of either of these claims. 

The third version of the razor also gives us the advantage of being able to ignore silly questions such as are photons particles or waves.  For the third version What matters is the mathematical form of the theories and whether these forms make accurate predictions or not.




17 May 2009

The data flood

Over the last decade western societies have begun to generate vast quantities of data.  Examples include:

1) Sales statistics
2) Medical records
3) Web traffic data
4) Media usage data (scrobbling, play counts etc.)
5) User generated data (facebook pages, text messages, emails etc.)
6) Online gaming data (second life, online board game archives etc.)

We have also begun to mine this data for useful patterns.

I give two major problems with the way in which we are doing this. 
A) The privacy of this data is often undermined by the careless manner in which it is stored.  Whilst the data has great economic value (mostly to companies and governments but also to the consumer/citizen) it can also be used for new and worrying kinds of abuse.

B) We depend too much on people to analyze this data.  In my opinion the main limiting factor in gaining economic advantage from this data is our ability to process it.

I predict two trends over the next decade with regard to data mining. 
Firstly there will be significant abuses of people's personal data resulting in a backlash against the mass storage of such data.  This will cause more data to be stored in a distributed or properly encrypted form.  Computation on this data may be performed in a distributed manner too with the programs that perform the computations being more likely to be under the control of the users.

Secondly simple artificial intelligence agents (not much more intelligent than insects) will proliferate as the cost of the necessary hardware to run them approaches zero.  These agents will make be called upon to dramatically reduce the burden of decision making for users.

Various developments have been hampered in the past by the unwillingness of people to make large numbers of trivial choices.  Some examples include:

1) Highly efficient payment structures (for text, music and video) which require large numbers of individual payment choices.

2) Consumer boycotts based on the complexities of the supply chain and company financing.

So I think that we'll deal with the flood of information by finding better ways to control the movement of that information and by finding better ways to utilize the value in that information.

03 May 2009

Consequences of the coding equivalence principle

See my earlier post on the coding equivalence principle.

In my earlier article I motivated, defined and gave some reasons for believing in a philosophical coding invariance principle.  Today I will address the profound implications that this principle has, discuss the definition of Occam's razor in light of this principle and finally consider in what ways Turing computation may prove to be inadequate theory of computation with which to formalize this principle.

If a universe is described by a finitely computable general unified theory then it is equivalent to the empty universe (a general unified theory is a physics theory that predicts everything and finitely computable means you could simulate it on a sufficiently powerful computer).

This is profound because it says that our universe might as well not exist if physics could ever fully describe it!  Of course denying the possibility of a general unified theory would not imply physics was useless.  Indeed science could still be the best way to arrive at truth.  All it says is that the work of physics will never end.

Any finite universe is equivalent to the empty universe.  This is an extraordinary claim as current thinking is that our universe is finite in extent.  However strictly speaking the rule applies to the full space-time diagram of the universe which might be infinite in the time direction.  If the universe will only last for a finite time before ceasing to exist (and certain other currently believed technical conditions hold) then the universe might as well not exist.

These two facts might be considered to raise problems with the coding principle.  However, we do not know that our universe is finite and we have no reason to believe that there is a general unified theory.  Indeed I will argue in another blog post that we could never have any evidence for such a proposal.

Fortunately for the coding equivalence principle it does not imply that all infinite universes are equivalent to the empty universe.  You might take this as a good reason to assume that our universe is infinite (at least in potential).  Indeed I hold this view myself.

There are also significant ethical consequences of this coding principle which will be addressed in another blog post.

Let us consider Occam's razor now.  If there are multiple codings for the same universe which do you use when checking how simple a theory is?  I think this question can be satisfactorily resolved but the details are too mathematical for this blog post.

I say that two descriptions should be taken to describe the same universe if they are each computable from the other.  But there is a significant technical problem with this.  Turing computation is simply not defined for the entities with which we are dealing (abstract sets).  I think the answer to this is to generalize Turing computation to dealing with this type of data.  Unfortunately there have been multiple ways of doing this suggested and no canonical form identified.  So for this principle to be properly applicable we really need further research into types of hypercomputation.  This is a major research interest of mine (mainly for other reasons).

26 April 2009

Reduced service announcement!

Dear all readers,
                        Apologies but I will be posting only every fortnight until my thesis has been submitted.

Take care,
Barnaby

19 April 2009

Computational equivalence principle

Today I am going to motivate and define a philosophical principle which I shall call "coding invariance".  I do not claim to have invented this principle as I can't remember exactly where I got the idea from.  I shall also explain its intuitive appeal, give simple examples of its application and explain why certain weakenings of it are not strong enough.

Consider a universe and then consider collecting together all the facts about this universe throughout its history into one large description of that universe.  We mean to include here all physical facts concerning the universe such as which electrons appear where at what times (or what the quantum wave function's state is at what times).  Other information about the universe may be present in patterns of these facts (laws of physics, larger scale facts about the universe).  We shall be considering how these facts about the universe might be represented.

Clearly there is not going to be any one unique way to right down the facts.  For instance we might use a "+" symbol to represent a proton but someone else might use a "p" or even a "-".  As long as the method chosen doesn't give the same name to protons and electrons (or any two particles) then any naming system could be chosen (practically some may be more useful but philosophically no one could be preferred).

So conceivably we could have two differing descriptions of the same universe.  Two different sets of facts may turn out to describe the same universe only using different symbolic conventions.

The coding invariance principle states that two sets of facts should be considered to refer to the same universe if there is a way of recoding one as the other and vice versa.  The intuition here is that the method of recording facts about a universe (including the names used and the data storage structures) should not matter.  Or put another way "A rose by any other name would smell as sweet".

Now let us look at some simple examples of choices of coding.  Here we represent a collection of facts as sequence of 0's and 1's (this can be done for simple collections of facts in simple languages):

00000110001...
11111001110...

These two sequences share the same pattern but the digits used to describe it differ.  In this case one can translate just by replacing 0's with 1's.

0 0 1 1 0 1 1 1 0 1 0 1 ...
000011110011111100110011...

I have spaced out the first of these two sequences so that you can see that the latter is just the former with each digit repeated once.  One is just a recoding of the other.

001100 11  00   11    11     00      ...
0 11000111100000111111111111100000000...

The rule here is harder to discern.  The nth digit is repeated n times in the 2nd coding but only twice in the first coding.

00
10

In the 2nd coding 00 is represented as 10 by the second coding but as 00 in the first coding.  This last example may seem less reasonable than the others, however, I shall argue that it is necessary to accept it as equally valid as the other examples.

The rules that are used in the first two examples seem reasonable because the description of the rules are relatively simple compared to the complexity of the data coded.  We might like to say that some rules are silly because they are too complicated or they seem like overkill for the data they code.  This is a reasonable stance practically (for computer science) but it is not easy to defend philosophically because it really just depends on your perspective.

The complexity of the rule set depends on the language you use to describe them or the concept of algorithm you use conceptualize them.  The problem is there is no way to choose a canonical language to decide which rule sets are too complex.  Therefore all languages must be equally good philosophically speaking and this implies we cannot consistently maintain that there should be a restriction on complexity.

Of course from out perspective codings can seem better or worse but that is really down to us having had an explicit coding choice already made for us (as our life experiences are a particular way of experiencing reality).

So we are left with the coding principle that two collections of facts describe the same universe if they can each be computably recoded as the other.

NB: All examples given in this post were of sequences but this was just for simplicities sake.  The principle could be applied to unordered sets of facts too.

06 April 2009

Pseudo-anonymity

Our personal data is being harvested for many purposes these days.  Stored in vast databases with little or no encryption organizations from our supermarkets to the security services are mining this data for reasons from commercial gain to national security.  How we keep confidential and sensitive information private is much debated.  However, important facts concerning cryptography are being ignored in this discourse.  Today I want to describe what is possible as a result of the revolution in cryptography that occurred at the end of the 20th century.

Our intuitions tell us that ineffective solutions aside (such as identification by indexed numbers) it is impossible to have both the benefits of anonymity and those of transparency.  But this is false.  Cryptography can combine benefits of anonymity and benefits of transparency.  Pseudo-anonymity is possible and comes in many forms.  Without an understanding of these possibilities any discussion concerning privacy will be be missing out on a huge range of potential solutions.

In what follows I will be making a couple of technical assumptions that are not hugely controversial.  Firstly I shall assume that certain widely believed mathematical conjectures are true or at least not usefully false.  I shall also assume that we are not going to be able to build quantum computers any time soon.  Our entire banking transfer system is based on these assumptions so I am in good company in making them!

Various counter intuitive things are possible with modern strong cryptography.  A cryptographic signature consists of a private and a public key.  Anyone in possession of the private key can sign messages (but no-one else can).  Anyone in possession of the public key can check the signature and read the messages.  Creating a cryptographic signature is easy with the right software.  Cryptographic signatures can be used to create a virtual identity which is hard (or if desired impossible) to tie to the person who created it.  However, over time such virtual identities can acquire trust in much the same way that individuals have for millenia.

Networks may be set up which allow anonymous communications to be sent.  This allows not only the content of a message but the existence of a message to be hidden.  Such networks already exist (for example Tor) and are used in high tech music piracy peer to peer networks.

Protocols are possible in which a certain action (such as decrypting a document) are only possible if certain people agree to the operation.  As an example it is possible to so encrypt a document so that any 3 people out of 5 key holders can decrypt it but that no 2 people can decrypt it acting alone.

A canary is a characteristic piece of data which identifies the source of a document.  The ordinance survey include minor errors dotted over their maps.  This allows them to detect any may which has been copied from an ordinance survey map.  Canaries can be added to many types of data to help identify unauthorized copies (although their utility is restricted to situations where few people have access to data).

No one has absolute privacy.  There are many ways in which your privacy may be intentionally violated.  For instance private detectives can be employed to follow you, public records mined for information or bribery used to obtain sensitive information.  We shouldn't try to get any absolute guarantees of privacy because we know it to be impossible.  In practice maintaining privacy is a matter of raising the cost of violating privacy to the extent that it is not worth the effort for the eavesdropper.

What matters is the cost of access to private data, the people who can access it, how easy it is to trace them and how susceptible the data is to abuse.  The problem with storing masses of credit card details in centralized databases is not that the information needs to be private but that the cost of steeling each record is lower by a kind of mass-stealing economy.  If only 2 or 3 people have access to a confidential file and anonymous blackmail threats are made then there is already a ready made shortlist of suspects available.  If furthermore there are tell tail mistakes in the blackmailer's threat (because canaries have been used) then the perpetrator may be identifiable.  Finally ease with which data can be abused matters.  There are sometimes alternative methods to store information and some may be less prone to abuse than others.

So taking all this into account we should worry when:

1) Data is stored in central databases

The more data in one place the cheaper it is to illegally access that data (per record)

2) This Data is in a computer readable format

Working through masses of data by hand takes many more resources than if that process can be automated.  As an example a supermarkets credit card database is computer readable but google street view is not (in any useful way)

3) Data is in abusable format

Records of transactions are necessary but these records can be stored in a manner which doesn't expose people's bank accounts to fraud.

4) The data is sensitive

Data about what music you like is not as open to abuse as data identifying which whores you've been visiting.

5) Many people have access to the data

It stands to reason that the more people have access to records the easier it is to trick/bribe them and the more likely it is that there are bad apples.

6) Its hard to trace the source of a leak

Clearly the easier it is to identify abuse the easier it is to discourage it.

7) The value of the data is high

A database containing movie preferences is much less valuable than one containing details of police cautions.  The second one needs much better protection than the first.

I will now give some examples of what cryptography could be used for.  Firstly it is possible to have electronic voting systems which are private but for which everyone involved in the process can count the votes themselves.  Unfortunately no system that is currently in operation uses the necessary technology.  Hence I am not against electronic voting in principle but I oppose all systems currently used.

Secondly any interaction that can be thought of as a sort of game with hidden information (such as a game of poker or a financial transaction) can be implemented using cryptography is such a way that the information can be hidden (such as the face of the card) the and yet when it is revealed the information is still known to be correct (revealing your hand).

Thirdly identity cards are possible which allow you to prove that you are a member of some group (such as non-terrorists or over 18s) without identifying who you actually are.  It is possible to do this without making the system more prone to abuse by terrorists or underage drinkers! Please note that the UK governments proposals for ID unbelievably do not use such technology.

The benefits of modern cryptography then are (A) that pseudo anonymity is possible are can be used to prove facts such as your age, your criminal status without revealing any other information (B) that signature schemes allow proofs of transactions without increasing the risk of fraud (C) that cryptographic protocols are stricter instruments of public policy than laws in that they can (subject to our assumptions) be mathematical proven to prevent abuses.  One of the many failings of modern liberal democracies is a failure to put our understanding of cryptography to work to provide these benefits and a failure to recognize the need for cryptographic solutions to provide privacy for the public, data for the government and intelligence for the police.

There are problems with cryptography too though.  Cryptographic protocols take time to perform but as computers get faster this objection becomes weaker and weaker.  Cryptographic protocols are brittle and not easy to adapt to new usage patterns.  I think that it is better to live with this than to risk the massive privacy violations that will occur without it.

05 April 2009

Coming soon...

Sorry there was no post last weekend.  This weeks post should be up soon and is likely to be about the false dichotomy between anonymity and transparency.  Otherwise there are further posts on vegetarianism, utilitarianism and corruption as a universal sociological/biological law.