Thursday, January 22, 2009

Principal Components of Principles

Alix and Jennie have resurrected the peculiar fascinating 2005 political compass, the one that suggets some odd-looking criteria for defining left and right.

I come out pretty close to Alix: left on the main axis of internationalism/rehabilitation, and right on the minor axis of free trade and war. I was against the war, so I must be really keen on free trade.

OK the fascinating thing about this survey is these axes were not chosen by the authors, but arose as a statistical outcome of the actual answers actual voters gave to the questions. Let me try to explain this a little bit.

Lets say you have a 1000 responses to a survey of 20 political questions with simple for/against type answers. What the principal components analysis does is find a formula for a number that would let you predict the answers to one survey, given no other information, with the greatest possible accuracy.

And it turns out that this number reflects opinions on hanging/flogging and nationalism. So this is the principal principle axis. What this means is that if you know somebody is, say -4, on this axis, you can predict their answers to the whole survey with, say, 80% accuracy.

Nobody chose hanging/flogging/internationalism - it is just that this formula and this number gives you the best predictive power. Some other formula might reflect different priorities, but only give 75% accuracy.

It gets a bit more complicated with two axes, but the principle (the principle of the principal principle?) remains the same. With a second formula and a second number, you can raise the predictive power to, say, 90%. A number reflecting views on war/free trade does this better than any other.

What does this mean - well it means that the data is telling us that views that belong together on an axis are highly correlated with each other. Supporters of hanging and flogging tend to be nationalist, and supporters of free markets tend to be pro-war. It also tells us how much the parties overlap, and therefore how difficult it is for any party to position itself clearly without alienating a lot of supporters.

OK that's the good and interesting. What's wrong with this picture? Well for starters, there were only 1 or 2 questions on war and 6 on economics. Having 6 similar questions means that those 6 are 30% of the data and will almost certainly be reflected in a principal component. War may have found itself tacked on as a statistical artefact but I suggest there should be 6 questions on war too before we read too much into this correlation.

Similarly there were 6 questions on inter/nationalism - another 30% of the data - and two on crime. So nationalism is going to come out of the analysis,  picking up one or two more questions in the same way.

So I think a safer interpretation is that there are two components or groupings by which answers are predictable: 1. nationalism and 2. economics, in a survey where a third of the questions are on nationalism, a third are on economics, and the last third are an assortment of other topics.

Oh. That's not exactly surprising is it? If we had a survey with 6 questions about electoral reform, 6 about income tax and 8 assorted others, I wonder what the two axes would be labelled then?