Have you ever noticed how CNN ‘calls’ for a candidate or party in US elections, and it does come true? Have you ever wondered why (practically) every opinion or exit poll in India generally, and in Tamil Nadu specifically, are so wrong? No one seems to get it right. We saw it in the TN Assembly Elections 2011, where opinion polls conducted within days of each other, gave diametrically opposite projections.
So, what exactly is going on? As I had blogged almost 2 moths ago (has it been so long?), there are four factors affecting the Opinion / Exit polls in India, namely:
This factor is not a problem in say, USA, where you have 72% of people are registered (to vote) as Democrats or Republicans, Even in remaining 28%, people identify as Democrats, Republicans or lean-Dem/GOPs.
In Tamil Nadu, it is frequent for a person to hold a membership card in two or more parties! The most glaring example is the Tenkasi (222) candidate and actor Sarath Kumar who is a leader of his own party, but pulled out an ADMK membership card when challenged during filing of nomination!
But, you may ask why a correct, decent vote share of various parties matter? Because, the initial dataset, or ‘initialization’ to IT people, is the most crucial factor in deciding the accuracy of an opinion poll projection – just like in many real life situations. Let us see 2 examples of how wrong initialization can lead to errors, even to the extent of face palm.
The first face palm is mine, and it occurred just about 2 years ago, during the All India General Elections 2014. While I correctly predicted that the voter would flip the 2009 vote share omelette around – BJP (19%) versus Congress (29%) – the tally of seats was stunning - BJP surged from 116 to 282, and the Congress slumped from 206 to 44. This occurred because my initial data set was wrong. I did not load the correct data, which was the BJP’s votes are concentrated in pockets, whereas Congress voters are spread all over India. Hence for a similar percentage of votes, BJP would gain more seats.
The second example is even more empirical. The 2 major climate weather prediction models in the World are the USA’s NOAA-NCEP Global Forecast System (GFS), and the European Centre for Medium-Range Weather Forecasts (ECMWF). Surprisingly the ECMWF model has been more accurate than the GFS model. After monkeying around, they finally found what was probably wrong with the GFS accuracy – the initialization data. When GFS was run with ECMWF initialization data, the forecasts became more accurate.
So initial data matters, and is in fact crucial for accurate projections, whether in weather or in elections.
Here is my estimate of the Basic Vote Share (also called vote bank) of the various parties in TN Assembly Elections 2016.
So, what exactly is going on? As I had blogged almost 2 moths ago (has it been so long?), there are four factors affecting the Opinion / Exit polls in India, namely:
- Apprehensive Voters – reluctant to ‘truthfully’ disclose their voting intention;
- Lack of decent database – vote share of various parties;
- Inability to gauge Swing - voter’s mood or intention;
- Inability to gauge ‘anti-incumbency’ factor – amount of ‘Swing’
- Apart from above there are other 'minor' considerations:
- adequate sample size, frequent sampling, concise data collection, and 'bias'.
This factor is not a problem in say, USA, where you have 72% of people are registered (to vote) as Democrats or Republicans, Even in remaining 28%, people identify as Democrats, Republicans or lean-Dem/GOPs.
In Tamil Nadu, it is frequent for a person to hold a membership card in two or more parties! The most glaring example is the Tenkasi (222) candidate and actor Sarath Kumar who is a leader of his own party, but pulled out an ADMK membership card when challenged during filing of nomination!
But, you may ask why a correct, decent vote share of various parties matter? Because, the initial dataset, or ‘initialization’ to IT people, is the most crucial factor in deciding the accuracy of an opinion poll projection – just like in many real life situations. Let us see 2 examples of how wrong initialization can lead to errors, even to the extent of face palm.
The first face palm is mine, and it occurred just about 2 years ago, during the All India General Elections 2014. While I correctly predicted that the voter would flip the 2009 vote share omelette around – BJP (19%) versus Congress (29%) – the tally of seats was stunning - BJP surged from 116 to 282, and the Congress slumped from 206 to 44. This occurred because my initial data set was wrong. I did not load the correct data, which was the BJP’s votes are concentrated in pockets, whereas Congress voters are spread all over India. Hence for a similar percentage of votes, BJP would gain more seats.
The second example is even more empirical. The 2 major climate weather prediction models in the World are the USA’s NOAA-NCEP Global Forecast System (GFS), and the European Centre for Medium-Range Weather Forecasts (ECMWF). Surprisingly the ECMWF model has been more accurate than the GFS model. After monkeying around, they finally found what was probably wrong with the GFS accuracy – the initialization data. When GFS was run with ECMWF initialization data, the forecasts became more accurate.
So initial data matters, and is in fact crucial for accurate projections, whether in weather or in elections.
Here is my estimate of the Basic Vote Share (also called vote bank) of the various parties in TN Assembly Elections 2016.
No comments:
Post a Comment