CSIR elections prediction system showed accurate results in recent polls

30th August 2016 By: Keith Campbell - Creamer Media Senior Deputy Editor

CSIR elections prediction system showed accurate results in recent polls

The Council for Scientific and Industrial Research (CSIR) has enjoyed considerable success in its operation to predict the outcomes of the recent local government elections, in the crucial (for the media and political analysts being interviewed) period between the closing of the polls and the announcement of the final results. “We were accurate to within 1% of the final result, in almost every municipality,” highlighted CSIR institutional planning head Dr Zaid Kimmie.

“We are trying to bridge the information gap in the early hours after voting has concluded in an election, when you only have early results coming out, which will not reflect the final result. We’re trying to predict that final result,” he elucidated. “We also want to show how useful mathematics and statistics are.”

The CSIR developed special algorithms for this purpose. These were originally created by a small team led by now-retired CSIR nuclear physicist Jan Greben. These have since been incrementally improved.

The algorithms are run on a standard desktop computer, but a laptop could be used instead. Indeed, the CSIR election prediction team jokes that they could be run on a smartphone, if they developed an app for it. “It’s not computationally complex,” explains Kimmie.

The algorithms do not make use of poll data. The process simply extrapolates from districts that have reported their results to districts that have not reported their results. However, the practice is more complicated than the principle.

“We use some quite fancy maths,” he points out. “We split the country into little groups of voting districts and we extrapolate within these groups.” To give a simplified example, districts that have previously voted for the Democratic Alliance would be grouped together and the outcomes for those which reported their results early would be extrapolated to those that had not yet reported. In reality, the districts and clusters are defined in a more sophisticated way than this, including by use of socioeconomic factors.

“This is intended as a serious aid to the media,” he observed. “We try and give good information early on, to help the analysts. And it’s nice to be first with the results!”