Pundits Vs. Machine: Predicting Controversies In The Presidential Race

From NPR:

The Computer

The computer is run by Quid, a data analytics firm that uses proprietary software to search, visualize and analyze text. Since the computer can’t speak, Dan Buczaczer, Quid’s head of marketing, is going to speak for it and explain how it “thinks.”

“Quid uses proprietary software to search, visualize, and then analyze massive amounts of text,” Buczaczer says. “In this case, what we’re talking about today, that massive amount of text happens to comes from news sources and blogs written about a particular topic, in this case the presidential election.”

And Buczaczer is talking about some 300,000 U.S. blogs and publications amounting to nearly 7.5 million articles — everything that has been written about Hillary Clinton and Donald Trump since they announced they were running for president.

Quid’s computers sifted through all that coverage to find every controversy that has plagued each candidate. “Two-thirds of them were Donald Trump, one-third Hillary Clinton,” Buczaczer says. “So Trump is the winner in terms of overall number of controversies generated.” Though Clinton had fewer controversies, they still generated as much coverage as Trump’s.

The computer sifted through all them for patterns — like, which ones kept re-appearing.

“We kind of mapped it against both reoccurrence and importance — what sort of an impact did it have at its peak?” Buczaczer says. “In a lot of ways this was probably the heaviest part of computation around what we think is going to show up again and again for each candidate.”

Buczaczer and his computer have done some forecasting about which controversies will get the most coverage for each candidate between Sept. 12 and Oct. 12. But I’m not going to tell you what they …

