How This Algorithm Detected The Ebola Outbreak Before Humans Could
When an infectious disease starts spreading, it seldom takes its time. And when that infection is called Ebola, any delay in halting its spread can take a very real toll in human lives. The trouble, of course, is that it takes time for people to even figure out that an outbreak has occurred. Thankfully, machines are getting smarter.
Nine days before the World Health Organization announced the African Ebola outbreak now making headlines, an algorithm had already spotted it. HealthMap, a data-driven mapping tool developed out of Boston Children’s Hospital, detected a “mystery hemorrhagic fever” after mining thousands of web-based data sources for clues.
“We’ve been operating HealthMap for over eight years now,” says cofounder Clark Freifeld. “One of the main things that has allowed it to flourish is the availability of large amounts of public event data being accessible on the Internet.”
Those data sources include news reports, social media, international health organizations, government websites, and even the personal blogs of health care workers operating in affected areas. The team’s custom-built web crawler traverses RSS feeds and APIs, analyzing the text from these content sources for disease-related terminology and clues about geography.
As anyone who’s ever looked at the Internet knows, any bulk consumption of web content is bound to scoop up tons of noise, especially when sources like Twitter and blogs are involved. To cope with this, HealthMap applies a machine learning algorithm to filter out irrelevant information like posts about “Bieber fever” or uses of terms like “infection” and “outbreak” that don’t pertain to actual public health events.
“The algorithm actually looks at hundreds of thousands of example articles that have been labeled by our analysts and uses the examples to pick up on key words and phrases that tend to be associated with actual outbreak reports,” explains Freifeld. “The algorithm is continually improving, learning from our analysts through a feedback loop.”
The latest string of Ebola infections became public knowledge on March 23 when the World Health Organization issued its first report on it. Since then, the outbreak–which appears to have started with a 2-year-old boy in Guinea–has spread to other countries in Africa and killed over 1,000 people.
By that point, HealthMap had already picked up on the spread of the virus, even if it hadn’t been identified as Ebola yet. In this case, the automated detection of the disease didn’t help stem the outbreak, but the promise of such machine intelligence is hard to deny.
In addition to the breadth of the content available online, Freifeld credits the “availability of inexpensive Internet hosting and computation resources” with allowing HealthMap to crunch and store so much data. Clearly, such a thing would not have been possible even five years ago. And with the trends of big data and machine intelligence being as young as they are, one can only imagine where technology like this is headed.
In the short term, the team behind HealthMap is busily working on improving its filtering algorithms and adding new sources of data, one of which is decidedly old-school.
“We allow anyone, anywhere in the world to submit a direct report of an outbreak event and as more people become connected, Freifeld says. “It opens even greater possibilities.”