Big Data and Crowdsourcing: Keys to a Cancer Cure

WIRED Brand Lab at WIRED

At any given moment, scores of cancer-related studies and research trials are ongoing. But all too often, the treatment models suggested by one group of doctors run counter to what another group found.

For patients, the conflicting feedback can be confusing and frustrating. For doctors and pharmaceutical companies, it’s costly because exploring disparate paths can devour resources.

“The current system is pretty unsustainable in terms of money spent on drug development and how we work as a community,” says Jim Costello, assistant professor in the department of pharmacology at the University of Colorado’s Anschutz Medical Campus. “We’re at the stage where information technology and big data are really coming together and pushing things forward.”

In order to bring researchers together as a community against a common foe (rather than silo them in their own efforts), Costello and others are leveraging the power of big data and online crowdsourcing. These technological tools have the potential to change the way the medical world approaches cancer studies by connecting a united base of scientists around the world to perform unbiased, all-encompassing cancer research.

Big Data Powers a Cure

In November, Costello—in conjunction with Sage Bionetworks and the DREAM organization, a non-profit devoted to leveraging crowdsourcing to find better computational models—published the first results from a crowdsourced, big data-driven project that asked data scientists to develop a more accurate prognosis method for metastatic castration-resistant prostate cancer.

The study worked a bit like a contest, where Costello’s group would provide entrants (data scientists) with information based on a series of data collected from a number of independent trials in hopes that, with the data in one place, these researchers could finally cull some meaningful information.

Getting all the data in one place is certainly easier said than done, though. The dataset for the prostate surgery challenge was formidable—with data from about 2,300 patients.

For another crowdsourced study, DREAM compiled over 640,000 images from 81,000 patients—about 10 TB of data. For some context, that’s the equivalent of streaming HD video for 210 hours straight.

The technology required to house the massive amounts of data for these sorts of studies is substantial, as well. DREAM receives assistance from different cloud storage providers so data scientists can easily access and mine it. And DREAM and Sage Bionetworks also use Sage’s own cloud-based platform as a portal. Through that, teams working on the challenge can build upon one another’s work to evolve more accurate predictive models.

Keeping Sensitive Data Confidential

The search for cancer solutions in big data is hardly limited to prostate cancer. DREAM is currently in the midst of another challenge focused on improving the accuracy of breast cancer detection.

“We’re asking: Can we improve the classification of digital mammograms to predict cancer?” says Justin Guinney, director of computational oncology at Sage Bionetworks. “Are there elements that can be observed by machine learning—that might not be obvious to the radiologists’ eyes—that might predict cancer a year out?”

And while sharing patient information with a large number of data scientists might raise some fears about patient confidentiality, any identifying information is stripped from the studies before they are delivered to DREAM, Guinney says.

The Future of Cancer Studies is in the Data

So far, there are already some tangible benefits to big data cancer studies. Foremost, perhaps, is the elimination of the self-assessment trap some researchers get caught in. In essence, that means a single group comes up with the challenge they want to study, then devises a solution based on data it selects—creating the potential for unintentional bias among researchers.

“Currently there’s an idea that you’re judge, jury and executioner of everything to do with your methodology,” Costello says. “We take an unbiased assessment of these predictors and have them evaluated by an independent party.”

Compiling data from multiple cancer studies for assessment by independent parties is certainly gaining acceptance as a key treatment breakthrough. In fact, it was a tenet of the Cancer Moonshot Blue Ribbon Panel, a task force led by former Vice President Joe Biden to combat cancer.

In a 2016 report, the panel called for the development of a National Cancer Data Ecosystem that would allow researchers to mine cancer-related data with the goal of developing new strategies to prevent, diagnose and treat the disease.

“Our ability to accelerate progress against cancer demands that researchers, clinicians and patients across the country collaborate in sharing their collective data and knowledge about the disease,” the panel wrote in a report.

Guinney says that’s the ultimate goal for DREAM.

“We want to build communities of researchers,” he says. “In a way, we’re thinking about a world where data is not siloed and hoarded. It’s being shared by diverse teams who have an interest in improving human health and fighting disease—and there are a lot of people very motivated to help in that project.”

The future of cancer studies lies in data.

Locked Content

Click on the button below to get access

Unlock Now

Or sign in to access all content on Comcast Business Community