Mind & body, Research

$4.5 million for big-data projects in ecology, astronomy, microscopy

By Robert Sanders

With new computational tricks, Laura Waller hopes to turn simple microscopes into cutting-edge imaging machines.

Joshua Bloom wants to train computers to be surrogate astronomers so they can discover new celestial phenomena within the streaming torrent of data from telescopes.

Laurel Larsen plans to harness environmental data to develop models that could predict potentially disastrous tipping points in streams, wetlands and other ecosystems.

To realize their goals, each of these professors at the University of California, Berkeley, will receive $1.5 million over the next five years from the Gordon and Betty Moore Foundation as part of the foundation’s Data-Driven Discovery Initiative. The initiative, one of the largest privately funded data scientist programs of its kind, is committed to enabling new types of scientific breakthroughs by supporting interdisciplinary, data-driven researchers.

The three faculty members are among 14 new Moore Investigators in Data-Driven Discovery announced today (Thursday, Oct. 2). Together, they will receive $21 million in unrestricted funds to “harness the unprecedented diversity of scientific data now available and answer new kinds of questions,” according to a statement from the foundation.

“Science is generating data at unprecedented volume, variety and velocity, but many areas of science don’t reward the kind of expertise needed to capitalize on this explosion of information,” said Chris Mentzel, program director of the foundation’s broader, $60 million, five-year Data-Driven Discovery Initiative. “We are proud to recognize these outstanding scientists, and we hope these awards will help cultivate a new type of researcher and accelerate the use of interdisciplinary, data-driven science in academia.”

“It speaks to the strength and breadth of UC Berkeley’s data science efforts that three campus researchers from very different disciplines received this prestigious award. We have a very vibrant data science research community that cuts across the entire university,” said UC Berkeley Vice Chancellor for Research Graham Fleming.

The three grants come in the wake of the Moore Foundation’s support last year of the Berkeley Institute for Data Science (BIDS). BIDS is part of a data science collaboration among three institutions – UC Berkeley, the University of Washington and New York University – supported by $37.8 million over five years from the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation.

BIDS announced its first cohort of Data Science Fellows over the summer, and has moved into new quarters in Doe Library that will formally open on Dec. 10.

Laura Waller, an assistant professor of electrical engineering and computer sciences with a background in optics, uses computation to avoid having to deal with big sets of data. Today’s high-end microscopes increasingly capture more data and require more computer analysis to produce an image, but can take an hour or more to scan a large volume. She wants to capture less data in less time using cheaper equipment, but to do so in a smart way that still allows her to produce images that are as good as expensive optical and X-ray microscopes.

By capturing images from a dozen different angles, often using cheap LED illumination, she’s already able to produce high-magnification images from low-resolution microscopes.

“We can use cheap and dirty optics, but achieve the results of expensive, highly corrected microscopes,” said Waller, who runs the Computational Imaging Lab. “By using data mining ideas to capture only the important data in the image, we can do it in real time.”

“The end goal,” she added, “is an adaptive, real-time system that is constantly crunching and updating what measurements to take next, so we get the most efficient data capture possible.”

Joshua Bloom, a professor of astronomy and head of the campus’s Center for Time Domain Informatics, has pioneered the use of machine learning in astronomy. He and his students teach computers to sift and analyze data, ideally in real time, to pick out anomalies that may signal new and weird cosmic phenomena – from explosive events in space to unusual variable stars. Machine learning techniques he has applied to data from the Palomar Transient Factory have produced 65 papers so far.

As a Moore Investigator in Data-Driven Discovery, Bloom will be able to collaborate with statisticians and computer scientists to explore machine learning more thoroughly, and find ways to expand into other fields, such as particle physics, where large amounts of data are typical.

“As we look down the barrel of this amazing data deluge in astronomy, we are reaching a breaking point where we just can’t have people looking at all the data,” Bloom said. “This is true today with data from the Sloan Digital Sky Survey and the Palomar Transient Factory, but in five years’ time there will be new telescopes with data from different wave bands that no one can look at.”

One question is how much actual physics a machine learning algorithm needs to know. Perhaps finding patterns in the data indicating new stellar phenomena or new elementary particles doesn’t require a computer with a thorough knowledge of physics as much as an ability to recognize patterns in multidimensional space.

“In our data-driven world, we have to, in some sense, give up our physical intuition, and that is very scary and not the way modern scientists have been taught to think. But I think it is the direction we have to head,” he said.

Laurel Larsen, an assistant professor of geography whose background includes civil engineering, hydrology and systems science, has been modeling ecosystems such as the Florida Everglades to understand how local changes in the environment impact larger areas, and how brief events can have long-lasting effects.

“I want to find the dominant feedback processes by which ecosystems evolve and landscapes are shaped and function,” said Larsen, who heads the Environmental Systems Dynamics Laboratory. “The Moore Investigator Award is a wonderful opportunity, now that we are collecting massive amounts of environmental data, to more efficiently discover the time and length scales of critical processes that govern a functioning ecosystem.”

One goal is predicting irreversible and perhaps destructive tipping points. In the Everglades, for example, she has looked at what causes transitions from patterned landscapes with interconnecting waterways ideal for fish migration to sawgrass meadows unsuitable for fish or wading birds.

“There is a very strong possibility that we can use this approach to actually detect catastrophic shifts or critical transitions in the ecosystem before they happen,” she said. Larsen is excited by the possibility of applying these models to entirely new questions, such as what triggers the transition from normal brain function to an epileptic seizure.