Pioneering data science tool — Jupyter — receives top software prize

Perez and Granger discuss Jupyter

Fernando Pérez and Brian Granger discuss the architecture of Project Jupyter. Adriana Restrepo photo.

You may not have heard of Jupyter Notebooks, but they’re taking the data science world by storm and becoming the go-to tool for sharing and improving computer code, documents, data visualizations and more.

Students in UC Berkeley’s fastest-growing course, Foundations of Data Science, use the software tool, as do those in the upper division course Principles and Techniques of Data Science, both offered as part of UC Berkeley’s new data science major. Many other universities, and even high schools, employ Jupyter Notebooks to teach not only data science, but also aerodynamics, computer science, statistics, physics and cognitive science, among other subjects.

Industry, likewise, has snapped it up: Jupyter Notebooks are used in daily computation and data-analysis at companies such as Microsoft, Google and IBM, which have created hosted services based on Jupyter.

More than 2 million Jupyter Notebooks are hosted on the popular GitHub service, covering technical documentation, course materials, books and academic publications.

Jupyter has even contributed to the scientific collaboration that discovered gravitational waves. The LIGO observatory, whose discovery was recognized with the 2017 Nobel Prize in Physics, publishes Jupyter Notebooks that allow anyone to replicate their original analyses of the black holes and neutron stars that collide and generate these ripples in spacetime.

The international team that developed Jupyter, which was co-led by UC Berkeley’s Fernando Pérez, is now being recognized with the 2018 Software System Award from the Association of Computing Machinery, the world’s largest society of educational and academic computer scientists. Project Jupyter, in turn, evolved from IPython, which Pérez created 17 years ago when he was a graduate student in Colorado studying particle physics.

“One afternoon in late 2001, I was a physics graduate student at the University of Colorado working on my dissertation and decided to spend an afternoon writing the original, tiny version of IPython,” said Pérez, now a UC Berkeley assistant professor of statistics and a faculty scientist in the Department of Data Science and Technology at Lawrence Berkeley National Laboratory. “I could not have imagined that this would grow into a worldwide platform almost two decades later. For me, it’s been a wild ride, made possible by going from a personal exploration to an open collaboration with an incredible team.”

IPython – an interactive add-on to the Python programming language – is a free, open-source platform that provides a unified environment for scientific computing. IPython evolved over the years to meet the needs of various communities and in 2014 was rebranded as Jupyter. In 2015, Pérez and Brian Granger of California Polytechnic State University in San Luis Obispo received $6 million from the Leona M. and Harry B. Helmsley Charitable Trust, Alfred P. Sloan Foundation and Gordon and Betty Moore Foundation to expand and improve the capabilities of the Jupyter Notebook.

With current funding from many companies and the U.S. Department of Energy, the Project Jupyter collaboration continues to develop tools for “human computer interplay for scientific exploration and data analysis,” Pérez said. This includes the next-generation user interface for the Jupyter Notebook, JupyterLab.

“This is a project that has demonstrated 20 years of intellectual contributions with major impact in research, education and industry, and it continues to make its advances available to the world as an open platform,” said Kathy Yelick, a professor of electrical engineering and computer sciences at Berkeley and associate Berkeley Lab director for computing sciences. “The ACM System Software Award is an incredible honor, and this team is entirely deserving of this recognition.”

In addition to Pérez, other members of the Jupyter Project collaboration include Granger and Carol Willing of California Polytechnic University in San Luis Obispo, Matthias Bussonnier of UC Berkeley, Paul Ivanov and Jason Grout of Bloomberg, Thomas Kluyver of the European XFEL, Damián Avila of Anaconda, Inc., Steven Silvester of JP Morgan Chase, Jonathan Frederic of Google, Kyle Kelley of Netflix, Jessica Hamrick of DeepMind, Sylvain Corlay of QuantStack and Peter Parente of Valassis Digital.

The award and a prize of $35,000 will be presented to the team at the ACM Awards banquet in San Francisco on June 23.



Read the Berkeley Lab story