Opinion, Berkeley Blogs

Public open data: The good, the bad, the future

New technology tools, combined with raised expectations among voters and stakeholders for government transparency, have sparked a movement toward “open government.” Championed by advocacy organizations and a few high-profile elected officials, the trend seeks to promote greater accountability and responsiveness for the systems of representative democracy. An area of particular opportunity — as well as potential concern — is the growing cache of large datasets of public information now available on the Internet.

Government entities from cities to nations are making data not only public but accessible. Earlier, such data was often buried in City Hall filing cabinets, provided only after Freedom of Information Act requests, or published electronically but in cumbersome formats. Machine-readable formats allow new applications, analysis and visualizations to be developed by anyone with basic skills and an Internet connection. Datasets from many corners of government are coming online: public health and demographic information, business licenses and property ownership, campaign contributions and expenditures, crime reports, school test scores, and much more.

The White House under the Obama administration has been a leader in its approach to transparency and launched the website data.gov in 2009. To date, nearly 100,000 datasets are available on the site. Other countries soon followed: the U.K., Kenya, Brazil, India and more than 30 other countries have created portals for public data. The European Union Open Data Portal offers more than 6,000 datasets from its member countries. International organizations from the UN to the World Bank add their own repositories to the surfeit of online information.

The trend is growing also at the state and local level. Chicago apparently boasts the most public datasets (950) among cities. San Francisco has an extensive open data policy and is one of the first cities in the nation to hire a Chief Data Officer.

The benefit of providing public information in accessible formats is undeniable. If entrepreneurs and citizens develop their own tools, it may encourage a sense of community and reduce reliance on outside experts, advantages California Lt. Gov. Gavin Newsom espouses in his book Citizenville. Citizens identify the apps that will be of greatest use to them; their work to create designs and prototypes reduces pressure on strained city budgets and circumvents the cumbersome process of competitive procurement. Some of the most powerful tools combine official public data with social media or other citizen input, such as the recent partnership between Yelp and the public-health departments in New York and San Francisco for restaurant hygiene inspection ratings. In other contexts, such tools can help uncover and ultimately reduce corruption by making it easier to “follow the money.”

Despite the opportunities offered by “free data,” this trend also raises new challenges and concerns, among them, personal privacy and security. While attention has been devoted to the unsettling power of big data analysis and “predictive analytics” for corporate marketing, similar questions could be asked about the value of public data. Does it contribute to community cohesion that I can find out with a single query how much my neighbors paid for their house or (if employed by public agencies) their salaries? Indeed, some studies suggest that greater transparency leads not to greater trust in government but to resignation and apathy.

Exposing certain law enforcement data also increases the possibility of vigilantism. California law requires the registration and publication of the home addresses of known sex offenders, for instance. Or consider the controversy and online threats that erupted when, shortly after the Newtown tragedy, a newspaper in New York posted an interactive map of gun permit owners in nearby counties.

Mind the ‘big data gap’

Ad hoc apps built on public open data represent all the advantages and pitfalls of volunteer labor: free, committed, passionate, but perhaps unsustainable or unrepresentative of a diverse constituency. Whom does open data omit? What kind of apps will be created by programmers and designers with discretionary time and tech tools to build them? Policymakers and officials must still mind the “big data gap.”

So what does the future hold for open data? Publishing data is only one part of the information ecosystem. To be useful, tools must be developed for cleaning, sorting, analyzing and visualizing it as well. The much-heralded new field of “data science” has applications in industry as well as in non-profit or government sectors. (UC Berkeley’s School of Information just launched an online master’s degree program to train professionals in this area.) Fluency in parsing and presenting public data also has clear relevance for journalists, and the field of “data journalism” is a bright spot in the struggling newspaper industry.

For-profit companies and non-profit watchdog organizations will continue to emerge and expand, building on the foundation of this data flood. Public-private partnerships such as those between San Francisco and Appallicious or Granicus, start-ups created by Code for America’s Incubator, and non-partisan organizations like the Sunlight Foundation and MapLight rely on public data repositories for their innovative applications and analysis.

Making public data more accessible is an important goal and offers enormous potential to increase civic engagement. To make the most effective and equitable use of this resource for the public good, cities and other government entities should invest in the personnel and equipment — hardware and software — to make it universally accessible. At the same time, Chief Data Officers (or equivalent roles) should also be alert to the often hidden challenges of equity, inclusion, privacy, and security.

Join the CITRIS Data and Democracy Initiative and UC Berkeley’s Institute for Governmental Studies for a one-day symposium, “Can Open Data Improve Democratic Governance?” The Sept. 12 event is open to the public and will be held on the Berkeley campus. Visit http://tinyurl.com/citris-opendata for agenda and registration.

Cross-posted from PBS MediaShift IdeaLab, Sept. 4, 2013.