Berkeley Talks transcript: Product engineer Amy Heineike on how humans and machines interact with AI

Introduction: It gives me great pleasure to introduce Amy Heineike. She’s the vice president of product engineering at Primer AI. She says that she’s always been interested in using data and algorithms to tell stories and understand the world that we live in. Amy was a mathematician originally and became interested in data that could be used to tell stories that are not visible to an individual. This database story theme, willed her to Silicon Valley in 2008, she says that she’s had a strange background for someone at a startup and it’s not something that she planned on, but her curiosity driven nature led to this path as the vice president of product engineering. She admits that it was a struggle to name the team because their product is an NLP tool. And she describes it as applied machine learning and applied science wrapped with data science into a product.

Primer AI has been able to wrestle some interesting problems. One area that they’re active in is around news data and news cycles. I think we can admit that we’re overwhelmed by content. It’s a pretty big theme and there are a lot of big problems in the world currently. Primer AI models the contrasting narratives that people are telling around global stories using millions of statistical observations about entities and their relationships. Another area that they’re very active in is around Wikipedia. Human written summaries and maintaining these summaries is extremely time intensive and Primer AI has formulated NLP approaches to creating and maintaining pages.

They also look at how people use Wikipedia and what it means for technology. Another area that Amy is very interested in and works closely on as Primer AI has grown, she’s focused more and more on diversity on her teams. She believes that solving and wrestling with these problems requires a variety of viewpoints and approaches. Her team currently includes a computational chemist, an astrophysicist and a history of computer science graduate.

For her talk, she will focus on an idea that Primer AI had from the very beginning, the challenge of Wikipedia, how we think about humans and machines interacting with AI and understanding the data and then overcoming the bias that you discover. So please join me in welcoming Amy Heineike, VP of Product Engineering from Primer AI.

Amy Heineike: 02:50 Well thank you so much. That was a really lovely introduction and it’s a real pleasure to be here today. I feel like there’ve been some great conversations and it’s a little bit tricky trying to figure out how to follow such esteemed panelists right now. I have some slides so we’ll figure out how to get them in just a second, but I can give you some introduction to start with. So I’ve been with Primer since it was founded four years ago. The team is now 70 people. We’re based in San Francisco. We’ve been growing quickly. We raised our Series B back in the
autumn and we were founded on the observation, so we were founded on an observation that the amount of content that’s available to us to read, the amount of texts that we wrestle with is growing actually exponentially. There’s more and more news articles, reports, filings, um, documents available that tell us stories about what’s happening in the world that we, that we want to understand. Um, but our time and attention is not growing. Anything like that. Right?

So we work with analysts teams who work with people in organizations who needs to be experts on the world who have to develop a view of what’s going on so they can build decisions off of that.

04:14 And their time isn’t growing exponentially either. Right? So, we see this difference between the content we can consume and what’s available. We call that that difference, the intelligence gap. And that gap manifests itself in a lot of different ways. So, we talked to financial analysts who will very sheepishly admit that they don’t actually read all the 10k filings that they should be reading that they skip over analyst reports that, you know, are probably pretty
important for their investment decisions. And as you know, individuals, as concerned citizens, we feel this when we read a lot of news about some topic and we feel like we know a lot about it. And then we talk to our cousins on the other side of the country and realize that actually our information is completely different to each other.

05:02 We have a disjoint view of the world because we’re not actually reading the same content, we’re sampling it in funny ways. So this creates a lot of problems. But I think we’re very excited about the promise of natural language technology to help us grapple with this and help us build tools. So I’m going to talk a bit about a project we’ve been working on over the last year that I think is very interesting to this room. And I’m actually super happy that Victoria is here to kind of correct me if I get anything wrong from the Wikimedia perspective.

So, Wikipedia is an incredibly important resource for us, and I don’t have to make that case. It’s something that we use all the time, but we also use it to train our AI models, right? So as machine learning researchers, it turns out it’s one of the most interesting, rich text data sets that we can leverage and all this kind of linked data, multilingual data. It’s very important to us and  the fact that, um, uh, yeah, so, so the fact that it’s so important to us means that the places where there are flaws or failings in it are actually very important to us. Right. And we should think about those and engage with those questions.

So one way to think about how well things performance, the precision and recall framework. So precision is the idea of the stuff that’s there, how good is it? And on the recall side, what’s missing? Um, so it’s actually very clear that
Wikipedia does a very good job on. So precision recall is already useful framing, right? So, that was, so when we think about Wikipedia’s precision actually, if you think about when you go and read Wikipedia pages, they’re actually incredibly good content.

07:27 They’re incredibly good quality. So the grammatical correctness, the references behind the facts that are there. The fact you can go back and read good sourcing material and especially for pages that are well-trafficked, a lot of people go to, it’s incredibly impressive. And that’s really driven by the editorial processes, the teams of people that are working on this and a lot of hard, dedicated work from wonderful people basically. When we think about recall, that’s actually an interesting framing because, um, there are some challenges here. So, there are two kinds of recall we can think of. So one of them is the challenge of staleness. And another one is of coverage.

08:18 So this is the page for Alexander Kogan who’s the academic behind the Cambridge Analytica scandal. This was front page news for a while and while it was front page news this page was written. It’s a very well-written page, it’s very well, referenced and backed up. But what’s interesting is that as soon as this story drops out the front pages, the page basically went stale, nobody went into update that. And the very last reference, the 26th of April, following on that, there was actually, there are many major updates to the story that impacted him. So Cambridge Analytica filed for bankruptcy and there were follow-on hearings and that kind of stuff. So, you might not realize if you’ve read it, but  there’s more that you could have known.

09:08 The second problem that Victoria mentioned already is the coverage challenge. So coverage is, is actually quite hard to spot. Um, it shows up in the pages that you really expect to be there that that actually aren’t. So on the morning that Donna received the Nobel Prize call to announce that she’d won the Nobel Prize in physics, She didn’t have a Wikipedia page. And then this is the page that was written, after that announcement came out. It’s pretty stunning that this was missing, and it’s really good to hear about the work that was done to try and unpack and think about why in this particular case that could have happened to figure out how to build tools to help it. One thing that’s very clear though is it’s actually an enormous challenge to think about what could belong in Wikipedia, what information could be there and how could you, um, uh, you know, how could you keep up with the scale of that?

10:09 How could you spend the time to keep all of these pages up to date, to create them all, so at first blush at the
numbers of editors of Wikipedia, absolutely immense, right? Tens of millions of people have made edits to Wikipedia and changed it. But when you actually cut down to the number of people, the number of dedicated souls who are regularly logging in and making a lot of changes who are doing all those edits, the numbers get to be a lot smaller. And so three and a half thousand people doing 100 edits a month, that’s an impressive number. It’s a lot bigger than the Encyclopedia Britannica team, I’m sure. Um, but it’s still a bit daunting when you kind of compare that to the number of people who are appearing in the news. The number of new technologies that are getting developed, the number of new companies that are, uh, that are out there.

11:01 And so this is actually a pretty big challenge for us to kind of think about. So what did we do? Well, what we wanted to think about is how we could kind of address, and kind of interact with that, that recall problem. So, we built a system called Quicksilver. So, the idea of Quicksilver is that we could draft a first version of a Wikipedia page. And so we could present this to people who wanted to create pages so that it would be quicker for them to then go and do the final draft, the final edits and publish that. We started with a focus on scientists because I guess we just like scientists a lot. And you can go to this page and have a look. So there’s examples for a hundred different scientists that you log on and read and kind of see how this is performing.

11:57 We also put up a blog post where he described the methodology and a lot more detail and put some of the training data. So I’m going to go through and talk about how we did this and then we’re going to talk a little bit after that, about how we think about it. So we started off as any good machine learning project should do with a big pile of data. So our friends over at the Allen Institute for AI gave us a list of 200,000 scientists who published recently. We’ve got a corpus of a half a billion English language news stories, and then we got access to the whole of Wikipedia and then the sister project WikiData. So our first challenge was to link these together. So, a good example of why this is hard is the Michael Jordan of AI who was actually Berkeley’s very own Michael Jordan.

12:53 If you search for his name, you probably don’t get his research as the first thing back. You have to start adding, in other words, so his university or his research interests, and then you can find the wonderful things he’s written. So we had to train the models to be able to do something similar. But once we had, we could make some pretty interesting, kind of data to look at this so we could build a list of all the scientists and then, how many news articles we could find about them and then what their coverage was in Wikipedia. And then we could look at some of the missing cases. So, who wasn’t making it into Wikipedia, who maybe should have been.

The next thing we could do is go back and start to train some model, so we could look at all the scientists that are on Wikipedia that we could find news that we had the page for, and we could build a model of how the content in news maps to content that’s put into Wikipedia. And then we can apply that model to the people who had the news but didn’t have the Wikipedia page. So here’s a couple of examples of what it looks like. This is the page for Karen Lips. So she’s a researcher on frogs. She’s done some incredible research. So, there’s a few things going on on this page that I can point out. So first of all, we could structure some biographical information about her so we could go and
look for a field of study, institutions, awards she may have had.

14:35 And secondly, we could build a model that looks for rich kind of biographical information that was talked about in the news articles and surface that to make very rich paragraphs of information. And then thirdly, we could bring in the idea of an event stream. So of all the news content was coming out, we can highlight the most recent stories. So they may be the things that I’m most likely to have missed. That needs to be kind of brought in to kind of keep the page fresh. So, here’s another example: Andrej Karpathy is the director of AI for Tesla, and actually, amazingly when we started this project, he also didn’t have a Wikipedia page that’s subsequently been built. Again, so those three components are here. Um, and we also put in full references. If you go to the Quicksilver page, the references are all
formatted nicely, so they’re easy to kind of copy and paste into a Wikipedia editor.

15:33 So here’s what we did. So we posted them to the site, so that you could go and look, look at it. So I want
to go back to this framing of kind of precision and recall and think about what we can kind of learn from here based on what we did. So it’s very clear that on the precision stakes, people are very, very good right? And I think if you read through the computer generated summaries, it’s getting pretty good actually, but it’s still clunky. There’s still
work to be done there. But on the recall side, it’s actually very interesting. So we were able to create 40,000 summaries overnight, very quickly. Those summaries were looking across vast numbers of news articles.

16:25 So far more than for an individual page, far more than a person could easily even read through if they’re trying to draft the article. So, it turns out it’s a lot of work to create a Wikipedia page. It takes a lot of time together or the evidence and draft that content. And so when we built the system, we didn’t want to just immediately publish the pages to Wikipedia because you know, Wikipedia has some very high quality bars, which we respect. But what we did want to do is build for people who were creating content. So here’s an example of one of the people who inspired us and that we’ve kind of shared this work with.

17:13 So Jessica Wade is a researcher, a physics researcher at Imperial College, London. She was shocked when she got into physics and looked around and realized that there weren’t very many women around. And that actually, beyond that, the women who were there, their stories weren’t being well told. And so what did she do about it as well? She went on a mission to go and create many, many pages. So she did 270 pages in a year, which is hugely amazing. Other groups that inspire us. So, 500 Women of Science is an organization that brings together editor funds where teams of people come together and sit down and encourage each other and help each other through the process of writing pages and publishing them. And so these kinds of tools for those kinds of people are very useful because it means that the effort that they put into creating the pages can be much more efficient, that the legwork of gathering the information can be done by the machine, but they can bring in their expertise, their vision, their good grammatical skills to kind of create that final thing and post it.

18:25 So, one of the things that we’ve, we’ve thought a lot about is what the goals of AI systems are. And I think this is something that got highlighted, especially in the last panel by Karen. But, it’s very tempting when you’re dealing with ML to think that the right thing to do is to build a predictive system that’s going to automate everything. You think that’s the goal. And if you don’t quite get there, then like that’s what you’re trying to get. But actually what’s interesting is that you can have a different set of goals, which would be, to look at a human system and figuring out how to support that. So for some tasks, for example, if you’re building an ML system to help you sort fruit while you’re picking, you’re picking it in the field, you want that to be as automated as you possibly can.

19:18 It’s great if it’s automated and it’s fine if it makes some mistakes. But there’s plenty of other systems where actually there are huge human implications of the decisions that are made. And there’s processes already in place where people are making decisions about what to do. So this could be, in the legal space and deciding the sentencing and parole opportunities for people, right? So we work with analysts in different industries, so they’re making very impactful decisions. You know, who do I invest in or how do I understand what’s going on in a region of the world? There are already teams in place, they already have their, their processes for how they make these decisions. And so our goal can, instead of just trying to automate the whole thing, our goal can be how do we make them more efficient?
20:04 How do we give them more access to data so that they have better inputs to that process. And what’s very interesting when, when your goal shifts to being explicitly empowering people to do these decision making processes, actually the set of problems that you end up then needing to unpack shift a little bit, right? So the machine learning algorithm is still really important. The performance of that. So dealing with the architecture of it, the training data for it, that’s important. But there’s a bunch of other questions that open up.

So the framing of, well, you know, what actually is the most useful information that we could put in front of somebody? What kind of evidence would they need to be able to see, to be persuaded that they could make a decision? So, actually user experience becomes very important. The question of what information is presented, how do people interact with it?

20:58 That becomes very important, and there’s actually this whole kind of host of questions that we end up wrestling with. So this is a picture of a few of my wonderful colleagues from Primer, and I think there’s a couple back at the back of the room waving who’ve come. So the biggest privilege of my career has been to get to build a team, so to actually make some choices about, you know, who are we looking for? Who do we want to hire, who do we bring in? And so when we look around at Primer there’s actually some people who’ve come from very, very different backgrounds. So, a couple of these people were working as journalists before, so they were publishing news articles, working on interactive graphics. So  how do people learn quickly from data and interpret that?

21:51 Well, Rain asked a question earlier this morning, but I think she had to leave. So she’s a history undergrad. It’s been thinking lots about language. She can actually write prose really well. So it turns out that when you’re thinking about language generation, it’s nice to have people who care about that in the room. Anna, who’s at the back and Emmanuel, are computational scientists. So they actually came in with a perspective of understanding really messy data on how you apply algorithms towards that. Steph, who’s also at the back, is a wonderful designer who thinks about how pages look, how do you interact with them? So I highlight these people because I think tech comes from people at the end of the day. And, one of the things is we’ve wrestled with these problems is you realize, when you have this huge, hugely diverse set of problems, that you actually have to solve to be able to build systems that are
useful for people, you need different kinds of people in the room, right?

22:51 You want people who’ve come from, who bring very different perspectives. So when you sit down and you’re trying to figure out how to solve it, you can say, “Well, how much is this a data visualization problem. How much of this is a data analysis problem?” That you want to be able to draw on those different skill sets and bring them together. So this also makes me kind of kind of curious and hopeful when I think about the future of AI.

So, one of the things that’s been very clear from today is there’s a huge amount of work for us to do. So there’s a lot of interesting opportunities of kind of fun things we can build, but there’s a lot of hard problems we’ve got to wrestle with as we go on that path, the ethics and accountability kind of questions. But there’s a lot of interesting people who want to get engaged with that, which is kind of cool. And I think one of the things that’s interesting now.

One of the other trends we’re seeing is that as more of these kind of algorithms get well written about and explained and some of the tools get easier to download and to kind of play around with, there is actually an opportunity for us to make it easier for more people to come into the field and for people to come from very different paths and have learned different things on their way before they kind of get in the room. And I think that makes it really interesting to think about the, uh, the kinds of problems we’ll be solving, and the ways that we’ll be solving them.