Undergraduates Work to Improve Search Relevance in Visual History Archive
(L-r: Foster, Shearer, Bhogaonker, Maierhofer and Li talk about their project at the USC Shoah Foundation ITS office)
It’s that time of year again: four talented college students are diving into the math and technology behind the Visual History Archive as part of the annual Research in Industrial Projects (RIPS) program at the UCLA Institute for Pure and Applied Mathematics (IPAM).
Each summer, undergraduate students from around the world convene at UCLA for the two-month program. They are placed in small groups, and each group must find a solution to an applied mathematics problem within the sponsor organization they are assigned. Past sponsors have included Intel, HRL and the Los Angeles Police Department. After eight weeks, they present their findings to the organization. This is the sixth year USC Shoah Foundation’s ITS department has participated in RIPS.
Usually, ITS staff assigns the RIPS team a specific problem to work on, such as information retrieval in IWitness (last year) or improving the Visual History Archive’s “quick search” function (2013). However, this year they decided to let the team explore the Visual History Archive and come up with a project on its own. Mills Chang, senior software architect at ITS who advises the RIPS group, said he and his ITS colleagues wanted to give the students more freedom to identify a problem that interested them and think of creative ways to solve it.
“Before, every time we have a specific problem we try to solve, and most of the time they come up with a great answer, but we find that students have more brilliant ideas than we have – so that’s why we tried to open it up for students to do their own thing and see what they can bring to our system,” Chang said.
This year’s team is made up of Adam Foster and Georg Maierhofer, who both attend Cambridge University studying mathematics and are from the UK and Austria, respectively, Megan Shearer, math, from the University of Arizona, and Hangjian Li, applied math and economics, who attends UCLA and is from China. Their academic mentor is Krishna Bhogaonker, a master’s candidate in statistics from UCLA.
For their project, they decided to improve the relevance of search results in the Visual History Archive. While exploring the Visual History Archive, they noticed that sometimes the testimonies that came up first when they entered various search terms were not always the most relevant to what they were looking for. In order to solve this problem, they will experiment with models of information retrieval and Latent Dirichlet allocation, a model in natural language processing where data is explained by unobserved groups.
With time permitting, they also hope to create a graphical interface for the archive that could suggest connections between search queries, video interviews and latent topics within the videos.
Shearer said a Nanjing Massacre testimony and a Holocaust testimony about rape were particularly affecting for her. “Watching actual survivors talking about it is more emotional than reading about it in a textbook,” she said.
As they perused the archive, the team’s task became clear. They noticed difficulties they were having finding certain testimonies and found that there wasn’t any algorithm ranking search results to ensure they were seeing the most relevant testimonies for their searches. It seemed to the team that this was one of the most crucial problems they could address, and that it would also allow them to work on a second task as well.
“Search is something that is sort of obvious because all the users have to get through it,” Foster said. “We also knew that if we worked on this and did some back-end work with topic modeling, there might be other payoffs in terms of maybe being able to produce visualizations of the archive.”
So far, the team has been studying the 50-plus data tables that Chang provided, which contain statistics about user activity in the archive, data from the testimonies, keyword searches, and other useful information. They have also been studying the mathematics behind the work they want to do and figuring out ways to achieve their goals through programming.
Li said one of the most challenging aspects of the project is working with such a large data set in a short amount of time. The team may have to reassess what they can realistically achieve in the next seven weeks as they continue to experiment with the data. And then there’s all the new software they’ve had to learn.
“So far I’ve learned maybe five to six new softwares in two weeks!” Li said.
But having the opportunity to work with the Visual History Archive and potentially create something that can have an impact on its millions of users is extremely rewarding, they said. At their presentation to Institute staff at the end of the summer, they hope to present a prototype that the Institute could really implement.
“It is a great experience to be able to make an impact on the research people do on that part of history which is, I think, very important,” Maierhofer said. “It’s good that we can share our part in keeping it alive.”
Like this article? Get our e-newsletter.
Be the first to learn about new articles and personal stories like the one you've just read.