• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Stop wasting time looking for files and revisions. Connect your Gmail, DriveDropbox, and Slack accounts and in less than 2 minutes, Dokkio will automatically organize all your file attachments. Learn more and claim your free account.



Page history last edited by PBworks 12 years, 3 months ago

Eric Larson - BibApp


UW-Madison and UIUC have co-created BibApp, free Open Source "Institutional Bibliography" software.


BibApp imports and analyzes citations, helping librarians identify which articles to archive locally. It also provides a unique social view of research on campus, showing who publishes most often with whom, and which departments are collaborating with one another.


Eric Larson is head of Digital and Computer Services at Wendt Library and leads BibApp's development. He told us what BibApp is all about and how it can help librarians and others.


What it's about


Example of why BibApp is useful:


People at UW are studying Tokamak, a machine that may be able to create virtually unlimited environmentally-friendly energy. Those experts should be able to find each other.


If you do a Google search (tokamak site:wisc.edu), it's not that helpful.


He came up with the idea of BibApp by thinking about how to serve the ECE department on campus. We already have contact information available in UW databases. Those researchers generate citations. We can combine those citations with "SHERPA RoMEO", which tells us what rights we have with regard to the publication. Some of those articles we can put into Minds@UW, UW's Institutional Repository, and make them available to everyone n the world for free.


Why this is complex


  • Citations. There are 50 people in the ECE department, and about 3000 unique citations. We could pull the names out of the citation to figure out who the people are... but it's not that easy. Those names in citations aren't people, they're just strings of text. They're name strings, and one researcher may be known under multiple different formats of their names. The 50 faculty members in ECE leads to over 4000 unique name strings, and we need to map those names back to their publications.
  • Publications. "Applied Physics Letters" may be cited with a variety of abbreviations.
  • Publishers. Same issue.


In other words, we have a lot of bad/messy data that needs to be cleaned up.


Goal of BibApp


Take the example of a paper written by a specific person on campus, "Morgan, D.". There are two faculty members in UW with that name, one in Engineering and one in History. We need to decide which one of them wrote this paper, and mark the record accordingly. Someone checks the publication, notices it's physics paper, and marks it as being for the Engineering person. Over time as you verify which articles go with which professors, the system get smarter and begins to make very good predictions about the right match. Over time BibApp has a better understanding of Dane Morgan's writings (in Eingieering), and of the likelihood that he wrote a particular paper you're trying to prcoess. BibApp looks at the year of publication, the journals he's published before, the authors he's published with before, etc.


About 1200 articles in Minds@UW came from the BibApp process.


Amazon has a service called "Mechanical Turk" that you can use to get lots of people to do small tasks for a few cents, e.g., draw a picture of a sheep for 3 cents. The sort of matching described above for BibApp might be fielded out to this sort of worker.


Other features


You can create a sort of virtual business card that pulls together the person's faculty information with his/her associations with other staff -- people with whom they've published. You can also analyze the citations and figure out which ones can be pulled into Minds@UW, making it available for free to the world.




BibApp is built on Ruby on Rails. It uses a module called Solar for the indexing behind the scenes (which makes faceted search work).




They use Google Code and Subversion for source control. Since it's Open Source, the code is freely available to anyone.


UW involved UIUC by just setting up an example of UIUC's data and showing them how cool it would be. Illinois signed up when they saw it. Eric will be presenting at a conference later in June with a half-day seminar on using BibApp to generate interest.


Later, to build connections between institutions, they want to build a FoF representation ("Fofe" - Friend of Friend), an XML format showing how people are related to one another.

Comments (0)

You don't have permission to comment on this page.