Guest Blog: Three Graduate Projects and a Dissertation

By:  Christopher Harris, School Library System, Genesee Valley Educational Partnership (New York) and Member of the OITP E-books Task Force and the OITP Advisory Committee 

Short back story: I am getting ready to teach LIS 506: Introduction to Information Technology, a course in the State University of New York Buffalo’s LIS program. For that course, my primary text is James Gleick’s The Information: A history, a theory, a flood. Gleick breaks down information into its atomic parts to show how it flows under the same science as fluid dynamics and follows the laws of entropy like the rest of our universe. Couple that with the fact that my wife and I have been watching Numb3rs on NetFlix and you can probably see where this blog post comes from…

I am starting to think that it really isn’t totally our fault that we can’t figure out this whole ebook thing. It certainly isn’t the publishers’ fault either. The problem is that you have a bunch of humanities folks sitting around pondering the biggest math problems to ever hit the book world. The last big math to strike publishing was learning how to count pages to fold them into a folio. Now we are dealing with geek math trying to figure out circulation models for items with no physical presence. That means we can’t even count them on our fingers as they pass beneath our scanners, and that makes this hard.

There are two problems. What we don’t know, and what we think we know. I am not sure which is more dangerous, but I have a feeling it is the latter. For example, we think we know how many books to buy; public librarians look at the ratio of available copies to pending holds and then buy more copies if the ratio is off. Besides being totally reactive, I am not sure how effective that formula is. Are we at the leading edge of the demand curve with a bigger spike to come? Are the pending holds the last ones that will ever be placed and so we could probably wait it out? What we need is a Charlie Eppes algorithm to start figuring this stuff out mathematically instead of humanistically.

So I propose three graduate projects and one dissertation.

1) We assume that libraries are the very embodiment of the long tail, but is this really true? Are we helping people access the long tail or is the long tail just gathering dust on our shelves? What we need to do is compare collection information to circulation data for a large library to evaluate true usage patterns of materials. What percentage of our collections never circulates? How far down the tail can we afford to go? What is the value of the tail? What is the optimal tail length for different types of libraries in different locations with different levels of access to libraries with longer tails?

2) How does patron driven acquisition work? Do books purchased through this model see higher use? Is the public an effective and efficient collection development method? We need to review multiple years of circulation data from a large library to evaluate the potential impact that patron driven acquisition might have had given the usage pattern if that institution had followed an established model of patron driven acquisition how efficient and effective would it have been?

3) At what point does “unlimited simultaneous usage” become a mathematically null statement. In other words, at what value of X does a library or consortia having access to X copies of a book cease to be statistically different than unlimited access? This can be evaluated by looking at access data for other electronic resources. There are actually some rudimentary formulas out there for determining the appropriate number of access seats to buy for online services that look at the number of turn aways (people that tried to log in but were unable to because the maximum access was already reached). But what is the break point for ebooks?

D) Can an algorithm for library collection development and circulation be developed to determine how many copies of a certain book a given library population will need to satisfy demand over a set period of time? What factors will impact this calculation? What economic models can be developed to match this algorithm? What might this algorithm reveal about consortia (is the optimal consortia size a rising curve or a bell curve for example)? Basically, is there a unifying mathematical model for how libraries work? My hunch is that there is, but it will involve some serious work in the fluid dynamics of information flow, the entropic decay of demand for a work over time, the impact of mob psychology on circulation, and many other factors I can’t even imagine.

So really, except for the librarians with advanced degrees in math whom I hold entirely responsible, it probably isn’t our fault that we are having a hard time understanding the transition of libraries from an art to a math. We are not trained to think of information in a mathematical/scientific fashion; as a group we instead tend to focus on the aesthetic and humanistic aspects of our profession.

What will be our fault, however, is if we continue to think in this narrow way despite the changing nature of libraries and information.

The views expressed in this guest blog post do not necessarily reflect that of the ALA.



About Marijke Visser

As associate director of OITP, Marijke leads and coordinates all of ALA’s work on E-rate. In addition to E-rate, Marijke supports the Program on Networks focusing on broadband adoption issues for diverse populations. Marijke also serves as Program Director for OITP’s emerging portfolio on children, youth, and technology.


  1. I don’t know if any algorithm devised would be relevant for long. Look at how much the demographics of potential ebook users have changed in just a year, for example. We’re still at the earlier adapter stage, but not for much longer I believe. After the initial investment in ebooks, the usage statistics are easier, not more difficult as you seem to indicate, to maintain. With physical books, in-house use is only recognized if the patron leaves the book out for staff to shelve. With ebooks, we can track every book they open, and even how long they looked at the book. Much like book leasing, we can license access to multiple copies of the most popular books when they first come out and stop licensing them when no longer used. My library is acquiring our first ebooks, using a couple models. I will have to really look at the stats and see if there are identifiable patterns that might lead to an algorithm – thanks.

  2. Christopher Harris

    While demographics might favor into some calculations, I think there are many basic questions that are independent of ebook adoption demographics. For example, what is the ideal consortia size for ebooks? Should individual libraries even be buying ebooks? Should we even be pursuing a national digital library project?

    It’s great to hear that your library is trying some things out, I just think we need to ask more questions (and get more mathematics based answers) before we get in too deep.

Share your thoughts