What is a use?

One of the metrics that we use for the various digital library systems that we run at work is the idea of an item “use”.

This post will hopefully explain a bit more about how a use is calculated and presented.

The different digital library systems that we operate (The Portal to Texas History, the UNT Digital Library, and the Gateway to Oklahoma History) make use of Google Analytics to log and report on access to these systems.  Below is a screenshot of the Google Analytics data for the last month related to The Portal to Texas History.

Google Analytics Screenshot

Google Analytics Screenshot for The Portal to Texas History

From Google Analytics we are able to get a rough idea of the number of users, sessions, and pageviews as well as a whole host of information that is important for running a large website like a digital library.

There are a number of features of Google Analytics that we can take advantage of that allow us to understand how users are interacting with our systems and interfaces.

One of the challenges we have with this kind of analytics is the fact that it collects information when triggered by Javascript on the page.  This can happen when the page is loaded or when something is clicked on the page.  The reason that this is sometimes not enough for our reporting is the fact that much of the content in our various digital libraries is linked to directly by outside resources,  either embedded in discussion forums or by directing users directly to the PDF representation of the item.

A few years ago we decided to start accounting for this kind of usage of our systems in addition to the data that Google Analytics provides.  In order to do this we developed a set of scripts that we run each night that work on the previous days worth of log files on the application servers that serve our digital library content.  These log files are aggregated to a single place,  parsed, and then filtered to leave us with the information we are interested in for the day.  This resulting data are the unique uses that an item has had from a given IP address during a 30 minute window.  This allows us to report on uses of theses and dissertations that may be linked to directly from a Google search result,  or possibly an image that was embedded in another sites blog post that pertains to one of our digital libraries.

Once we have the data for a given object we are able to aggregate that usage information to the collection and partner level for which the item belongs.  This allows us to show information about usage at the collection or partner level.  Finally the item use information is aggregated at the system level so that you can see the information for The Portal to Texas History, UNT Digital Library, or The Gateway to Oklahoma History in one place.

Item page in the UNT Digital Library

Item page in the UNT Digital Library

The above image shows how an end user can see the usage data for an item on the items about page.  This shows up in the “Usage” section which displays total usage, uses in the last 30 days, and then uses yesterday.

Usage Statistics for item in the UNT Digital Library

Usage Statistics for item in the UNT Digital Library

If a user clicks on the stats tab they are taken to the items stats page.  They can see the most recent 30 days or select from a month or year in the table below the graph.

Referral data for item in the UNT Digital Library

Referral data for item in the UNT Digital Library

A user can view the referral traffic for a selected month or year by clicking on the referral tab.

Collection Statistics for the UNT Scholarly Works Repository in the UNT Digital Library

Collection Statistics for the UNT Scholarly Works Repository in the UNT Digital Library

Each item use is also aggregated to the collection and partner level.

System statistics for the UNT Digital Library

System statistics for the UNT Digital Library

And finally a user is able to view statistics for the entire system.  At this time we have usage data for the systems going back to 2009 when we switched over to our current architecture.

I will probably write another post detailing the specifics of what we do and don’t count when we are calculating a “use”.  So more later.