This is going to be another several post series as I wade through some of the data we have been collecting for the past year related to metadata editing and various events within a metadata record’s lifecycle.
Background
For the past few years the UNT Libraries has been collecting data about how long our metadata editors are spending editing records in our systems. We’ve written on the overall change of metadata in our digital library and presented those findings as last years Dublin Core Metadata Initiatives conference in Austin Texas with a paper called “How Descriptive Metadata Changes in the UNT Libraries’ Collection: A Case Study“. The goal of collecting data about metadata change is that we will have a better idea of how our metadata editors are interacting with our systems.
What is an edit event?
Our metadata system will create a log entry when a user opens a record to begin editing. This log acts as the start of a timer for the given edit session of that specific record by a given user. When the user publishes that metadata record back into the system the log entry is queried, the amount of time that has passed is recorded along with the metadata editors username, identifier for the record and state (hidden or unhidden) is in when the item is saved. This information is submitted to the Metadata Event Service and logged.
An edit event ends up looking like this once it has been created
id | event_date | duration | username | record_id | record status | record status change | record quality | record quality change |
73515 | 2014-01-04T22:57:00 | 24 | mphillips | ark:/67531/metadc265646 | 1 | 0 | 1 | 0 |
With this information we are able to create a number of views into the metadata editing workflow in our environment, we can easily see the number of metadata edits on a given day, within the month and for the entire period we’ve been collecting data. We can view the total number of edits, the number of unique records edited, and finally the number of hours that our users have spent editing records within a given period.
Below are a few screenshots from our Edit Event Service web-interface.
We are able to query a given day, month, year to view statistics as well as show the rankings and information for a specific user or digital object in the system.
Analyzing a year of data.
We were interested in taking a deeper look at the metadata edit events and that is what the following posts in this series will cover. A year’s worth of metadata edit data was extracted from the event service. This was paired with two other datasets, descriptive metadata about the items editing including contributing institution, collection, resource type and format fields. We also classified each user in the dataset with their status as either an UNT-Employee or Non-UNT-Employee, and finally their rank as either Librarian, Staff, Student, or Unknown rank. These datasets were merged to form a complete record for each metadata event in the Edit Events Dataset. They were added to a Solr index that was used in analyzing this data.
A total of 94,222 edit events occurred from January 1, 2014 to December 31, 2014 and are the base dataset for the analysis presented here.
Month, Day, Hour
During 2014 we averaged 7,852 metadata edits per month
January | 10,133 |
February | 5,082 |
March | 5,960 |
April | 5,543 |
May | 6,622 |
June | 5,136 |
July | 8,099 |
August | 10,508 |
September | 10,989 |
October | 12,840 |
November | 7,712 |
December | 5,598 |
Looking at the day of the week that metadata edits occurred shows the expected pattern of the majority of metadata editing activities taking place during the week with fewer happening on the weekend. The breakdown by day of the week is presented in the table below.
Sunday | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday |
2,765 | 17,506 | 19,580 | 16,876 | 20,838 | 14,416 | 2,241 |
The hour of day that metadata is edited is interesting to take a look at. For the most part you will see the majority of editing being done during the work week, with the afternoons being the time of day that most records are edited. The full data is presented below.
Hour | Edit Events |
0:00 | 237 |
1:00 | 77 |
2:00 | 58 |
3:00 | 41 |
4:00 | 19 |
5:00 | 86 |
6:00 | 290 |
7:00 | 601 |
8:00 | 1,836 |
9:00 | 6,189 |
10:00 | 8,948 |
11:00 | 8,868 |
12:00 | 8,134 |
13:00 | 10,760 |
14:00 | 11,653 |
15:00 | 11,184 |
16:00 | 9,114 |
17:00 | 4,868 |
18:00 | 3,564 |
19:00 | 2,439 |
20:00 | 1,947 |
21:00 | 1,787 |
22:00 | 937 |
23:00 | 585 |
Presented as a graph you can easily see the swell of metadata editing in the afternoons.
If you combine the day of the week and hour of the day data into a single table you will get something like this.
In the image above, green is lower number of edits and red represents higher numbers of edits. It shows that Thursday afternoons tend to be very busy, while Friday is much lighter compared to other days of the week.
That’s it for the first post in this series, I have a plan for information about Who is editing records, What records are they editing, and then finally How Much time are we spending on metadata editing. Check back for future posts.
As always feel free to contact me via Twitter if you have questions or comments.