11 June 2014

Applying 10 rules for care & feeding of scientific data

Based on the rules in this slide posted by @flexnib on Twitter


The rules come from:

10 Simple Rules for the Care and Feeding of Scientific Data by

Alyssa Goodman, Alberto Pepe, Alexander W. Blocker, Christine L. Borgman, Kyle Cranmer, Mercè Crosas, Rosanne Di Stefano, Yolanda Gil, Paul Groth, Margaret Hedstrom, David W. Hogg, Vinay Kashyap, Ashish Mahabal, Aneta Siemiginowska, Aleksandra Slavkovic


… here is how we applied them to our data from our research into the use of Instagram by libraries for VALA 2014.

The perfect storm: The convergence of social, mobile and photo technologies in libraries (data set)
Wendy Abbott,  Jessie Donaghey,  Joanna Hare,  Peta J. Hopkins.

Date range : 2013
Paper presented at VALA 2014, February 3-5, Melbourne, Vic.
200102 (Communication Technologies and Digital Media Studies); 080709 (Social and Community Informatics)
Creative Commons Licence

  1. Love your data, and help others love it too. Advice here is to cherish, document and publish your data.
    We put some effort into compiling the data into consumable file types, we published it online and we mentioned it in our presentation (video available) at the conference along with the URL. We also blogged about the paper and data.
  2. Share your data online, with a permanent identifier.
    We posted our data to our institutional repository. We don’t have a DOI, but it is an archive with long-lasting URLs and provides metadata to make datasets findable.
  3. Conduct science with a particular level of reuse in mind.
    We planned for our data to be inspectable, and if a curious mind wanted to do something creative, or extend it then that was a bonus. Our paper and the presentation describes the methods that were used in compiling it albeit at a high level. In addition the survey instruments used were included in the data set.
  4. Publish workflow as context.
    I’m going to have to check how well we recorded this and made it available with the data set. It included some basic modifications to the raw data from the 3rd party monitoring tool we used as the first set of data from them varied slightly in headings to the 2nd set due to Instagram making some changes to their service eg. they implemented video sharing. We also made some minor modifications to make sure that the country information was comprehensive. We output some of this data to csv and uploaded it to create a Google map. While we covered our methods at a high-level in the paper and presentation, I suspect that we could have done better with this rule when it came to publishing the dataset. Ah, well – there’s always room for improvement.
  5. Link your data to your publications as often as possible
    Our slides (prezi) includes the URLs of both the paper and the dataset, and in our presentation we mentioned the paper and the dataset. However, on inspection we neglected to add links between the paper and dataset in the institutional repository. So that’s on my to-do list.
  6. Publish your code (even the small bits).
    We didn’t write any code – we made use of a 3rd party product to gather public data from instagram accounts.
  7. Say how you want to get credit.
    We published our data (and paper) under a creative commons licence. This is encoded in the dataset elements.
  8. Foster and use data repositories
    As librarians in an academic libraries we support and promote the use of our institutional repository e-publications@bond. Our Scholarly Publications & Copyright Team provide research data management support to our University community including upload of metadata to Research Data Australia.
  9. Reward colleagues who share their data properly
    Tell them how you have “loved and fed” your research data and librarians can help to raise its profile through research repositories, inclusion in open access collections and recommendations to those who Ask-A-Librarian for help finding information. We undertake to always credit the sources of data that we use in accordance with best practices.
  10. Be a booster for data science
    Well, I’m writing this post to demonstrate that it is not that hard to apply these rules in cases of simple data. The more complex the data, then the more time is needed in sorting out the data management plan and implementing it. Many academic libraries are ready and available to provide advice in research data management from the planning to the publishing stage.