Research Data Things 2/23

This week's thing looks at issues associated with research data management.

I read the post Big Data: The 5Vs Everyone Must Know by Bernard Marr. Quotes below are from this post.

My meandering thoughts turned to:

"With big data technology we can now store and use these data sets with the help of distributed systems, where parts of the data is stored in different locations and brought together by software." 
Challenges of bringing distributed data back together to make sense. What if we bring the wrong bits back, making assumptions that this goes with that, but in fact is unrelated. There is a need for good description and notes about where, how, why the data is stored the way it is to mitigate poor contextualisation, and just plain wrong deductions. Big data is often used for making BIG decisions so getting it wrong can have widespread consequences.

"In fact, 80% of the world’s data is now unstructured, and therefore can’t easily be put into tables (think of photos, video sequences or social media updates). " 

Critical need for metadata. Capturing descriptions of unstructured data sets adds meaning to a collection that might comprise multiple formats across multiple sources. For example, the current advocacy for improved funding for the National Library's Trove is made up of all of the following:

How best to pull all of these together? How to identify that they are all associated with each other - Good description that provides background for the advocacy and sources of all these data makes it efficient for future research and analysis of what happened.

"It is all well and good having access to big data but unless we can turn it into value it is useless. So you can safely argue that 'value' is the most important V of Big Data. It is important that businesses make a business case for any attempt to collect and leverage big data." 

Predicting value of data in the future. Aaagh!  Libraries and other GLAM sector institutions have been undertaking the biggest BIG DATA collection for centuries. It is very hard (impossible?) to predict how much value will be derived from a data set in the future (by other potential researchers) - why it is so hard for cultural institutions to argue for current funding for future unknown value???. Once again it comes back to the metadata in my mind. Without good description then the chances of deriving value in the future is very low.