18 May 2017

Blogjune 2017 join the challenge

It's that time of the year again - time to ramp up your blog mojo.

The challenge: Blog every day in June - or as often as you can manage, or comment on someone else's blog every day
Register by posting a tweet. The tweet must contain:
  1. URL where you will be blogging
  2. hashtag #registerblogjune
Optionally include other words like "Aaagh what am I thinking", "I'm a blog junkie" or anything else that comes to mind.

When you register, your twitter handle, blog URL and the text of your tweet will be added to the google docs spreadsheet below. Your blog will (probably) be included in an OPML file that will be published for those who want to subscribe to all the blogjune participants. Watch for twitter updates about that https://twitter.com/petahopkins

Next steps...

  • On the 1st of June post to your blog. Share each post on Twitter using the hashtag #blogjune
  • Check back here to find new twitter accounts to follow, or... I will periodically update  this twitter list and you can subscribe to that list. I'm not guaranteeing it will be exact or up to date with registrations. I haven't found a good way to automatically add registrants to the list.
  • Read other's posts and comment on them
  • Keep posting

Long time Junebloggers - how about sharing your tips or ideas for topics and themes in the comments of this post to welcome the newbies to the challenge.

PS. I doubt that I will be blogging this June. I have 3 concurrent quilting projects on! But I will enjoy reading some of your posts.

25 March 2017

Finding Open Access moves on

Some time ago in 2013 the OA button arrived and I wrote a bit about it.

Back then it was a bookmarklet that sat with your browser bookmarks. . You could use it to identify when and where you were trying to access a paywalled article.

4 years on it is a browser extension (I'm using Chrome - have not checked Firefox or others) you can use on the web to locate open access versions of articles, or if none found to request an OA version be made available. The requests are forwarded to researchers for legal open access copies to be archived in a repository. There is no guarantee that your request will be satisfied, but it helps to communicate to researchers the demand and importance of OA.

I have used it from our Library discovery system (Primo) and it worked OK, I assume using the DOI in the article record I was viewing. DOI, PMID and some other identifiers may be used.

There is also an Unpaywall extension. It's official launch is April 4 2017 - so it is a less mature product. This one is designed to automatically display an indicator of whether an OA version is available while viewing an article metadata page. This one is not working with my Library discovery system, nor a ResearchGate, nor the Australian Library Journal on Taylor and Francis pages, but it does work with some journal sites.

Ex Libris has planned to incorporate oadoi as an option in Alma's uresolver. This will provide a similar kind of finding option to locate open access versions where a DOI is available in a citation in Primo.  I'm looking forward to adding that one to our interface. It will be interesting to see what if any impact this has on our document delivery service.

For news about these extensions..
Follow @oaDOI_org
Follow Unpaywall

9 September 2016

Research Data Thing 23/23 - Making Connections

This is the last thing! Woot!

I have:
And for now I think that's enough. No doubt opportunities and ideas will arise from this experience.

Thank you ANDS and fellow thingers.

Research Data Thing 22/23 - What's in a name

The penultimate thing!

I've been listening to more podcasts lately, so instead of sharing videos as suggested in the thing, here are some podcasts that might be interesting on big data topics.

  • Data Skeptic - short episodes exploring data concepts and longer interviews with practitioners on data science.

4 September 2016

Research Data Thing 21/23 - Tools of the (dirty data) trade

Thing 21 is about dirty data and some strategies and tools for fixing data issues.

Having been involved in implementing data systems at work which involved data migration and establishing feeds from other systems with transformations eg. building an organisation code structure in a new system based on partial strings from a payroll system; sourcing person records from two separate systems and deduplicating (people who were both staff and students), the pitfalls of dirty data is quite familiar. The problems soon started appearing during testing phase, particularly as we looked at report generation and business processes that relied on choosing a specific record.

One of the difficulties was individuals that had name variations between the two systems but were in fact the same person. Sometimes the only way these were found was through someone knowing that staff member had changed their name, or used a diminutive in their student record. This led to changing some business processes to help identify persons between the two systems.

This thing talks about using Google Spreadsheets and a scraping extension to gather data tabular data from websites. In the past, when websites used
tags in the html it was relatively easy to import tables directly into Excel using the method in this video. I was hoping to try it again, but could not find a suitable table to play with. (They mostly seem to use these for ads!, and alternative methods for tabular data)

The feature to do this is available in Excel 2016 in the data ribbon.

This is my first time at trying Google spreadsheets for scraping data. So here is a table from the Wikipedia page on Australia at the Olympics.

Medals by Summer Games

In the wikipedia page the column "Totals" has bold text. In the data scraped the wiki encoding for bold has been captured as asterisks surrounding each value - a prime candidate for some cleansing.

I was going to have a go with openRefine, but it was downloaded on a different computer and I can't be bothered shifting gears to finish this on the other one.