Last week there was a flurry of comments around a post by Bret Taylor, We need a Wikipedia for data. Taylor describes a model for a wiki that would aggregate common data in one database that could be cross-searched. Great idea.
One interesting thing about the types of datasets he mentions are that they are all copyrighted - stations own TV schedules, exchanges own market data (the free stuff is usually 20 minutes delayed) and a variety of companies own publishing rights over telephone numbers. This is the data that could be really useful if it was truly free, but given the amount of updating required, I wonder who would do so without a business or legislative imperative.
But that issue is perhaps besides the point. There are many, many incredible datasets out there, everything from Census data to older market information to astronomy. Reading the comments and suggestions on Taylor’s post and Read/Write Web’s post about the topic revealed dozens of sites to find these resources.
I did feel that looking through the list libraries may have missed an opportunity. We have been recommending and linking to various datasets on our websites for years, but there is a huge potential to go beyond this and build something collaboratively and use it as an input for different libraries. Many libraries now take in Open Access Journal records to their catalogues and search engines via DOAJ but there is no reason to not do something similar for Open Data.
Certainly, it is an issue that few of these datasets can talk to eachother - but perhaps the move towards a more standards-based Semantic Web will encourage standardisation and interoperability, at least within, for example, individual government departments so that Census records can be analysed against education records.
One of the sites recommended by Read/Write Web is CKAN, which is backed by the Open Knowledge Foundation that counts someone who has worked in the library sector amongst their leadership. Are these the types of groups more of us should be involved in to have a role in information access on a larger scale?
2 Comments »
If there was any doubt that libraries belong in semantic web development, check out these two examples of the semantic web applications that are emerging. The majority of applications available right now (or in preview/beta) focus on information management and search. Sounds like our business to me. With 2008 predicted by some as the year of the semantic web, the time for libraries to take a closer look is now.
Pulls in data from datasets such as Wikipedia and creates relationships and meaning between different data points. Their example of usage:
For example, if you ask Freebase for Jennifer Connelly films with actors who have appeared in a Steven Spielberg movie, you”ll get a tidy list of eight movies.
Also includes an API to add Freebase results to your own web projects. Possibly useful at the refdesk.
If you’re a statistics person, this concept will be very familiar to you. Essentially, it’s a giant database of cross-tabulated data. But unlike the kinds of data that you might be used to cross-tabulating, such as government statistics or census data (as I used to do) there is a big difference with Freebase inherent in the datasets they have chosen to populate their database -
Because Freebase lets anyone edit the data, there’s always a chance that somebody has, intentionally or unintentionally, introduced a mistake.
Aims to retrieve more relevant, reliable search results. As librarians used to teaching all kinds of complicated methods to retrieve better results not only from search engines but scholarly databases we might wonder whether such a goal is possible! This and other semantic search projects will be interesting to watch over the next year.
No Comments »
Two articles out this week describe semantic projects which aim to improve findability and personalisation:
An EU-funded project is described in Next-generation hi-fi: deepening the musical experience. One of the aims of the projects, called Semantic Hi-Fi, is to increase the number of ways in which you can search for music -
As a result of this work, users of future hi-fi can expect to be able to navigate easily through their collections using search criteria, such as tempo, genre, instrumentation, in addition to the traditional search criteria of artist and title. If you have a particular tune running through your head, but no information on it, you can simply hum the tune into the system’s microphone and it will find it for you!
When I worked in a music library, we used to include all of this information in the catalogue, but it was done manually, and only if the information was included on the CD. This project goes much further by analysing the audio file itself to extra the data.
Libraries which circulate music may have an interest in this type of semantic technology - it could assist with research and reference questions about music, especially classical and jazz in academic environments, and in recommending new titles for clients. Further information is available on the project’s website.
The second article is about ubiquitous computing, but has some interesting possibilities for the semantic web and libraries. Democratic Parties: An Interview With UCLA Computer Scientist Kevin Eustice discusses applications which establish social and location and make recommendations. Eustice gives the example of providing local information at museums -
A pretty straightforward application we’ve considered is extending this to support context-aware museum experiences. You can provide media content directly to a device; you can customize it to social groups; you can customize based on the age or language of a person visiting the exhibit.
There are some libraries in Asia that have started to use barcode technology to activate downloads of audio and data to mobile and other handheld devices. Technology like Eustice describes is another way of providing tailored experiences to individuals.
No Comments »