Archive for the “search” Category

Continuing on the theme of trust as a key component of the Semantic Web, and hot on the heels of Hakia’s announcement that they would seek help from librarians to build their search engine with credible websites, comes the announcement a few weeks ago of Reference Extract, this time from inside the library world. This project is being headed by two well-known LIS academics, Mike Eisenberg and David Lankes, and is cooperatively developed by Syracuse University, the University of Washington and OCLC with grant funding from the John D. and Catherine T. MacArthur Foundation. I’m particularly interested in Lankes’ involvement as he has spoken regularly about the concept of participatory librarianship, and is interested in discoverability of library resources.

Instead of actively having to select sites as being credible as in Hakia’s model, Reference Extract will instead do as its title suggests, and mine data about websites that librarians recommend in virtual reference queries. While this is an interesting, low impact approach, I still wonder about how qualified we librarians actually are to assess credibility. Yes we can look for the mechanics in a site to determine if it looks authentic, but a subject expert may be more able to speak to the content, especially as misleading sites are harder and harder to tell apart from authentic ones purely by surface examination.

I also wonder, how many sites do librarians recommend in virtual reference inquiries? In my experience, librarians don’t recommend websites all that often in an academic environment other than online subscribed content, a few government sites, and depending on the client, repositories and preprint archives.

I am interested to see how this project progresses and the kinds of results that emerge. How often are government sites recommended? Professional association sites? UN sites? What else do, or should, we recommend? How often do we point clients to open access journals and repositories?

It’s still early days for the project, as Barbara Quint notes in ”The Wisdom of Crowds of Librarians Is on the Way-In Time: Reference Extract“,

“Over the next 2 months, the team will conduct a variety of meetings and solicit comments with a blog on the website. They will release news and notes, hold webinars, appear at a national conference, and even stream a video blog. All this is aimed at creating a proposal which, according to Lankes, they “hope to implement next year, building it and running it by people and then rolling out real services sometime in 2010.”

There’s more commentary about Reference Abstract available at -

  1. Calling all librarians – Reference Abstract, Allan’s Library
  2. Reference Extract in the Press, Reference Abstract blog
  3. OCLC, Syracuse University and University of Washington to help develop a new Web search experience based on expertise from librarians
  4. The Wisdom of Crowds of Librarians Is on the Way-In Time: Reference Extract, Infotoday Newsbreaks, November 24 2008

Comments 1 Comment »

If there was any doubt that libraries belong in semantic web development, check out these two examples of the semantic web applications that are emerging. The majority of applications available right now (or in preview/beta) focus on information management and search. Sounds like our business to me. With 2008 predicted by some as the year of the semantic web, the time for libraries to take a closer look is now.

Freebase

Pulls in data from datasets such as Wikipedia and creates relationships and meaning between different data points. Their example of usage:

For example, if you ask Freebase for Jennifer Connelly films with actors who have appeared in a Steven Spielberg movie, you”ll get a tidy list of eight movies.

Also includes an API to add Freebase results to your own web projects. Possibly useful at the refdesk.

If you’re a statistics person, this concept will be very familiar to you. Essentially, it’s a giant database of cross-tabulated data. But unlike the kinds of data that you might be used to cross-tabulating, such as government statistics or census data (as I used to do) there is a big difference with Freebase inherent in the datasets they have chosen to populate their database -

Because Freebase lets anyone edit the data, there’s always a chance that somebody has, intentionally or unintentionally, introduced a mistake.

Hakia

Aims to retrieve more relevant, reliable search results. As librarians used to teaching all kinds of complicated methods to retrieve better results not only from search engines but scholarly databases we might wonder whether such a goal is possible! This and other semantic search projects will be interesting to watch over the next year.

Comments No Comments »

ReadWrite Web has a recent article on how semantic search works, Semantic Search: Myth and Reality. The article points out that semantic search will not solve all our search problems, it’s simply impossible. For those working with datasets, there will always be a great deal of cross-tabulation, manipulation (to reformat and present data) and manual work to be done to bring together answers to queries. Yes, semantic search might help save time in the initial stages of a query by giving more meaning to the terms we use to query with, but search is still, at its heart, a human construct and will always be open to interpretation and error. The article makes the important point that -

These are computationally challenging problems that really have nothing to do with understanding semantics. The misconception has been perpetuated since early days of the Semantic Web that somehow, because we will annotate the web, we will be able to solve these super complex problems. This is simply not true. There are fundamental limits to what we can compute, and a class of problems that have an exponential number of possible solutions is not going to be magically solved because we represent data as RDF.

Right now we search based on word occurance (ignoring for the moment Google’s fancy rankings, inbound link rankings, and other fancy search criteria and patterns in high-end databases). In the future we will search using semantics for words+concepts+inferred trust. We will probably never be able to search purely by opinion or emotion. And that’s fine – because we can make that judgement for ourselves. We’re human after all.

Comments 5 Comments »