Archive for the “semantic web” Category

ResearchBlogging.org

Allan Cho and Dean Giustini have recently published an article on Web 3.0 and Health Libraries [PDF]. It’s a good introduction to Web 3.0 and the Semantic Web in their field of librarianship. They discuss some of the definitions of Web 3.0 and their own vision, noting -

“The common theme here is a focus on information organization and retrieval”

Cho and Giustini explain why Web 3.0/Semantic Web will be good for health librarians - because it will improve the accuracy and efficiency of searching. Searching for health information now is difficult - not only because there is a great deal of specificity in searching some of the big databases (MeSH helps some, but not everyone can/will use it) but trying to work out what information is reliable can be very difficult.

The article describes other problems with current search technologies - it is easy to miss important information because of the way searches are constructed and limitations of databases. Findability and trust is important.

Library standards and other projects are also discussed, like RDA and RDF. But one very important point that Cho and Guistini make is that,

“there is a sense that the two groups - library professionals and semantic technologists - do not communicate or see their potential synergies.”

It is important that librarians are participating in discussions about the Semantic Web and RDF, but at the same time, we should be welcoming in those who might be interested in how we do things.

Reference:
Cho, A., Giustini, D. (2008). Web 3.0 and health librarians: an introduction. Journal of the Canadian Health Libraries Association, 29(1), 13-18.

Tags: ,

Comments 4 Comments »

Several blogs have posted about Web 3.0 recently, most trying to come up with a central set of ideas about what it might be. For some, Web 3.0 = Semantic Web, for others, Semantic Web is just a part of it. My take on it that I wrote in October last year, if there is such a thing as Web 3.0, can be found on the About page of this blog:

  • Semantic web: True write once, publish many: hamstrung until now by proprietary software, proliferation of XML schemas, and a lack of end-user tools.
  • Metadata: Meaning and context within and between objects, new languages.
  • Rich open data: Geotagging, eScience data for everyone
  • Content anywhere, especially mobile
  • Make your own software: bringing software and tool creation to the masses
  • Two opposing ends: on demand anywhere (video, TV, radio, text), lightweight flexible architecture

Politics and governance issues will continue to evolve to bring;

  • Ubiquitous Open Access
  • Access to Knowledge (A2K) in the developing world

I think it is important to keep in mind the political and governance issues surrounding the web. The technical part of Web 3.0 is not possible without supportive research, funding, and policies. Additionally, if Web 3.0 impacts people’s lives by making communication and managing information easier, it has to include all kinds of knowledge (social, government, entertainment, scholarly) and be accessible by people all over the world regardless of language, socioeconomic and geographical barriers.

While I enjoyed all the good things the supposed Web 2.0 movement offered - community, interaction, etc - I am hoping that Web 3.0 will pay attention to the difficult issues around data, scholarly communication, and dissemination of research. Linked data is an area of research on this topic, but I hope to see more standards, policies and funding in this area.

A study of all the different ideas people have about Web 3.0 was posted by Jonas Bolinder, and fell into four categories -

  • Semantic Web
  • APIs and Web Services
  • Mobile Web and other devices
  • Implicit Web (personalisation and recommendation)

There’s a little bit of each of these in my view of Web 3.0

A post on Read/Write Web, Web 3.0 Through the Ages, sums up some of the current thinking around the term, valid or not and concludes -

“…the discussions we have about defining the next web help to solidify our vision of where we’re going — and you can’t get there until you decide where you want to go. “

I agree, and I’m interested to see where the discussions lead next.

Other recent posts on Web 3.0:

Comments 2 Comments »

Web sites and applications burst on the scene out of nowhere, attract massive usage and undergo continual improvements to make them better. We wonder how we ever got along without them, until they get bought out, put up access or paywalls, or just disappear.

Libraries have long been concerned with preserving information for the future, and increasingly that includes digital information and websites (for example, Pandora at the National Library of Australia which archives everything from blogs to the 2000 Games site).

So where do they intersect? And how can we take a more proactive approach to design for sustainability rather than saving retrospectively? The Semantic Web is all about linking, openness, and relationships between data. In some ways the Semantic Web is, in my view, how we will move towards a more Sustainable Web.

What might the Sustainable Web be?

Adapting the Triple Bottom Line approach to sustainability, web developers and those who create data could take a lifecycle approach to how they create, manage and produce sites and information. When planning a new website, dataset or service, in addition to deciding on purpose, standards and features, you could also include a statement about how you would -

  • Distribute the data if you were no longer maintaining the site (using a LOCKSS principle, perhaps?)
  • Migrate to future standards
  • Ensure that your site is indexed in the Internet Archive (all pages and data, not just the index)
  • Give people ownership of their data (if you’re running an online service where people store or save information) so they can get it out when they want, or own it if the site closes or the terms of service changes significantly (eg, in the instance of a buyout).

Depending on what type of site it is, there may be governance and political impacts now or in the future. If you’re running a scientific research portal, how might changes in government policy affect the site? What obligations might be imposed on sharing or accessing the data you provide?

Using open standards as the backbone

A starting point is to use open standards. In addition to W3C standards most of us already know (like HTML and CSS), we can extend this to Semantic Web standards like OWL and RDF. Adherance to standards allows information to be interpreted correctly, exchanged, and migrated to newer standards in the future. Standards may also make it easier to hand datsets over to someone else or distribute copies to keep it accessible. It’s a key part of understanding the potential of the Semantic Web according this summary of a talk by Nova Spivack at last week’s The Next Web -

“The semantic web is not so much about “semantics” as it is set of open standards defined at W3C. The semantic web approach builds on open standard meta data which is in line with previous presentations that supported the open data approach. The idea is that everyone profits from everyone’s metadata. The semantic web is a compromise in making the data smarter and the software smarter. It is the best of both worlds.”

Keeping data usable

Over the past two years, libraries, museums, companies and other organisations have set up pages in Facebook, mySpace and other social networking sites. In some libraries, this is the work of an emerging technologies specialist, in others it’s an added role for an individual that may or not be sustained if that person leaves or changes job focus.

Whatever the situation, it’s not the best use of time to have to create a new profile and create networks in every service. This is where a move towards data standards and portability is a plus. Being able to move data between and in/out of these services saves time and sustains online networks and communities. Data Portability is one of the major projects looking at these issues. According to Chris Saad from the project, “The new innovation platform is data” and this is certainly true if looking at things from a Semantic Web point of view.

Libraries and the sustainable web

A recent article in Interactions stresses the importance of designing for sustainability of content on the web - the authors note that libraries and other cultural insitutitions will be at the heart of these efforts,

“Digital technology makes it possible to extend the walls of the archive beyond a single space or person, as well as ensure preservation and acccess in locations around the world [...] Libraries, museums, and archives will need to collaborate with business interests to build lasting social structures that are sustainable over time.” (Churchill E, Ubois J, 2008)

Libraries have played a significant role in participating in a variety of digital and web preservation projects over the years, but what’s the next step? How do we get more involved in conversations that take place in business?

———–
Churchill, E, Ubois J. 2008. Designing for Digital Archives. Interactions. March/April 2008. Retrieved from: http://interactions.acm.org/content/?p=1089 (full text via ACM Portal)

Comments 3 Comments »

Two projects and some food for thought, all new this week and all in our sector.

Open Calais and Museums

The Powerhouse Museum is using Reuters’ Open Calais service launched earlier this year to generate tags for their online collections. The ReadWrite Web article notes that,

That the museum has so much of its collection online is actually quite impressive in its own right. About 70% of the museum’s electronically documented collection is online in the database which went live in June 2006. Museum objects are searchable, taggable (by humans) and painstakingly described.

A bit of extra background about this museum that’s interesting for libraries (It’s located in my home state in Australia). The Powerhouse is government-funded by the state, and it falls under the Arts portfolio, which also includes the State Library. There’s some healthy competition between them, which is a good thing - libraries are looking to the museum sector for ideas and inspiration. We should look at the museum sector more especially as their online education and digital programmes increase. The Powerhouse is also behind a project to create an email archive, complimenting the National Library’s Pandora program which archives Australian websites.

LCSH as Linked Data

Here’s something interesting and fun - take a look at the Library of Congress Subject Headings represented as Linked Data. The site makes use of several Linked Data browsers which provide a different type of interface to browse through the headings. The good thing about this project is that it uses a concept most librarians would be familiar with (subject headings), which may make understanding a new concept (Linked Data) easier.

Semantifying existing content

Richard Akerman at Science Library Pad writes about the Five Laws of Library Science and adds two new laws for the machine.  He discusses how and where people might add new informatino to existing content to make it Semantic, but adds the caveat -

“Now I have no Semantic Web illusions that people are going to nobly go back and markup all their content with semantic information, that vision is a fantasy that lingers with us from the SGML days and it’s never going to happen.”

And didn’t we say the same about metadata, migrating between HTML versions, etc etc?

Akerman goes on to discuss the advantages of microformats and points out that findability is important -

“Even a slight advantage in discovery can be a huge motivator to people. “

I agree. If you can make your content or data or whatever you have on the web more easily located with little effort people will do it. When blogging first arrived, there was no way of notifying people that you’d updated. Now, you wouldn’t think of having a blog without a feed. Although it’s mostly part and parcel for new bloggers these days, a few years back it required some effort to add this functionality (I used to handcode mine before I started using Movable Type). RSS added great benefits to blogs for relatively little effort and that’s how the Semantic Web has to be too.

Tags: , ,

Comments 3 Comments »

If you are finding your way to the blog via the Talking with Talis podcast, hello! I wanted to expand on why I am interested in the Semantic Web as I only briefly touched on this in the podcast.

eResearch and Data

A couple of years ago, I attended a conference where the theme was eResearch. Librarians described how they have responded to the challenges of managing datasets, ever-increasing amounts of raw information and data, as well as grey literature, preprints, and other publications. Several scientists also gave perspectives on how they thought libraries could assist with their research. The scientists discussed the issues with being able to collect so much data, increased complexity in manipulating it, and how so much of their work has shifted online and in some fields, to Open Access. This led me to think about how librarians can work with researchers to assist them better, beyond what we do now. How can we assist with the way data is structured and shared, and perhaps even become part of research teams, assisting with the gigabytes upon gigabytes of data that teams create, use and share.

At the same time, there was a growing focus on research metrics, quality and impact in several countries. We know how limited ISI is, so what else can we build to do this better? How can we trace data through a published presentation back to where it was created? How can we connect ideas, people and projects online to find collaborators and like-minds in a field?

The conference didn’t mention the Semantic Web as a way to assist with these issues, but to me, as I read more about the concepts behind it, it seems a logical fit for Open Data, Open Access and issues of managing gigabytes of data.

Looking to the future

Other ideas, like Next Generation Catalogues are also really interesting. What is important is a focus on the structure and quality of the data we have in the catalog. There is no point to bells and whistles presentation like tag clouds and facets without having good data to work with. There’s a growing number of librarians who are focusing on this and taking a strong interest in RDA and other projects.

Beyond these issues, I’m interested in what’s next. I think libraries are a natural fit for the Semantic Web because of its emphasis on RDF, and data and metadata. In some ways it’s a return to what we do best - organising information, provenance, databases.

I do think that it should not be just from technical staff and cataloguers and research librarians - there really does need to be involvement from all types of librarians to ensure that we are really participating in projects that meet user needs, not just in libraries but on the Internet as a whole.

Comments 1 Comment »