Tietolinja

Tietolinja
News 1/1999


EDITORIAL

ARTICLES


Uniform Resource Names

Juha Hakala






URNs - or Uniform Resource Names - are persistent and unique identifiers of Internet documents. The URN given to a document will never change, if the intellectual content remains the same. A "used" URN will never be given to an another document. URNs, when allocated to documents, can be utilized in Internet information retrieval in many different ways.

URNs have been discussed in the Internet community for a while. The first attempt to develop them came to halt in 1995. The second attempt, which has been more successful, was launched in 1996, when the Internet Engineering Task Force established the URN Working Group. As of this writing (November 1998) the working group is finalising its work. URN syntax was defined in Internet standard RFC2141 in May 1997. Work since then has concentrated on standardisation of URN delivery mechanisms and the Internet infrastructure needed to resolve URN's into document location information (URL). In addition to the standards the group has develop also applications, which prove the feasibility of the framework built. A good introduction to the work of the URN Working group has been written by the co-chair of the group, Leslie Daigle, and other URN pioneers (Daigle).

URN basics

URN is an "umbrella" system. It can incorporate into itself all current and future identifiers. There is already an Internet document, RFC2288, which defines how ISBN, ISSN and SICI can be used as URN's. Naturally it is equally easy to use Digital Object Identifiers or national bibliography numbers as Uniform Resource Names. By the way, all DOIs are valid URNs, but not all URNs can be used as DOIs.

How does URN accommodate all other identifiers? Any URN consists of three parts, string "URN", namespace identifier, and namespace specific string, separated by colons. "URN" string will enable Internet indexing applications to locate URNs easily from any kind of text documents. Namespace identifier (NID) identifies the system used as an URN. For instance, ISBN will most probably receive NID "ISBN", and namespace identifier "NBN" has been reserved for the national bibliography numbers. Namespace specific string contains the actual identifier, such as 9282776433. An example of ISBN-based URN is then URN:ISBN:928276433. URN as such does not identify electronic books, it needs an another identifications systems which provides the actual identifiers.

An another fundamental difference between URN and, say, ISBN, is that URN is a production system. It is not only an identifier but something far more interesting: Internet infrastructure which will enable users to locate documents from the Internet. A global URN resolution service will translate URNs into either URLs, metadata related to the document or to the document itself. Long-term storage of electronic documents and de-duplication of stored documents will be a lot easier once the documents have identifiers. Digital Object Identifier DOI will also provide not only identification of documents, but also resolution services, and most likely many other services such as copyright control as well.

URN namespaces

Delivery of namespace identifiers must be a managed process. In November 1998 the Internet Engineering Task Force agreed that the Internet Assigned Names Authority (IANA) will be responsible of this work. A soon to be released Internet standard, called URN Namespace Definition Mechanisms, specifies the NID registration process in quite a detailed manner. There are three categories of URN namespaces defined in the document: experimental (X-yyy), informal (iana-xxx) and formal. Most of the systems libraries utilise, belong to the last category.

Since the URN system is no ready for registration of traditional identifiers used by libraries, we should proceed with NBN, ISBN, ISSN and SICI. Helsinki University Library has already sent a proposal for formal registration of the NBN (see http://www.lib.helsinki.fi/urn/NBN_registration.htm). In a similar manner, the ISBN and ISSN agencies can easily register the systems they control. It is important to manage the process of NID registration, since duplicate NIDs might ruin the whole URN system. But it may also be difficult for IANA to check if all applicants really deserve their own URN namespaces. The URN Namespace Definition Mechanisms -standard reserves country codes for eventual national registrations of URN namespaces. The document does not give any details on how this activity can be arranged.

Helsinki University Library will propose to the IETF usage of ISBN publisher identifiers for subdivision of the country code based URN namespaces. The required publisher identifiers could be assigned by the national ISBN agencies. Any publisher who uses ISBN would be able to acquire an URN namespace within the country code namespace (for instance: URN:FI-9510: or URN:FI:9510: for the largest publisher in Finland) by contacting the national ISBN Agency. The agency would then pass the information required for resolution of this URN namespace to IANA. This may not always be necessary: publisher might use some or all of these identifiers for internal purposes only, and use other systems such as ISBN or NBN for published documents.

Country code is needed, since a single country may have more than one ISBN country code. If a single publisher has several publisher IDs, it should pick a single one. Small publishers who do not use ISBN or whose volume of publishing is very small could rely solely on NBN-based URNs. In some cases an active Web publisher could obtain an ISBN publisher identifier mainly for being able to provide URNs.

A decision of whether the publisher really needs an URN namespace should be done on a national level, since IANA does not have resources of checking the validity of requests. The national ISBN Agencies can quite easily find out about the publishers' activities, while IANA may not have resources for doing this kind of work in a global scale. An another choice would be use of DOI publisher IDs (but not DOIs) for subdivision of country code based URN namespaces. A problem with this approach is that not all publishers who make available a significant number of documents in the Web will ever acquire a DOI publisher ID.

Organisationally this solution would be based on a similar infrastructure, since it seems likely that the national ISBN Agencies will also co-ordinate DOI publisher ID delivery.

As of this writing it is difficult to anticipate the full scale of the URN utilisation in the future. But it is obvious even now that it will extend beyond the publishing world to e.g. social security numbers, product IDs and so on. National libraries may not have interests beyond electronic publications, but in this area it is vitally important for us that things are properly identified. In order to attain best results we should have a central role in how URNs are delivered and used. I believe that this is not possible unless we can also to some extent control URN namespaces. Due to the national libraries' unique knowledge on what is published in our countries we are in ideal position to manage this work properly.

URN resolution service

In the future Internet users will be able to write URNs into the Location: window of their WWW browsers, and then get the actual documents, no matter where they are in the net. This is not utopia, although it may well be impossible to build a resolution system, which covers every Internet document, which has an URN.

Internet standard RFC 2276, Architectural Principles of Uniform Resource Name Resolution, specifies on a general level how the global URN resolution service will work. It will be a two-step process: on a top level there will be a Resolver Discovery Service (RDS), a service to help in the learning about URN (Uniform Resource Name) resolvers. "Resolver" indicates a service that translates URNs to URLs (Uniform Resource Locators) or URCs (Uniform Resource Characteristics). Some resolvers may provide direct access to the resources as well.

In order to resolve an URN, a browser will first contact the Resolver Discovery Service. In the RDS system, a search is made with the namespace identifier part of the URN. As a reply, the browser gets information on where to find a resolution service for this namespace. The resolution service will then process the namespace specific string part of the URN, and translate it into locations of that document.

As of this writing a demonstration of this kind of service is already available, but requires installation of a Netscape plug-in. The IETF URN Working Group sees this as an intermediary step only; in the future this kind of service will be more closely integrated into existing browser technologies.

There are two RFC standards (RFC 2168 & 2169), which specify how to use the HTTP protocol and Domain Name Service for establishing an URN resolution services. In the future, these protocols can easily be replaced by something else, since URNs are not protocol dependent. In this respect URN system is very different from Persistent URL PURL, which relies heavily on the HTTP protocol. From the national libraries point of view, it is important that PURL is only technology, not a standard. The PURL developer, OCLC, does not have any plans to standardise the PURL system. It was never intended to be more than an intermediary solution.

In Nordic countries URN resolution will be initially based on the Nordic Web Index application. In NWI, one can search with an URN and get URLs as a reply. NWI and PURL-based resolution services share the same handicap: a user needs to know where to go to translate an identifier into URLs. But there is also a fundamental difference: PURL resolution database has to be maintained manually, but NWI database is updated automatically. Anders Ardö from Danish Technical Information Center has already made the modifications NWI needed in order to become a "poor man's" URN resolution service, but these features have as of this writing not yet been implemented in the Nordic NWI systems.

Even an OPAC may work as an URN resolution service. A bibliographic record may contain both an URN and URL(s), in which case you can try your luck with URL(s) without checking first whether an URN resolution service is available for this namespace and "knows" about this particular document.

Actually, there are two practical reasons why it is important for libraries to start using URNs without delay in cataloguing. First, if library puts e.g. ISBN and ISBN-based URN into a MARC record, it may be possible to locate the document via an URN resolution service even when the URL in the record is not valid any more (and this may happen in a few days time). In the long run we may (and should) replace URLs with URNs entirely, since average lifetime of URLs is really too short for our timescale. The other reason is more prosaic: MARC format has a place for URN, and therefore any identifier, including for instance SICI and BICI which do not have a tag in all MARC formats for the time being, can be accommodated immediately.

State of the art of URN

CENL (Conference of European National Librarians) made a decision in October 1998 that the CENL libraries should implement URNs. The decision was based on a discussion paper written by Esko Häkli and Juha Hakala (http://www.lib.helsinki.fi/urn/urnimp.html). Some libraries, Helsinki University Library among them, have not waited for a joint recommendation. We made the internal decision to use URNs immediately after the URN syntax was published in May 1997 as the RFC 2141. After a 12-month preparation period, URN delivery started in May 1998 in Finland and Sweden. The service is based on a simple software application called URN generator, and a user guide which tells how to use URNs and how to embed them into HTML metadata (see http://www.lub.lu.se/metadata/URN-help.html). As of this writing hundreds of URNs have been delivered, although the service has not been advertised outside library community due to lack of proper URN resolution services.

Other Nordic countries are also among early starters: Norway will publish its own URN generator in near future, and Denmark has also decided to launch URN services. Same basic technology is used in all Nordic countries. The URN generator specification, written in Helsinki University Library, was done in such a way that there is nothing specific to Finland and Sweden in them. The same specifications can in principle be used in any country in the world with equal ease. The URN generator application, programmed by Mattias Borell from Lund University Library NetLab unit, is written in Perl and can therefore be ported to a number of different platforms. The software is, like other products developed in the Nordic metadata project, available for free.

From standardisation point of view, URN has been a safe choice for a long time. A major step forward during autumn has been finalisation of URN Namespace Definition Mechanisms, but there has also been important progress in standardisation of URN resolution services. As said earlier, registration of URN namespaces for our traditional identifications systems must be carried out as soon as possible. It is also important to inform librarians, archivists and other information intermediaries about the URNs. Many organisations, which would probably found URNs very useful are not yet familiar with the system and its potential.

Juha Hakala, Development Director
Helsinki University Library
email: Juha.Hakala@helsinki.fi


References

Daigle, Leslie, Daniel, Ron & Preston, Cecilia: Uniform Resource Identifiers and Online Serials. The Serials Librarian, vol. 33 (1998) no. 3/4, pp. 325-341.

Useful addresses

DOI, Digital Object Identifier: http://www.doi.org
IANA, Internet Assigned Numbers Authority: http://www.iana.org/
IETF's URN Working Group: http://www.ietf.org/html.charters/urn-charter.html
RFC2141: ftp://ftp.isi.edu/in-notes/rfc2141.txt">
RFC 2276: Architectural Principles of Uniform Resource Name Resolution: ftp://ftp.isi.edu/in-notes/rfc2276.txt
RFC2288 :ftp://ftp.isi.edu/in-notes/rfc2288.txt
Nordic Web Index: http://nwi.funet.fi/cgi-bin/egwcgi/egwirtcl/nwiquery.tcl/lang=uk
PURL: http://purl.org/
URN Namespace Definition Mechanisms Standard: http://www.ietf.org/internet-drafts/draft-ietf-urn-nid-req-07.txt

Tietolinja 3/1998