Comments on: this is a fantastically cool idea

By: Gunther Eysenbach

Gunther Eysenbach — Sun, 25 Nov 2007 03:20:00 +0000

Great move, Lars.
What exactly is the point of plagiarizing WebCite?
Your "innovation" that the standard output is "a URL that reveals within it the original URL so no more opaque URLs." is not really an innovation.
Perhaps you missed my response where I am pointing out that "transparent: URLs like
http://www.webcitation.org/query?url=http://www.ehealthinnovation.org&date=2006-02-02
are fully supported by WebCite. The abbreviated format using a TinyURL format is mainly for publishers and citing authors, who in their list of references usually also provide the "live" URL.

WebCite is meanwhile used by hundreds of journals and publishers like Biomed Central.

Well, I guess Imitation is the sincerest form of flattery.

By: Lars Bell

Lars Bell — Thu, 02 Aug 2007 04:55:33 +0000

Hi,

I have a web service now that should answer many if not all of the above concerns.
http://www.stayboystay.com
Free on demand archival service.

The standard output is a URL that reveals within it the original URL so no more opaque URLs. The date of the capture is also within the new URL.

In addition the new archive URL has a hash built into it. This provides a guarantee that the cached version has not been changed long after it was stored. Due do the cryptographic nature of the public domain hash it is computationally infeasible to change the content and then come up with the same hash.

This service is free and simple to use. You can use it anonymously or you can sign up for a free account and get some additional administration functions.

thanks

Lars Bell

By: Gunther Eysenbach

Gunther Eysenbach — Mon, 11 Sep 2006 15:02:56 +0000

Some of the commentators here would benefit from looking at the detailed technical description (http://www.webcitation.org/doc/WebCiteBestPracticesGuide.pdf ).
First of all, URLs are not necessarily opaque. A format such as http://www.webcitation.org/query?url=http://www.ehealthinnovation.org&date=2006-02-
02 is also functional and can be used as alternative format for citation purposes. This URL however gets very long (if the cited URL is already long), so it can be replaced by a shorter URL using a ID such as http://www.webcitation.org/5IlFymF33 or http://www.webcitation.org/query?id=5IlFymF33 .
If you only know the cited URL but not the ID but want to cite the ID version then you can look up the ID by using the query form at http://www.webcitation.org/query
Copyright concerns are addressed at the FAQ (http://www.webcitation.org/query) - essentially, recent jurisdiction in the context of the Google cache supports archiving projects like this where the copyright holder can opt out using robot exclusion standards, metatags, or an email requesting removal of material.
Somebody mentioned the idea of a toolbar. WebCite welcomes any initiatives to create a toolbar (or to embed this into existing toolbars), but it should be pointed out that a "bookmarklet" - as offered and described on the WebCite page - works as easy and convenient as well.
The answer to questions about sustainability of such a service and incentives for the WebCite consortium to actually maintain the archive is that the WebCite consortium is a consortium of academic editors and publishers who are using WebCite in their journals. They have an intrinsic motivation to keep this service running, otherwise everything which has been cited in their (printed) academic journals would vanish.
The WebCite consortium alsp collaborates with the U of T library and seeks active collaboration with other archiving projects such as IA.
Future iterations of WebCite will contain features like first displaying the live page and only if it is not the same as the archived version displaying the archived snapshot.
The cited snapshot is usually exactly what the citing author saw and archived (a given webpage at a certain date/time). The ONLY exception is if the dynamic page looks different for different viewer IPs (e.g. different countries) - in which case WebCite will archive/display the page the WebCite robot - which is located in Canada "sees" - but as somebody remarked, such pages probably should not be cited anyway.
I just also clarify that WebCite is not a "company", but a open source / community project, and everybody who things he could contribute code or ideas is more than welcome to contact the WebCite consortium.

For further background about this see also the following article (published in a journal which uses WebCite rountinely for all references):

Eysenbach G, Trudel M
Going, Going, Still There: Using the WebCite Service to Permanently Archive Cited Web Pages
J Med Internet Res 2005;7(5):e60
">">http://www.jmir.org/2005/5/e60/>

By: Juan

Juan — Mon, 11 Sep 2006 11:21:45 +0000

Hi all. I am the developer behind WebCite and I wanted to clarify a few of the points noted here.

1. the snapshot ID shown is just a way to get URLs to be short and intended to look 'pretty' on print publications. It is also possible to search for a given webcite by using URL parameters on the query page. For example, http://www.webcitation.org/query?url=http://lessig.org/blog/&date=2006-09-08 also gets you to the archive of this blog.

2. How do you know that WebCite wont itself disappear? WebCite is in talks with the University of Toronto Library and it appears that they will provide hosting should the Centre for eHealth Innovation ever stop being the host. -- BioMedCentral already archives with us and PLoS is starting soon -- showing that some big players are already relying on the service and thus providing some 'guarantee'.

3. The purpose of WebCite is not to be used as legal proof that a given website looked a certain way. It was designed for the purpose of archiving web citations on academic papers.

I welcome any other questions/comments: jalperin [ at ] ehealthinnovation.org

Please also visit our FAQ. Note there is a link there to a "Best Practices Guide" that explains all URL parameters and other technical ins and outs.

By: Lessig

Lessig — Mon, 11 Sep 2006 05:56:19 +0000

I think some of this is missing the point. The purpose is not to prove with 100% certainty what was at a particular URL at a particular time. The purpose, as I understand it, is to make URL's useable as citations. A simple way, that is, to go back to the thing cited, for the purpose of completing the reference. Sure, more confidence is better than less. But that you don't have perfect conference doesn't mean the service has no important value.

Second, as I said in the post, better would be if the cite published a table of its opaque URLs and the originals, so someone could at least go back to the original (and alternative archives for the original URL) if needed.

Third, of course there's always a risk that the archive disappears. But if people start supporting the archiving movement, there's less risk they will disappear than that a single URL will disappear.

By: QrazyQat

QrazyQat — Sun, 10 Sep 2006 16:00:03 +0000

Plus, really, who says this service is going to be around forever? Is it really going to archive all those pages for years; where's their incentive? At least the original owner/poster has some incentive to keep the info online, although they may not. Why would you assume this service will do it?

I note that web archive, which I understood was originally going to archive it all (sure), simply dropped pages after a while.

By: Guilherme P. de Freitas

Guilherme P. de Freitas — Sun, 10 Sep 2006 13:55:59 +0000

I'd just like to emphasize the points made by Seth and Phillip:

1. "Opaque" URLs are not nice (Seth).
2. At least in principle, they may archive a different version of the webpage due to location differences (Phillip).
3. Authenticity of the archived content is an issue (Phillip).

Point 3 is critical; point 1 could be easily addressed, I guess; point 2 is important, but you can always check if it archived the right version of the webpage in order to avoid mistakes.

By: Max Battcher

Max Battcher — Sun, 10 Sep 2006 07:35:25 +0000

"If so, it would give you that page; if not, it would take you to the archive. Difficulty with this is dynamic pages."

A truely dynamic page shouldn't be used in a citation, as it isn't a verifiable source, righy? A "dynamic page" (ultimately static over time but within a dynamic web engine) generated by blog software or what have you, should still attempt to send a correct Last-Modified header if at all possible, and you could certainly rely on that just as so many cache engines do (such as Google Cache).

By: Philip Weiss

Philip Weiss — Sat, 09 Sep 2006 18:30:20 +0000

It's nice to have that feature, but there are a whole host of problems related to it.

First, there's no guarantee that the page archived is the same one you viewed. Given that many web sites show different versions of the web page based on browser, based on IP address, based on country of the viewer, etc. They may return one thing to the citation company, and another to you.

Similarly, there's no guarantee that what a person views now is the same as what was originally stored in the archive. While probably good enough for referencing in news articles, if you want to use it for legal purposes, or for other reasons where you need a correct copy, this is not a good way to do it. (For example, I want to record the price advertised for an item so I can take advantage of a competitor's "meet or beat" guarantee.) Can you trust the citation company and it's agents to maintain integrity? I wouldn't put money on it.

By: Andreas

Andreas — Sat, 09 Sep 2006 16:52:43 +0000

Although I definitely see the value of this kind of service (I second Seth's doubts about the opaque URI format though), isn't this a bit a grey zone with regards to copyright? Services like Furl and other bookmark managers that save a copy of the pages in your library, haven't added "world sharing" or even "group sharing" features for exactly that reason. More at http://furl.net/faq.jsp#copy