已收录 268919 条政策
 政策提纲
  • 暂无提纲
A cross disciplinary study of link decay and the effectiveness of mitigation techniques
[摘要] BackgroundThe dynamic, decentralized world-wide-web has become an essential part of scientific research and communication. Researchers create thousands of web sites every year to share software, data and services. These valuable resources tend to disappear over time. The problem has been documented in many subject areas. Our goal is to conduct a cross-disciplinary investigation of the problem and test the effectiveness of existing remedies.ResultsWe accessed 14,489 unique web pages found in the abstracts within Thomson Reuters' Web of Science citation index that were published between 1996 and 2010 and found that the median lifespan of these web pages was 9.3 years with 62% of them being archived. Survival analysis and logistic regression were used to find significant predictors of URL lifespan. The availability of a web page is most dependent on the time it is published and the top-level domain names. Similar statistical analysis revealed biases in current solutions: the Internet Archive favors web pages with fewer layers in the Universal Resource Locator (URL) while WebCite is significantly influenced by the source of publication. We also created a prototype for a process to submit web pages to the archives and increased coverage of our list of scientific webpages in the Internet Archive and WebCite by 22% and 255%, respectively.ConclusionOur results show that link decay continues to be a problem across different disciplines and that current solutions for static web pages are helping and can be improved.
[发布日期] 2013-10-09 [发布机构] 
[效力级别]  [学科分类] 
[关键词] Optical Character Recognition;Universal Resource Locator;Internet Archive;Naming Authority;Survival Regression Model [时效性] 
   浏览次数:1      统一登录查看全文      激活码登录查看全文