Abstract
The World Wide Web is growing at an enormous speed, and has become an indispensable source for information and research. New pages are constantly added, but there are additional processes as well: pages are moved or removed and/or their content changes. We report here the results of an eight year long project started in 1998, when multiple search engines were used to identify a set of pages containing the term informetrics. Data collection was repeated once a year for the last eight years (with the exception of 2000 and 2001) using both search engines and revisiting previously identified pages. The results show that the number of pages grew from 866 in 1998 to 28,914 in 2006 — a 33-fold growth. Besides the obvious growth of the topic on the Web, we observed both decay (pages disappearing from the Web) and modification. Even though most of the pages from 1998 either disappeared or ceased to contain the term informetrics, 165 pages (19.1%) still exist in 2006 and contain the search term. We followed the “fate” of these 165 pages: characterized the publishers, the contents and the changes that occurred the whole period. In recent years e-print servers and publishers’ sites became sources of large number of pages related to informetrics. Longitudinal studies following the evolution of a topic on the Web are very important, since they provide insights about content and the underlying Web processes.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Bar-Ilan, J. (2000), The Web as information source on informetrics? — A content analysis. Journal of the American Society for Information Science, 51(5): 432–443.
Bar-Ilan, J., Peritz, B. C. (1999), The life-span of a specific topic on the Web — The case of “informetrics”: A quantitative analysis. Scientometrics, 46: 371–382.
Bar-Ilan, J., Peritz, B. C. (2004), Evolution, continuity and disappearance of documents on a specific topic on the Web — A longitudinal study of “informetrics”. Journal of the American Society for Information Science and Technology, 56: 980–990.
Baeza-Yates, R., Poblete, B. (2003), Evolution of the Chilean Web structure composition. In: Proceedings of the First Latin American Web Congress (LA-WEB 2003), Retrieved November 12, 2006 from: http://www.la-web.org/2003/stamped/02_baeza-yates-poblete.pdf
Bergman, M. K. (2001), The deep Web: Surfacing hidden value. Journal of Electronic Publishing, 7(1), Retrieved September 12, 2007, from http://www.press.umich.edu/jep/07-01/bergman.html
Bharat, K., Broder, A. (1998), A technique for measuring the relative size and overlap of public Web search engines. Computer Networks and ISDN Systems, 30: 379–388.
Brewington, B. E., Cybenko, G. (2000), How dynamic is the Web? Computer Networks, 33: 257–276.
Casserly, M. F., Bird, J. E. (2003), Web citation availability: analysis and implications for scholarship, College and Research Libraries, 64(7): 300–317.
Cho, J., Garcia-Molina, H. (2000), The Evolution of the Web and implications for an incremental crawler. In: Proceedings of 26th International Conference on Very Large Databases (VLDB), September 2000, (pp. 200–210).
Fetterly, D., Manasse, M., Najork, M., Wiener, J. L. (2004), A large scale study of the evolution of Web pages. Software — Practice and Experience, 34: 213–237.
Goh, D. H., Ng, P. K. (NO DATE), Link decay in leading information science journals. To appear in JASIST. Retrieved November 17, 2006 from: http://www3.interscience.wiley.com/cgi-bin/fulltext/113452914/HTMLSTART
Gomes, D., Silva, M. J. (2006), Modeling information persistence on the Web. In: Proceedings of the 6th International Conference on Web Engineering (ICWE06), (pp.193–200).
Ke, Y., Deng, L., Ng, W., Lee, D. L. (2006), Web dynamics and their ramifications for the development of Web search engines. Computer Networks, 50: 1430–1447.
Kim, S. J., Lee, S. H. (2005), An empirical study on the change of Web pages. In: Proceedings of APWeb 2005, LNCS 3399, (pp. 632–642).
Koehler, W. (2004), A longitudinal study of Web pages continued: A report after six years. Information Research, 9(2) paper 174. Retrieved November 12, 2006 from: http://InformationR.net/ir/9-2/paper174.html
Krippendorff, K. (2003), Content Analysis: An Introduction to Its Methodology. 2nd edition. Sage Publications.
Lawrence, S., Giles, C. L. (1998), Searching the World Wide Web. Science, 280(5360): 98–100.
Lawrence, S., Giles, C. L. (1999), Accessibility of information on the Web. Nature, 400: 107–109.
Lawrence, S., Pennock, D. M., Krovetz, R., Coetzee, F. M., Glover, E., Nielsen, F. A., Giles, L. E. (2001), Persistence of Web references in scientific research. Computer, 34(2): 26–31.
Markwell, J., Brooks, D. W. (2003), “Link rot” limits the usefulness of Web-based educational material in biochemistry and molecular biology. Biochemistry and Molecular Biology Education, 31(1): 69–72.
McCown, F., Chan, S., Nelson, M. L., Bollen, J. (2005), The availability and persistence of Web references in D-Lib Magazine. 5th International Web Archiving Workshop (IWAW05), Vienna, Austria. Retrieved November 12, 2006 from: http://arxiv.org/ftp/cs/papers/0511/0511077.pdf
Mizzaro, S. (1998), How many relevances in information retrieval? Interacting with Computers, 10(1998): 305–322. Retrieved November 12, 2006 from: http://www.dimi.uniud.it/mizzaro/research/papers/IwC.pdf
Nelson, M. L., Allen, B. D. (2002), Object persistence and availability in digital libraries. D-Lib Magazine, 8(1). November 12, 2006 from: http://www.dlib.org/dlib/january02/nelson/01nelson.html
Neudorf, K. A. (2001), The Content Analysis Guidebook. Sage Publications.
Ntoulas, A., Cho, J., Olston, C. (2004), What’s new on the Web? The evolution of the Web from a search engine perspective. In: Proceedings of the World-Wide Web Conference (www), May 2004, (pp. 1–12).
Ortega, J. L., Aguillo, I., Prieto, J. (2006), A longitudinal study of content and elements in scientific Web environment. Journal of Information Science, 32: 344–351.
Rousseau, R. (1999), Daily time series of common single word searches in AltaVista and Northern Light. Cybermetrics, 2/3(1), paper 2. Retrieved November 12, 2006 from: http://www.cindoc.csic.es/cybermetrics/articles/v2i1p2.html
Saracevic, T. (1998), Relevance reconsidered. In: Proceedings of the Second Conference on Conceptions of Library and Information Science (CoLIS 2), Copenhagen, Denmark (pp. 201–218).
Sellitto, C. (2005), The impact of impermanent Web-located citations: A study of 123 scholarly conference publications. Journal of the American Society for Information Science and Technology, 56(7): 695–703.
Spinellis, D. (2003), The decay and failures of URL references. Communications of the ACM, 46(1): 71–77.
Toyoda, M., Kitsuregawa, M. (2006) What’s really new on the Web? Identifying new pages from a series of unstable web snapshots. In: Proceedings of www2006 (2006), (pp. 233–241).
Tyler, D. C., Mcneil, B. (2003), Librarians and link rot: A comparative analysis with some methodological considerations. Portal: Libraries and the Academy, 3(4): 615–632.
Wren, J. D. (2004), 404 not found: The stability and persistence of URLs published in Medline. Bioinformatics, 20(5): 668–672.
Wren, J. D., Johnson, K. R., Crockett, D. M., Heilig, L. F., Schilling, L. M., Dellavalle, R. P. (2006), Uniform Resource Locator decay in dermatology journals. Archives of Dermatology, 142: 1147–1152.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bar-Ilan, J., Peritz, B.C. The lifespan of “informetrics” on the Web: An eight year study (1998–2006). Scientometrics 79, 7–25 (2009). https://doi.org/10.1007/s11192-009-0401-7
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-009-0401-7