Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aarseth, E. J. (1997). Cybertext: perspectives on ergodic literature. Baltimore, MD: Johns Hopkins University Press
Abiteboul, S., Cobena, G., Masanès, J., & Sedrati, G. (2002). A first experience in archiving the French Web. Paper presented at the Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Abiteboul, S., Preda, M., & Cobena, G. (2003). Adaptive on-line page importance computation. Paper presented at the Proceedings of the twelfth international conference on World Wide Web
Antoniol, G., Canfora, G., Cimitile, A., & De Lucia, A. (1999). Websites: files, programs or database. Paper presented at the 1st International Workshop on Web Site Evolution, Atlanta, USA
Arvidson, A., Persson, K., & Mannerheim, J. (2000). The Kulturarw3 project The Royal Swedish Web Archiw3e - An example of “complete” collection of web pages. Paper presented at the 66th IFLA - International Federation of Library Associations and Institutions, Jerusalem
Baeza-Yates, R. & Castillo, C. (2005). Characteristics of the Web of Spain. Cybermetrics, 9
Baeza-Yates, R., Castillo, C., & Efthimiadis, E. (2005a). Characterization of national Web domains
Baeza-Yates, R. A., Castillo, C., Marin, M., & Rodriguez, A. (2005b). Crawling a country: better strategies than breadth-first for Web page ordering. Paper presented at the WWW 05: Proceedings of the 14th international conference on World Wide Web, Chiba, Japan
Balayé, S. (1988). La Bibliothèque nationale, des origines à 1800 (Histoire des idées et critique littéraire; vol. 262). Genève: Droz
Battelle, J. (2005). Google Announces New Index Size, Shifts Focus from Counting. http://battellemedia.com/archives/001889.php
Benjamin, W. (1963). Das Kunstwerk im Zeitalter seiner technischen Reproduzierbarkeit; drei Studien zur Kunstsoziologie. [Frankfurt am Main]: Suhrkamp
Bergman, M. I. K. (2001). The deep Web: Surfacing hidden value. The Journal of Electronic Publishing, 7(1)
Bergmark, D. (2002). Collection synthesis. Paper presented at the 2nd ACM/ IEEE-CS joint conference on Digital libraries, Portland, USA
Bergmark, D., Lagoze, C., & Sbityakov, A. (2002). Focused crawls, tunneling, and digital libraries. Paper presented at the 6th European Conference on Research and Advanced Technology for Digital Libraries, Roma, Italy
Berners-Lee, T. & Connolly, D. (1995). Hypertext Markup Language - 2.0. RFC, 1866
Berners-Lee, T. (1994). Universal Resource Identifiers in WWW, A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web. RFC 1630
Berners-Lee, T. (1998). Cool URIs don’t change. http://www.w3.org/Provider/Style/URI.html
Berners-Lee, T. & Fischetti, M. (2000). Weaving the Web: The original design and ultimate destiny of the World Wide Web by its inventor (1st pbk. ed.). New York: HarperCollins
Björneborn, L. & Ingwersen, P. (2001). Perspective of webometrics. Scientometrics, 50(1), 65-82
Bolter, J. D. (2001). Writing space: Computers, hypertext, and the remediation of print (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates
Borgman, C. L. (2000). From Gutenberg to the global information infrastructure: access to information in the networked world (Digital libraries and electronic publishing). Cambridge, MA: MIT
Borgman, C. L. (2003). The Invisible Library: Paradox of the Global Information Infrastructure. Library Trends, 51(4), 652-674
Boudrez, P. & Eynde, V. D., Sofie. (2002). Archiving Websites
Boufkhad, Y. & Viennot, L. (2003). The Observable Web. RR
Boyko, A. (2004). Test Bed Taxonomy. IIPC Reports, 16
Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., & Stata, R., et al. (2000). Graph structure in the web. Paper presented at the 9th International World Wide Web Conference (WWW9), Amsterdam, Netherlands
Brown, A. (2006). Archiving the Web: A guide for information management professionals. Library Assn Pub.
Brügger, N. (2005). Archiving Websites, general considerations and strategies. Aarhus, Denmark: Center for Internet Research
Bruns, A. (2005). Gatewatching: Collaborative online news production (Digital formations, v. 26). New York: P. Lang
Burner, M. (1997). Crawling towards Eternity Building An Archive of The World Wide Web. New Architect, 5
Canfora, L. (1989). The vanished library (Hellenistic Culture and Society; 7). Berkeley: University of California Press
Canfora, L. (1996). Les bibliothèques anciennes et l’histoire des textes. In M. Baratin, & C. Jacob (Eds.), Le pouvoir des bibliothèques: la mémoire des livres en Occident. (pp. 338 p). Paris: A. Michel
Carlin, J. W. (2004). Harvest of agency public websites. NARA Bulletin, 2005-02
Castells, M. (1996). The rise of the network society. Malden, MA: Blackwell
Castillo, C., Marin, M., Rodriguez, A., & Baeza-Yates, R. A. (2004). Scheduling Algorithms for Web Crawling
Chakrabarti, S. (2002). Mining the Web: discovering knowledge from hypertext data. San Francisco, CA: Morgan Kaufmann
Chakrabarti, S., Berg, M. V. D., & Dom, B. (1999). Focused crawling: A new approach to topic-specific Web resource discovery. Computer Networks (Amsterdam, Netherlands: 1999), 31, 1623-1640
Chang, K. C.-C., He, B., Li, C., Patel, M., & Zhang, Z. (2004). Structured databases on the web: observations and implications. SIGMOD Record, 33 (3), 61-70
Charlesworth, A. (2003). Legal issues relating to the archiving of Internet resources in the UK, EU, USA and Australia
Cho, J., & Garcia-Molina, H. (2000). The evolution of the web and implications for an Incremental Crawler. Paper presented at the Proceedings of the 26th International Conference on Very Large Data Bases
Cho, J., Garcia-Molina, H., & Page, L. (1998). Efficient Crawling Through url ordering. Computer Networks and Isdn Systems, 30, 161-172
Christensen-Dalsgaard, B. (2001). Archive experience, not data. Paper presented at the Preserving the Present for the Future - Strategies for the Internet, The Royal Library, Copenhagen, Denmark
Crowston, K., & Williams, M. (1997). Reproduced and emergent genres of communication on the World-Wide Web. Paper presented at the 30th Annual Hawaii International Conference on System Sciences (HICSS-30), Wailea, USA
Cruse, P., Eckman, C., & Kunze, J. (2003). Web-based government information: Evaluating solutions for capture, curation, and preservation. An Andrew W. Mellon funded initiative of the California Digital Library
Dahn, M. (2000). Counting Angels on a Pinhead: Critically Interpreting Web Size Estimates. Online, January/February, 35-40
Day, M. (2006). The long-term preservation of Web content. In J. Masanès (Ed.), Web archiving. Berlin Heidelberg New York: Springer
Dikaiakos, M. D. (2004). Intermediary infrastructures for the World Wide web. Computer Networks, 45(4), 421-47
Dobra, A., & Fienberg, S. E. (2004). How Large Is the WorldWide Web?. In M. Levene, & A. Poulovassilis (Eds.), Web dynamics web dynamics - adapting to change in content, size, topology and use. (pp. 23-44). Berlin Heidelberg New York: Springer
Dubberly, H., Forlizzi, J., Hodge, C., Laurel, B., Lyman, P., Meggs, P. B., et al. (2002). Archiving experience design, a virtual roundtable discussion. LOOP: AIGA Journal of Interaction Design Education, Number 6
Dumais, S. T., Cutrell, E., Cadiz, J. J., Jancke, G., Sarin, R., et al. (2003). Stuff I’ve seen: A system for personal information retrieval and re-use. Toronto, Canada
Egghe, L. (2000). New informetric aspects of the Internet: some reflections many problems. Journal of Information Science, 26(5), 329-335
Eiron, N. & McCurley, K. S. (2003). Locality, hierarchy, and bidirectionality on the Web. Paper presented at the Workshop on Web Algorithms and Models
Eisenstein, E. L. (1979). The printing press as an agent of change: Communications and cultural transformations in early modern Europe. Cambridge [Eng.]; New York: Cambridge University Press
Entlich, R. (2004). Blog Today, Gone Tomorrow? Preservation of Weblogs. RLG DigiNews, 8(4)
Eriksen, L. B. & Ihlström, C. (2000). Evolution of the web news genre - The slow move beyond the print metaphor. Paper presented at the 33rd Hawaii International Conference on System Sciences (HICSS-33), Hawaii, USA
Estivals, R. (1961). Le dépôt légal sous l’Ancien Régime, de 1537 à 1791. Paris: M. Rivière
Estivals, R. (1965). La statistique bibliographique de la France sous la monarchie au XVIIIe siècle. Paris: Mouton
Fauconnier, S. & Frommé, R. (2004). Capturing unstable media, summary of research
Fayet-Scribe, S. (2000). Histoire de la documentation en France: Culture, science, et technologie de l’information, 1895-1937 (CNRS histoire). Paris: CNRS
Featherstone, M. (2000). Archiving cultures. British Journal of Sociology, 51(1)
Febvre, L. P. V. & Martin, H. J. (1976). The coming of the book: The impact of printing 1450-1800 ([New ed.] ed.). London: NLB
Fetterly, D., Manasse, M., Najork, M. & Wiener, J. (2003). A large-scale study of the evolution of web pages. Budapest, Hungary
Fielding, R. T., Gettys, J., Mogul, J., Nielsen, H. F., Masinter, L., J, P., et al. (1999). Hypertext Transfer Protocol - HTTP/1.1. RFC, 2616
Fitch, K. (2003). Web site archiving: An approach to recording every materially different response produced by a website. Paper presented at the AusWeb 2003: The Ninth Australian World Wide Web Conference, Sanctuary Cove, Australia
Florescu, D., Levy, A., & Mendelzon, A. (1998). Database techniques for the World-Wide Web: A survey. SIGMOD Record 27, 59-74
Freeman, E. & Gelernter, D. (1996). Lifestreams: A storage model for personal data. SIGMOD Record, 25(1), 80-86
Gemmell, J., Bell, G., Lueder, R., Drucker, S., & Wong, C. (2002). MyLifeBits: fulfilling the Memex vision. Juan-les-Pins, France
Gibson, D., Punera, K., & Tomkins, A. (2005). The volume and evolution of web page templates. Paper presented at the WWW ’ 05 14th international conference on World Wide Web, Chiba, Japan
Gillies, J. & Cailliau, R. (2000). How the Web was born: The story of the World Wide Web. Oxford: Oxford University Press
Golder, S. & Huberman, B. A. (2005). The Structure of Collaborative Tagging Systems
Gomes, D. & Silva, M. J. (2003). A Characterization of the Portuguese Web. Paper presented at the 3rd Workshop on Web Archives (IWAW’03), Trondheim, Norway
Gulli, A. & Signorini, A. (2005). The indexable web is more than 11.5 billion pages. Chiba, Japan
Halavais, A. (2004). Tracking Ideas in the Blogosphere
Hallgrímsson, T. (2006). Access and finding aids or web archives. In J. Masanès (Ed.), Web archiving. Berlin Heidelberg New York: Springer
Hine, C. (2000). Virtual ethnography. London; Thousand Oaks, CA: Sage
Hofmann, M. & Beaumont, L. R. (2005). Content networking: Architecture, protocols, and practice (The Morgan Kaufmann Series in Networking). Amsterdam; Boston: Morgan Kaufmann
Ingwersen, P. (1998). The calculation of web impact factors. Journal of Documentation, 54(2)
Jones, S. & Johnson, C. (2006). Web Use and Web Studies. In J. Masanès (Ed.), Web Archiving. Berlin Heidelberg New York: Springer
Jones, W., Bruce, H., & Dumais, S. (2001). Keeping found things found on the web. Atlanta, GA, USA
Jones, W., Bruce, H., & Dumais, S. (2003). How do people get back to information on the Web? How can they do it better? Paper presented at the IFIP INTERACT’03
Kahle, B. (1997). Preserving the Internet. Scientific American, 397, 82-84
Kahle, B. (2002). The Internet Archive. RLG DigiNews, 6(3)
Kimpton, M., Braggs, M., & Ubois, J. (2006). Year by Year: From an Archive of the Internet to an Archive on the Internet. In J. Masanès (Ed.), Web Archiving. Berlin Heidelberg New York: Springer
Koehler, W. (1999). Unraveling the ISSUES, ACTORS, & ALPHABET SOUP of the Great Domain Name Debates. Searcher, 7(5)
Koehler, W. (2004). A longitudinal study of Web pages continued: a consideration of document persistence. Information Research, 9(2)
Krishnamurthy, B. & Rexford, J. (2001). Web protocols and practice: HTTP/1.1, networking protocols, caching, and traffic measurement. Boston, MA: Addison-Wesley
Lagoze, C., Dean B. K., Sandy, P., & Jesurogaii, S. (2005). What Is a Digital Library Anymore, Anyway? Beyond Search and Access in the NSDL. D-Lib Magazine, 11-11
Lampos, C., Eirinaki, M., Jevtuchova, D., & Vazirgiannis, M. (2004). Archiving the Greek Web. Paper presented at the 4th International Web Archiving Workshop (IWAW’04), Bath (UK)
Landow, G. P. (1997). Hypertext 2.0 (Rev., amplified ed.). Baltimore: Johns Hopkins University Press
Lavoie, B. F. & Schonfeld, R. C. (2005). The systemwide print book collection. Paper presented at the CNI Spring 2005 Task Force Meeting
Lawrence, S. & Giles, C. L. (1998). Searching the Web. Science, 281, 175.
Lawrence, S. & Giles, C. L. (1999). Accessibility of Information on the Web. Nature, 400, 107-109
Lecher, H. E. (2004). Informant networks, alarm systems, and research contributors. Selection and ingest process for the Digital Archive for Chinese Studies. Paper presented at the Archiving Web Resources Conference - Issues for Cultural Heritage Institutions, NLA, Canberra, Australia
Lecher, H. E. (2006). Academic Web archiving: DACHS. In J. Masanès (Ed.), Web archiving. Berlin Heidelberg New York: Springer
Levy, P. (1997). Collective intelligence: Mankind’s emerging world in cyberspace. Cambridge, MA: Perseus Books
Liu, C. & Albitz, P. (1999). DNS & BIND (3rd ed.). O’Reilly & Associates
Lueg, C. & Fisher, D. (2003). From Usenet to CoWebs: Interacting with social information spaces (Computer supported cooperative work). Berlin Heidelberg London New York: Springer
Lyle, J. A. (2004). Sampling the Umich.edu Domain. Paper presented at the 4th International Web Archiving Workshop (IWAW’04), Bath (UK)
Lyman, P. (2002). Archiving the World Wide Web. In CLIR (Ed.), Building a national strategy for preservation: issues in digital media archiving. Council on Library and Information Resources and the Library of Congress
Lyman, P. & Kahle, B. (1998). Archiving digital cultural artifacts. D-Lib Magazine
Mantratzis, C. & Orgun, M. (2004). Towards a peer2peer world-wide-web for the broadband-enabled user community
Masanès, J. (2002). Towards continuous Web archiving: First results and an agenda for the future. D-Lib Magazine, 8(12)
Masanès, J. (2004). Site-first priority: Implementing the frontline
Masanès, J. (2005). Web archiving methods and approaches: A comparative study. Library Trends
Masanès, J. (2006a). Collecting the hidden web. In J. Masanès (Ed.), Web archiving. Berlin Heidelberg New York: Springer
Masanès, J. (2006b). Selection for Web Archives. In J. Masanès (Ed.), Web archiving. Berlin Heidelberg New York: Springer
Mohr, G., Kimpton, M., Stack, M. & Ranitovic, I. (2004). Introduction to Heritrix, an archival quality web crawler. Paper presented at the 4th International Web Archiving Workshop (IWAW’04), Bath (UK)
Mueller, M. (2002). Ruling the root: Internet governance and the taming of cyberspace. Cambridge, MA: MIT
Najork, M. & Heydon, A. (2001). High-performance Web crawling. SRC Research Report
Najork, M. & Wiener, J. (2001). Breadth-first search crawling yields high-quality pages. Paper presented at the 10th World Wide Web Conference (WWW ’10), Hong Kong
National Archives of Australia. (2001). Archiving Web resources: A policy for keeping records of web-based activity in the Commonwealth Government
Osborn, T. (1999). The ordinariness of the archive. History of the human sciences, 12 (2)
Page, L., Brin, S., Motwani, R. & Winograd, T. (1998). The Pagerank citation ranking: Bringing order to the Web, 17
Pandey, S. & Olston, C. (2005). User-centric Web crawling. Chiba, Japan
Pant, G., Srinivasan, P. & Menczer, F. (2004). Crawling the Web. In M. Levene, & A. Poulovassilis (Eds.), Web Dynamics. (pp. 153-178). Berlin Heidelberg New York: Springer
Pastor-Satorras, R. & Vespignani, A. (2004). Evolution and structure of the Internet: A statistical physics approach. Cambridge, UK; New York: Cambridge University Press
Phillips, M. E. (2005). Selective archiving of Web Resources: A study of acquisition costs at the National Library of Australia. RLG DigiNews, 9(3)
Qin, J., Zhou, Y. & Chau, M. (2004). Building domain-specific web collections for scientific digital libraries: A meta-search enhanced focused crawling method. Tuscon, AZ, USA
Rekimoto, J. (1999). Time-machine computing: A time-centric approach for the information environment. Paper presented at the 12th annual ACM symposium on User interface software and technology, Asheville, North Carolina, USA
Riché, P. (1996). La bibliothèque et la formation de la culture médiévale. In M. Baratin, & C. Jacob (Eds.), Le pouvoir des bibliothèques: la mémoire des livres en Occident (p. 338). Paris: A. Michel
Ringel, M., Cutrell, E., Dumais, S., Horvitz, E. (2003). Milestones in Time: The Value of Landmarks in Retrieving Information from Personal Stores. Paper presented at the IFIP INTERACT ’03
RLG. (2006). Web Archiving Program. http://www.rlg.org/en/page.php? Page_ID=399
Roche, X. (2006). Copying web sites. In J. Masanès (Ed.), Web Archiving. Berlin Heidelberg New York: Springer
Rosenfeld, L. & Morville, P. (2002). Information architecture for the World Wide Web (2nd ed.). Cambridge, MA: O’Reilly
Scharl, A. (2000). Evolutionary Web development (Applied computing). Berlin Heidelberg New York: Springer
Shepherd, M. & Polanyi, L. (2000). Genre in Digital Documents. Paper presented at the Proceedings of the 33rd Hawaii International Conference on System Sciences - vol. 3
Sonnenreich, W. (1997). A History of Search Engines. http://www.wiley.com/legacy/compbooks/sonnenreich/history.html
Spinellis, D. (2003). The decay and failures of web references. Communications of ACM, 46(1), 71-77
Stack, M. (2005). Full Text Search of Web Archive Collections. Paper presented at the IWAW’05, Vienna, Austria
Star, S. L. & Ruhleder, K. (1994). Steps towards an ecology of infrastructure: Complex problems in design and access for large-scale collaborative systems. Chapel Hill, NC, United States
Teevan, J. (2004). How people re-find Information when the Web changes. AIM2004-012
Thelwall, M. (2001). Extracting macroscopic information from Web links. Journal of the American Society for Information Science and Technology, 52(13), 1157-1168
Thelwall, M. (2006). Interpreting social science link analysis research: A theoretical framework. Journal of American Society of Information Science and Technology 57(1), 60-68
Thelwall, M. & Harries, G. (2004). Do the websites of higher rated scholars have significantly more online impact? Journal of the American Society for Information Science and Technology, 55(2), 149-59
Thelwall, M. & Vaughan, L. (2004). A fair history of the Web? Examining country balance in the Internet archive. Library & Information Science Research, 26(2), 162-176
Ubois, J. (2002). The Oakland archive policy. Recommendations for managing removal requests and preserving archival integrity
Voerman, G., Keyzer, A., Hollander, F. D., & Druiven, H. (2002). Archiving the Web: Political Party Web sites in the Netherlands. European Political Science, 2(1)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Masanés, J. (2006). Web Archiving: Issues and Methods. In: Web Archiving. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-46332-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-46332-0_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23338-1
Online ISBN: 978-3-540-46332-0
eBook Packages: Computer ScienceComputer Science (R0)