Internet Archive Blogs

A blog from the team at archive.org, defining web pages, web sites and web captures.

We define a webpage as a valid web capture  that is an HTML document, a plain text document, or a PDF.

A domain on the web is an owned section of the internet namespace, such as google.com or archive.org or bbc.co.uk. A host on the web is identified by a fully qualified domain name  or FQDN that specifies its exact location in the tree hierarchy of the Domain Name System. The FQDN consists of the following parts: hostname and domain name.  As an example, in case of the host blog.archive.org , its hostname is blog  and the host is located within the domain archive.org .

We define a website to be a host that has served webpages and has at least one incoming link from a webpage belonging to a different domain.

As of today, the Internet Archive officially holds 273 billion webpages from over 361 million websites, taking up  15 petabytes  of storage.

4 thoughts on “ Defining Web pages, Web sites and Web captures ”

' src=

Good job guys! Interesting facts about archiving!

Pingback: Beta Wayback Machine – Now with Site Search! | Internet Archive Blogs

Pingback: WOW! New Beta Allows Users to Keyword Search a Limited Amount of Material in The Wayback Machine | LJ INFOdocket

Pingback: Internet Archive – Treasure | Web Search Guide and Internet News

Comments are closed.

Upcoming Events

Escaping the memory hole, book talk: the line: ai and the future of personhood.

IMAGES

  1. BBC Bitesize

    history of websites

  2. The World’s First Web Site

    history of websites

  3. History of the Internet

    history of websites

  4. How the internet was invented?

    history of websites

  5. Happy 25th Birthday, World Wide Web!

    history of websites

  6. A Look at Internet History: The Most Popular Websites from the Past

    history of websites

VIDEO

  1. AudioMo Day 1 The Gallery of Madame Liu Tsong

  2. Zywall USG-100 Plus Check Users History Websites

  3. The world's first website in NCSA Mosaic

  4. StefieBeatsToMusic

  5. Origins of the Web

  6. Does this only happen to me?? Let me know below.. please! 😅 #obsessed #buttonmaker #handmadebuttons

COMMENTS

  1. Wayback Machine

    The Wayback Machine is an initiative of the Internet Archive, a 501 (c) (3) non-profit, building a digital library of Internet sites and other cultural artifacts in digital form. Other projects include Open Library & archive-it.org.

  2. Internet Archive: Wayback Machine

    Search the history of over 866 billion web pages on the Internet. Search the Wayback Machine

  3. Using The Wayback Machine – Internet Archive Help Center

    The Wayback Machine is built so that it can be used and referenced. If you find an archived page that you would like to reference on your Web page or in an article, you can copy the URL. You can even use fuzzy URL matching and date specification… but that’s a bit more advanced. How can I use the Wayback Machine’s Site Search to find websites?

  4. Internet Archive: Digital Library of Free & Borrowable Texts,...

    Search the history of over 866 billion web pages on the Internet. A line drawing of the Internet Archive headquarters building façade. Internet Archive is a non-profit digital library offering free universal access to texts, movies & music, as well as 624 billion archived web pages.

  5. Internet Archive: About IA

    The Internet Archive, a 501(c)(3) non-profit, is building a digital library of Internet sites and other cultural artifacts in digital form. Like a paper library, we provide free access to researchers, historians, scholars, people with print disabilities, and the general public.

  6. Defining Web pages, Web sites and Web captures - Internet Archive...

    Over the years, the Archive has saved over 510 billion such time-stamped web objects, which we term web captures. We define a webpage as a valid web capture that is an HTML document, a plain text document, or a PDF.

  7. The History of Rome : Mike Duncan : Free Download, Borrow, and...

    A weekly podcast tracing the history of the Roman Empire, beginning with Aeneas's arrival in Italy and ending with the exile of Romulus Augustulus, last Emperor of the Western Roman Empire. Now complete!

  8. Wayback Machine General Information - Internet Archive Help...

    Every day hundreds of web crawls contribute to the web captures available via the Wayback Machine. Behind each, there is a story about factors like who, why, when and how. Why is the Internet Archive collecting sites from the Internet? What makes the information useful?

  9. Internet Archive: Search Engine

    Search the history of over 866 billion web pages on the Internet. Search the Wayback Machine

  10. A history of Nigeria : Falola, Toyin : Free Download, Borrow, and...

    Search the history of over 866 billion web pages on the Internet. Search the Wayback Machine