Showing posts with label Library of Congress. Show all posts
Showing posts with label Library of Congress. Show all posts

January 2, 2012

The Internet Lives


While the Internet, with all its information, is constantly changing with updates and new information, what is great to know is that it is being preserved and archived, so present and future generations can "travel back" and see what it looked liked at earlier points in time and have access to the wealth of information contained in it.

This is what the Internet Archive does--this non-profit organization functions as the Library of the Internet. It is building a "permanent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format."

In the Internet Archive you will find "texts, audio, moving images, and software as well as archived web pages" going back to 1996 until today.
I tested the Archive's Wayback Machine with my site The Total CIO and was able to see how it looked like back on October 24, 2010.

It is wonderful to see our digital records being preserved by the Internet Archive, just like our paper records are preserved in archives such as The Library of Congress, which is considered "the world's most comprehensive record of human creativity and knowledge"), The National Archives, which preserves government and historical records, and The National Security Archive, a research institute and library at The George Washington University that "collects and publishes declassified documents through the Freedom of Information Act...[on] topics pertaining to national security, foreign, intelligence, and economic policies of the United States."

The Internet Archive is located in San Francisco (and my understanding is that there is a backup site in Egypt).

The Internet Archive is created using spider programs that crawl the publicly available pages of the Internet and then copy and store data, which is indexed 3 dimensionally to allow browsing over multiple periods of times.

The Archive now contains roughly 2 petabytes of information, and is growing by 20 terabytes per month. According to The Archive, the data is stored on hundreds (by my count it should be about 2,000) of slightly modified x86 machines running on Linux O/S with each storing approximately a terabyte of data.

According to the FAQs, it does take some time for web pages to show up--somewhere between 6 months and 2 years, because of the process to index and transfer to long-term storage, and hopefully the process will get faster, but in my opinion, having an organized collection and archiving of the Internet is well worth the wait.

Ultimately, the Internet Archive may someday be (or be part of) the Time Capsule of human knowledge and experience that helps us survive human or natural disaster by providing the means to reconstitute the human race itself.

(Source Photo: here)

Share/Save/Bookmark