The Internet Archive for Genealogists

History of the Internet Archive

The Internet Archive ( is a non-profit that is building a digital library of Internet sites and other cultural artifacts in digital form. It provides free access to researchers, historians, scholars, people with print disabilities, and the general public. Their mission is to provide Universal Access to All Knowledge.

The Archive began archiving the Internet in 1996 It now has over twenty-three years of web history accessible through the Wayback Machine and partners with over one thousand and other partners to identify important web pages. It soon began providing digital versions of other published works. At this time it contains seven hundred thrity-five billion web pages, forty-one million books and texts, 14.7 million audio recordings (including two hundred forty thousand live concerts), 8.4 million videos (including 2.4 million television news programs), 4.4 million images, and eight hundred ninety thousand software programs.

It pays to set up your free account with the Archive, as some e-books are accessible only if you have one. Also, anyone with a free account can upload media to the Archive. It works with thousands of partners globally to save copies of their work into special collections.

The Archive began a program to digitize books in 2005 and today it scans four thousand three hundred books per day in eighteen locations around the world. Books published prior to 1927 are available for download, and hundreds of thousands of modern books can be borrowed through their Open Library site. Even better, you can create a free account to borrow e-books, upload material, create virtual bookshelves, and mark favorite resources.

Internet Archive and Genealogy

Since our focus is on genealogy, we’re going to explore the many resources genealogists can access within the Internet Archive. The value of these resources should not be underestimated!

The Genealogy section was added in December 2008 and continues to expand. It includes resources from Allen County Public Library Genealogy Center; Robarts Library/University of Toronto; University of Illinois Urbana-Champaign Library; Brigham Young University; National Library of Scotland; Indianapolis City Library City Directory and Yearbooks Collection; Leo Baeck Institute Archives of German-speaking Jewry; Boston Public Library; and many more. Items include (among many things) books on surname origins, vital statistics, parish records (UK), census records, passenger lists of vessels, yearbooks, city directories, local/family histories, etc. Currently, Genealogy contains 103 collections, all of which are in the public domain.

Locating Genealogical Resources in the Internet Archive

One way to use the Archive is to do a full-text search using the Search Box on the Internet Archive home page. The box is located below the string of colored icons representing the different forms of media in the Archive: Web, books, video, audio, software, and images. You can search metadata, text contents, TV news captions, radio transcripts, and archived web sites within the Archive. NOTE: You can search only text in the Internet Archive; handwritten items are browseable only.

You can perform a general search from the page for an individual and see where his name appears on the site. For example, enter the name Peter Pyeatt, select “Search text contents” below the search box, and click “Go” or press the “Enter” key on your computer. Be sure to use quotes around phrases and full names (Figure 1).

Figure 1. Internet Archive homepage with search terms. Search box is below the line of nine kinds of media in the Archive.

This search reveals twenty results showing the name Peter Pyeatt, although just twelve are shown here (Figure 2). It shows a thumbnail of book cover with author and title; an excerpt with search term(s) highlighted; number of views item had; filters on left side; and sort ability on top. Clicking on a thumbnail will take you to the first appearance of the search term(s) in that publication. The list of filters is longer that what’s shown here, so be sure you check the entire list for all filtering options.

Figure 2. Some of the search results for Peter Pyeatt.

Below is the only search result for Peter Pyeatt in the 24 June 1882 Scientific American thumbnail (Figure 3). The result shows the OCR (optical character recognition) transcription of text on the right that contains the search term, and the actual page, with the search term highlighted. You’ll see this format for every appearance of the search term within the publication.

Figure 3. Search result for Peter Pyeatt in the 24 June 1882 issue of The Scientific American.

Once you have your search result, you can do a variety of different things with it. At the upper left side in Figure 3 is a list of icons representing the various functions you can perform with the document. From top to bottom, they are Search; Download; Bookmark, Visual Adjustment (adjust brightness, adjust contrast, inverted colors, grayscale, zoom); and Forward. You can also change search terms. At the bottom right is a row of icons that allow you to flip left or right; see a one-page view; see a two-page view; see a thumbnail view of the entire publication; read the book aloud if allowable; zoom in and out; and view the page in full-screen mode.

Genealogy Collection

The Internet Archive has an assemblage of one hundred thirty collections of various genealogical materials scanned from books, periodicals, and microfilm that were contributed by many institutions. You can access this collection directly (Figure 4) at ( or via the home page (, by clicking on “Books”, then on “Genealogy.”

On this collection’s homepage, look for the Search this Collection input field on the left side of the page, then enter your search term there and hit the Return or Enter key. Results will show items in that collection. You can also search within an institution’s collection.

Figure 5. The Internet Archive Genealogy Collection homepage.

Click on “Collection” just below the introductory paragraph to see a list of all genealogy collections (Figure 5). You can sort by collection, show collections as a list, and get collection details. You can select a collection and search it if it’s searchable, select a download format for the item you want, then save or print it. Figure 6 shows the results of a search for the surname Sabo in the New Jersey Marriage Index shown in Figure 5.

Figure 6. Results of searching for the surname Sabo in New Jersey Marriage Index (Brides) – 1915-1919 – Surnames R-S

It’s important to remember that the thumbnail displays the general title of the item. When you click on the thumbnail and go to the actual item itself, you’ll see its specific contents. In the case of the New Jersey Marriage Index, the full description states that this is the index for brides whose last names begin with “R” and “S” who were married between 1915 and 1919. This means you can select just this collection and check the full descriptions for the desired last name and year of marriage.

Family Genealogy Collection

The Internet Archive Family Genealogy Collection is separate from its Genealogy Collection. It includes 3,773 texts that are out of copyright. You can perform the same tasks as with other collections (Figure 7). For direct access, go to Figures 8 and 9 show sample pages from a book in this collection.

Figure 7. The Internet Archive Family Genealogy Collection.
Figures 8 and 9. Pages from the book Genealogy of the descendants of Thomas Gleason of Watertown, Mass. 1607-1909 by John Barber White (1909).

School Yearbooks

You can also check for free school yearbooks and perform searches within them (Figure 10). It’s not a large collection but it’s still worth a look. However, it’s especially helpful if you’re interested in a Massachusetts yearbook. Boston Public Library arranged for thousands of yearbooks to go online from about 140 Massachusetts cities and towns, ranging from the 1920s to today! The direct link to this collection is

Figure 10. School Yearbooks homepage.

Figure 11 displays the result of a search for Freda Belkin (lower right side) in the 1934 yearbook for Waltham High School.

Figure 11. Waltham High School (Waltham, Massachusetts) 1934 yearbook.

OpenLibrary (Figure 12) is the Internet Archive’s free, digital lending library of over two million e-books that can be read in a browser or downloaded for reading off-line. The Archive is undertaking a unique but tremendous project of building one web page for every book ever published. Over twenty million books already have a page on

Figure 12. Banner from OpenLibrary’s homepage.

There are several ways that genealogists can search for genealogy books on this site. Here is one way:

  • From the Internet Archive homepage, click on “Books” at the top left side of the screen, then click on “OPENLIBRARY”.
  • Sign in with your Internet Archive login and password
  • Under “Browse”, click “Subject”
  • In the Subject box, enter “genealogy” and hit the “Enter” key
  • Select your subject of interest, e.g. “Texas Genealogy” or enter a search term, then hit the “Enter” key
  • On the new page, click the number of works next to subject heading and hit the “Enter” key
  • On the far right, under “Ebook?”, click “Yes”

The results will list the Internet Archive genealogy books you can borrow and use.

Figure 13. The Genealogy page at Currently it contains 78,450 works but not all of them can be borrowed.

You can also access the main page directly at (Figure 12) and follow steps 2-7 from the above list, or go straight to the genealogy search page at (Figure 13), sign in, and follow steps 3-7 from the above list.

The Wayback Machine

The Internet Archive’s web archive, the Wayback Machine, was launched in 1996 (Figure 14). It contains over two petabytes of data compressed, or over one hundred fifty billion web captures and includes content from every top-level domain, over two hundred million web sites, and over forty languages. You can utilize this tool by entering current and defunct web site addresses (URLs) to see what pages its web crawler found at a specific point in time between 1996 and the current year.

Figure 14. The Wayback Machine can be useful in accessing past snapshots of current genealogy web sites as well as accessing defunct sites.

To use the Wayback Machine, follow these steps:

  • Enter the specific web site address into the Wayback Machine search bar on the Archive home page and hit “Enter”. This takes you to a page showing a timeline extending from 1996 to the current year, which is the default year, and highlights the days the web crawler got a snapshot of that site during that year.
  • Select the year you want to check for web crawler “hits”.
  • Select a highlighted day and hit “Enter,” which will take you to that “snapshot” page. Some days display multiple times that the web crawler took a snapshot of the site.
Figure 15. Some web crawler snapshots taken of the site in 2000.

The example in Figure 15 displays partial results for the site for the year 2000. In some cases you might not be able to access certain sites because their owners have blocked the web crawler from being able to find them.


Using Google is another way to search for genealogical materials within the Internet Archive. You can enter search terms such as “internet archive genealogy” in the search box which can brings back a lot of great results, including those concerning using the Archive for genealogy, including helpful videos.

You can also search for items within the Archive’s Genealogy Collection by entering “[keyword(s)] +” in the search box. This can be the quickest way to find certain items, such as inventories of county courthouse archives (Figure 16) and especially city directories (Figure 17).

Figure 16. Google search results for published county
archives in the Internet Archive.
Figure 17. Google search results for city directories in the Internet Archive.

The Internet Archive is a treasure trove for genealogists. Take some to explore it, and be sure you include this fantastic site among your top genealogical resources.

You may also like...