New Arrivals    Books    Archival Products   Charts   Newsletters   Upcoming Events   Contact Us  

Popular Categories

   How-To - Genealogy Misc.
   How-To - Write & Publish
   How-To - Conservation
      - Acadie, Acadian
      - New Brunswick
      - Newfoundland & Lab.
      - Nova Scotia
      - Ontario
      - Prince Edward Island
      - Quebec
      - Western Canada
      - First Nations, Metis
      - Military - Before 1920
      - Loyalists / UEL
      - Pioneers' Stories
   British Home Children
   England & Wales
   Ireland & Northern Ireland
   United States
      - American Revolution
   more countries...

   Archival Products

   Genealogy Charts

   Gift Certificates

Popular Authors

   Thomas MacEntee
   Paul Milner
   Chris Paton
   Ron W. Shaw
   Gavin K. Watt

Popular Publishers

   Global Heritage Press
   MacDonald Research
   OGS - Ottawa Branch
   Unlock The Past

Search by topic, title, author or word:

News & How-To
Formerly branded as

Articles, press releases,and how-to information for everyone interested in genealogy and history

Subscribe to our free newsletter

What's distant reading?
Feb 02, 2017
By John D. Reid, Canada's Anglo-Celtic Connections

John D. Reid
What's distant reading? Only done by those with hypermetropia?

Distant reading is understanding not by studying particular texts, but by aggregating and analyzing massive amounts of textual data from a corpus.

Distant reading has evident limitations for genealogy where you need to pick through to find something particular, say, an obit of your great-grandmother. That's called, unsurprisingly, close reading.

What distant reading can do for the family historian is provide context. Your great grandmother died of influenza and you'd like to know if it was a year when the disease was prevalent.

You may be familiar with Google Ngram where you can explore how frequently a word or phrase has been used in a corpus of books over time. This example shows the profile for cholera in red and influenza in blue. There's a huge spike for cholera in 1884 and upticks for both in 1942. While there is an uptick for influenza in 1918/19 for the pandemic that killed more than 20 million, perhaps as many as 50 million, the Ngram peak is not as significant at in 1905. The problem for genealogy is the book corpus relates to the publication date which may not bear any relationship to current events. It is good for long term trends - try cigarette, aircraft, newspaper, radio, television.

Recent months have seen several articles published on distant reading using newspaper databases as the corpus. The British Newspaper Archive, Chronicling America and a Dutch newspaper database have all been explored. While newspapers cover current events there are still issues of representativeness as discussed in the article Bridging the gap between quantitative and qualitative research in digital newspaper archives.

The studies using the British Newspaper Archive have been conducted by a group from the University of Bristol led by Professor Nello Cristianini. A recent article, Content analysis of 150 years of British periodicals includes a diagram reproduced here with the bottom panels showing the difference between the frequencies through the years for cholera, influenza, smallpox and plague from newspapers (left) and books (right). Note the more prominent peak for influenza in the newspaper corpus in 1918.

That same article also gives a link where you can download a huge file giving the year-by-year frequency of occurrence of different Ngrams from the newspapers. It's a computer challenge. not for those unprepared to wrangle large data files. I've been experimenting with it. A graph produced for cholera, influenza and cancer is appended.

More later if and when time permits.

It should not go unremarked that there is no possibility of performing a similar analysis on Canadian newspapers lacking any national newspaper digitization program.

More resources from

History & Genealogy Books...

Printed and digital history and genealogy books from a wide selection of publishers, including Global Heritage Press. Browse by country, location or topic.

Document & Artifact Preservation Products

Acid-free storage and display products to preserve and safely store your family heirloom documents and artifacts.

Family Tree Charts & Forms

Poster-size blank family tree charts and foldable large charts for storage in binders.

© Inc. 1992-2018
Sign up for our free newsletter!   |   Unsubscribe from our newsletter

The Merivale Cemeteries
(Protestant - Ottawa area)