FindMyPast Yearly N-grams and Entities Dataset

Alternative title Secondary Data for Content Analysis of 150 Years of British Periodicals
Creator(s) The FindMyPast Newspaper Team, Nello Cristianini, Saatviga Sudhahar, Thomas Lansdall-Welfare
Funder European Research Council
Contributor(s) James Thompson, Justin Lewis
Publication date 19 Dec 2016
Language eng
Publisher University of Bristol
Licence Non-Commercial Government License for public sector information
DOI 10.5523/bris.dobuvuu00mh51q773bo8ybkdz
Complete download (zip) http://data.bris.ac.uk/datasets/tar/dobuvuu00mh51q773bo8ybkdz.zip
Citation The FindMyPast Newspaper Team, Nello Cristianini, Saatviga Sudhahar, Thomas Lansdall-Welfare (2016): FindMyPast Yearly N-grams and Entities Dataset. http://dx.doi.org/10.5523/bris.dobuvuu00mh51q773bo8ybkdz
Total size 3.1 GiB

This dataset is the FindMyPast Yearly N-grams and Entities dataset. It contains the secondary data for the paper "Content Analysis of 150 Years of British Periodicals". It contains the yearly time series for the 1,000,000 most frequent 1-, 2-, and 3-grams from the corpus described in the paper, the yearly time series for the 100,000 most frequent named entities linked to Wikipedia and the list of articles and newspapers used from FindMyPast in the study.

When using this data, please cite:

Lansdall-Welfare, T. et al. (2016). Content Analysis of 150 Years of British Periodicals. In: Proceedings of the National Academy of Sciences of the United States of America.

Data Resources