This project has two primary interrelated goals:

  • To try to better understand if using digital methods associated with “distant reading” (particularly topic modeling) to examine general academic journals in Jewish studies can help to reveal the “state of the field”;
  • To develop a workflow and toolkit for working with data provided from  Data for Research (a JSTOR initiative) that could be more generally applied.

The project began in a course taught by Michael Satlow at Brown University entitled “Introduction to Digital Humanities.”  One of the presentations made in the class by Ashley Champagne of the Center for Digital Scholarship (a division of the Brown University Library) was a topic model by Andrew Goldstone of the PMLA.  Satlow recruited a student from that class, Alexander Berry, as a collaborator, particularly on the technical side.  The original idea was simply to acquire the data from JSTOR – using AJS Review, the journal of the Association for Jewish Studies – and plug it into Goldstone’s code, which he made publicly available.  AJS Review was a good candidate for this kind of project both because it is generally devoted to Jewish studies but also because it has had a relatively limited print run, allowing us to develop meaningful techniques on a reasonably sized dataset.  But nothing is ever that easy, is it?

Since Goldstone’s project, the format of the data provided by Data for Research changed.  That, along with the desire to experiment with different kinds of analyses and visualizations of this data, prompted us to develop our own code (often using as a base existing and freely available code in Python, that we attempt to acknowledge) and workflow.  All of our code can be freely accessed on our Github site.

We developed four kinds of visualizations:

  • Gender.  We were first interested in visually representing the gender of the authors of research articles and of book reviews through time.  For this, we automatically extracted the names and ran a gender detection routine on the surnames.  We then manually cleaned and corrected the results;
  • Trends.  As a first attempt at distantly reading the articles, we used sparklines to get a sense of the trends of the most important words over time;
  • Topic Models.  Topic models are digital representations of the “topics” of a text, produced through the mathematical analysis of word frequency and proximity.  We developed several different views of the topics in AJS Review, some of which are based on segmenting the run into different “bins” by year and type and others by tracing topics through time.  For more information on our topic models and their development, see this page.
  • Citation Analysis.  We were interested on whether one can chart the citation webs of who is citing whom.  We have tried this on the level of authors and individual works.

Data

Our analyses were only as good as our data.  The data we received from JSTOR was generally good enough to work with, but it had three important limitations.  First, due to the moving paywall we could get this data only for publications up until 2014, so our analyses do not extend beyond that date.  Second, many of the text files of the articles, particularly the earlier ones, were produced through an imperfect OCR that have mistakes, particularly of hyphenated words at the end of lines.  Third, and specific to AJS Review (although similar issues are found in many other journals), there is a mix of languages, with some articles in Hebrew.  Since there are too few articles in Hebrew to produce a reliable topic model, we excluded these from most of our analyses.

It is also important to take into account, especially in the trends and topic models, our list of stopwords – those words that we excluded because they are not informative.  These include general stopwords (e.g., “a”, “the”, “in”) as well as words that skewed our dataset (e.g., “Jews”, “Jewish”, “New York” in the footnotes, which are incorporated into the OCR files).  Before making any conclusions about missing words, those stopwords (found in the program files in Github) should be consulted.

Copyright and Citation

All visualizations here are licensed under the CC-BY-4.0.  Code is licensed with the MIT License found in our Github site.

Principal Investigators

Michael Satlow, Professor of Religious Studies and Judaic Studies, Brown University.  Satlow specializes in early Judaism.

Alexander Berry, ScM in Data Science candidate, Brown University.

You can contact us by email (available from the Brown University search page)

 

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
css.php
0
Would love your thoughts, please comment.x
()
x