Monday, 23 April 2012

Digital Shakespeares 5 and HBWS12

This is the final post in the series exploring the databases containing Shakespearean texts. From Stoppard I have learned that “there is an art in delay.” In this series of posts dealing with digital databases of Shakespearean texts I have constantly postponed revealing the collection of these databases. I have done this through introducing the topic and then for four posts I posted a list of criteria that I think helps to assess digital databases. Originally I thought it would be enough to post the sixteen questions I found relevant in meditating about databases, but then realized that these criteria formulated as questions without explanation would be less beneficial, so I pasted a paragraph-long explanation to each of the questions. Last week having finished the posting of these questions, I had to admit that the delay is not righteous any longer. So this time, I should present the list of databases on the one hand.

On the other hand this post is not just a post directing attention to databases that might come in handy when doing some research on Shakespeare, but also a contribution to another project, i.e. the celebration of Shakespeare’s 448th birthday. The Happy Birthday Shakespeare website can be found here. This is not the first time that a blog post functions as a gift to the long dead and still living Bard. Last year I wrote up a post in the same project about the given theme: “How did Shakespeare shape my life, my intellectual life?” That said it may be clear that this year if I intend to take part in this festive event again, I cannot retell the same story. Of course, hermeneutics would remind me that a year later—having changed (hopefully for the best)—the same story would not, could not be the same, yet I think this year I should do something else. So this year, as I guess Shakespeare would be interested in what happened to his texts, I present him and anybody else interested in this, the list of databases that contain Shakespeare’s texts.

So this time, both as a gift and a conclusion to my previous posts I am going to lists databases, not unexpectedly in an indirect way, making the experience interactive. There is a simple way for whoever is interested in this list, as following the link to my Delicious stack, “Databases of Shakespearean texts” one may well go to the list directly, and check out the items immediately without reading the rest of this post. Those, however, who would like to stay here for longer, I shall give some explanation on how these otherwise different types of databases can be classified as databases. I am quite sure that a lot of databases have been left out, but as I promised it in the introductory post, I have only dealt with databases that have some either institutional basis, or scholarly references or both.

There are seven ways the individual databases can be classified. Some of the databases can be downloaded, or at least the text analysis software, such as WordHoard or WordCruncher, the rest of the databases can be used via a web browser. Most of the databases are dedicated to Shakespeare studies, while two of them are rather text analysis tools demonstrating their power on the Shakespearean corpus, i.e. WordCruncher and Wolfram|Alpha. Most of the databases are Open Access but some are massively behind the pay-wall, such as Gale Catalog: The Shakespeare Collection, XMAS, and one project though not behind the pay-wall yet it needs a password which may or may not be granted is The Shakespeare Electronic Archive. Most of the databases are dedicated to Shakespeare, while there are two that include texts by Shakespeare and many others as well: Project Gutenberg, The Internet Archive. Most of the databases include a text analysis tool, but there are a few that only contain digital texts, such as The Project Gutenberg, the Internet Archive, the Shakespeare Quartos Archive, the Shakespeare in Quarto, etc. Some of the databases deploy either an unreliable corpus or a somewhat questionable one from a strictly philological point of view, while some others use either the digital versions of reliable early prints (Shakespeare Quartos, Shakespeare in Quarto), or even modern critical editions (Internet Shakespeare Editions, The Shakespeare Electronic Archive). Most of the databases are device independent, while there is at least one that has been built only for the iPad: Shakespeare's The Tempest for iPad.

The lines of this classification create a rather complicated matrix upon which the individual databases can be located. This complexity is both an advantage and a disadvantage. It is an advantage as it demonstrates the interest in Shakespeare in the digital space, that scholars use digital technology in studying and thus representing the Bard’s texts in the 21st century in a great number of ways and modes. But this variety also demonstrates that enthusiasm towards digital scholarship is also dispersed, funds are scattered instead of uniting forces and resources to create a database that would be equally useful and beneficial for a variety of scholarly approaches, number of levels of interest from the scholarly to the general. Do you like this, Will? Anyway, I wish you a happy birthday in the heavenly theatre with this multifocal symphony of textual databases.

PS. The advantage of checking my Delicious stack is that it may well be improved in the long run. I can imagine, however, that somebody would like to see the list here as well, so here it is:

3.      Hamlet Works
4.      Internet Archive
6.      MONK Project
8.      Open Shakespeare
26.  WordCruncher
27.  Wordhoard
28.  XMAS 3.1

Thursday, 12 April 2012

Digital Shakespeares: Features of a Database 4

This post is number five in the series of posts dealing with working out a possible methodology for assessing and accounting for databases containing Shakespearean texts. After an introductory post four other ones have been dedicated to listing and explaining, contextualizing questions that might come in handy when pondering about these databases. So far areas of basic facts, transparency and flexibility were covered in the first three posts, and now, as I have promised I am going to meditate and present questions pertaining to what I would like to term as “interdisciplinary openness.”

Most of the databases reduce texts to their linguistic aspect. Queries focus on words, strings of words, linguistic units, grammatical units and verbal statistics. They can also visualize tendencies, create diagrams in a variety of formats about the linguistic construction of the text. All this is fine, as most of the time when reading a Shakespearean play the reader will be interested in the ways a text communicates its layers of meaning through verbal means. There has been, however, a tendency in scholarly circles claiming in a great number of ways that a text does not only reveal layers of meaning via its linguistic construction but that meaning is also a social construct embedded in the material ways a text functions in the world.  So, scholars claim that bibliographical data from the date of publication to publisher, from the typeset to the type of paper, from decoration to page size play their part in the process of constituting meaning. Here, a long list of authors, theoretical and pragmatic may be presented from David Scott Kastan to John N. King, from Woudhuysen to McGann, from Shillingsburg to Hayles, from Marshall McLuhan to Andrew Murphy to mention a few authorities in the field. It is beneficial if a database allows for research other than ones pertaining to the linguistic aspect. The next three questions, thus, explore ways in which a database may cater for interests in aspects other than the linguistic one.

  1. Format of the digital text (txt, xml, jpg, tiff etc.)

Interdisciplinary research presupposes the complexity of possible questions to be asked, and this complexity can only be provided through presenting the texts in a variety of formats. Sometimes the best choice is to have a rather unmarked list of words, e.g. in a txt file, this is sufficient and even more fruitful for some queries, especially when it is not clear how the file is read by a text analysis tool. For another set of questions encoding is needed, say for tokenised or lemmatised queries, other times it is the best if there are images only that may be analyzed in ways unimaginable before. It is the format of the file that enables these differing approaches, so it is fine if the same text is accessible in a variety of formats.

  1. Is it the linguistic, digital or bibliographic aspect that is emphasized?
The linguistic aspect refers to the language, linguistic elements of the digital text. The bibliographical aspect refers to the material aspect, but in this very case, this does not define the digital text, as  digital, but as an outcome of the visual aspect of some original printed material. The digital aspect refers to the computational coding of a text that enables the visual aspect and also the searchable quality of these texts. It is clear that builders of databases have to decide on what they intend to achieve. Unfortunately there is no such database that would/could lay equal emphasis on every aspect of a digital text. Databases vary among paying special attention to the text as a linguistic unit, or to the text as a deeply encoded entity that allows for complex and intelligent queries, or to aspects that are relevant for the historian of the book.

  1. Which aspect of the text is open to queries?
If it is possible to present the text in a variety of formats, thus a variety of disciplinary approaches may be occasioned within the database. If this is so, it is also relevant which aspect of the text is open to queries, as it is a query that makes computer enabled research fruitful. It is the query that makes research faster and more accurate, so it is great if the image file is there that enables research related to the history of the book, but if this aspect of the text is not open to queries, computation is like a disabled giant: it is there but the scholar cannot make use of the power of computer technology. The Text Encoding Initiative enables marking up a text for queries about the visual aspect of a work, and there are even free image mark-up tools, so technologically it is not impossible to prepare a database in which the bibliographical code is open to queries.

* * *

This time, thus, we have seen the remaining three criteria for assessing a database. These questions covered practically an area that I have labeled as “interdisciplinary openness.” The interdisciplinarity of a database manifests itself in the variety of formats of the files, the types of queries that a user may conduct. Naturally, these criteria may or may not be true for each and every database and can only be used as a means of orientation. So neither these three criteria nor the other thirteen should be thought of as complete and compelling ones, but rather as means to be able to discuss critically a database or databases. What follows form this is that a positive assessment does not necessarily mean that one can give the highest possible scores for each and every criterion, as it can easily happen that a database can fruitfully be used even though reviewing it with the help of the above sixteen criteria should suggest that the database is less good. Assessment at its best relies on criteria relevant to the individual database. Having thus finished the meditation about the criteria of assessment, next time I shall start a new series of posts exploring databases one by one.

Monday, 2 April 2012

Digital Shakespeares: Features of a Database 3

This is the last but one post in the series “Digital Shakespeares: Features of a Database. The previous posts presented and explained the first eight questions of the list that I used when assessing databases containing Shakespeare’s texts. The first eight questions explored some basic facts and the documentation of the database. This time the focus will be on another aspect that I label as flexibility. This is an important aspect, as it makes a database more usable if it can be bent to the researchers’ expectations and interests. Before this larger area of questions there are two extra ones that pertain to the ease of the usage of a database.

  •  9.  Is the interface clear and logical?

    The question about whether the interface is clear and logical does not invite an answer in a form of a subjective aesthetic judgement, but rather reflection about the pragmatic aspect of the interface. What I am interested in here is whether one could without much thinking and many mistaken steps navigate from one action to another with relative ease. Nevertheless, I am aware that this feature of a database is a rather subjective one, as something that seems illogical and complicated for one user may well be straightforward and simple for another. Yet hopefully the response to this question will not reflect on the interface in isolation, but will keep an eye on other databases and even other applications, and then subjectivity can be avoided via experience and comparison.

    1. Is it possible to create a researchers room?
    A researcher “room” is a handy opportunity if the database is an online one. It seems handy if one can stop working whenever it is necessary without losing the findings of the then current research, and can continue working when it is possible again. This feature is also important as this may be the cyber-spatial “room” where one may share the results with colleagues and may expect some reaction from them to her/his work. A researcher “room” can be a place that anybody can, may customize to her/his expectations, work-method and needs, can leave notes and reflections on where one is in the process of research.

    The theoretical problem that is addressed by the following questions seems to be the following. A database most of the time is built for one type of research, which is no problem as how can one foresee what other researchers would like to do with a particular database. One may well argue that the virtue of a database is that it does what it promises in the best way, and I agree with this argument. An equally powerful claim could be, however, that if a database is tuned for only one type of research, naturally the one that best suits the builder, then why and how could it be used by other researchers with either slightly, or completely different purposes? So in this Kantian or Pyrrhonian situation, where there are two equally powerful claims in opposition, I would like to vote for some sort of a flexibility providing more opportunities than the ones envisioned by the builders. I can imagine that a database that can be adapted to a variety of purposes will be the one that will attract researchers’ attention.

    1. Can the digital text be downloaded?
    Sometimes it seems beneficial to be able to download the text that one works with. This adds to the usability of a database, as it can easily happen that the analytical tools of a database do not harmonize completely with the needs of a researcher. It is then beneficial if the text, or texts can be downloaded and fed into another search engine. This may well be the case with absolutely cleansed texts to be used with independent text analysis tools, or with deeply marked-up texts, when the mark-up is deeper than what the facilities of the database allow to explore. In this latter case it is also possible that queries tuned for specific aspects can be executed elsewhere than within the database.

    1. Can the results of the query be saved, downloaded?
    It may well be fruitful if the findings can be saved and downloaded to be deployed elsewhere than within the application. This may be appropriate if results in one database are to be compared with the findings in another one, or if to be arranged in another way than what is occasioned by an application. A third scenario when saving, downloading is fruitful may be when one intends to insert, or copy-paste the results of the query into an article, paper, blogpost. (Only between round brackets do I dare to insert here, that as a Zotero fan, it would be nice if a database could be linked to Zotero, and then referencing would be a matter of clicking here and there. I am aware that this is only the lazy researchers dream…)

    1. Is the source-code open, i.e. can the search tools be modified?
    This attribute is something that is both beneficial and nice. It is beneficial because the tools may be tuned for the analysis of texts from another database without starting the building of the search-tool from nothing. Naturally it can happen that it is easier to start from nothing, but it can happen as well that coding means just fine-tuning. The open-source code is nice too, as it tells the user that the builder trusts his/her users, shares with them everything, admits that the application can be developed, used elsewhere and in other ways than first envisioned.

    To sum up, this time I pondered about the features of a database that I labelled “flexibility.” Flexibility of a database lies in whether a researcher can or cannot adapt the texts included in the database, the analytical tools to her / his needs. Flexibility is not only important because the database then will be one that may serve a variety of purposes but also because this way it will attract more users. Having, thus, accounted for this feature of a database what remain for the next post are the attributes that I classify as “interdisciplinary openness.”