One of the concepts I found most compelling in this week’s readings is the idea, expressed by Clay Shirky in both his book “Here Comes Everybody” and his Smithsonian webcast, that failure now comes at a lower price than ever before. With the rise of the Internet and, more recently, the proliferation of user-friendly programs and software which non-computer-programmers can handle and even master with ease, publishing one’s work in a digital environment becomes a matter of “publish, then filter” rather than the traditional reverse.
I couldn’t help but think about our own projects while reading and listening to his ideas. I’m sure we are all proud of our research and our work and are eager to demonstrate what we know to the public. Before the internet (or, really, before web-designing software made it easy for beginners to make their own sites) how many of us could have published what we’re currently working on? Would a book publisher have risked money and reputation printing the work of a student with few credentials and no previous publishing experience? And yet, here we are about to make our work public with no risk or financial loss.
It is this notion that binds this week’s readings together. If more risk were involved with this type of production, Nina Simon’s belief in museum interactivity (with guidelines as she proposes) would be nearly impossible to carry out. It would just be too financially risky an endeavor to offer museum visitors the chance to shape the exhibits themselves, rather than remaining passive visitors but ones who pose no potential loss to the museum. The same is true for Wikipedia. If the cost of digital production was not so low and the notion of “publish, then filter” did not exist for things created digitally, then a digital encyclopedia relying on contributions from amateurs and non-scholars would be too great a risk to attempt (although as Roy Rosenzweig points out, Wikipedia critics might argue that the risk involved in Wikipedia’s case is not as much financial as it is a loss in true education and knowledge).
Some thoughts, rather than a tech question this week…
One of the things that struck me in this week’s readings was the notion of scanning versus full-text reading. Writing for the Web’s authors emphasized the tendency of computer users to scan webpages and Web-based writing rather than read the text in full. I agree that most of us likely scan a page before investing time reading it, but I am shocked to discover that only 16% of Web users read the entire text. Then again, I believe this number reflects a study done on only two websites, both affiliated with the Sun Science office which may not require a full-text read.
I agree with the authors’ suggestions on improved navigation and site search by including keywords, headers, lists, image captions, and embedded meta-tags. However, I fear that their emphasis on text reduction contradicts much of what Rosenzweig and Cohen write in Collecting History Online and may even prevent potential Web contributors from sharing their own stories digitally. A potential contributor to a website may be disillusioned by the notion of Web 2.0 if asked to limit their text because studies show that people generally read less online than in print. Such contributors may decide that to abridge their story loses its value and may opt not to contribute at all.
I argue that different types of sites attract audiences more willing to read word-for-word. I think it is important to keep the suggestions offered by Writing for the Web in mind, but not to lose sight of the mission of each individual website. Often that mission cannot be achieved without lengthy contributions from site visitors and serves little purpose if not read in full by its audience.
In her chapter on Preservation, Abby Smith discusses the idea of emulation as a form of preservation. She mentions that while it is possible to emulate “retrospectively” as has been done with computer games designed years ago for now obsolete systems, it has not yet been possible to emulate “prospectively.” I’m very confused about this concept. How would one possibly design a method to emulate software/programs on machines that have yet to be created? Is this something that is actually in the works?
On a related note, something that really struck me was the reality that proprietary software is governed with such a sense of security that it makes it nearly impossible to guarantee its preservation in the long run. Keeping the source code private prevents the dissemination of enough documentation to allow for emulation later on. I think an agreement, somewhat like a will, whereby the proprietors agree to transfer the source code to another party if the company folds, terminates their product, etc. is a clever way of preventing a situation where software and items using such software are no longer accessible once the proprietors are no longer exercising control over their product. Is this happening regularly? I wonder how often software is left to rot with no ability to preserve the items which rely on it.
Below is the link to my online archive. I’m having some trouble changing my theme so it is currently still in the default theme. My goal is to digitize as much of the collection of personal papers of Mary Ann Dickinson Smith as is possible given the time. Mary, the wife of Senator Truman Smith of Connecticut, was a copious letter writer. In addition to writing daily, she saved what seems like most, if not all of her correspondence. Remarkably, much of the correspondence she wrote herself, especially those letters sent to her husband and her sons, was also saved and is included in the collection as well. Mary’s letters illustrate the intimacies of daily life in the mid 19th century, while allowing a contemporary reader a window into a world long since gone.
After reading through the explanations of the Dublin Core elements, I’m left with a few questions. If we are to rely on the Library of Congress Subject headings and a controlled vocabulary for the Subject field, should we do the same for the Creator field? If the creator of our item is listed in the Library of Congress Subject Headings/Name Authorities, is the protocol to list the creator of the item as it is recorded by the Library of Congress, or does it make more sense to write out the name in a more informal “first name last name” format without the extra elements added in the Name Authorities. The Dublin Core elements page does not specify this.
I am hoping to model my online archive after the Seeking Michigan site, and, more specifically, their Civil War Manuscript collection online. In a Tags field for each item, Seeking Michigan provides numerous variations in spelling/format of the same name. I imagine some of the names might have been pulled from the official Library of Congress Name Authorities and others are informal tags. Does it make sense to include variant spellings and formats of the same name? And would we include these as tags or under the Dublin Core Subject field?
Is there a controlled vocabulary for the Source field or is it up to us to write it as we think most appropriately fits? The Dublin Core description does not specify an official way of formatting the Source note.
For the Relation field, it seems like there are numerous qualifiers. How do I represent these in the Relation field provided by Omeka? It doesn’t seem like Omeka offers an option for the Qualified Dublin Core. If I am archiving letters written to and from Civil War soldiers all found within the same collection and no where else, is the Relation field even necessary? I’m at a loss as to what I would include in this field.
And for the Format field, the Dublin Core description mentions including physical size. Would it have made sense for me to have measured the actual size of the paper on which each letter was written?
Using the Seeking Michigan site as a model, in the Format field they use the term “Document” and in the Type field they use the term “Correspondence.” I think I am still confused as to exactly how to differentiate the two and how to describe each. According to the DCMI Type Vocabulary list, correspondence is a form of “text” but not its own element term. Perhaps I’m getting too bogged down on the details.
Some of the tagging comments posted for today made me realize I had another question, this time related to tagging. Both Everything is Miscellaneous and the YouTube video emphasize the importance of tagging to help facilitate finding what we need in a digital environment where we can’t rely on a traditional chronological, alphabetic, or a similarly structured hierarchy. Weinberger describes, for example, various efforts undertaken to record every known name for every species (All Species Foundation project, UBio project, and others), including both scientific and common names, and variations of those names based on geography. With this in mind, should we be focusing less on Library of Congress subject headings and more on our own social tagging of our items, or should it be roughly equal for both? I wonder whether our items will be more accessible with the non-standard tags we create than with the official Library of Congress subject headings.
I’m not sure if we’ve discussed this in class or if perhaps this is something we will be discussing this week, but I’m a bit confused as to what differentiates the Dublin Core metadata fields from the Item Type Metadata list. And should the Type field in the Item Type Metadata match the Type field in the Dublin Core metadata list? In the Dublin Core list it seemed like Type was looking for the genre of the item itself (in this case, correspondence) but in the Item Type metadata list it appears as though it’s looking for the format/type of object.
Also, in the Item Type metadata list, what is the Text box traditionally used for? Is this where we add a transcription, if we choose to do so? If not, where would a transcription go?
I too am having trouble uploading my images. They are rather large TIFF files (I’m assuming this is why they won’t upload). However, I thought when Omeka receives a TIFF file it creates additional JPEG versions of the same image (so I wanted to give this a try). I don’t think I want to provide TIFF files on the site anyways, so I am going to try uploading JPEG versions instead.
My goal for our Creating Digital History project is twofold: I am scanning in letters from a collection housed at the Stamford Historical Society in order to introduce the public to this particular collection and make the letters accessible for those conducting research online. At the same time, though, I would like to provide archival quality digital copies of these letters for the historical society as backup to the original analog forms. I want to maintain “fidelity” to the original letters. Most of the letters with which I am working were written in black ink (though some are in pencil and a few in blue ink) on white or light blue paper and stationery.
In their discussions of digitizing texts, all four authors of this week’s readings agree that for displaying images of scanned textual items, a dpi of 300 or above is generally the best practice. Cohen and Rosenzweig suggest that for displaying images a scan should be set at 24-bit color (but for OCR only 1-bit is necessary). Will this provide a true archival scan of the letters? If no, what will? Cohen and Rosenzweig claim that even inexpensive scanners can scan at these settings. I went to a tutorial for archival scanning in the preservation department and remember a color calibration taking place though I don’t remember the details. Is this necessary for creating a digital archival master? Or can this step be bypassed if the scanner does not have the capability?
On another note, is the Do History site (http://dohistory.org/diary/index.html) which uses Laurel Thatcher Ulrich’s book, A Midwife’s Tale as a case study an example of rekeying the text of the scanned image and attaching that text to the digital facsimile? The site provides images of the diary pages but also allows for a keyword/date search through the entire text. I am assuming that OCR would not work for such a document because both articles/chapters we read for this week stress that OCR does not work with handwritten documents.
One of the things that struck me while completing this week’s readings was the contrasting viewpoints of appropriate text length for a website. Krug argues that viewers rarely actually read the text, but rather scan the webpage for salient material. A website’s audience will likely plow forward, interacting with the site without acquainting themselves with its content, design, or function beforehand. To compensate for this, a site’s organization must be obvious (aka less text), catering to the majority of viewers who will not take the time to read. The less thinking required the more successful a website.
Cohen and Rosenzweig, on the other hand, reject this notion of text reduction. Unlike those in the usability camp, they champion the use of longer passages of text, especially in academic/historical websites whose content might lose some of its potency and value if condensed.
Reading these contrasting viewpoints made me wonder whether, in time, those like Krug might reconsider the necessity of reduced text, of eliminating introductory passages and instructional content (Krug’s examples). As more and more people become familiar with and even begin to rely on digital reading (Kindle, online newspaper subscriptions…) perhaps the fear of losing an audience to extended text might wane?
On a completely different and more technical note, in describing the award-winning Mesoamerican Ballgame website, Cohen and Rosenzweig mention formatting the contents of a website according to percentages rather than pixels. Does Omeka give us this choice? And which is recommended for maintaining a consistent appearance regardless of browser?