»
S
I
D
E
B
A
R
«
Week 9: Data and Metadata
Nov 4th, 2009 by Amanda French

First question: What was it like trying to describe your object?

Second question: What one thing did you take away from the David Weinberger book?

Main lesson for today: It is important to be able to share data and metadata, whether it’s first-level, second-level, or third-level (according to Weinberger’s typology).

Examples of XML (used for text, both data and metadata):

Examples of databases:

Data and metadata sharing:

  • HousingMaps.com is a “mashup” of Google Maps and Craigslist.
  • The Open Archives Initiative sets standards for sharing archives data.
  • OAIster is a database that allows you to find digital archival collections that share their metadata. (now available through WorldCat!)
  • The Omeka plugin OAI-PMH Harvester allows you to automatically import lots of metadata records at a time.

Note that someday, because your bibliographies are in shareable data form, someone might be able to reuse them.

Week 8: Digitization
Oct 28th, 2009 by Amanda French

WEEK 8: DIGITIZATION

Here are some basic terms and concepts that you need to be aware of in digitization. Parts of this post are adapted from the University of Virginia’s Electronic Text Center Scanning Help Sheets.

Note: The best place to go for help with digitization is the Digital Studio on the 2nd floor of Bobst Library, where they have many “digital authoring” tools and where they will help you learn the software and make decisions about standards.

Image Scanning

Software

The gold standard of image capture and editing software is Photoshop. However, there are other image editing programs that can do many of the most common image editing tasks. You might want to download and try these:

* Paint.net for Windows
* Seashore for Mac
* Gimp for Mac

There’s also a “light” (and cheaper) version of Photoshop called Photoshop Elements or Photoshop LE.

Pixels, DPI, and Resolution

A “pixel,” or “dot,” is the atom of a digital image. The resolution of a digital image is measured in dots per inch (dpi). If you keep zooming in on a digital image, for instance, the image will become “pixelated,” meaning that you can see the individual pixels.

pixelated

The NINCH Guide to Good Practice recommends a minimum of 300 dpi for image scanning. At this resolution, images can be printed with little loss of quality. If you have enough space, you may want to scan at 600dpi instead. Serious preservation scanning is done at 1200 dpi. Computer monitors display images at a resolution of 72 dpi, so an image that will only be displayed on screen (not printed) need not have a resolution any higher than that.

Color

Images can be scanned with four basic levels of color information:

* 1-bit black and white — each dot can be either black or white; used for cartoons, line drawings, and to increase contrast on images of text.

* 8-bit greyscale — each dot can be one of 256 grey shades; used for ordinary (non-archival) scanning of what we normally call “black and white” images.

* 8-bit color — each dot can be one of 256 colors; 8-bit color can look a little grainy at times; it’s used mainly for color cartoons and “clip art.”

* 24-bit color — each dot can be one of 16.8 million colors; best used for photographs. There’s even 48-bit color, now, but many programs will not support it.

File Formats, Loss, Losslessness, and Compression

* TIFF — a lossless (uncompressed) format that works on all platforms. The TIFF file has long-term archival use, but is usually too big to put on a web site.

* PNG — a compressed format that works on all platforms; the best one for web delivery.

* JPG or JPEG — a highly compressed format similar to PNG.

* GIFs — a moderately compressed format that is only suitable for 8-bit or 1-bit color images (i.e., cartoons and logos).

Text Scanning

Optical Character Recognition (OCR)

To perform “OCR” on an image is to transform a picture of text into editable, searchable, manipulatable text. One piece of software that can do this is called “FineReader,” made by a company called ABBYY.

The books available on Google Book Search have been OCR’d. It can be instructive to view the “plain text” behind the page images, as in this page from the introduction to a version of Jane Eyre. There are some small errors; such errors are inevitably greater when the page is stained or faded, or when the font is unusual.

As the University of Virginia’s Electronic Text Center advises, “even with clean text of a decent type size there will be occasional errors; this error rate increases as the text’s size and clarity decreases. Altering the brightness and resolution can improve results, but little can be done with a badly faded photocopy or a 17th or 18th century typeface.” When it is important to treat a text *as* text, and not simply as images, many prefer to outsource the work of retyping the text in by hand. There are firms that will do this that guarantee a 99.99% rate of accuracy.

Audio and Video Digitization

Note: Audio and video are sometimes called “time-based media,” for reasons that are obvious if you think about them — audio and video happen over time, whereas images and text are “still.” Many of the issues for digitizing audio and video are therefore the same.

Software

One of the best audio editing programs is a free, open-source program called Audacity. You can use this to manage digital audio files. The Digital Studio also has ProTools, a more advanced program.

There are still not many very good free video editing and conversion programs. The Digital Studio has both iMovie (which is simpler) and FinalCut Pro (which is more difficult).

Sample rate and bit rate

“Sample rate” and “bit rate” in audio s analagous to “resolution” for images: they measure of how much information the computer has captured. If you would like to read more about what exactly sample rates and bit rates are, check the NINCH Guide’s section on Audio and Video Management. The basic thing you need to know, however, is that audio should be sampled at a rate of at least 44.1 kHz, which is the same rate as a CD; 96 kHz is the new professional archival standard. The archival bit rate is 24-bit, although many projects have opted to use 16-bit.

Video digitization is more complicated. There is a good tutorial about archival digitization of video at the University of Texas’s Information School.

Format

Just as with images, audio and video digital files can be saved in formats that are either “lossy” (meaning that they are compressed so that the file is smaller) or “lossless” (meaning that no information has been lost).

* mp3 — The mp3 is the most compressed audio format, and therefore the one that is most common on the Internet. It is not suitable for archival purposes.

* aiff — Lossless audio format owned by Apple.

* wav — Lossless audio/video format owned by Microsoft; the most common for archival audio.

* mpeg — The most common video format, used for both preservation and access.

Other

The Digital Studio has a slide scanner that they can teach you to use, if necessary. The Microfilm department at NYU’s Bobst can also help you get a digital copy of a microfilm, if necessary.

HTML and Installing Omeka
Oct 21st, 2009 by Amanda French

Note — be sure to see the Glossary for a refresher on terms such as “client” and “server.”

Preparatory exercises — Dreamhost and FTP

  1. Write down your domain name, your Dreamhost Web ID, and your Dreamhost password on the sheet provided.
  2. Watch me while I log in to my Dreamhost account and create a new FTP user. Then log in to your own Dreamhost account at http://panel.dreamhost.com and create a new FTP user. Write down the FTP user name and password on the sheet provided.
  3. Watch me while I create a database. Then create a new database yourself and write down the database name, database hostname, database username, and database password on the sheet provided.
  4. Watch me while I open my FTP program and connect to the aphdigital.org server. Then open your own and connect to your server. Make sure that the FTP program is set to “Show Hidden Files” — in FileZilla, this is in the menu Server –> Force showing hidden files.
  5. Find the folder named “omeka-1.1″ (NOT the file named “omeka-1.1.zip”). Rename it “projectt.” Upload it to your server. This will take about 40-50 minutes to upload; in the meantime, we will learn to write a web page.

Exercises — HTML

  1. Go to http://www.westland.net/coneyisland/. Right-click on the image and choose “View Image.” Note that the image is a file named “titlepic.jpg” which has its own URL.
  2. Go to http://www.westland.net/coneyisland/. Click on the File menu in your browser, and choose to save the page as “Web Page, Complete” to your Desktop. Notice that what you have is a file called “Coney Island History Web Site.htm” and a folder with several items in it, two of which are .jpg images.
  3. Close your browser. Now open it again. Go to the File menu and choose “Open File.” Navigate to the downloaded file named “Coney Island History Web Site.htm” and open it. Notice that what appears looks exactly like the web site, except that your browser’s address bar is prefaced by file:// instead of by http://.
  4. In your browser, go to View –> Page Source. What you see there is simple HTML. You can view and copy any HTML code on the internet this way.
  5. Create a folder (aka directory) in your My Documents folder and call it “website.” Create a folder (aka directory) inside that folder and call it “images.”
  6. Go to Start –> Accessories –> Notepad (or Applications –> TextEdit, on a Mac) and open a new file. Save it in the folder “website” and call it “index.html.” Copy the image file titled “titlepic.jpg” from the Coney Island website to your own “images” directory.
  7. Follow along with me as I write some basic HTML. Put the following on your page:
    1. A paragraph of text
    2. An image (use the one titled “titlepic.jpg” that we downloaded)
    3. A link to another website
  8. Open the file “index.html” in your browser. Make some changes to the paragraph of text in your text editor, then refresh (reload) the page in your browser. Notice the changes. Then, move the image file out of the images folder and hit refresh. What happens and why?
  9. Rename the “images” folder to “pictures.” Put the image back in it. What happens and why? How can you get the site to work again?
  10. Spend at least 10 minutes making changes to your web page. Ask me if you have any questions. Refer to the HTML, XHTML, and CSS book or the W3Schools HTML Tutorial for further help.

Exercises — Installing Omeka

  1. Once the folder named “project” (which is really the Omeka software) has finished uploading to the server, go to the page http://omeka.org/codex/Installation and follow along with me as we go through each step.
  2. Watch me while I log in to the Omeka administrative panel and choose a theme. Choose different themes yourself and see which ones you like.
  3. Once the Omeka software is installed, watch me while I download and install a theme from http://omeka.org/add-ons/themes. Download and install an extra theme yourself. Do these themes seem to meet Krug’s criteria for usability? What do you make of the fact that Omeka was developed by Dan Cohen’s minions — does it match what his expressed theories of design are in the chapter from Digital History?
  4. Watch me while I write some HTML in the “Simple Pages” part of Omeka. Write a simple page yourself.
  5. For the rest of the class period, PLAY! Play with the software you’ve installed, and see if you understand it. Raise your hand if you have any questions, and I will come around to answer them.
  6. In future classes, we will go over more carefully how to use Omeka: in particular, how to describe the items you upload to your archive and how to create an exhibit.
  7. When you have a question or a problem, please check first in the support documentation for Omeka at http://omeka.org/documentation. There are a great many very well-written documents and tutorials there, including video screncasts — thank heaven. There are also a set of active and well-monitored forums where you can post questions. These should be your first recourse, but if you still have trouble, please please please do send your question to the class at creating-digital-history@lists.nyu.edu


Preparing for class this Wednesday
Oct 20th, 2009 by Amanda French

In preparation for class this Wednesday, please download the following:

1) The FTP (File Transfer Protocol) program Filezilla, available at http://filezilla-project.org/ (download the Client version, not the Server version) or from http://download.cnet.com/FileZilla/3000-2160_4-10308966.html?tag=mncol This program can be installed on either Windows or Mac OS X. Please also install this program after you download it; here’s a quick guide to downloading and installing free software that I wrote for you; go through it carefully if you’re at all unclear on the process.

2) The Omeka 1.1 software package, available at http://omeka.org. We will install this software on your servers during class. Be sure to unzip the .zip file, and be sure you know where the resulting folder (which will be titled “omeka-1.1″) is located. (Update: All you need to do is to download the Omeka files. Do not try to install Omeka on your laptop by double-clicking on anything. We will install Omeka on your server space during class.)

That’s it. See you on Wednesday!

Downloading and installing software
Oct 20th, 2009 by Amanda French

There is a great deal of free, high-quality software available on the Internet. These free applications are written by developers who want to improve their skills (rather like an internship), developers who want to make a reputation for themselves, and developers who want to make a contribution to society. There’s also plenty of low-cost (and also high-cost) software available on the Internet from developers who may have all the above motivations and who also want to make a little money.

One of the best sites to help you find this plethora of software is called simply Download.com. Download.com provides both editorial reviews and user reviews of the applications it lists, for one thing, along with very useful categories and options.

Here are some tips for downloading and installing software:

1) Create a folder called “Download” (if one doesn’t already exist) on your computer, and save all your downloaded files there. You can (and should) set this folder to be the default saved-file folder in the preferences of your browser (Firefox, IE, Safari, etc.).

2) Make sure that you have a program installed on your computer that can “unzip” or “uncompress” or “unstuff” or “extract” files. Software installation files can be very large, so they are often downloaded in compressed formats such as .zip files for Windows and .sit files for Mac. QuickZip is a good free uncompresser for Windows, while Stuffit Expander is a good uncompression program for Mac. Note that you might already have such a program on your computer, in which case you don’t need to get a different one. If you’re unsure, double-click on any compressed file and see what happens. Again, make sure that you know where the program puts the files it uncompresses; I recommend that you make sure the program puts the uncompressed files in the Download folder.

3) Once uncompressed, a software installation file for Windows will have the .exe extension. Double-click on the file and follow the instructions to install the program. Once uncompressed, a software installation file for Mac will have the .dmg or .pkg extension. Double-click on the file and follow the instructions to install the program.

Update: The above instructions refer only to client-side (i.e., desktop) software such as FileZilla, NOT to server-side software such as Omeka. Server-side software has to be uploaded to a server.

Digital History Projects to evaluate
Oct 14th, 2009 by Amanda French

DIGITAL HISTORY PROJECTS

Who Killed William Robinson?
http://www.canadianmysteries.ca/sites/robinson/home/indexen.html

Urban Simulation Team, UCLA: World’s Columbian Exposition of 1893
http://www.ust.ucla.edu/ustweb/Projects/columbian_expo.htm

Philip Ethington, “Los Angeles and the Problem of Urban Historical Knowledge”
http://www.usc.edu/dept/LAS/history/historylab/LAPUHK/

William G. Thomas and Edward Ayers, “The Differences Slavery Made: A Close Analysis of Two Communities”
http://www2.vcdh.virginia.edu/AHR/

The Dolley Madison Project
http://www2.vcdh.virginia.edu/madison/index.html

Sarah Toton, “Vale of Amusements: Modernity, Technology, and Atlanta’s Ponce De Leon Park, 1870-1920,” Southern Spaces
http://www.southernspaces.org/contents/2008/toton/1b.htm

Post links to online bibliographies here
Oct 13th, 2009 by Amanda French

I’ve updated the bibliography assignment with some more specific advice about how to publish your bibliographies to the web. If you’re having trouble, I do always recommend searching the help documentation for the particular software you’re using (which, everyone said in class, was either Zotero or Refworks) — there’s plenty of useful information there.

Please comment on this post with the URL to your bibliography before class time tomorrow. Here’s a sample Zotero bibliography and a sample RefWorks bibliography (neither have notes; sorry). These samples are now also on the assignment page.

Week 5: Copyright
Oct 7th, 2009 by Amanda French

CDH Week 5: Copyright

5:00 – 5:15 Introduction

Housekeeping: Changes to digital archive assignment, please visit links page, suggest terms for glossary, collaboration redux, forums?

There are two issues here: you as consumers of copyrightable content, and you as producers of copyrightable content. We’re going to spend the first hour talking about the first, and the second hour talking about the second. The general argument here is that U.S. copyright law is a mess, and that it has not kept up at all with the digital age, and that Lawrence Lessig and Siva Viadhyanathan are right to say that copyright exerts too strong an influence on us.

Introduction to copyright law — full text of the law is available at http://www.copyright.gov/title17/92chap1.html:

  • Copyright in unpublished works (this covers most material in archives) lasts for at least the life of the author plus 70 years; a term which is longer than it used to be.

  • There is a special exemption for libraries and archives to make digital
    copies of materials for the sole purpose of preservation.

  • Works published in the U.S. before 1923 are in the public domain.
  • A “fair use” is a use for which you do not have to ask permission. You cannot know in advance whether a use is fair; a judge must determine whether the use was fair after the fact. There are four factors for judging whether a use is fair. All four must be considered; if a use meets one criterion but not another, the use is likely to be judged unfair:
    • the purpose and character of the use, including whether
      such use is of a commercial nature or is for nonprofit educational purposes;

    • the nature of the copyrighted work;

    • the amount and substantiality of the portion used
      in relation to the copyrighted work as a whole; and

    • the effect of the use upon the potential market for
      or value of the copyrighted work.

  • There are several tools to help you discover whether a work is copyrighted and whether a planned use is fair, including:
  • There are also some search tools that help you discover items that you are free to use and re-use:

5:15 – 6:00: Copyright, permissions, and us(ers)

Who here is nervous about copyright? Why? Tell me your own experiences, both in planning the project for this course and with other experiences.

6:00 – 6:20: 5-minute writing exercise

What other laws make you similarly nervous? What other laws don’t make you nervous at all? Why don’t they, and how could copyright law (and practice) be reformed to imitate those laws?

6:20 – 6:30: BREAK

6:30 – 7:20: Our own copyrightable / commercializable content

Now let’s talk about you as producers of potentially copyrightable content. Let’s discuss Roy Rosenzweig’s question: Should historical scholarship be free? What about digital copies of the materials in libraries, archives, and museums–should those be free? What issues did you bring up in your discussion questions?

Let’s hear from Rachel, specifically, since she’s been working on these issues.

Note that there are reasonable arguments that these materials should not be free: examples include the article by Robert Townsend of AHA and the point made by the Smithsonian Institution that there are two significant costs in making images freely available: managing digital collections and clearing rights. (See http://cnx.org/content/m27791/latest/)

REMINDER: ANNOTATED BIBLIOGRAPHY DUE NEXT WEEK — questions?

Glossary additions?
Oct 6th, 2009 by Amanda French

I’ve added terms here and there to the glossary, but it’s easy for me to get behind. If there are terms that you yourself had to look up, it’d be great if you’d mention them here in the comments. I’ll add them to the glossary, and it’ll be useful for future students, I hope. Thanks for your help.

Collaboration redux
Oct 5th, 2009 by Amanda French

Wanted to point out that we sparked a bit of a discussion on Twitter when I mentioned that we were talking about collaboration — Garrett McMahon, the Institutional Repository Content Manager at Trinity College in Dublin, even took the trouble to find our course website and post a comment in response to Stacey’s remark that she finds Irish Studies to be particularly collaborative. Those who weighed in on soldier blogs, Omeka use, and collaboration might be particularly good people to begin eavesdropping on via Twitter (I’ll let you investigate who these folks are):

And my promised advice on how to build a social network: start by eavesdropping, continue by replying, end by writing. This works the same way with Twitter as with blogs, Flickr, YouTube and so on: if you find a blog post that’s interesting and you comment on it, leaving a link back to your own blog or site, people will often look at what you’re doing out of sheer curiosity. Thus are networks born.

For Twitter, specifically, you might begin by noticing when sites you visit have a “Follow us on Twitter” link. You also might search for a specific (but not too specific) term to discover who’s talking about the same things you’re interested in. The concept of a “hashtag” is important here: hashtags are simply keywords prefaced by the # sign — it’s an organic way of generating metadata. For instance, a Twitter search of #archives produces great results; public historians haven’t quite coalesced around a hashtag in the same way, but you can always look for #museums.

One final note about Twitter: several of the authors whose works we are reading are (gasp!) alive and Twittering. Ted Friedman, author of Electric Dreams, is currently teaching a course down at Georgia Tech (and is Twittering about it); Lawrence Lessig, whose presentation you’ll watch this week, twitters at @lessig; and Siva Vaidhyanathan, author of Copyrights and Copywrongs, twitters amusingly and informatively from @sivavaid. Siva has asked, in fact, that you let him know of any updates you’d recommend to Copyrights and Copywrongs — a tenth anniversary edition is planned in 2011.

See you Wednesday night.

»  Substance: WordPress   »  Style: Ahren Ahimsa