»
S
I
D
E
B
A
R
«
Week 14: Open House
Dec 16th, 2009 by Amanda French

Here’s the schedule for tonight’s open house. Please plan to bring your laptop, log in to your site so that everyone can see any private items, and prepare to answer questions about your project.

Again, remember that everyone in a particular time slot will be presenting at the same time — the rest of us will be circulating. I’ll have comment sheets for you, and I’d like everyone to fill out at least 3 comment sheets on other projects.

5pm – 5:30pm

  • Ann Christiansen
  • Nicole Milano
  • Kait Medley
  • Brigid Harmon

5:40pm – 6:10pm

  • Samantha Gibson
  • Meredith Davidson
  • John Bence
  • Paula Wagner

6:15pm – 6:45pm

  • Rachel Moskowitz
  • Ashley Jones
  • Tracie Logan
  • Amita Manghnani

6:50pm – 7:20pm

  • LEJ Rachell
  • Nicole DeRise
  • Sarah Hodge
  • Amanda Timolat
  • Julianna Monjeau
Week 8: Digitization
Oct 28th, 2009 by Amanda French

WEEK 8: DIGITIZATION

Here are some basic terms and concepts that you need to be aware of in digitization. Parts of this post are adapted from the University of Virginia’s Electronic Text Center Scanning Help Sheets.

Note: The best place to go for help with digitization is the Digital Studio on the 2nd floor of Bobst Library, where they have many “digital authoring” tools and where they will help you learn the software and make decisions about standards.

Image Scanning

Software

The gold standard of image capture and editing software is Photoshop. However, there are other image editing programs that can do many of the most common image editing tasks. You might want to download and try these:

* Paint.net for Windows
* Seashore for Mac
* Gimp for Mac

There’s also a “light” (and cheaper) version of Photoshop called Photoshop Elements or Photoshop LE.

Pixels, DPI, and Resolution

A “pixel,” or “dot,” is the atom of a digital image. The resolution of a digital image is measured in dots per inch (dpi). If you keep zooming in on a digital image, for instance, the image will become “pixelated,” meaning that you can see the individual pixels.

pixelated

The NINCH Guide to Good Practice recommends a minimum of 300 dpi for image scanning. At this resolution, images can be printed with little loss of quality. If you have enough space, you may want to scan at 600dpi instead. Serious preservation scanning is done at 1200 dpi. Computer monitors display images at a resolution of 72 dpi, so an image that will only be displayed on screen (not printed) need not have a resolution any higher than that.

Color

Images can be scanned with four basic levels of color information:

* 1-bit black and white — each dot can be either black or white; used for cartoons, line drawings, and to increase contrast on images of text.

* 8-bit greyscale — each dot can be one of 256 grey shades; used for ordinary (non-archival) scanning of what we normally call “black and white” images.

* 8-bit color — each dot can be one of 256 colors; 8-bit color can look a little grainy at times; it’s used mainly for color cartoons and “clip art.”

* 24-bit color — each dot can be one of 16.8 million colors; best used for photographs. There’s even 48-bit color, now, but many programs will not support it.

File Formats, Loss, Losslessness, and Compression

* TIFF — a lossless (uncompressed) format that works on all platforms. The TIFF file has long-term archival use, but is usually too big to put on a web site.

* PNG — a compressed format that works on all platforms; the best one for web delivery.

* JPG or JPEG — a highly compressed format similar to PNG.

* GIFs — a moderately compressed format that is only suitable for 8-bit or 1-bit color images (i.e., cartoons and logos).

Text Scanning

Optical Character Recognition (OCR)

To perform “OCR” on an image is to transform a picture of text into editable, searchable, manipulatable text. One piece of software that can do this is called “FineReader,” made by a company called ABBYY.

The books available on Google Book Search have been OCR’d. It can be instructive to view the “plain text” behind the page images, as in this page from the introduction to a version of Jane Eyre. There are some small errors; such errors are inevitably greater when the page is stained or faded, or when the font is unusual.

As the University of Virginia’s Electronic Text Center advises, “even with clean text of a decent type size there will be occasional errors; this error rate increases as the text’s size and clarity decreases. Altering the brightness and resolution can improve results, but little can be done with a badly faded photocopy or a 17th or 18th century typeface.” When it is important to treat a text *as* text, and not simply as images, many prefer to outsource the work of retyping the text in by hand. There are firms that will do this that guarantee a 99.99% rate of accuracy.

Audio and Video Digitization

Note: Audio and video are sometimes called “time-based media,” for reasons that are obvious if you think about them — audio and video happen over time, whereas images and text are “still.” Many of the issues for digitizing audio and video are therefore the same.

Software

One of the best audio editing programs is a free, open-source program called Audacity. You can use this to manage digital audio files. The Digital Studio also has ProTools, a more advanced program.

There are still not many very good free video editing and conversion programs. The Digital Studio has both iMovie (which is simpler) and FinalCut Pro (which is more difficult).

Sample rate and bit rate

“Sample rate” and “bit rate” in audio s analagous to “resolution” for images: they measure of how much information the computer has captured. If you would like to read more about what exactly sample rates and bit rates are, check the NINCH Guide’s section on Audio and Video Management. The basic thing you need to know, however, is that audio should be sampled at a rate of at least 44.1 kHz, which is the same rate as a CD; 96 kHz is the new professional archival standard. The archival bit rate is 24-bit, although many projects have opted to use 16-bit.

Video digitization is more complicated. There is a good tutorial about archival digitization of video at the University of Texas’s Information School.

Format

Just as with images, audio and video digital files can be saved in formats that are either “lossy” (meaning that they are compressed so that the file is smaller) or “lossless” (meaning that no information has been lost).

* mp3 — The mp3 is the most compressed audio format, and therefore the one that is most common on the Internet. It is not suitable for archival purposes.

* aiff — Lossless audio format owned by Apple.

* wav — Lossless audio/video format owned by Microsoft; the most common for archival audio.

* mpeg — The most common video format, used for both preservation and access.

Other

The Digital Studio has a slide scanner that they can teach you to use, if necessary. The Microfilm department at NYU’s Bobst can also help you get a digital copy of a microfilm, if necessary.

Week 5: Copyright
Oct 7th, 2009 by Amanda French

CDH Week 5: Copyright

5:00 – 5:15 Introduction

Housekeeping: Changes to digital archive assignment, please visit links page, suggest terms for glossary, collaboration redux, forums?

There are two issues here: you as consumers of copyrightable content, and you as producers of copyrightable content. We’re going to spend the first hour talking about the first, and the second hour talking about the second. The general argument here is that U.S. copyright law is a mess, and that it has not kept up at all with the digital age, and that Lawrence Lessig and Siva Viadhyanathan are right to say that copyright exerts too strong an influence on us.

Introduction to copyright law — full text of the law is available at http://www.copyright.gov/title17/92chap1.html:

  • Copyright in unpublished works (this covers most material in archives) lasts for at least the life of the author plus 70 years; a term which is longer than it used to be.

  • There is a special exemption for libraries and archives to make digital
    copies of materials for the sole purpose of preservation.

  • Works published in the U.S. before 1923 are in the public domain.
  • A “fair use” is a use for which you do not have to ask permission. You cannot know in advance whether a use is fair; a judge must determine whether the use was fair after the fact. There are four factors for judging whether a use is fair. All four must be considered; if a use meets one criterion but not another, the use is likely to be judged unfair:
    • the purpose and character of the use, including whether
      such use is of a commercial nature or is for nonprofit educational purposes;

    • the nature of the copyrighted work;

    • the amount and substantiality of the portion used
      in relation to the copyrighted work as a whole; and

    • the effect of the use upon the potential market for
      or value of the copyrighted work.

  • There are several tools to help you discover whether a work is copyrighted and whether a planned use is fair, including:
  • There are also some search tools that help you discover items that you are free to use and re-use:

5:15 – 6:00: Copyright, permissions, and us(ers)

Who here is nervous about copyright? Why? Tell me your own experiences, both in planning the project for this course and with other experiences.

6:00 – 6:20: 5-minute writing exercise

What other laws make you similarly nervous? What other laws don’t make you nervous at all? Why don’t they, and how could copyright law (and practice) be reformed to imitate those laws?

6:20 – 6:30: BREAK

6:30 – 7:20: Our own copyrightable / commercializable content

Now let’s talk about you as producers of potentially copyrightable content. Let’s discuss Roy Rosenzweig’s question: Should historical scholarship be free? What about digital copies of the materials in libraries, archives, and museums–should those be free? What issues did you bring up in your discussion questions?

Let’s hear from Rachel, specifically, since she’s been working on these issues.

Note that there are reasonable arguments that these materials should not be free: examples include the article by Robert Townsend of AHA and the point made by the Smithsonian Institution that there are two significant costs in making images freely available: managing digital collections and clearing rights. (See http://cnx.org/content/m27791/latest/)

REMINDER: ANNOTATED BIBLIOGRAPHY DUE NEXT WEEK — questions?

Collaboration redux
Oct 5th, 2009 by Amanda French

Wanted to point out that we sparked a bit of a discussion on Twitter when I mentioned that we were talking about collaboration — Garrett McMahon, the Institutional Repository Content Manager at Trinity College in Dublin, even took the trouble to find our course website and post a comment in response to Stacey’s remark that she finds Irish Studies to be particularly collaborative. Those who weighed in on soldier blogs, Omeka use, and collaboration might be particularly good people to begin eavesdropping on via Twitter (I’ll let you investigate who these folks are):

And my promised advice on how to build a social network: start by eavesdropping, continue by replying, end by writing. This works the same way with Twitter as with blogs, Flickr, YouTube and so on: if you find a blog post that’s interesting and you comment on it, leaving a link back to your own blog or site, people will often look at what you’re doing out of sheer curiosity. Thus are networks born.

For Twitter, specifically, you might begin by noticing when sites you visit have a “Follow us on Twitter” link. You also might search for a specific (but not too specific) term to discover who’s talking about the same things you’re interested in. The concept of a “hashtag” is important here: hashtags are simply keywords prefaced by the # sign — it’s an organic way of generating metadata. For instance, a Twitter search of #archives produces great results; public historians haven’t quite coalesced around a hashtag in the same way, but you can always look for #museums.

One final note about Twitter: several of the authors whose works we are reading are (gasp!) alive and Twittering. Ted Friedman, author of Electric Dreams, is currently teaching a course down at Georgia Tech (and is Twittering about it); Lawrence Lessig, whose presentation you’ll watch this week, twitters at @lessig; and Siva Vaidhyanathan, author of Copyrights and Copywrongs, twitters amusingly and informatively from @sivavaid. Siva has asked, in fact, that you let him know of any updates you’d recommend to Copyrights and Copywrongs — a tenth anniversary edition is planned in 2011.

See you Wednesday night.

»  Substance: WordPress   »  Style: Ahren Ahimsa