Digital Photo Archiving
What is a Digital Archive
A photo archive is simply a collection of images: prints, negatives, slides, or digital files. The simplest digital archive is, of course, a folder on the hard drive containing digital photos. If there are only a few hundred images, this is a fine way to store them; there might be individual folders by subject to make searching the archive a little easier.
With more images, the next step up is to make a flat-field database in a spreadsheet program like Excel. Storing photo files on CD-R media, one line in the spreadsheet corresponds to each numbered CD, with a brief description of the images on that CD. Then a quick search of the spreadsheet locates the CD with the required image. (This can be integrated with a billing system, with each job stored on a separate CD.)
Taking it one step further, it should be possible to create an archive that is much easier to use, and more powerful, than a simple spreadsheet. Such an archive might:
    •    Keep track of images no matter where they are stored
 
    •    Help categorize pictures in a logical manner
 
    •    Search for pictures by caption, filename, date shot, photographer, or any other useful information
 
    •    Create a thumbnail of each image for easy reference
 
    •    Allow others to have access to your images

Media Asset Management applications
Programs that manage a database of digital files are called Media Asset Management applications. These include Canto Cumulus, Extensis Portfolio, iView, and others. These applications all do essentially the same thing as any database: they create a 'Record' for each digital photo, and that Record has a number of 'Fields' that contain information about the image. These Fields usually include a small thumbnail image, the filename, the path to the location where the original file is stored, the date the record was created and modified. The fields may also include IPTC data (captions and other information) and EXIF data (technical data taken from the camera).
Note that the database Record does NOT contain the original photograph. The original is left alone, and the Record contains a marker showing its location. The original image may be stored on a CD, on a local hard drive, or on a server anywhere in the world.
Trial versions of several of these applications are available from their websites (see Useful Links, at right.) While the underlying job of all these programs is the same, they go about it in very different ways. It's best to try several applications to decide which one best fits into a specific workflow.

Preparing Images for Archiving
Before archiving digital photographs using one of these programs, it helps to standardize the files in some way. There are several decisions to make:
    •    Archive every single RAW image, or just the best processed files? This depends on how the archive will be used, and who will have access to it.
 
    •    What size files to archive: archiving high-resolution print-size files makes it easy for the photographer and other users to access images for publication or web use. However, for more control over what gets used in publication, putting web-resolution files only in the archive forces the end user to come to request print-res files.
 
At Wake Forest, our Cumulus archive is available to everyone in Creative Services and the News Service. As such, it includes only the best images from any given assignment, saved as Adobe-RGB JPEG files. The files have been processed from the RAW originals using Adobe Camera Raw. The images receive a very light capture sharpening, or none at all (see the sharpening page for details). The photos are full-resolution files, saved at the highest possible JPEG quality setting.
File naming conventions
After the archive grows past a few hundred images, it is very easy to have file naming problems, especially two different files having the same name. (How many images might be called 'campus.jpg?') A simple way to give every single file a unique name is to use a specific file naming convention that uses the shoot date. After some trial and error, I settled on a format that uses the first 8 characters of the file name as the date in YYYYMMDD order, followed by content information, then a frame number and the .JPG suffix. Filenames look something like this:
20060309mbaclass1234.jpg
where the date is followed by the Jobname (a user-defined variable in Photo Mechanic), then the frame number taken automatically from the camera, and the file type suffix. Files are batch-renamed in Photo Mechanic immediately after they are downloaded. This naming convention has several advantages:
    •    It is very difficult to create two images with the same filename (exception: when shooting with two cameras of the same model, be careful that they are not using overlapping frame numbers).
 
    •    Finding the original raw image is easier, since the shoot date is in the filename, and all original photos are stored by shoot date.
 
    •    A folder full of random images on the desktop is automatically sorted into chronological order by filename.
 
A file naming convention will be specific to the user’s individual needs. For a large catalog of stock images, the Dewey Decimal or Library of Congress system can identify the subject of the photographs. For very specialized catalogs, a three- or four-letter coding system can provide a lot more content information than is available in a simple 4-8 character jobname. The key is to integrate the naming system into the workflow so that every file is renamed properly before it ends up in the archive.
Captions
As regular readers of this website know, I am a major proponent of captioning digital images as completely as possible, and as soon as possible after shooting. Captioning images helps in several ways:
    •    The subject(s) of the photograph can be identified long after it was made, and potentially long after the photographer has moved on.
 
    •    Copyright and contact information is permanently attached to each image.
 
    •    The caption becomes a searchable field in the media asset management program, providing a powerful search tool.
 
Captioning digital images is so much easier than with film images, I am always surprised at the resistance I get to this part of the program. Using an image browser like Photo Mechanic or BreezeBrowser, it is a simple matter to attach IPTC-compliant caption information, plus all the other IPTC fields (photographer, location, etc.), to even very large selections of images. Using batch captioning and the IPTC stamp in Photo Mechanic makes this process even easier (see the workflow page for more information.)
In any case, the most important advantage of captioning is the last one: being able to search the caption field in Cumulus is a terrific tool, and gaining the benefits of that tool requires good captions.

Organizing the Archive: Using Categories and Keywords
Up to this point, everything has applied equally to any of the commercially available image database programs, and even to any homemade system. From here on, this page will deal more specifically with the Canto Cumulus application.
Many applications work on a Keyword system for organizing and retrieving images. There is a fixed list of Keywords which are applied to each image after it is acquired into the database. This system has several advantages: using a fixed list (a 'controlled vocabulary') allows more control over how data is entered into the system. More than one person can enter image data, for example, without the chaos that would result from each person having their own idea of how to keyword the image. It also makes searching by Keyword easier, since there is a fixed list and no need to guess at the keywords.
Canto Cumulus implements a fixed keyword system without ever using the word 'keyword', and in such a way that it emulates the Macintosh hierarchical folder structure. A Cumulus Archive is organized by Category folders, which bear a remarkable resemblance to folders on the Macintosh desktop. After photos are acquired into the Cumulus database, they are placed into the Category folders, just like putting images into a folder. Double-clicking one of the Category folders displays all the images in that Category, just like opening a folder of images. This is conceptually a very simple way to organize images, especially for Macintosh users.
The category system can be as complex or as simple as needed: each Category folder can have multiple levels of sub-categories, and any one image file can be in many different Category folders at the same time. Looking at a simple category structure that might apply at a university, this could be the root category Academics:
    •    Academics
    •    By School
    •    Undergraduate
    •    Babcock School (our graduate business school)
    •    Divinity School
    •    Law School
    •    By Location
    •    Classroom
    •    Laboratory
    •    Field
    •    By Subject
    •    Business
    •    Science and Mathematics
    •    Art and Music
    •    Humanities
 
The graduate business photo mentioned above, 20060309_mba_class1234.jpg, would be in the By School>Babcock School, the By Location>Cassroom, and the By Subject>Business Category folders. It is a simple matter to drag and drop the thumbnail into each category after the photo has been acquired.
The power of this system comes into play when looking for pictures of a specific subject. Find every classroom picture by double-clicking on By Location>Classroom. Find every Babcock photo by double-clicking the By School>Babcock School category. But, combine this search and look at only Babcock classroom pictures by selecting both the By School>Babcock School and the By Location>Classroom folders; then Cumulus only shows images that are in both folders at the same time. It is possible to conduct a very narrow search this way.
Setting Up the Category structure
It should be obvious that the Category (or Keyword) structure is the most important part of creating a digital photo archive. When I first started shooting with a digital camera in January, 2001, I created a simple Cumulus single-user archive, keeping the archive on my laptop and the original images on CDs in my studio. Within a few months, it became painfully obvious that my Category structure wasn't very well though out. This was okay, since I was the only person using the archive, and I pretty much knew where to find everything.
When I was directed to create an on-line archive, accessible by everyone in our office (see below), I knew that the Category structure would have to be intuitive, easy to use, and very comprehensive. It would have to be searchable not only by subject, but also by concept, as our designers often need an image that just says 'technology,' or 'winter.' I spent about 6 months working on an interface that looks very simple to start, and gets more complex the deeper you go.
In setting up the Categories, take into account who will be using the archive, how knowledgeable they are about digital images (and computers in general), and what sort of images they might need. Remember, too, that the categories might change if the archive contains all the original RAW images, for example, or different kinds of files. One simple example of a top-level directory for a large Public Affairs Office might be:
    •    Photographs
    •    Illustrations
    •    Stories
    •    Audio Clips
    •    Layouts
 
The user can easily look at only the kind of files required, but then each root category can be subdivided by subject. On a University campus, it might go like this:
    •    Photographs
    •    Campus
    •    academic buildings
    •    residential buildings
    •    scenics
    •    Faculty and Staff
    •    headshots
    •    portraits
    •    in class
    •    Students
    •    in class
    •    residence life
    •    features
    •    Athletics
    •    By Team
        •        list of teams
    •    headshots
 
This is far too general, but it’s a start. Remember that images can go into multiple categories. One suggestion is to have conceptual categories in a separate folder from specific subject categories.
Test, Test, Test
Cumulus can be a difficult application to set up. Not only is there a complex category structure, but there are various preferences in different places that need to be customized before Cumulus will work properly. It is a very powerful program, but that power comes at the price of easy setup. For example, several preferences need to be changed before Cumulus will 'see' the IPTC Caption info, and even more before it will make it a searchable field. It can be a waste of time to have Cumulus acquire several gigabytes of pictures, which takes many hours, only to find out that it didn't put the Caption in the right place, or one set of categories is wrong.
The key to setting up Cumulus is to test. Create a new folder with about 20-25 images in it, making sure that the images cover the whole range of categories. Then keep acquiring that one folder until everything is set just right.

Finding Pictures: Searching by Category and Using the Find Tool
Once the Cumulus archive is up and running, it is a very powerful tool for finding images. As we discussed earlier, it is possible to set up a search using multiple Categories to find only pictures that are in a very narrow range. It is also easy to look at a broad range of images, to see everything in one very general category. (Note that every image in a sub-folder is automatically in the folders above it. So in the example above, double-clicking Students would show every image that is in every sub-folder in that category.)
The Find tool searches for specific pictures, rather than in general categories. The most powerful implementation of the Find tool is the Caption Contains search. After configuring Cumulus to acquire the IPTC caption and make it a searchable field (this is not the default), select 'Caption' 'contains' 'string', where 'string' is the word or phrase in the Caption. Cumulus then shows every image that has that word or phrase in its caption.
Cumulus can also search for 'Asset Name' 'contains' 'string', where the Asset Name is the File Name of the original file. By following a specific file naming convention, it is possible to find every picture shot on a certain date, or find pictures of a specific person.
Search Box
Although Cumulus provides several very powerful search tools, and I have provided a well-organized Category structure, I have found that most users don’t use them.
Most of the end users in our office use the default Search Box in the toolbar of the Cumulus Client. Note that by default, this box does not search within the Caption field, only the Asset Name (filename) and Notes field. Setting the preferences to make this box search within Captions makes this search method much more useful.

Making Pictures Available on a Network
Canto sells several versions of Cumulus that work across a network. We have implemented the Cumulus Workgroup client/server package. The server software runs on a Macintosh OS-X Server in the photo studio, with all of the original files available on an external SATA hard drive enclosure attached to the server. Everyone in my work group has the Client software installed on their computer, and can access the Cumulus catalog, search for pictures, and download the originals to their desktop.
This system has several advantages:
    •    We control who has access to the images (unlike a web-based solution)
 
    •    People who need images can find them, and pick the image(s) they want from a range of choices, all without ever contacting me
    •    Many more of my images are seen and used than was ever possible with film
 
There are several things to keep in mind when setting up a Cumulus Workgroup system. These are in no particular order:
    •    The initial purchase of the client and server software includes a fixed number of clients that can be logged into the server at the same time. The client software may be installed on an unlimited number of computers, but server connections are limited to the number of client licenses purchased. In a small office, with 10-12 people, we purchased six client licenses, which is probably too many. Three or four would have been sufficient. However, as we add archive catalogs for our graduate schools and athletic offices, the extra client licenses will be useful.
 
    •    The client software is available for any platform, and should be able to access the Cumulus catalog anywhere on the network. We are a PC campus, with Macs used in the Creative Services office, and the catalog is running on a Mac, but anyone with the client software and a user ID can access the catalog from anywhere in the world.
 
    •    Installing the Cumulus Server software on an OS-X machine turned out to be quite an interesting experience for a simple photographer. I have christened this particular Mac the 'Unix Server of Death.'
 
    •    The photographer running the Cumulus server becomes a systems administrator, responsible for keeping the server up and running, responding to emergency breakdowns, making sure that everything is backed up properly, and dealing with power supplies, network connections, usernames, etc. It's very different from being a photographer.
 
I archive only selected images from each assignment and stock shoot, usually the best 4-10 selects (but it can be many more from a large event like Commencement.) There are over 8000 images in the archive right now (out of about 150,000 on the photo server.) They are color corrected, toned, and saved as Adobe-RGB JPEG files (highest quality). I add new images every few days. Once the catalog is up and running, adding images and general catalog maintenance only takes an hour or two a week.
More Information
I hope this page has been helpful. Find additonal information about Cumulus at their web site. Feel free to contact me with any specific questions.
 
About this page
This page discusses various topics related to digital asset managment (archiving). Specific examples come from the Canto Cumulus Workgroup installation at Wake Forest, but the general principles apply to most database programs.
 
Useful Links
Canto Cumulus
 
Extensis Portfolio
 
Iview Multimedia
 
Controlled Vocabulary: David Riecks' excellent site on this topic.
 
 
Copyright Notice and Disclaimer: The complete contents of this site are ©Copyright 1997-2006 by Ken Bennett and/or Wake Forest University. All rights are reserved: no part of this site maybe downloaded, copied, folded, spindled, or mutilated without express written permission of the owner. The views expressed in this site are the individual opinions of Ken Bennett, and in no way imply any endorsement by Wake Forest.