Database Building Options and Costs
by Marjorie M.K. Hlava
Note: This white paper was first published as part of Information Today‘s Private Files Series, Volume 1, Issue 7.
People often comment that the cost estimates for database construction seem exceedingly high.
In response to those surprised voices, this paper will explore various options available to those who are building a database. Associated costs will be detailed in each option. This should help clarify the assorted costs of automating data and also show how to determine the most efficient, cost effective way to produce a database.
The stages of automation include collecting the data; tagging or indexing the records to indicate in what field each item of information should be entered; writing abstracts; keying the data into computer files; and converting these files into the database. In certain cases, some of these steps will either be already completed or entirely unnecessary. The following case studies detail options for completing various stages of database construction.
Case Study One
The data is on cards collected in card catalog drawers. The information is a special collection. There are 10,000 items and the library has decided to mount it using a home-grown system that will be provided by the company’s computer services department. Since the data has been collected and indexed and the system is already developed, the only problem is to make the data machine-readable and convert it to the system. Standard library cards are estimated at 300 characters each. The total number of characters to be entered is approximately 3 million. The library decides to put out a small request for proposal (RFP) to local and offshore facilities. The RFP details the specifications and the bids come in as follows:
The local optical character reader (OCR) company bids to do the cards for 52 cents each with 99 percent accuracy. This data will not be sorted into fields, but the MIS department of the company can unscramble the data to sort it by title, subject headings, author, etc., after they receive the tape from the scanning company.
Cost: $.52 per record x 10,000 records = $5,200.
A local keyboarding company proposes to enter the data by the various fields of author, title, and subject. The remaining information on the card will be entered into a bibliographic notes field. The company will charge $1.50/1,000 characters. The accuracy rate will be 99 percent.
Cost: $1.50/1,000 characters x 3,000,000 characters = $4,500.
A response from another local keyboarding company proposes to keyboard the data. This company enters the field, author, title and subject, and also includes the remaining card information into a bibliographic notes field. The guarantee is 99 percent accuracy. The charge is $1.60/ 1,000 characters.
Cost: $1 .60/ 1,000 characters x 3,000,000 characters = $4,800.
Another company bids to keyboard the data, fielding author, title, subject heading, and bibliographic notes. First typing is charged at $1.50/1,000 characters; key verification (another entire typing) which guarantees 99.997 percent accuracy for $1.30/ 1,000 characters or a total of $2.80/1,000 characters.
Cost: $2.80/1,000 characters x 3,000,000 characters = $8,400.
This bid comes in from a library conversion organization proposing to enter the data for 15 cents a card for all of the cards they have in their catalog for MARC records supplied. They estimate to be able to cover 75 percent of the card catalog in this corporate library. All other things will have to be manually typed by library staff at $9/hour. So 7,500 of the records will cost 15 cents each; the remaining 2,500 records keyed by library personnel at $9/hour, at a rate of 10 cards typed per hour (or one every six minutes), comes out to a total cost of $4,875.
The collected bids then are analyzed. First, look at the accuracy rate. There are 1,500 characters on the average typewritten page. At 99 percent accuracy, that means one error for every 100 characters. That amounts to 15 typos per typewritten page.
When translating that into 300-character library cards, we are looking at three typos per card. This is what an unverified 99 percent accuracy means. The 99.997 percent accuracy rate means that you have three typos for every 10,000 characters or approximately one typo for every 33 cards.
In Case Study One, the library decided that accuracy wasn’t crucial. They could correct each error on their in-house system since the system would only be used by the internal staff. The library also determined they did not want to tie up their library staff with the job of keying the cards that could not be duplicated by MARC tapes. Bid B was the winning bid in this instance.
Case Study Two
In this library, all of the cards have been collected in drawers. The computer services department will give the library access to the STAIRS system which they already use in-house. The library can timeshare on the system. Full computer support will be available and a member of the library staff is familiar with the computer system, so programming should not be a problem.
The library staff has decided a card catalog type entry is adequate and abstracts are not necessary. They do feel, however, that some enhancement of the subject headings is important because the library supports a fairly technical R&D center and enhanced subject headings will help the library personnel. The staff estimates an average of three subject headings per entry will be adequate.
There is no hurry, so the staff decides that over a two-year period the library secretary and the clerk will be able to enter all of the data into the catalog. Here again, 10,000 records will be entered. The professional staff will add the keywords.
Some administrators might think this project will be inexpensive since it is being completed by in-house staff. But an examination of the labor costs is quite illuminating.
The staff should be able to enhance the subject headings at the rate of 10 to 15 cards per hour. The librarians make $15 an hour with no overhead included since that will be absorbed by the company. The clerks make an average of $9/an hour. They can keyboard the cards with the new subject headings at a rate of 20 cards per hour. They make no errors so there is no need for proofreading or corrections. So, over the two year period the company will spend approximately $16,500 in labor for the production of this file. But it’s all invisible cost and, therefore, seems negligible to the company.
Case Study Three
This is a special collection of documents numbering 10,000 items. These are the company’s technical reports, so they include author, subject, and title of report for each record.
The reports have already been collected and are sitting in boxes. The author, title, and report number have been recorded. The library still needs to add the subject terms and an abstract if an abstract is not already supplied by the author.
There is no concern about copyright because these are company reports. The information center has decided to hire a temporary person through an agency to help with the work. This person will cost $15 an hour. The temporary employee is very familiar with the subject and, therefore, can write abstracts when necessary at a rate of three per hour. The data entry will be done by another temporary employee at $9 an hour. The system is owned by the company, so the cost of the system software and support is not relevant to this project.
A cursory review reveals that about half of the technical reports already have abstracts and that the temporary-help person can type at 14,000 keystrokes per hour. The citation with abstract for each technical report will be about 1,000 characters. The costs incurred in this project are:
5,000 abstracts at 3 per hour = 1,667 hours; 1,667 hours x $15 = $25,005.
10,000 items at 1,000 characters each = 10,000,000 characters: 10,000,000 characters at 14,000 characters/hour = 714 hours; 714 hours x $9/hour=$6,426.
Juggling the Budget
These three case studies should give an idea of the variety in costs in production of a file. We have not discussed the costs of data acquisition because they vary depending on the data. Nor have any of these cases included cataloging
It is certainly not necessary to do a full-blown, minute-detail, government-type database where every bit of data is caught and expanded, where abstracts are fully written from scratch. But the options outlined above give an indication of what some of the costs are if you want to build a database in the least expensive fashion.
Building a database is never free; it will always cost at least time if not direct money. But there are certainly many ways to manipulate the budget so the costs are not directly represented.
Once a database is fairly far along, it is much easier to convince management to put in a little extra capital to bring it up to perfection. If you don’t have something to show them, it is much more difficult to explain.
Marjorie M.K. Hlava is chairman of Access Innovations, based in Albuquerque, New Mexico, the provider of the Data Harmony line of software used for indexing and data structuring. You can reach her at firstname.lastname@example.org