Section 5: Post-digitization
The text above discusses the importance of creating an ingest protocol and coordinating with other stakeholders in the client organization (and perhaps partner organizations) who are depended upon or directly impacted. Also discussed as part of the ingest protocol was walking through the details of populating client systems with media and metadata. All of these are critical elements of properly planning for post-digitization. One aspect not yet covered is how to calculate storage requirements for each of the target systems, in total and over time. This will be an important piece of data for IT infrastructure planning, for budgeting for the delivery media in the project, and for logistical planning for quality control, ingest, and longer term storage.
The basis for calculating storage requirements will be an inventory of items selected for digitization. Populate a spreadsheet with the following columns:
- Format
- Quantity
- Estimated average duration (use media duration if program duration is unknown)
- Estimated total duration (quantity x estimated average duration)
- Preservation master file size (GB/TB per hour/min x estimated total duration)
- Mezzanine copy file size (GB/TB per hour/min x estimated total duration)
- Access copy file size (GB/TB per hour/min x estimated total duration)
- Total file size (PM file size + Mezz copy file size + Access copy file size)
Summing across all formats will provide the total required storage capacity. If the capacity of the delivery media is known, then the quantity of media to purchase can be calculated, and the client can plan for what will be received.
Knowing total storage capacity is helpful, but a lower level piece of information that is more useful when planning for IT storage infrastructure is how that data will be produced over time. This is much more true for large projects than small projects where everything will come in at once.
To calculate the storage capacity growth over time, the client organization will need to have either specified or have a sense of the frequency and quantity of batches being delivered from the vendor. Note that this same information is also useful for planning for quality control and ingest staffing. Once the frequency and quantities (and possibly formats) of deliveries are known, the spreadsheet created to calculate total storage capacity can be used to calculate storage capacity growth over time. Furthermore, if it is known which target formats will populate which target systems, this calculation can be performed at the target system level.