Scott Kehoe, Technology Consultant - Northeast Massachusetts Regional Library System (NMRLS)
scott@nmrls.org / 978-762-4433x16 /
AIM-bibliotechy / del.icio.us/bibliotechy

  Scanning 101

 

 

Scanning Terminology


Bit, Bit-depth & Grayscale Pixel Resolution & DPI Interpolation
TWAIN JPEG or JPG TIFF or TIF Megapixel 
OCR (Optical Character Recognition)
CD & DVD discs RESOLUTION RULES OF THUMB Web Links


 

Bit, Bit-depth & Grayscale determines the maximum number of colors available in a digital image.  The more bits, the more color.  Or the more depth of color as perceived by the human brain.  The following are standard bit-depth ratios:

 

24 bit image

8 bit image

8 bit Grayscale image

An image can contain up to

16.7 million colors.

An image can contain up to

256 colors.

An image can contain up to

254 shades of gray,

black and white.

 

 

 

Pixel (picture element) - Digital images are made up of individual pixels.  Each pixel conveys a different color that is dependant on the available bit-depth of the image.  The more pixels captured when scanning or digitally photographing an image the better.

 

 

Resolution & DPI

A longer explanation is available below, but suffice it to say that for everyday use of your scanner, “what you don’t know won’t hurt you.”
But, what you should know are the … Resolution Rules of Thumb

 RESOLUTION RULES OF THUMB


Email /Web pages =
100dpi

Printing photos / OCR text = 300dpi

Digital Master/ “Archival” image = at least 600dpi

 

DIGITAL MASTER
A high resolution is important with the initial scan an to provide a digital master.  The idea is to scan an item once at the highest resolution possible, then it can be edited and resized to suit other needs such as display on a website, reprints, or reproduction for t-shirts and coffee mugs!  A digital master will also be insurance for future improvements in digital display technology.  It is recommended that digital master images be saved as uncompressed TIFF files.

PRINTING SCANNED IMAGES
To print a scanned image at it’s best and at the dimensions of the original item (1:1 copy) , most scanner and printer manufacturers recommend an image should be scanned at a resolution of 300dpi.  DPI is not as a major concern for on-screen images as it is for prints from on-screen or digital images.  

ON-SCREEN DISPLAY
The rule of thumb for the dpi of images for on-screen display, or web pages, has generally been 72dpi – 100dpi, but many collections provide a 300+dpi image to facilitate printing.  Note also that the dpi of a digital  image also effects its byte size! In other words higher dpi images take up more room on a hard drive and will take longer to load on a screen or download over the internet.  On-screen images are dependant on the monitor resolution on which the image displays.  Most current monitor resolutions are usually set to at least 1024 horizontal pixels by 768 vertical pixels (1024x768), but many flat panel and laptop computers display at 1280x800 or higher.  A computer monitor assigns one screen pixel for every image pixel. The physical size of the way the image looks on a monitor will change depending whether a monitor is set for 1024x768 or 1280x800.


DPI (Dots Per Inch)  
& PPI (Pixels Per Inch)
DPI is technically a printing term referring to how many ink dots are in a linear inch.  With inkjet printers a higher dpi image usually means better looking print.  But the inkjet technology and the type of paper being printed on is also a factor.  If the paper allows the ink dots to bleed into each other, known as dot gain, the image can look fuzzy no matter how sharp the original digital image looks.  DPI is also the term every scanner and digital camera manufacturer has co-opted to express the pixels per inch (PPI) capability of their products.  A scanners capability is often expressed as up to 600dpi when they actually mean 600ppi or pixels per inch.  This is so ingrained in the marketplace and common parlance that dpi as a scanning term is now the “technical” way to speak about digital capability of scanners or resolution of images.

 

 

Interpolation or Interpolated Resolution - Enhanced resolution created by software.  This is additional resolution above what a scanner is optically able to produce.  Interpolation adds what it thinks is appropriate data to an image in an attempt to achieve a higher resolution.  Note that most Library, Museum, and Archive digitization projects rely on the optical resolution of their equipment to determine standards and best practices.  For instance, NMRLS Digital Library Initiative standards call for a non-interpolated resolution of 600dpi for digital master images.

 

 

TWAIN - it’s an acronym that doesn’t mean anything, or as some report it, it means “Technology Without An Interesting Name.”  Essentially TWAIN software is what allows a computer and scanner talk to each other.  You need TWAIN drivers installed on your computer for the computer to recognize and utilize your scanner

 

 

OCR (Optical Character Recognition) - process that converts text in an image file to a word processing document .  For example a TIFF image file of an invoice in run through OCR software to produce a Microsoft Word document that can be fully edited.  As a TIFF image file Microsoft Word would treat it as a picture and you would not be able to correct misspelling, typos, etc.  Once run through OCR software this former image becomes a document that can be treated as any other word processing document.

 

This is a very useful process for older documents that have no digital or computer version (type-written letters, newspapers).  It is not a foolproof process, different fonts, word spacing, and foreign languages can create word processing documents that may need a great amount of proof-reading and correction.  Unfortunately, it is not a process that is particularly useful for hand-written documents as the variations in hand-writing styles is still too great for OCR software to overcome at this time.  Most new home-use flatbed scanners come rudimentary OCR software which can handle most simple type-written documents.

 

 

 

Image Files

 

Lossless & Lossy Compression– terms refereeing to an image file formats’ way of compressing data.

 

Lossless files compress image data and do not lose any image information when an image is saved and/or compressed.  But lossless image formats can create very large files.

Examples of lossless image file formats: TIFF, PNG

 

Lossy files compress image data and do lose image information every time an image is saved and compressed.  Over time images may appear grainier and grainer as an image is saved and re-edited and saved again.  Lossy compression is not effected each time a file is opened and closed for viewing, only when the image is opened and edited and saved in an image editing software program like PaintShop Pro.

Examples of lossy image file formats: JPEG, GIF

 

JPEG or JPG, (Joint Photographic Experts Group) – the most popular format for digital images.  JPEG is a lossy format and always compresses image data.  This is desirable because it takes up less storage space and takes less time to download over the Internet, thus the popularity of JPEG images on web sites.  Unfortunately, the more a JPEG image is compressed, the more data is lost and the image’s appearance may deteriorate.  The amount of compression that takes place when saving a JPEG image is determined by the camera or image-editing software being used.  Image-editing software (PaintShop Pro, Photoshop) gives the user latitude in how much they want to compress a JPEG image.

 

TIFF or TIF, (Tagged Image File Format)a lossless image format used by most scanners, but not popular for use on the Internet due to their file size.  TIFFs are widely used in the corporate world for document imaging and have become the de facto standard as a digital master format for nearly all major library digitization projects.  TIFF images can be compressed (lossless), but for archival storage they are usually left uncompressed (NMRLS DLI recommended standard).

 

 

 

Other Related Technologies

 

 Megapixel a digital camera term referring to a resolution of 1 million pixels.  A camera with the ability to take photographs at a resolution of 2.1 megapixels camera translates to a cameras with the resolution of 1200 high by 1800 pixels wide (1200 x 1800 = 2,160,000).  This term has become the yardstick by which camera manufacturers use to measure their camera’s capabilities against their competitors.  The higher the megapixels, the more pixels, the more colors, the more detail, and the better the image.

 

 

 CD-R (Compact Disc Recordable) - A CD-R can hold at up to 700 megabytes of data.  A CD-R can not have data erased from it, but it can have data continually added until it fills up the 700 megabyte capacity.  CD-R discs are easily obtainable, cheap, and readable in nearly every CD drive and in most DVD drives too.  While Compact Discs have proven to be a reliable and cheap way to store data, their true longevity is still unknown.

For long-term data storage NMRLS recommends using the ISO9660 File Format.  The ISO9660 File Format is an international cross-platform standard used for recording data, and is readable by all computer operating systems and CD/DVD drives.  If you are archiving data for future access this is also a very good choice as all operating. 

 

a note about DVDs …

DVD-R and DVD+R formats hold up to 4.7 gigabytes of information!  And prices for DVD readers (players), writers (burners), and the discs themselves keep falling.  

 

Care and Handling of CDs and DVDs: A guide for Librarians and Archivists, Fred R Byers.

Council on Library and Information Resources / National Institute of Standards and Technology, October 2003.

(CLIR pub21). FREE DOWNLOAD (PDF format):  http://www.clir.org/pubs/reports/pub121/pub121.pdf

 

 

 

Links

 

 

 

 

Updated: 3/27/08 Scott Kehoe, Technology Consultant, NMRLS