Diversified Data Systems  
| Home | Company | Products | Services | FAQ | White Papers | Support | Events | Links | Contact Us | Site Map |

Data Normalization

[We would be happy to notify you by email when information is changed on this page.]

During the past few years, we have spent a great deal of time analyzing, planning, and implementing data normalization with our clients and prospects.

In some of our recent projects, data conversion and normalization has consumed more effort and resources than the rest of the project combined.

Normalization is the process of establishing and maintaining data that is correct, consistent, and complete. Data can be called “normal” after it has been “normalized.”

The use of the term “normal” to describe data should not be confused with “normal form” used to describe the design of databases (cf. “third normal form”, etc. a la Dr. E. F. Cobb et al.).

Normal Data
Normal data must be correct. This means that values are accurate. For example, if a value should be “2” and is recorded as “3,” then the value of the data is obviously compromised.

Normal data must be consistent. For example, if two instruments are both made by the same manufacturer, then the name of the manufacturer should be identical for the two instruments. Nomenclatures for a given model of equipment should be consistent. Without consistency in the data, analysis of the data is virtually impossible.

Normal data must be complete. Missing data leaves “gaps” during analysis that seriously de-grades and compromises the data that does exist. For example, if the manufacturer is omitted for an item, then it becomes impossible to aggregate the item into the model group to which it belongs. If the acquisition value or replacement value is omitted for an item, then it becomes impossible to compute aggregate inventory values.

Establishing Normal Data
The most common opportunity for normalizing existing data is during the conversion to a new computer system.

Normalizing existing data can be a truly daunting, task. Several factors tend to make data harder to normalize. Data entered by one person tends to be more normalized than data entered by several people. Data entered into one computer system tends to be more normalized than data entered into several different systems or several different versions of a single system. Data collected in a single location or facility tends to be more normalized than data collected in two or more locations or facilities.

Sometimes the effort to normalize existing data is simply too great to justify. Usually, however, at least some cleanup or normalization is incorporated during the data conversion effort to any new computer system.

Once the decision has been made to normalize existing data, considerable effort is usually required to identify the data that is currently incorrect, inconsistent, or incomplete and to formulate strategies to normalize it.

During the normalization process, it is often valuable, and sometimes virtually mandatory, to utilize a master catalog of normalized data (such as QuikPDR from C. A. Motzko & Associates, one of our valuable strategic partners) to establish baselines for consistency and completeness. Such a master catalog accumulates aliases which can be mapped into consistent data values. For example, QuikPDR has accumulated nearly 300 different representations for Hewlett Packard in various databases.

Maintaining Normal Data
After investing the effort to normalize a database, diligence, vigilance, training, and an appropriate database software system is required to maintain normal data.

The database software system must include extensive tools and technologies to standardize data that is added to the system. Once the data has been normalized, the system should serve as its own master catalog for additions of similar data.

We believe that OpenMETRIC currently provides the best capabilities on the market for maintaining normalized data for metrology and calibration.

In some cases, however, adding the capabilities of a master catalog (such as QuikPDR) can significantly improve OpenMETRIC’s data normalization results.

"If the Data Don't Match, then the Answers Won't Hatch!"
With apologies to a famous attorney, this is the light-hearted title for a tutorial scheduled for the NCSL International Conference in Salt Lake City in 2004 by Mr. Don Wyatt, our President.  (Hope to see you there!)

This is a slightly different way of saying something we've all heard for years; namely, “garbage in, garbage out.” Nowhere is this axiom more applicable than with metrology data. The most sophisticated systems in the world will not achieve their potential if the data is not normalized.
 


| Home | Company | Products | Services | FAQ | White Papers | Support | Events | Links | Contact Us | Site Map |