|
Data
Normalization
[We would be happy to
notify
you by email when information is changed on this page.]
During the past few
years, we have spent a great deal of time
analyzing, planning, and implementing data normalization with our clients
and prospects.
In some of our recent projects, data conversion and normalization has
consumed more effort and resources than the rest of the project combined.
Normalization is the process of establishing and maintaining data that is
correct, consistent, and complete. Data can be called “normal” after it
has been “normalized.”
The use of the term “normal” to describe data should not be confused with
“normal form” used to describe the design of databases (cf. “third normal
form”, etc. a la Dr. E. F. Cobb et al.).
Normal Data
Normal data must be correct. This means that values are accurate.
For example, if a value should be “2” and is recorded as “3,” then the
value of the data is obviously compromised.
Normal data must be consistent. For example, if two instruments are
both made by the same manufacturer, then the name of the manufacturer
should be identical for the two instruments. Nomenclatures for a given
model of equipment should be consistent. Without consistency in the data,
analysis of the data is virtually impossible.
Normal data must be complete. Missing data leaves “gaps” during
analysis that seriously de-grades and compromises the data that does
exist. For example, if the manufacturer is omitted for an item, then it
becomes impossible to aggregate the item into the model group to which it
belongs. If the acquisition value or replacement value is omitted for an
item, then it becomes impossible to compute aggregate inventory values.
Establishing Normal Data
The most common opportunity for normalizing existing data is during the
conversion to a new computer system.
Normalizing existing data can be a truly daunting, task. Several factors
tend to make data harder to normalize. Data entered by one person tends to
be more normalized than data entered by several people. Data entered into
one computer system tends to be more normalized than data entered into
several different systems or several different versions of a single
system. Data collected in a single location or facility tends to be more
normalized than data collected in two or more locations or facilities.
Sometimes the effort to normalize existing data is simply too great to
justify. Usually, however, at least some cleanup or normalization is
incorporated during the data conversion
effort to any new computer system.
Once the decision has been made to normalize existing data, considerable
effort is usually required to identify the data that is currently
incorrect, inconsistent, or incomplete and to formulate strategies to
normalize it.
During the normalization process, it is often valuable, and sometimes
virtually mandatory, to utilize a master catalog of normalized data (such
as QuikPDR from C. A. Motzko & Associates, one of our
valuable strategic partners) to
establish baselines for consistency and completeness. Such a master
catalog accumulates aliases which can be mapped into consistent data
values. For example, QuikPDR has accumulated nearly 300 different
representations for Hewlett Packard in various databases.
Maintaining Normal Data
After investing the effort to normalize a database, diligence, vigilance,
training, and an appropriate database software system is required to
maintain normal data.
The database software system must include extensive tools and technologies
to standardize data that is added to the system. Once the data has been
normalized, the system should serve as its own master catalog for
additions of similar data.
We believe that OpenMETRIC currently provides the best capabilities on the
market for maintaining normalized data for metrology
and calibration.
In some cases, however, adding the capabilities of a master catalog (such
as QuikPDR) can significantly improve OpenMETRIC’s data normalization
results.
"If the Data Don't Match, then the Answers Won't
Hatch!"
With apologies to a famous attorney, this is the
light-hearted title for a tutorial scheduled for the NCSL International
Conference in Salt Lake City in 2004 by Mr. Don Wyatt, our President.
(Hope to see you there!)
This is a slightly
different way of saying something we've all heard for years; namely,
“garbage in, garbage out.”
Nowhere is this axiom more applicable than with metrology data. The most
sophisticated systems in the world will not achieve their potential if the
data is not normalized.
|