Skip to:

Data Administration A Data Naming Primer

Version Date: 8-31-95
Version 1: 095-046

IRM Guideline 9, Version 1

DOCUMENT OVERVIEW

Purpose for This Document
The purposes of this document are to: 1) inform managers and decision-makers of the need for a good datanaming program, 2) explain the business benefits of a good data naming program, 3) show how data naming fits within a larger data administration function, and 4) identify action steps that enable agencies to begin realizing the benefits of a data naming program.

Using This Document
This document is organized into four sections:

  1. A Document Overview,
  2. A 'Data Naming Rules' section: briefly describes important aspects of data naming rules,
  3. A 'Data Naming Strategies and Management' section: explains how data naming fits into a data administration organization, and how to establish a data naming program, and
  4. An 'Appendix' section: shows how data naming and data administration fit into IRM; how data naming fits within data administration; and which aspects of data naming are in the scope of this document and a companion document, "Data Administration: A Data Naming Practitioner's Guide".

Audiences for This Document
The intended audiences for this document are executives, managers, decision-makers and data administrators with responsibility for an organization's data resources. Data base administrators, warehouse or repository implementers, systems and business analysts and those preparing vendor contract performance requirements may find this document useful for background or general information about data naming. Technical audiences should also refer to the companion document: "Data Administration: A Data Naming Practitioner's Guide".

Potential audiences are identified in the following chart. The chart identifies why each audience should be concerned with data naming, actions to take, how to proceed, and which sections of this document to read.

POTENTIAL AUDIENCES & SUGGESTED ACTIONS

Who? Why? What? (Actions) How? (To Proceed) What To Read?
Executives; managers; decision-makers (including CIO's) Data naming is key to maximizing the value of data resources (and resource investment), sharing data with others, meeting customer data needs, & realizing other business benefits
  • Ensure data administration (DA) function exists, with data naming responsibilities for the organization.
  • Ensure activities associated with managing the organization's data resources are connected to the organization's DA function.
  • Establish DA skills within the organization.
  • Promote data sharing:develop common names across organizations.
  • Circulate this guideline to those in the organization with responsibility for developing, providing access to, or otherwise managing data resources.
  • Charge DA with establishing and enforcing data naming standards, consistent with statewide standards.
  • Ensure DA has companion document, "Data Administration: A Data Naming Practitioner's Guide".
Strongly suggested:
  • Executive Summary;
  • Data Naming Rules;
  • Data Naming: Strategies and Management;
  • Appendix I
  • Appendix II

For the technically inclined or interested:

  • "Data Administration: A Data Naming Practitioner's Guide" (companion document)
Data Administrators DA is responsible for the integrity of the organization's data resources. Read this document if general background, or information about DA or data naming, are needed.
  • Develop data naming policies, rules & methods.
  • Also read "Data Administration: A Data Naming Practitioner's Guide"
Strongly suggested:
  • Executive Summary;
  • Data Naming Rules;
  • Data Naming: Strategies and Management;
  • Appendix I
  • Appendix II
Data Base Administrators; Systems Analysts Needs to understand the role of DA - and its perspective on data naming; how to work with DA. Read this document if general background, or information about DA or data naming, are needed.
  • Become familiar with basic data naming strategies;
  • Also refer to "Data Administration: A Data Naming Practitioner's Guide" (as needed)
Optional (but recommended):
  • Executive Summary
  • Data Naming Rules;
  • Data Naming: Strategies and Management;
  • Appendix I
  • Appendix II
Business Analysts Needs to understand how DA facilitates data sharing and data access; how to work with DA. Read this document if general background, or information about DA or data naming, are needed.
  • Become familiar with basic data naming strategies;
  • Communicate important naming requirements to business users
Optional (but recommended):
  • Executive Summary
  • Data Naming Rules;
  • Data Naming: Strategies and Management;
  • Appendix I
  • Appendix II
Creators of vendor performance contracts for data-related services Needs to understand the impact of DA, and data naming standards, on future vendor performance contracts. Read this document if general background, or information about DA or data naming, are needed in order to prepare contracts or contract performance clauses.
  • Become familiar with basic data naming strategies;
  • Also refer to "Data Administration: Data Naming Practitioner's Guide" (as needed)
Optional (but recommended):
  • Executive Summary
  • Data Naming Rules;
  • Data Naming: Strategies and Management;
  • Appendix I
  • Appendix II

Why Should Data Naming Be Important to an Agency?
'Data names' are unique identifiers of data that provide links to information about specific data, or to the actual data itself. A data name identifies data in the same way a person's name identifies a person. However, there is an important difference between data names and people names: there are no naming standards for people to prevent two people from ending up with the same name. Data names, when developed using data naming standards, are unique and accurate identifiers that prevent duplication within a particular environment.

Properly created data names are used to help manage data resources by ensuring integrity (without duplication), providing clarity of meaning, and making data accessible to those who need it through precise identification of the required data. Data naming standards are typically developed and administered by a data administration function within an organization. Data administration is part of the data perspective within an organization's IRM (Information Resource Management) program. Appendix I, pages 1-3 show how IRM and data administration fit together and contribute to the quality of information resources.

Data naming is not a new activity: state agencies already establish data names on a regular basis as part of developing information (data) resources. However a formal data administration program that includes a data naming strategy and standards, is important for consistently achieving the desired business benefits.

An example of the use and value of good data names is the Yellow Pages of the phone book. Unique data names have been developed for use in the Yellow Pages. The "data naming strategy" within the Yellow Pages ensures integrity without duplication by maintaining a category, 'Automobiles', without also maintaining the redundant category, 'Cars'. Those who try accessing information about 'Cars' find a cross-reference to 'Automobiles', instead of a duplicate listing under 'Cars'. Without naming standards, it is unlikely the Yellow Pages could have avoided having some entries under 'Automobiles' and others under 'Cars'.

However the Yellow Pages example is somewhat oversimplified in terms of the data naming problems faced by most large organizations today. A more appropriate analogy might be an effort to standardize all Yellow Pages across phone companies. Since no naming standards existed across phone companies in the past, it is likely there are multiple naming standards in place. A feasible approach to consolidation might be to provide cross referencing between phone books based on a newly developed set of standard names. The new names could then be cross-referenced to any number of phone books, while providing a point of continuity between the books. As long as one started a search from the new standard names, any existing book could be accessed, thus leading to data sharing among phone companies.

Within most organizations today, varied computer systems, data base managers and programming languages have been responsible for a proliferation of data naming standards. In effect, organizations today are faced with the equivalent of multiple Yellow Pages to consolidate, both within their organizations and externally when they share data with other organizations.

A good data naming strategy with proper discipline and management can help with data consolidation by providing a common point of continuity. Good data names also help reduce data costs (especially those associated with data redundancy) and improve the quality of data and other information resources. Data is a valuable asset that needs to be managed and protected like any other valuable resource. The value of data assets can be maximized by keeping data management costs to a minimum while maintaining - and improving - data quality.

Business Benefits of Having a Good Data Naming Strategy
The following list includes some of the business benefits that can be realized through the use of a good data naming program:

  • Greater reusability when data can be easily identified. Reusable data minimizes costs associated with data creation (no need to reinvent the wheel), and improves delivery time for information resources that use the data.
  • Improved ability to meet customer needs as a result of having identifiable data. (Data that cannot be uniquely identified cannot be accessed.)
  • Reduced redundancy (lower cost, less obsolescence, less data to manage).
  • Better reliability of data because of less ambiguity and the reuse of existing (reliable) data.
  • Ease of access for business users and citizens. Improved reliability of accessed data.
  • Improved ability to develop quality data warehouses, data repositories and executive information systems to meet the needs of business users and citizens.
  • Increased clarity of communication through precise, agreed-upon language to describe data.
  • Ability to perform impact analysis using data names: For example, an impact analysis of the year 2000 on computer systems would be possible if data naming standards (enforcing the use of the data element name of 'Date'), had been in place when the systems were built.
  • Ability to establish concise vendor contract performance requirements, and non-performance clauses for the development of state information resources. Agencies that hire outside data base developers, data analysts and modelers need to set performance requirements and non-performance clauses. Naming standards can be used to set and measure performance requirements.
  • Help agencies be in compliance with the Minnesota Government Data Practices Act (MGDPA) by avoiding data accuracy problems (caused by unplanned redundant data), and providing access paths, via standardized data names, to make public data accessible under the law.
  • Improved quality and understandability of object and data models by using naming standards.
  • Increased (software) product quality as a result of more reliable data, better reusability, reduced costs and reduced delivery time for information resources. NOTE: Purchased software quality can also benefit from good data names, both when selecting software to meet data requirements and when implementing or integrating with existing systems.

DATA NAMING RULES In order to share data between organizations, or between computer systems within an organization, data must be uniquely and accurately identified. Accurate identification ensures that data can be defined in one place, and then shared with, or transmitted to, another place without losing its meaning or clarity. Data meaning and clarity are enforced through data names that have consistent formats and content. Standardized data names are developed based on two types of naming rules: format rules and content rules.

Format rules identify the parts of a data name and how the parts are put together in sequence to form a complete name. Content rules define what each part of the name may (or may not) contain and which abbreviations are permitted. Rules for both are within the scope of this document.

Data Naming Format Rules
Consistent data naming formats ensure data names are always constructed the same way, regardless of who constructed the name, or where it was constructed. Computer systems can be programmed to recognize parts of names that are consistently formatted. Business users or citizens can also access the appropriate parts of data names when searching for key words.

For example, searching for dates is possible when the word 'Date' in a data name is always located in the same position within the name. 'Date of Birth' and 'Birth Date' do not follow consistent formats, thus a computer would have to search for the word 'date' in order to locate both names. Unfortunately this search might also find "dated" and "dateline", depending on how the search instruction was defined. Without consistent formats, those accessing data need to be more sophisticated searchers and even, in some cases, have sophisticated computer expertise to formulate search commands.

Data Naming Content Rules
Consistent content within data names ensures the words used in the names mean the same thing regardless of who constructed the name, or where it was constructed. Consistent content also means the words used in names are as clear as possible. Words used in data names that have ambiguous meanings tend to prevent accurate data identification or comparison with other data.

For example, both 'Birth Date' and 'Birthday' would be allowed without content rules. The computer search for the word 'date' would only find the first name, thus duplication could exist without being discovered. This example also shows a second problem with content due to ambiguous meanings: 'Birthday' might mean only month and day, while 'Birth Date' more clearly includes the year of birth (standardized dates include month, day and year). This above example also shows a violation of format rules: the part of the name (format) that contains date information is not consistent, so a computer could not match words based on their position.

DATA NAMING: STRATEGIES AND MANAGEMENT

Understanding 'Data Administration'
Data naming is typically a function within data administration (DA). Data administration has only recently evolved into a unique discipline. Functions now associated with DA were originally part of other disciplines, primarily data base administration (DBA). Data administration differs from data base administration in a couple of ways. First, data administration is oriented around an organization's assets instead of the data detail focus of data base administration. Second, data administration has a broader IRM focus, and usually reports to the organization's CIO (Chief Information Officer). Data base administration usually reports within the IS (or IT) development organizations.

As DA evolved, different views of its purpose and scope have surfaced. Some views are functional, while others are organizational, thus making it difficult to compare them. Organizational views tend to be arbitrary, since several can work successfully. Organizational views also tend to be less generic than functional views. For example, in some (organizational) views, data administration directs data base administration, while in other (functional) views the functions overlap. Functional views leave the organizational structure and specific implementation of the functions up to each organization.

This document views the disciplines functionally (as overlapping), rather than organizationally. Some data management functions that typically fall within data administration's area of influence include:

  • Data naming policies, rules and methods: creation and enforcement of policies and rules. The DA function typically defines rules for data name content and format.
  • Data names: creation and enforcement of standard names. The DA function typically determines whether to use 'Car' or 'Automobile'; 'Cab' or 'Taxi , and ensures access mechanisms are in place (similar to the Yellow Pages cross-references or indices).
  • Data specifications: Field size and type, description, data access paths for researchers, etc.
  • Information modeling policies, rules and methods: creation and enforcement.
  • Data administration management policies, rules and methods: creation and enforcement.
  • Configuration management: data change history, versioning, etc.
  • Data security; integrity; reliability; archiving: ensuring and maintaining.
  • Business liaison (between business users and D.B.A.)

A model for data administration functions is shown in Appendix I, pg. 3. Data naming activities within data administration are shown in Appendix II, pg. 1. The portions of data naming within the scope of this document are shown in Appendix II, pg. 2. Detailed technical aspects of data naming, covered in the companion document, "Data Administration: A Data Naming Practitioner's Guide" are shown in Appendix II, pg. 3.

When Are Data Naming Standards Important?
Data naming standards provide consistency and continuity to data names, whether the names appear on data models or in data bases. Because data names uniquely identify data, naming standards promote a level of data integrity that is important for any data management environment. Data naming standards should generally be treated as one of the "best practices" for data management. However, there are certain areas in which data naming standards are critical to success, that should be among the first priorities for implementation.

Priorities for developing and implementing data naming standards should focus on:

  • Data that will be shared: data that is received from others, provided to others, or for which there are other stakeholders (such as local government or private sector collaboration).
  • Data involved in inter or intra-agency efforts: community data, created or used by multiple organizations, or departments within a larger organization.
  • Public data: data that must be accessible to the public.
  • Data for current systems development or integration projects: to realize internal data integrity improvements for projects currently underway (especially those that will have to create data names or match / cross-reference duplicate data anyway).
  • Data involved in current modeling efforts: Data naming standards should be developed for the naming of entities, relationships and attributes on object and data models. Without good naming standards on models, it will be difficult to transform model information to real data and be difficult to integrate models or share data.

What Is Management's Role In Establishing a Data Naming Program?
To ensure an effective data naming program is in place, agency management should:

  1. Establish a data administration (DA) function, reporting to agency executive management (i.e., CIO).
  2. Ensure the DA function has responsibility and authority for creating and enforcing data naming policies, rules and methods for the organization.
  3. Ensure the DA function name has responsibility and authority for creating and enforcing actual data names, consistent with the standards. (Appendix I, page 3 shows a sample model of a data administration function).
  4. Ensure that necessary data administration skills exist within the organization.
  5. Ensure that all data activities related to data naming (see Appendix II, page 1) are linked organizationally, or through policy and procedures, to the data administration function.
  6. Ensure data administration and data modeling "practitioners" have access to this document, and the companion document, "Data Administration: A Data Naming Practitioner's Guide". Promote the development of shared common names across organizations to facilitate data sharing and data reuse with other organizations.

Prerequisites for a Data Naming Program

  1. Special Legal Requirements Related to Data This section lists legal references that may apply to the topics covered in this document: Minnesota Government Data Practices Act (MGDPA), Chapter 13

    Information technologies and systems in Minnesota government also fall under the jurisdiction of legislation that requires:

    • A state technology, architecture, standards and guidelines
    • Information systems that do not "needlessly duplicate or needlessly conflict" with other systems
    • Efficient, cost-effective production and storing of data
    • Sharing of data between agencies
    • Required accessibility to government data
  2. Applicable Statewide and Local Standards
    When applicable, statewide standards and guidelines are patterned after international and federal standards (ISO, ANSI, FIPS, NIST). Statewide policies, standards and guidelines are documented in "Creating and Managing Information Resources for Minnesota Government Organizations", available from OT in paper form, or via the Internet (accessed through the OT Policies, Standards & Guidelines).

    To achieve legislative mandates, statewide standards and guidelines:

    • Identify common structure, content and format requirements for information resources, including data. (Sharable data are a priority for Minnesota government.)
      • Define an open systems environment that allows free flow of information within and among systems. (NOTE: One aspect of an open systems environment is common file formats for the exchange of information that ensure data is defined and transmitted consistently.)
    • "Starter List" of Policies, Rules and Methods for Data Naming: The following is a suggested "starter list" of data administration policies, rules and methods pertaining to data naming. Data Administrators may begin with this set of rules, most of which can be created using information in the companion guideline, "Data Administration: A Practitioner's Guide".

Policies for data naming
Rules for data name content
- Enterprise data
- Object / data model data
Rules for data name format
- Enterprise data
- Object / data model data
Rules for alternate names
- Enterprise data
Methods for adding to, or modifying, data rules

APPENDICES:

Appendix I:
"Framework for Conducting Business Within an IRM Environment"
"Where Does Data Administration Fit into the Organization?"
"What is Data Administration?"
"Data Naming Activities Within Data Administration"
"Scope of 'Data Administration: A Data Naming Primer"

Appendix II:
"Scope of 'Data Administration: A Data Naming Primer"