When starting a mapping project, the question of the data scope arises. At first glance, it often seems essential to map everything. How to get value from an incomplete map? However, this approach also encounters various limitations: creation and maintenance costs, the interest of the information concerning the use case, and the user populations...
If not everything is to be documented (at least initially), then we must identify what needs to be done, so we use the Key Data Elements (KDE) methodology.
What are KDEs?
KDEs is data that
- Has a significant impact on an activity or analytical process(es)
- and/or that span multiple systems and reports,
- and/or that are used by management to make critical decisions.
KDEs are items where any data quality issue impacts critical decisions. Therefore, it is precise data that requires special attention. Not all elements of a system or report are considered KDEs.
Why identify them?
As we have seen above, it is challenging and time-consuming to catalog all the data in the organization. Therefore, it is necessary to be able to limit the scope. Creating a use case is an excellent first step to limiting the scope. This approach can be completed by the key data approach. Identifying this key data will allow us to focus our actions on the data that is really important for the organization or the audience targeted by the use case.
How to identify them
To identify key data, it is necessary to make an inventory of the available data and to measure the criticality of each one. To do this, we propose to use the following scoring framework:
| Questions | Value |
| Has the data been identified as a KDE by a domain expert? | 10 |
| Is the data necessary to link systems? | 4 |
| Is the data necessary to build a report or make decisions? | 4 |
| Is the quality of this data impact customers? | 4 |
| Is the data used in many reports? | 2 |
| Is the data identified as a source for another KDE | 2 |
| Does the quality of the data have a direct impact on the data modeling? | 2 |
| Is the data identified by an external organization? | 2 |
| Is the data used for segmentation purposes? | 2 |
| Is the data made up of personal data? | 2 |
This can, of course, be completed or amended according to the constraints of the organization.
It is then just a matter of adding all the "values" obtained when data meets one or more conditions.
All data receiving a score higher than 10 are then KDE data for which it will be necessary to carry out priority documentation actions. It is quite possible to classify the KDEs even more finely, thanks to the score obtained. For example:
| Obtained score | Priority |
| [10 - 14] | 3 |
| [15 - 17] | 2 |
| [18 - 20] | 1 |
How to apply this methodology?
The evaluation of the KDE is done through an iterative and incremental method in 5 steps.
