This article explains what labelled data is and when to use this option.

 

If you create a new column in your Data Garden the main type field is used to define what type of data is gathered here and how it is going to be used within the data garden. One of these options is Labelled Data. In the CYS system, you can use labelled data to categorize values and keep your data ‘clean’. This is useful if the values you import are not entirely similar. 


An example of this is if the values “NL”, “the Netherlands” and “The Netherlands” should be grouped into one category called “Netherlands”. 


Another use is, for example, if want to combine stores into regions. In the future the organizational structure may change, and then you have to assign stores into new regions. Labelled data allows you to do that so your data will be structured and useful.


Down here you can see an example of how to create your labelled data:


Within the Data garden – Database, open the tab Definitions and click in the button Add column definition. This box will pop up and you can fill in the necessary information.


 


Header Name: this is the name of your column.

Main type: choose Labelled Data.

Sub type: choose one of the 4 options (IS, CONTAINS, SOUNDEX and REGEX).

  • IS: means that the value must be exactly similar.
  • CONTAINS: means that the value in the data must contain the value you add here.
  • SOUNDEX and REGEX: can be used to group your data based on soundex or regular expression.

 

After saving you definition, you are ready to set up you code book, so the system knows how to handle you data. 


You can download an example Excel file which you can use to set up your code book. This file contains a code (numerical), the label for that code, the sub type (e.g. CONTAINS) and all values that need to be translated into that category. You can upload your code book and the system will show the codes and values you have imported. If necessary, you can always import a new version to update your code book.


For the example above, where several values should be grouped together with label “Netherlands”, this is the corresponding code book:



If data with the value NL is imported, the CYS system will now automatically label this data as code 1 with label Netherlands. It will do the same with the values the Netherlands and Netherlands. So even though your imported values differ, in your project everything is now labelled the same way and you are able to use 1 single notation while analyzing and reporting on your data. 

In the case where you import data with a value that is not in the code book, the system automatically creates a new code for that value.