YAML : Use mapped list vs array

Solution 1:

With the first option, YAML enforces that there are no duplicate IDs. Therefore, an editor supporting YAML may support your user by showing an error in this case. With the second option, you need to check uniqueness in your code and the user only sees the error when loading the syntactically correct YAML into your application.

However, there are other factors to consider. For example, you may have a preference for the resulting in-memory data structures. If you use standard YAML implementations that deserialize to native data structures (PyYAML, SnakeYAML etc), the YAML structure imposes the type of the in-memory data structure (you can customize by writing custom constructors, but that's not trivial). For example, if you want to ask a dataset object for its ID, that is only directly doable with the second structure – if you use the first structure, you would need to search the parent table for the dataset value you have to get its ID.

So, final answer is (as always): It depends. Think about what you want to do with it. For simple configuration files, my second argument may be weaker than my first one, but I don't know what exactly you want to do with the data.

Solution 2:

Quick Answer (TL;DR)

YAML can be normalized quite cleanly and in a straightforward manner using YAML ddconfig format
Using this approach can simplify construction and maintenance of configuration files, and make them highly flexible for later use by many types of consuming applications.

Detailed Answer

Context

Data normalization (aka YAML schema definition) with YAML ddconfig format
- (tag:[email protected],2017:ddconfig)
- dmid://uu773yamldata1620421509

Problem

Scenario: Developer graille_stentiplub is creating a configuration file format for use with YAML.
- the data structure (i.e., schema) for the YAML must be flexible for use in many contexts.
- the schema should be amenable to arbitrary and flexible queries where the structure of the YAML does not "get in the way".
- the schema should be easy to read and understand by humans.
- the schema should be easily manipulated by any programming environment capable of processing standard YAML.
Special considerations: graille_stentiplub wants an easy way to determine when to use lists vs mappings.

Example

the following is a simple config file using YAML ddconfig format

  dataroot:

      file_metadata_str: |
        ### <beg-block>
        ### - caption: "my first project"
        ###   notes:  |
        ###     * href="//home/sm/docs/workup/my_first_project.txt"
        ### <end-block>

      project_info:
        prj_name_nice:        StackOverflow Demo Answer Project
        prj_name_mach:        stackoverflow_demo_001a
        prj_sponsor_url:      https://stackoverflow.com/questions/54349286
        prj_dept_url:         https://demo-university.edu/dept/basketweaving

      dataset_recipient_list:
        - [email protected]
        - [email protected]
        - [email protected]

      dataset_variations_table:
          -   dvar_id:            rate_variation
              dvar_name:          Rate variation over time      # Optional
              dvar_description:   Description here              # Optional
              dvar_type:          POINTS_2D
              dvar_opt_refresh_per_second: 5                    # Time in seconds

          -   dvar_id:            frequency_variation
              dvar_name:          Frequency variation over time
              dvar_description:   Description here              # Optional
              dvar_type:          POINTS_2D

Explanation

The entire data structure is nested under a toplevel key called dataroot (this is optional).
- Inclusion of the dataroot key makes the YAML structure more addressible but is not necessary.
- Using a filesystem analogy, you can think of dataroot as a root-level directory.
- Using an XML analogy, you can think of this as the root-level XML tag.
The entire data structure consists of a YAML mapping (aka dictionay) (aka associative-array).
- every mapping key is a first-level child of dataroot (or else a toplevel key if dataroot is omitted).
There are different types of mapping keys:
- String: (suffix _str) indicates that the mapped value is a string (aka scalar) value.
- List: (suffix _list) indicates the mapped value is a list (aka sequence).
- Info: (suffix _info) indicates the mapped value is mapping (aka dictionary) (aka associative-array).
- Table: (suffix _table) indicates the mapped value is a sequence-of-mappings (aka table).
- Tree: (suffix _tree) indicates a composite structure with support for one or more nested parent-child relationships.

Rationale

The YAML ddconfig format coincides nicely with many different contexts and tools.
This allows for simplified decision making when laying out the configuration file format, as well as simplified programming when parsing the file.

Simplicity

a _list mapping consists of a sequence of scalar-value items with no nesting.
a _info mapping consists of a scalar-key and a scalar-value (name-value pairs) with no nesting.
a _table mapping is simply a sequence of _info mappings.
nesting of arbitrary depth can be accomplished through YAML anchors and aliases, thus supporting the _tree composite data structure.

Similarity to relational databases

You can think of a ddconfig _info mapping as a single record from a standard table in a relational database.
You can think of a ddconfig _table mapping as a standard table in a relational database.
This similarity makes it extremely straightforward to transmit YAML to a database if and where necessary.

Anchors and aliases

The YAML ddconfig format works well with YAML anchors and aliases.
One or more _info mappings can be easily converted to a _table mapping by way of aliases.
Multiple _info mappings can be combined together into another _info mapping by way of YAML merge keys.