YAML : Use mapped list vs array
Solution 1:
With the first option, YAML enforces that there are no duplicate IDs. Therefore, an editor supporting YAML may support your user by showing an error in this case. With the second option, you need to check uniqueness in your code and the user only sees the error when loading the syntactically correct YAML into your application.
However, there are other factors to consider. For example, you may have a preference for the resulting in-memory data structures. If you use standard YAML implementations that deserialize to native data structures (PyYAML, SnakeYAML etc), the YAML structure imposes the type of the in-memory data structure (you can customize by writing custom constructors, but that's not trivial). For example, if you want to ask a dataset object for its ID, that is only directly doable with the second structure – if you use the first structure, you would need to search the parent table for the dataset value you have to get its ID.
So, final answer is (as always): It depends. Think about what you want to do with it. For simple configuration files, my second argument may be weaker than my first one, but I don't know what exactly you want to do with the data.
Solution 2:
Quick Answer (TL;DR)
- YAML can be normalized quite cleanly and in a straightforward manner using YAML
ddconfig
format - Using this approach can simplify construction and maintenance of configuration files, and make them highly flexible for later use by many types of consuming applications.
Detailed Answer
Context
- Data normalization (aka YAML schema definition) with YAML
ddconfig
format- (tag:[email protected],2017:
ddconfig
) - dmid://uu773yamldata1620421509
- (tag:[email protected],2017:
Problem
-
Scenario: Developer graille_stentiplub is creating a configuration file format for use with YAML.
- the data structure (i.e., schema) for the YAML must be flexible for use in many contexts.
- the schema should be amenable to arbitrary and flexible queries where the structure of the YAML does not "get in the way".
- the schema should be easy to read and understand by humans.
- the schema should be easily manipulated by any programming environment capable of processing standard YAML.
-
Special considerations: graille_stentiplub wants an easy way to determine when to use lists vs mappings.
Example
-
the following is a simple config file using YAML
ddconfig
formatdataroot: file_metadata_str: | ### <beg-block> ### - caption: "my first project" ### notes: | ### * href="//home/sm/docs/workup/my_first_project.txt" ### <end-block> project_info: prj_name_nice: StackOverflow Demo Answer Project prj_name_mach: stackoverflow_demo_001a prj_sponsor_url: https://stackoverflow.com/questions/54349286 prj_dept_url: https://demo-university.edu/dept/basketweaving dataset_recipient_list: - [email protected] - [email protected] - [email protected] dataset_variations_table: - dvar_id: rate_variation dvar_name: Rate variation over time # Optional dvar_description: Description here # Optional dvar_type: POINTS_2D dvar_opt_refresh_per_second: 5 # Time in seconds - dvar_id: frequency_variation dvar_name: Frequency variation over time dvar_description: Description here # Optional dvar_type: POINTS_2D
Explanation
-
The entire data structure is nested under a toplevel key called
dataroot
(this is optional).- Inclusion of the
dataroot
key makes the YAML structure more addressible but is not necessary. - Using a filesystem analogy, you can think of
dataroot
as a root-level directory. - Using an XML analogy, you can think of this as the root-level XML tag.
- Inclusion of the
-
The entire data structure consists of a YAML mapping (aka dictionay) (aka associative-array).
- every mapping key is a first-level child of
dataroot
(or else a toplevel key if dataroot is omitted).
- every mapping key is a first-level child of
-
There are different types of mapping keys:
-
String: (suffix
_str
) indicates that the mapped value is a string (aka scalar) value. -
List: (suffix
_list
) indicates the mapped value is a list (aka sequence). -
Info: (suffix
_info
) indicates the mapped value is mapping (aka dictionary) (aka associative-array). -
Table: (suffix
_table
) indicates the mapped value is a sequence-of-mappings (aka table). -
Tree: (suffix
_tree
) indicates a composite structure with support for one or more nested parent-child relationships.
-
String: (suffix
Rationale
- The YAML
ddconfig
format coincides nicely with many different contexts and tools. - This allows for simplified decision making when laying out the configuration file format, as well as simplified programming when parsing the file.
Simplicity
- a
_list
mapping consists of a sequence of scalar-value items with no nesting. - a
_info
mapping consists of a scalar-key and a scalar-value (name-value pairs) with no nesting. - a
_table
mapping is simply a sequence of_info
mappings. - nesting of arbitrary depth can be accomplished through YAML anchors and aliases, thus supporting the
_tree
composite data structure.
Similarity to relational databases
- You can think of a
ddconfig
_info
mapping as a single record from a standard table in a relational database. - You can think of a
ddconfig
_table
mapping as a standard table in a relational database. - This similarity makes it extremely straightforward to transmit YAML to a database if and where necessary.
Anchors and aliases
- The YAML
ddconfig
format works well with YAML anchors and aliases. - One or more
_info
mappings can be easily converted to a_table
mapping by way of aliases. - Multiple
_info
mappings can be combined together into another_info
mapping by way of YAML merge keys.
See also
- github link https://github.com/dreftymac/trypublic/search?q=uu773yamldata1620421509