YAML (and JSON) files

Information files are written in YAML or JSON, mostly YAML. YAML is a markup language that permits users to encode data in a structured format which can be shared, written and read by many applications using text files (rather than binary ones), a process known as serialization of data. It is one of the standard tools for this purpose, others being JSON and XML, which tend to be more verbose and harder for humans to read and write. This tutorial will center on YAML, but it is readily translatable to JSON for users fluent in the use of that markup language, which can, at any rate, be learned here. Keep in mind YAML is a superset of JSON, so some functionality is not readily implemented in the latter.

This is not a YAML tutorial. We freely mix required YAML syntax with best practices advocated for obsinfo and leave out many aspects of the language. For people wanting to get acquainted with YAML, a number of resources are available, such as this.

Basic YAML syntax

YAML files, as in the other markup languages mentioned, are structured hierarchically. The basic structure is the key-value pair, which permits to assign a value retrievable by key (and easily readable by a human user), such as:

last_name  : "Presley"
first_name : "Elvis"

Being hierarchical, these key-value pairs can be nested:

artist_name:
  last_name  : "Presley"
  first_name : "Elvis"

Space indentation is used in YAML to nest key-value pairs. NEVER use tabs. All key-value pairs at the same level must be equally indented. As a convention, two spaces are used for indentation.

YAML uses three dashes “—” to separate different streams of data. Always put as the first line in your file the three dashes as a best practice. Each obsinfo information file has one data stream. YAML files should have a .yaml extension.

YAML data types

There are other observations for the little piece of code above. First, the data types. Scalars can be either a number, a boolean (True or False as values) or a string, enclosed in double quotes. Numbers can be integers (without decimal point) or floating point (with decimal point).

Other data structures include lists and dictionaries. A list is simply an enumeration of elements which can be any data type, enclosed in brackets or listed in separate indented lines which start by dashes. The following are equivalent:

die_toss: [1,2,3,4,5,6]

die_toss:
 - 1
 - 2
 - 3
 - 4
 - 5
 - 6

Dictionaries are a collection of key-value pairs. They can either be indented, as above (last_name: “Presley” and first_name: “Elvis” are actually elements of a dictionary value associated with the key artist_name) or as enumerations enclosed in curly brackets, or, again, as dashes. As a side note, the curly brackets syntax is also used in JSON. The following are equivalent:

artist_name:
  last_name  : "Presley"
  first_name : "Elvis"

artist_name: { last_name  : "Presley", first_name : "Elvis"}

YAML variables

YAML is case sensitive. obsinfo uses keys in lower case with words separated by underscores. The only exceptions are the “ADCONVERSION”, “DIGITAL” and “ANALOG” filter types

Code reuse

The “$ref” special variable, a feature of JSON, is used in obsinfo to import the content of another file into the current file:

revision:
   date: "2018-06-01"
   authors:
       - $ref: "Wayne_Crawford.author.yaml#author"

In this example, only the part of the file corresponding to the key author will be included. To save space, if the key is the same as the file_type (second-to-last suffix), you don’t have to include it, so the above could be written as:

revision:
   date: "2018-06-01"
   authors:
       - $ref: "Wayne_Crawford.author.yaml"

$ref references will totally override all other keys at their level. For example, if we had another field in the same list element:

revision:
   date: "2018-06-01"
   authors:
       - $ref: "Alfred_Wegener.author.yaml"
         email: Alfred_Wegenerd@yahoo.de

the email field would disappear in the final result.

Contrast this with YAML anchors, to be discussed next.

YAML anchors

YAML anchors are used to avoid repetition, according to the DRY “don’t repeat yourself”) principle. They were important in early obsinfo versions, but are now almost completely replaced by “$ref” instances. The following description is for interested, advanced users: the obsinfo authors no longer use YAML anchors in their own files. In the following example, an anchor is defined which has the value of a dictionary:

yaml_anchors:
   obs_clock_correction_linear_defaults: &LINEAR_CLOCK_DEFAULTS
       time_base: "Seascan MCXO, ~1e-8 nominal drift"
       reference: "GPS"
       start_sync_instrument: 0

Further down the information file the following appears in several places wiwth different values for the start_sync_reference, end_sync_reference and end_sync_instrument keys:

processing:
  - clock_correction_linear_drift:
      base:
          <<: *LINEAR_CLOCK_DEFAULTS
      start_sync_reference: "2015-04-21T21:06:00Z"
      end_sync_reference: "2016-05-28T20:59:00.32Z"
      end_sync_instrument: "2016-05-28T20:59:03Z"

When an anchor is referenced with a star (*) it’s called an alias and has the effect of replacing the alias by the anchor definition. The effect will be:

processing:
  - clock_correction_linear:
      base:
        time_base: "Seascan MCXO, ~1e-8 nominal drift"
        reference: "GPS"
        start_sync_instrument: 0
      start_sync_reference: "2015-04-21T21:06:00Z"
      end_sync_reference: "2016-05-28T20:59:00.32Z"
      end_sync_instrument: "2016-05-28T20:59:03Z"

Furthermore, the << label above indicates that this is a mapping. If no fields with the similar name appear under the alias, it simply replaces the alias by the anchor. But if there are fields such as time_base under the alias, like this:

- clock_correction_linear:
    base:
       <<: *LINEAR_CLOCK_DEFAULTS
       time_base: "unavailable"

those fields will be overriden (IS THIS TRUE??), so the effect of the latter piece of code is the following:

- clock_correction_linear_drift:
                    time_base: "Seascan MCXO, ~1e-8 nominal drift"
                    reference: "GPS"
                    start_sync_instrument: 0

overriding the value of time_base.

All this allows for code reuse without needing an external file as in $ref.

Next page, Information File Structure, discusses how to start creating obsinfo information files.