Introducing obsinfo

Philosophy and comparison to other systems

obsinfo is a system to create standard seismological metadata files (currently StationXML), as well as processing flows specific to ocean bottom seismometer (OBS) data. It’s basic philosophy is:

  1. break down every component of the system into “atomic”, non-repetitive units.

  2. Follow StationXML structure where possible, but:

    1. Add entities missing from StationXML where necessary

    2. Use appropriate units for each component (for example, specifying the offset for a digital filter, not the delay, which depends on the sampling rate)

  3. Allow full specification of a deployment using text files, for repeatibility and provenance

File formats

Compared to StationXML files

  • Minimizes repeated information

    • for example, in StationXML

      • Each channel could have the same datalogger but all of the datalogger specifications are repeated for each channel.

      • Within a channel’s response itself, several of the stages may be identical (except for the offset).

  • Eliminate fields that can be calculated from other fields, such as:

    • The <InstrumentSensitivity> field, which depends on the Stage s that follow

    • The <Delay> for a digital filter stage, which can be calculated from <Offset> * <Factor> / <InputSampleRate>

Compared to RESP files

RESP files (mostly used in the Nominal Reference Library) are just text representations of the Dataless SEED files that preceded the StationXML standard, so they share the repetitive nature of StationXML files and add the complexity of a non-standard text format.

Compared to AROL

The Atomic Response Objects Library (AROL) replaces the RESP-based Nominal Response Library in the new YASMINE system. Files use the same atomic concept and YAML structure as obsinfo, in fact the AROL format was based on a previous version of obsinfo and we try to keep the two compatible.

AROL lacks the subnetwork, station and instrumentation levels as these are assembled by YASMINE.

Metadata creation systems

Compared to PDCC

PDCC is a graphical user interface allowing one to assemble different components (sensors, dataloggers, amplifiers) and then add in deployment information. Components can be added from the Nominal Response Library (NRL), which combines RESP files with textual configuration files which allow the user to select the exact component and configuration they used. obsinfo uses a fully textual description of instruments and deployments rather than a graphical user interface.

Compared to IRIS DMC IRISWS

I don’t know much about this, it looks like a webservice to obtain component responses but I’m not sure how you’re supposed to assemble them. It might just be a more modern way to access the NRL components that is supposed to be used by newer systems.

Compared to YASMINE

YASMINE is a new StationXML metadata creation tool. It’s major difference from PDCC is its use of atomic response files, which should be compatible with obsinfo files. It provides a graphical user interface (YASMINE-EDITOR) and a command-line interface (YASMINE-CLI). The major differences from obsinfo are the lack of instrumentation, station and subnetwork levels, as well as processing information such as instrument clock drift

File formats

All information files can be written in YAML or JSON format. Use whichever you prefer. YAML is generally easier to write and read by humans, whereas JSON is easier for computers. The tutorial includes a section describing YAML files as used in obsinfo (tutorial:tutorial-1). There are many sites for converting from one format to the other and for validating either format: including this json-to-yaml-convertor and this yaml-validator.

The Tutorial

This training course is meant to accompany an instructor. The tutorial provides a more detailed step-by-step explanation and we refer to sections of the Tutorial throughout this training course.

Structural units

A full obsinfo subnetwork description consists of the following entities (starred fields are optional):

format_version: {}
*revision: {}
*notes: []
subnetwork:
    network: {}
    operators: []
    *restricted_status: <string>
    *comments: []*
    *extras: {}*
    *reference_names: {}
    stations:
        <STATIONNAME1>:
            site: <string>
            start_date: <string>
            end_date: <string>
            locations: {}
            location_code: <string>
            instrumentation:
                base:
                    equipment: {}
                    channels:
                        default:
                            *orientation: <string or {}>
                            datalogger:
                                << GENERIC_COMPONENT
                                *configuration: <string>
                                sample_rate: <number>
                                *correction: <number>
                            *preamplifier:*
                                *<< GENERIC_COMPONENT*
                                *configuration: <string>
                            sensor:
                                << GENERIC_COMPONENT
                                 seed_codes:
                                *configuration: <string>
                            *location_code: <string> # otherwise inherits from station
                            *comments: []
                            *extras: {}
                       <SPECIFIC-CHANNEL1>: {}
                       <SPECIFIC-CHANNEL2>: {}
                       ...
                *serial_number: <string>
                *modifications: {}
                *channel_modifications: {}
            *notes: []
            *comments: []
            *operators: []
            *extras: {}
            *processing:
                - *clock_correction_linear: {}
                - *clock_correction_leapsecond: {}
        <STATIONNAME2>:
            ...

Where GENERIC_COMPONENT is:

equipment: {}
*configuration_default: <string>
*configurations: {}
*stage_modifications: {}
*notes: []
*stages:*
    - stage:
        base:
            input_units: <string>
            output_units <string>
            gain: <number>
            *name: <string>
            *description: <string>
            *decimation_factor: <integer>
            *delay: <number>
            *calibration_date: <string>
            *polarity: '+' or '-'     # default is '+'
            *input_sample_rate: <number>
            *filter:
                type: <string>
                <fields depending on type>
        *configuration: <string>
        *modifications: {}
    - stage:
    - ...

And FILTER is:

type: <string>  # one of "PolesZeros", "FIR", "Coefficients",
                # "ResponseList", "Polynomial", "ADConversion",
                # "Analog", "Digital"
*description: <string>
*delay.samples: <number>  # for all except "Analog" and "PolesZeros"
*delay.seconds: <number>  # for "Analog" and "PolesZeros"
# other parameters specific to the specified type

This could all be in one file, in which case there would be little benefit over StationXML. The power of obsinfo comes from the ability to put any sub-entity into a separate file, which is called from the parent file using the $ref field.

Standard file levels are: subnetwork, instrumentation_base, datalogger_base, preamplifier_base, sensor_base, stage_base and filter. The schema files are defined at these same levels, allowing the command-line tool obsinfo-validate` to validate any file ending with {one of the above}.{yaml,json}. Other elements often put into separate files are author, location_base, network_info and operator.

A common file structure is then (this time showing only the required fields):

  • a subnetwork file:

    format_version: <string>
    subnetwork:
        operators: []
        network: {$ref: networks/xxx.network.yaml#network}
        stations:
            <STATIONNAME1>:
                site: string
                start_date: string
                end_date: string
                location_code: string
                instrumentation:
                    base: {$ref: instrumentations/xxx.instrumentation_base.yaml#instrumentation_base}
                locations: {}
           <STATIONNAME2>:
                ...
           <STATIONNAME3>:
                ...
            ...
    
  • instrumentation_base files:

    format_version: <string>
    instrumentation_base:
        equipment: {}
        channels:
            default:
                datalogger:  {base: {$ref: dataloggers/xxx.datalogger_base.yaml#datalogger_base}}
                sensor: {base: {$ref: sensors/xxx.sensor.yaml#sensor}}
           <SPECIFIC-CHANNEL1>: {}
           <SPECIFIC-CHANNEL2>: {}
           ...
    
  • datalogger_base files:

    format_version: <string>
    datalogger_base:
        << GENERIC_COMPONENT
        sample_rate: float
    
  • sensor_base files:

    format_version: <string>
    sensor_base:
        << GENERIC_COMPONENT
        seed_codes:
    
  • stage_base files:

    format_version: <string>
    stage_base:
        input_units : {}
        output_units : {}
        gain : {}
        filter :
            type : <string>
    
  • filter files:

    There are 5 filter types corresponding directly to their StationXML analogues: PoleZeros, FIR, Coefficients, ResponseList and Polynomials. 3 other types allow simpler information entry:

    • Analog: An analog stage with no filtering (translated to StationXML PoleZero without any poles or zeros)

    • Digital: A digital stage with no filtering (translated to StationXML Coefficients stage without any coefficients)

    • ADConversion: like an analog stage, plus information about input voltage and output counts limits

    For examples, see Information_Files/{datalogger, preamplifier, sensor}/stages/filters PoleZero example:

    ---
    format_version: "0.111"
    filter:
        type: "PolesZeros"
        transfer_function_type: "LAPLACE (RADIANS/SECOND)"
        zeros:
           - '0.0 + 0.0j'
           - '0.0 + 0.0j'
        poles:
           - '19.99 + 19.99j'
           - '19.99 - 19.99j'
    

You don’t actually need to put the information in each file under a field with the filetype name: in fact if you didn’t you would save a little typing, as you could specify, for example,

{$ref: xxx.datalogger_base.yaml}

instead of:

{$ref: xxx.datalogger_base.yaml#datalogger_base}

But the second style is preferred as it allows the files to contain useful provenance and version information at the base level. To incite you to use the second style, obsinfo-validate only accepts this style.

Comments, notes and extras

Comments and notes are both lists of text.

comments will be transformed in to StationXML comments. They can be entered at the subnetwork, station and channel level and will be transformed into StationXML comments at the same level.

notes will not go into the StationXML file, they are for your information only. They can be entered at the base, station, and component levels.

extras is a free object-based field. It can be used to add fields that may be useful in a future version of obsinfo. Nothing there is put into the StationXML code unless the obsinfo software is specifically updated to do so ( which allows new fields without breaking compatibilty or schema rules). They can be entered at the subnetwork, station or channel level

Configurations, channel modifications and shortcuts

components can have pre-defined configurations and their internal values can be modified from higher levels.

The simplest and most common example is specifying each station’s sampling rate, which is done as follows:

modifications:
    datalogger: {configuration: "125sps"}

Configurations

Configurations modify parameters in a given component according to an existing configuration_definition in the component’s information file.

Allowed fields are:

  • datalogger_configuration

  • sensor_configuration

  • preamplifier_configuration

Configurations can be specified at the following levels, in order of priority:

  1. station:channel_modifications

  2. instrumentation:channels:{CHNAME}

  3. instrumentation:channels:default

Configurations are defined in the the component information files under the configuration_definition field.

Channel Modifications

channel_modifications directly modify one or more parameters in a given element. This gives complete control to the user but assumes knowledge of the obsinfo hierarchy.

Details of channel_modifications are provided in the Advanced Topics section advanced/chan_mods

Shortcuts

datalogger_configuration, preamplifier_configuration and sensor_configuration are actually shortcuts for common channel_modifications. Shortcuts are hard-coded into obsinfo to allow simpler representation of common configurations or modifications. Other ones may be added, including XX_serial_number, where XX could be datalogger, sensor, preamplifier or instrumentation

Other sources

  • Channel modifications are described briefly in /tutorial/tutorial-3:channel modifications and in detail in Channel modifications

  • Component configurations are described in /tutorial/tutorial-4:configurations and /tutorial/tutorial-5:configuration definitions and /tutorial/tutorial-6:datalogger configuration definitions

Details

  • Referenced files referenced are searched for starting at the paths given in the ~/.obsinforc file

delay, offset, and correction

One area where obsinfo differs from StationXML is in its handling of delays in digital filters. StationXML (and RESP) have three parameters in each stage, relating to the time delay created by the stage, in each Stage’s Decimation section:

offset

Sample offset chosen for use. If the first sample is used, set this field to zero. If the second sample, set it to 1, and so forth.

delay

The estimated pure delay for the stage (in seconds). This value will almost always be positive to indicate a delayed signal.

correction

The time shift, if any, applied to correct for the delay at this stage. The sign convention used is opposite the <Delay> value; a positive sign here indicates that the trace was corrected to an earlier time to cancel the delay caused by the stage and indicated in the <Delay> element.

StationXML specifies the delay for each stage, leaving the offset equal to zero. A digital filter’s true delay is in samples, not seconds, meaning that the delay will depend on the sampling rate.

obsinfo’s atomic philosphy does not allow a variable delay (in seconds) when there is a constant delay (in samples). obsinfo puts delay in the stage level but offset in the filter level. For digital filters, offset` should be filled with the delay samples and ``delay should not be provided.

Details

  • Referenced files referenced are searched for starting at the paths given in the ~/.obsinforc file

Command-line files

all of the command line files start with obsinfo-, so if you have a decent shell you should be able to see them by typing obsinfo<TAB>

  • obsinfo-makeStationXML makes stationXML files from an obsinfo subnetwork file and its dependencies

  • obsinfo-validate validates subnetwork, instrumentation, datalogger, sensor, preamplifier, stage and filter files

  • obsinfo-print

  • obsinfo-print_version

  • obsinfo-setup creates the .obsinforc file and can also create an example database.

  • obsinfo-test runs a series of validation tests

The different obsinfo-makescripts-* command-line scripts are used for making IPGP-specific data processing flows, as described below. They could be used as a basis for creating your own data processing flows.

The directory obsinfo/obsinfo/addons/ contains programs to create processing scripts using the information in the subnetwork files.

This is addressed in more detail in the training_course/4_advanced module