Metadata

File-level metadata associated with a data file are collected in a ReadStatMeta; while variable-level metadata associated with each data column are collected in ReadStatColMetas. These metadata objects are stored in a ReadStatTable along with the data columns and can be accessed via methods compatible with DataAPI.jl.

File-Level Metadata

Each ReadStatTable contains a ReadStatMeta for file-level metadata.

ReadStatTables.ReadStatMetaType
ReadStatMeta <: AbstractMetaDict

A collection of file-level metadata associated with a data file processed with ReadStat.

Metadata can be retrieved and modified from the associated ReadStatTable via methods compatible with DataAPI.jl. A dictionary-like interface is also available for directly working with ReadStatMeta.

Fields

  • row_count::Int: number of rows returned by ReadStat parser; being -1 if not available in metadata; may reflect the value set with the row_limit parser option instead of the actual number of rows in the data file.
  • var_count::Int: number of data columns returned by ReadStat parser.
  • creation_time::DateTime: timestamp for file creation.
  • modified_time::DateTime: timestamp for file modification.
  • file_format_version::Int: version number of file format.
  • file_format_is_64bit::Bool: indicator for 64-bit file format; only relevant to SAS.
  • compression::readstat_compress_t: file compression mode; only relevant to certain file formats.
  • endianness::readstat_endian_t: endianness of data file.
  • table_name::String: name of the data table; only relevant to .xpt format.
  • file_label::String: label of data file.
  • file_encoding::String: character encoding of data file.
  • notes::Vector{String}: notes attached to data file.
  • file_ext::String: file extension of data file.
source

To retrieve the ReadStatMeta from the ReadStatTable:

julia> metadata(tb)ReadStatMeta:
  row count           => 5
  var count           => 7
  modified time       => 2021-04-22T21:36:00
  file format version => 118
  file label          => A test file
  file extension      => .dta

The value associated with a specific metadata key can be retrieved via:

julia> metadata(tb, "file_label")"A test file"
julia> metadata(tb, "file_label", style=true)("A test file", :note)

To obtain a complete list of metadata keys:

julia> metadatakeys(tb)("row_count", "var_count", "creation_time", "modified_time", "file_format_version", "file_format_is_64bit", "compression", "endianness", "table_name", "file_label", "file_encoding", "notes", "file_ext")

Metadata contained in a ReadStatMeta can be modified, optionally with a metadata style set at the same time:

julia> metadata!(tb, "file_label", "A file label", style=:default)ReadStatMeta:
  row count           => 5
  var count           => 7
  modified time       => 2021-04-22T21:36:00
  file format version => 118
  file label          => A file label
  file extension      => .dta

Since ReadStatMeta has a dictionary-like interface, one can also directly work with it:

julia> m = metadata(tb)ReadStatMeta:
  row count           => 5
  var count           => 7
  modified time       => 2021-04-22T21:36:00
  file format version => 118
  file label          => A file label
  file extension      => .dta
julia> keys(m)KeySet for a ReadStatMeta with 13 entries. Keys: "row_count" "var_count" "creation_time" "modified_time" "file_format_version" "file_format_is_64bit" "compression" "endianness" "table_name" "file_label" "file_encoding" "notes" "file_ext"
julia> m["file_label"]"A file label"
julia> m["file_label"] = "A new file label""A new file label"
julia> copy(m)Dict{String, Any} with 13 entries: "file_ext" => ".dta" "file_encoding" => "" "file_label" => "A new file label" "var_count" => 7 "row_count" => 5 "modified_time" => DateTime("2021-04-22T21:36:00") "file_format_version" => 118 "file_format_is_64bit" => true "table_name" => "" "creation_time" => DateTime("2021-04-22T21:36:00") "endianness" => READSTAT_ENDIAN_LITTLE "compression" => READSTAT_COMPRESS_NONE "notes" => String[]

Variable-Level Metadata

A ReadStatColMeta is associated with each data column for variable-level metadata.

ReadStatTables.ReadStatColMetaType
ReadStatColMeta <: AbstractMetaDict

A collection of variable-level metadata associated with a data column processed with ReadStat.

Metadata can be retrieved and modified from the associated ReadStatTable via methods compatible with DataAPI.jl. A dictionary-like interface is also available for directly working with ReadStatColMeta, but it does not allow modifying metadata values. An alternative way to retrive and modify the metadata is via colmetavalues.

Fields

  • label::String: variable label.
  • format::String: variable format.
  • type::readstat_type_t: original variable type recognized by ReadStat.
  • vallabel::Symbol: name of the dictionary of value labels associated with the variable; see also getvaluelabels for the effect of modifying this field.
  • storage_width::Csize_t: variable storage width in data file.
  • display_width::Cint: width for display.
  • measure::readstat_measure_t: measure type of the variable; only relevant to SPSS.
  • alignment::readstat_alignment_t: variable display alignment.
source

To retrieve the ReadStatColMeta for a specified data column contained in a ReadStatTable:

julia> colmetadata(tb, :mylabl)ReadStatColMeta:
  label         => labeled
  format        => %16.0f
  type          => READSTAT_TYPE_INT8
  value label   => mylabl
  storage width => 1
  display width => 16
  measure       => READSTAT_MEASURE_UNKNOWN
  alignment     => READSTAT_ALIGNMENT_RIGHT

The value associated with a specific metadata key can be retrieved via:

julia> colmetadata(tb, :mylabl, "label")"labeled"
julia> colmetadata(tb, :mylabl, "label", style=true)("labeled", :note)

To obtain a complete list of metadata keys:

julia> colmetadatakeys(tb, :mylabl)("label", "format", "type", "vallabel", "storage_width", "display_width", "measure", "alignment")

Metadata contained in a ReadStatColMeta can be modified, optionally with a metadata style set at the same time:

julia> colmetadata!(tb, :mylabl, "label", "A variable label", style=:default)ColMetaIterator{ReadStatColMeta} with 7 entries:
  :mychar => ReadStatColMeta(character, %-1s)
  :mynum  => ReadStatColMeta(numeric, %16.2f)
  :mydate => ReadStatColMeta(date, %td)
  :dtime  => ReadStatColMeta(datetime, %tc)
  :mylabl => ReadStatColMeta(A variable label, %16.0f)
  :myord  => ReadStatColMeta(ordinal, %16.0f)
  :mytime => ReadStatColMeta(time, %tcHH:MM:SS)

A ReadStatColMeta also has a dictionary-like interface:

julia> m = colmetadata(tb, :mylabl)ReadStatColMeta:
  label         => A variable label
  format        => %16.0f
  type          => READSTAT_TYPE_INT8
  value label   => mylabl
  storage width => 1
  display width => 16
  measure       => READSTAT_MEASURE_UNKNOWN
  alignment     => READSTAT_ALIGNMENT_RIGHT
julia> keys(m)KeySet for a ReadStatColMeta with 8 entries. Keys: "label" "format" "type" "vallabel" "storage_width" "display_width" "measure" "alignment"
julia> m["label"]"A variable label"
julia> copy(m)Dict{String, Any} with 8 entries: "label" => "A variable label" "format" => "%16.0f" "display_width" => 16 "measure" => READSTAT_MEASURE_UNKNOWN "alignment" => READSTAT_ALIGNMENT_RIGHT "type" => READSTAT_TYPE_INT8 "storage_width" => 0x0000000000000001 "vallabel" => :mylabl

However, it cannot be modified directly via setindex!:

julia> m["label"] = "A new label"ERROR: MethodError: no method matching setindex!(::ReadStatColMeta, ::String, ::String)

Closest candidates are:
  setindex!(::AbstractDict, ::Any, ::Any, !Matched::Any, !Matched::Any...)
   @ Base abstractdict.jl:550

Instead, since the metadata associated with each key are stored consecutively in arrays internally, one may directly access the underlying array for a given metadata key:

julia> v = colmetavalues(tb, "label")7-element Vector{String}:
 "character"
 "numeric"
 "date"
 "datetime"
 "A variable label"
 "ordinal"
 "time"

Notice that changing any value in the array returned above will affect the corresponding ReadStatColMeta:

julia> colmetadata(tb, :mychar, "label")"character"
julia> v[1] = "char""char"
julia> colmetadata(tb, :mychar, "label")"char"

Metadata Styles

Metadata styles provide additional information on how the metadata should be processed in certain scenarios. ReadStatTables.jl does not require such information. However, specifying metadata styles can be useful when the metadata need to be transferred to some other object (e.g., DataFrame from DataFrames.jl). Packages that implement metadata-related methods compatible with DataAPI.jl are able to recognize the metadata contained in ReadStatTable.

By default, metadata on labels and notes have the :note style; all other metadata have the :default style. Keys for metadata with user-specified styles, along with those that have the :note style by default, are recorded in a dictionary:

julia> metastyle(tb)Dict{Symbol, Symbol} with 4 entries:
  :label      => :default
  :vallabel   => :note
  :notes      => :note
  :file_label => :default

All metadata associated with keys not listed above are of :default style. To modify the metadata style for those associated with a given key:

julia> metastyle!(tb, "modified_time", :note)Dict{Symbol, Symbol} with 5 entries:
  :label         => :default
  :vallabel      => :note
  :notes         => :note
  :modified_time => :note
  :file_label    => :default

The same method is also used for variable-specific metadata. However, since the styles are only determined by the metadata keys, metadata associated with the same key always have the same style and hence are not distinguished across different columns.

julia> metastyle!(tb, "label", :default)Dict{Symbol, Symbol} with 5 entries:
  :label         => :default
  :vallabel      => :note
  :notes         => :note
  :modified_time => :note
  :file_label    => :default
julia> colmetadata(tb, :mychar, "label", style=true)("char", :default)
julia> colmetadata(tb, :mynum, "label", style=true)("numeric", :default)
ReadStatTables.metastyleFunction
metastyle(tb::ReadStatTable, [key::Union{Symbol, AbstractString}])

Return the specified style(s) of all metadata for table tb. If a metadata key is specified, only the style for the associated metadata are returned. By default, metadata on labels and notes have the :note style; all other metadata have the :default style.

The style of metadata is only determined by key and hence is not distinguished across different columns.

source
ReadStatTables.metastyle!Function
metastyle!(tb::ReadStatTable, key::Union{Symbol, AbstractString}, style::Symbol)

Set the style of all metadata associated with key to style for table tb.

The style of metadata is only determined by key and hence is not distinguished across different columns.

source