Metadata
File-level metadata associated with a data file are collected in a ReadStatMeta
; while variable-level metadata associated with each data column are collected in ReadStatColMeta
s. These metadata objects are stored in a ReadStatTable
along with the data columns and can be accessed via methods compatible with DataAPI.jl.
File-Level Metadata
Each ReadStatTable
contains a ReadStatMeta
for file-level metadata.
ReadStatTables.ReadStatMeta
— TypeReadStatMeta <: AbstractMetaDict
A collection of file-level metadata associated with a data file processed with ReadStat
.
Metadata can be retrieved and modified from the associated ReadStatTable
via methods compatible with DataAPI.jl
. A dictionary-like interface is also available for directly working with ReadStatMeta
.
Fields
row_count::Int
: number of rows returned byReadStat
parser; being-1
if not available in metadata; may reflect the value set with therow_limit
parser option instead of the actual number of rows in the data file.var_count::Int
: number of data columns returned byReadStat
parser.creation_time::DateTime
: timestamp for file creation.modified_time::DateTime
: timestamp for file modification.file_format_version::Int
: version number of file format.file_format_is_64bit::Bool
: indicator for 64-bit file format; only relevant to SAS.compression::readstat_compress_t
: file compression mode; only relevant to certain file formats.endianness::readstat_endian_t
: endianness of data file.table_name::String
: name of the data table; only relevant to.xpt
format.file_label::String
: label of data file.file_encoding::String
: character encoding of data file.notes::Vector{String}
: notes attached to data file.file_ext::String
: file extension of data file.
To retrieve the ReadStatMeta
from the ReadStatTable
:
julia> metadata(tb)
ReadStatMeta: row count => 5 var count => 7 modified time => 2021-04-22T21:36:00 file format version => 118 file label => A test file file extension => .dta
The value associated with a specific metadata key can be retrieved via:
julia> metadata(tb, "file_label")
"A test file"
julia> metadata(tb, "file_label", style=true)
("A test file", :note)
To obtain a complete list of metadata keys:
julia> metadatakeys(tb)
("row_count", "var_count", "creation_time", "modified_time", "file_format_version", "file_format_is_64bit", "compression", "endianness", "table_name", "file_label", "file_encoding", "notes", "file_ext")
Metadata contained in a ReadStatMeta
can be modified, optionally with a metadata style set at the same time:
julia> metadata!(tb, "file_label", "A file label", style=:default)
ReadStatMeta: row count => 5 var count => 7 modified time => 2021-04-22T21:36:00 file format version => 118 file label => A file label file extension => .dta
Since ReadStatMeta
has a dictionary-like interface, one can also directly work with it:
julia> m = metadata(tb)
ReadStatMeta: row count => 5 var count => 7 modified time => 2021-04-22T21:36:00 file format version => 118 file label => A file label file extension => .dta
julia> keys(m)
KeySet for a ReadStatMeta with 13 entries. Keys: "row_count" "var_count" "creation_time" "modified_time" "file_format_version" "file_format_is_64bit" "compression" "endianness" "table_name" "file_label" "file_encoding" "notes" "file_ext"
julia> m["file_label"]
"A file label"
julia> m["file_label"] = "A new file label"
"A new file label"
julia> copy(m)
Dict{String, Any} with 13 entries: "file_ext" => ".dta" "file_encoding" => "" "file_label" => "A new file label" "var_count" => 7 "row_count" => 5 "modified_time" => DateTime("2021-04-22T21:36:00") "file_format_version" => 118 "file_format_is_64bit" => true "table_name" => "" "creation_time" => DateTime("2021-04-22T21:36:00") "endianness" => READSTAT_ENDIAN_LITTLE "compression" => READSTAT_COMPRESS_NONE "notes" => String[]
Variable-Level Metadata
A ReadStatColMeta
is associated with each data column for variable-level metadata.
ReadStatTables.ReadStatColMeta
— TypeReadStatColMeta <: AbstractMetaDict
A collection of variable-level metadata associated with a data column processed with ReadStat
.
Metadata can be retrieved and modified from the associated ReadStatTable
via methods compatible with DataAPI.jl
. A dictionary-like interface is also available for directly working with ReadStatColMeta
, but it does not allow modifying metadata values. An alternative way to retrive and modify the metadata is via colmetavalues
.
Fields
label::String
: variable label.format::String
: variable format.type::readstat_type_t
: original variable type recognized byReadStat
.vallabel::Symbol
: name of the dictionary of value labels associated with the variable; see alsogetvaluelabels
for the effect of modifying this field.storage_width::Csize_t
: variable storage width in data file.display_width::Cint
: width for display.measure::readstat_measure_t
: measure type of the variable; only relevant to SPSS.alignment::readstat_alignment_t
: variable display alignment.
To retrieve the ReadStatColMeta
for a specified data column contained in a ReadStatTable
:
julia> colmetadata(tb, :mylabl)
ReadStatColMeta: label => labeled format => %16.0f type => READSTAT_TYPE_INT8 value label => mylabl storage width => 1 display width => 16 measure => READSTAT_MEASURE_UNKNOWN alignment => READSTAT_ALIGNMENT_RIGHT
The value associated with a specific metadata key can be retrieved via:
julia> colmetadata(tb, :mylabl, "label")
"labeled"
julia> colmetadata(tb, :mylabl, "label", style=true)
("labeled", :note)
To obtain a complete list of metadata keys:
julia> colmetadatakeys(tb, :mylabl)
("label", "format", "type", "vallabel", "storage_width", "display_width", "measure", "alignment")
Metadata contained in a ReadStatColMeta
can be modified, optionally with a metadata style set at the same time:
julia> colmetadata!(tb, :mylabl, "label", "A variable label", style=:default)
ColMetaIterator{ReadStatColMeta} with 7 entries: :mychar => ReadStatColMeta(character, %-1s) :mynum => ReadStatColMeta(numeric, %16.2f) :mydate => ReadStatColMeta(date, %td) :dtime => ReadStatColMeta(datetime, %tc) :mylabl => ReadStatColMeta(A variable label, %16.0f) :myord => ReadStatColMeta(ordinal, %16.0f) :mytime => ReadStatColMeta(time, %tcHH:MM:SS)
A ReadStatColMeta
also has a dictionary-like interface:
julia> m = colmetadata(tb, :mylabl)
ReadStatColMeta: label => A variable label format => %16.0f type => READSTAT_TYPE_INT8 value label => mylabl storage width => 1 display width => 16 measure => READSTAT_MEASURE_UNKNOWN alignment => READSTAT_ALIGNMENT_RIGHT
julia> keys(m)
KeySet for a ReadStatColMeta with 8 entries. Keys: "label" "format" "type" "vallabel" "storage_width" "display_width" "measure" "alignment"
julia> m["label"]
"A variable label"
julia> copy(m)
Dict{String, Any} with 8 entries: "label" => "A variable label" "format" => "%16.0f" "display_width" => 16 "measure" => READSTAT_MEASURE_UNKNOWN "alignment" => READSTAT_ALIGNMENT_RIGHT "type" => READSTAT_TYPE_INT8 "storage_width" => 0x0000000000000001 "vallabel" => :mylabl
However, it cannot be modified directly via setindex!
:
julia> m["label"] = "A new label"
ERROR: MethodError: no method matching setindex!(::ReadStatColMeta, ::String, ::String) The function `setindex!` exists, but no method is defined for this combination of argument types. Closest candidates are: setindex!(::AbstractDict, ::Any, ::Any, ::Any, ::Any...) @ Base abstractdict.jl:552
Instead, since the metadata associated with each key are stored consecutively in arrays internally, one may directly access the underlying array for a given metadata key:
ReadStatTables.colmetavalues
— Functioncolmetavalues(tb::ReadStatTable, key)
Return an array of metadata values associated with key
for all columns in tb
.
julia> v = colmetavalues(tb, "label")
7-element Vector{String}: "character" "numeric" "date" "datetime" "A variable label" "ordinal" "time"
Notice that changing any value in the array returned above will affect the corresponding ReadStatColMeta
:
julia> colmetadata(tb, :mychar, "label")
"character"
julia> v[1] = "char"
"char"
julia> colmetadata(tb, :mychar, "label")
"char"
Metadata Styles
Metadata styles provide additional information on how the metadata should be processed in certain scenarios. ReadStatTables.jl
does not require such information. However, specifying metadata styles can be useful when the metadata need to be transferred to some other object (e.g., DataFrame
from DataFrames.jl). Packages that implement metadata-related methods compatible with DataAPI.jl are able to recognize the metadata contained in ReadStatTable
.
By default, metadata on labels and notes have the :note
style; all other metadata have the :default
style. Keys for metadata with user-specified styles, along with those that have the :note
style by default, are recorded in a dictionary:
julia> metastyle(tb)
Dict{Symbol, Symbol} with 4 entries: :label => :default :vallabel => :note :notes => :note :file_label => :default
All metadata associated with keys not listed above are of :default
style. To modify the metadata style for those associated with a given key:
julia> metastyle!(tb, "modified_time", :note)
Dict{Symbol, Symbol} with 5 entries: :label => :default :vallabel => :note :notes => :note :modified_time => :note :file_label => :default
The same method is also used for variable-specific metadata. However, since the styles are only determined by the metadata keys, metadata associated with the same key always have the same style and hence are not distinguished across different columns.
julia> metastyle!(tb, "label", :default)
Dict{Symbol, Symbol} with 5 entries: :label => :default :vallabel => :note :notes => :note :modified_time => :note :file_label => :default
julia> colmetadata(tb, :mychar, "label", style=true)
("char", :default)
julia> colmetadata(tb, :mynum, "label", style=true)
("numeric", :default)
ReadStatTables.metastyle
— Functionmetastyle(tb::ReadStatTable, [key::Union{Symbol, AbstractString}])
Return the specified style(s) of all metadata for table tb
. If a metadata key
is specified, only the style for the associated metadata are returned. By default, metadata on labels and notes have the :note
style; all other metadata have the :default
style.
The style of metadata is only determined by key
and hence is not distinguished across different columns.
ReadStatTables.metastyle!
— Functionmetastyle!(tb::ReadStatTable, key::Union{Symbol, AbstractString}, style::Symbol)
Set the style of all metadata associated with key
to style
for table tb
.
The style of metadata is only determined by key
and hence is not distinguished across different columns.