Value Labels
Value labels collected from the data files are incorporated into the associated data columns via a custom array type LabeledArray.
LabeledValue and LabeledArray
LabeledValue and LabeledArray are designed to imitate how variables associated with value labels are represented in the original data files from the statistical software. The former wraps a data array with a reference to the value labels; while the latter wraps a single data value. The element of a LabeledArray is always a LabeledValue. However, a LabeledValue obtained from a LabeledArray is only constructed when being retrieved via getindex for efficient storage.
Some noteworthy distinctions of a LabeledArray are highlighted below:
- Values are never re-encoded when a
LabeledArrayis constructed.[1] - It is allowed for some values in a
LabeledArrayto not have a value label.[2] - A label is always a
Stringeven when it is associated withmissing.
In essence, a LabeledArray is simply an array of data values (typically numbers) bundled with a dictionary of value labels. There is no restriction imposed on the correspondence between the data values and value labels. Namely, a data value in a LabeledArray is not necessarily attached with a value label from the associated dictionary; while the key of a value label contained in the dictionary may not match any array element. Furthermore, the dictionary of value labels may be switched and shared across different LabeledArrays. When setting values in a LabeledArray, the array of data values are modified directly with no additional check on the associated dictionary of value labels. For this reason, the functionality of a LabeledArray is not equivalent to that of an array type designed for categorical data (e.g., CategoricalArray from CategoricalArrays.jl). They are not complete substitutes for each other.
More details are below:
ReadStatTables.LabeledValue — TypeLabeledValue{T, K}Value of type T associated with a dictionary of value labels with keys of type K. If a value v is not euqal (==) to a key in the dictionary, then string(v) is taken as the value label. See also LabeledArray.
The value underlying a LabeledValue can be accessed via unwrap. The value label can be obtained by calling valuelabel or converting a LabeledValue to String via convert. The dictionary of value labels (typically assoicated with a data column) can be accessed via getvaluelabels.
Comparison operators ==, isequal, <, isless and isapprox compare the underlying value of type T and disregard any value label. To compare the value label, use valuelabel to retrieve the label first.
Examples
julia> lbls = Dict{Int,String}(0=>"a", 1=>"a");
julia> v0 = LabeledValue(0, lbls)
0 => a
julia> v1 = LabeledValue(1, lbls)
1 => a
julia> vm = LabeledValue(missing, lbls)
missing => missing
julia> v0 == v1
false
julia> v1 == 1
true
julia> isnan(v1)
false
julia> isequal(vm, missing)
true
julia> unwrap(v0)
0
julia> valuelabel(v1) == "a"
true
julia> getvaluelabels(v1) === lbls
trueReadStatTables.LabeledArray — TypeLabeledArray{V, N, A<:AbstractArray{V, N}, K} <: AbstractArray{LabeledValue{V, K}, N}N-dimensional dense array with elements associated with value labels.
LabeledArray provides functionality that is similar to what value labels achieve in statistical software such as Stata. When printed to REPL, a LabeledArray just looks like an array of value labels. Yet, only the underlying values of type V are stored in an array of type A. The associated value labels are looked up from a dictionary of type Dict{K, String}. If a value v is not equal (==) to a key in the dictionary, then string(v) is taken as the value label. The elements of type LabeledValue{V, K} are only constructed lazily when they are retrieved.
The array of values underlying a LabeledArray can be accessed via refarray. The dictionary of value labels assoicated with a LabeledArray can be accessed via getvaluelabels. An iterator over the value labels for each element, which has the same array shape as the LabeledArray, can be obtained via valuelabels.
Equality comparison (==) involving a LabeledArray only compares the underlying values and disregard any value label. To compare the value labels, use valuelabels to obtain the labels first.
Additional array methods such as push!, insert!, deleteat!, append! are supported for LabeledVector. They are applied on the underlying array of values retrieved via refarray and do not modify the dictionary of value labels.
For convenience, LabeledArray(x::AbstractArray{<:AbstractString}, ::Type{T}=Int32) converts a string array to a LabeledArray by encoding the string values with integers of the specified type (Int32 by default).
Examples
julia> lbls1 = Dict(1=>"a", 2=>"b");
julia> lbls2 = Dict(1.0=>"p", 2.0=>"q");
julia> x = LabeledArray([0, 1, 2], lbls1)
3-element LabeledVector{Int64, Vector{Int64}, Int64}:
0 => 0
1 => a
2 => b
julia> y = LabeledArray([0.0, 1.0, 2.0], lbls2)
3-element LabeledVector{Float64, Vector{Float64}, Float64}:
0.0 => 0.0
1.0 => p
2.0 => q
julia> x == y
true
julia> x == 0:2
true
julia> refarray(x)
3-element Vector{Int64}:
0
1
2
julia> getvaluelabels(x)
Dict{Int64, String} with 2 entries:
2 => "b"
1 => "a"
julia> valuelabels(x) == ["0", "a", "b"]
true
julia> push!(x, 2)
4-element LabeledVector{Int64, Vector{Int64}, Int64}:
0 => 0
1 => a
2 => b
2 => b
julia> push!(x, 3 => "c")
5-element LabeledVector{Int64, Vector{Int64}, Int64}:
0 => 0
1 => a
2 => b
2 => b
3 => c
julia> deleteat!(x, 4:5)
3-element LabeledVector{Int64, Vector{Int64}, Int64}:
0 => 0
1 => a
2 => b
julia> append!(x, [0, 1, 2])
6-element LabeledVector{Int64, Vector{Int64}, Int64}:
0 => 0
1 => a
2 => b
0 => 0
1 => a
2 => b
julia> v = ["a", "b", "c"];
julia> LabeledArray(v, Int16)
3-element LabeledVector{Int16, Vector{Int16}, Union{Char, Int32}}:
1 => a
2 => b
3 => cReadStatTables.LabeledVector — TypeLabeledVector{V, A, K} <: AbstractVector{LabeledValue{V, K}}Alias for LabeledArray{V, 1, A, K}.
ReadStatTables.LabeledMatrix — TypeLabeledMatrix{V, A, K} <: AbstractMatrix{LabeledValue{V, K}}Alias for LabeledArray{V, 2, A, K}.
Accessing Values and Labels
For LabeledValue, the underlying data value can be retrieved via unwrap. The value label can be obtained via valuelabel or conversion to String. For LabeledArray, the underlying data values can be retrieved via refarray. An iterator of value labels that maintains the shape of the LabeledArray can be obtained by calling valuelabels.
DataAPI.unwrap — Functionunwrap(x::LabeledValue)Return the value underlying the value label of x.
ReadStatTables.valuelabel — Functionvaluelabel(x::LabeledValue)Return the value label associated with x.
ReadStatTables.getvaluelabels — Functiongetvaluelabels(x::LabeledValue)Return the dictionary of value labels (typically assoicated with a data column) attached to x.
getvaluelabels(x::LabeledArray)
getvaluelabels(x::SubArray{<:Any, <:Any, <:LabeledArray})
getvaluelabels(x::Base.ReshapedArray{<:Any, <:Any, <:LabeledArray})
getvaluelabels(x::SubArray{<:Any, <:Any, <:Base.ReshapedArray{<:Any, <:Any, <:LabeledArray}})Return the dictionary of value labels attached to x.
getvaluelabels(tb::ReadStatTable)
getvaluelabels(tb::ReadStatTable, name::Symbol)Return a dictionary of all value label dictionaries contained in tb obtained from the data file. Return a specific dictionary of value labels if a name is specified.
Each dictionary of value labels is associated with a name that may appear in the variable-level metadata under the key vallabel for identifying the dictionary of value labels attached to each data column. The same dictionary may be associated with multiple data columns. Modifying the metadata value of vallabel for a data column switches the associated value labels for the data column. If the metadata value is set to Symbol(""), the data column is not associated with any value label.
DataAPI.refarray — Functionrefarray(x::LabeledArray)
refarray(x::SubArray{<:Any, <:Any, <:LabeledArray})
refarray(x::Base.ReshapedArray{<:Any, <:Any, <:LabeledArray})
refarray(x::SubArray{<:Any, <:Any, <:Base.ReshapedArray{<:Any, <:Any, <:LabeledArray}})Return the array of values underlying a LabeledArray.
ReadStatTables.valuelabels — Functionvaluelabels(x::AbstractArray{<:LabeledValue})Return an iterator over the value labels of all elements in x. The returned object is a subtype of AbstractArray with the same size of x.
The iterator can be used to collect value labels to arrays while discarding the underlying values.
Examples
julia> x = LabeledArray([1, 2, 3], Dict(1=>"a", 2=>"b"))
3-element LabeledVector{Int64, Vector{Int64}, Int64}:
1 => a
2 => b
3 => 3
julia> lbls = valuelabels(x)
3-element ReadStatTables.LabelIterator{LabeledVector{Int64, Vector{Int64}, Int64}, 1}:
"a"
"b"
"3"
julia> collect(lbls)
3-element Vector{String}:
"a"
"b"
"3"
julia> CategoricalArray(lbls)
3-element CategoricalArray{String,1,UInt32}:
"a"
"b"
"3"