Value Labels
Value labels collected from the data files are incorporated into the associated data columns via a custom array type LabeledArray
.
LabeledValue and LabeledArray
LabeledValue
and LabeledArray
are designed to imitate how variables associated with value labels are represented in the original data files from the statistical software. The former wraps a data array with a reference to the value labels; while the latter wraps a single data value. The element of a LabeledArray
is always a LabeledValue
. However, a LabeledValue
obtained from a LabeledArray
is only constructed when being retrieved via getindex
for efficient storage.
Some noteworthy distinctions of a LabeledArray
are highlighted below:
- Values are never re-encoded when a
LabeledArray
is constructed.[1] - It is allowed for some values in a
LabeledArray
to not have a value label.[2] - A label is always a
String
even when it is associated withmissing
.
In essence, a LabeledArray
is simply an array of data values (typically numbers) bundled with a dictionary of value labels. There is no restriction imposed on the correspondence between the data values and value labels. Namely, a data value in a LabeledArray
is not necessarily attached with a value label from the associated dictionary; while the key of a value label contained in the dictionary may not match any array element. Furthermore, the dictionary of value labels may be switched and shared across different LabeledArray
s. When setting values in a LabeledArray
, the array of data values are modified directly with no additional check on the associated dictionary of value labels. For this reason, the functionality of a LabeledArray
is not equivalent to that of an array type designed for categorical data (e.g., CategoricalArray
from CategoricalArrays.jl). They are not complete substitutes for each other.
More details are below:
ReadStatTables.LabeledValue
— TypeLabeledValue{T, K}
Value of type T
associated with a dictionary of value labels with keys of type K
. If a value v
is not euqal (==
) to a key in the dictionary, then string(v)
is taken as the value label. See also LabeledArray
.
The value underlying a LabeledValue
can be accessed via unwrap
. The value label can be obtained by calling valuelabel
or converting a LabeledValue
to String
via convert
. The dictionary of value labels (typically assoicated with a data column) can be accessed via getvaluelabels
.
Comparison operators ==
, isequal
, <
, isless
and isapprox
compare the underlying value of type T
and disregard any value label. To compare the value label, use valuelabel
to retrieve the label first.
Examples
julia> lbls = Dict{Int,String}(0=>"a", 1=>"a");
julia> v0 = LabeledValue(0, lbls)
0 => a
julia> v1 = LabeledValue(1, lbls)
1 => a
julia> vm = LabeledValue(missing, lbls)
missing => missing
julia> v0 == v1
false
julia> v1 == 1
true
julia> isnan(v1)
false
julia> isequal(vm, missing)
true
julia> unwrap(v0)
0
julia> valuelabel(v1) == "a"
true
julia> getvaluelabels(v1) === lbls
true
ReadStatTables.LabeledArray
— TypeLabeledArray{V, N, A<:AbstractArray{V, N}, K} <: AbstractArray{LabeledValue{V, K}, N}
N
-dimensional dense array with elements associated with value labels.
LabeledArray
provides functionality that is similar to what value labels achieve in statistical software such as Stata. When printed to REPL, a LabeledArray
just looks like an array of value labels. Yet, only the underlying values of type V
are stored in an array of type A
. The associated value labels are looked up from a dictionary of type Dict{K, String}
. If a value v
is not equal (==
) to a key in the dictionary, then string(v)
is taken as the value label. The elements of type LabeledValue{V, K}
are only constructed lazily when they are retrieved.
The array of values underlying a LabeledArray
can be accessed via refarray
. The dictionary of value labels assoicated with a LabeledArray
can be accessed via getvaluelabels
. An iterator over the value labels for each element, which has the same array shape as the LabeledArray
, can be obtained via valuelabels
.
Equality comparison (==
) involving a LabeledArray
only compares the underlying values and disregard any value label. To compare the value labels, use valuelabels
to obtain the labels first.
Additional array methods such as push!
, insert!
, deleteat!
, append!
are supported for LabeledVector
. They are applied on the underlying array of values retrieved via refarray
and do not modify the dictionary of value labels.
For convenience, LabeledArray(x::AbstractArray{<:AbstractString}, ::Type{T}=Int32)
converts a string array to a LabeledArray
by encoding the string values with integers of the specified type (Int32
by default).
Examples
julia> lbls1 = Dict(1=>"a", 2=>"b");
julia> lbls2 = Dict(1.0=>"p", 2.0=>"q");
julia> x = LabeledArray([0, 1, 2], lbls1)
3-element LabeledVector{Int64, Vector{Int64}, Int64}:
0 => 0
1 => a
2 => b
julia> y = LabeledArray([0.0, 1.0, 2.0], lbls2)
3-element LabeledVector{Float64, Vector{Float64}, Float64}:
0.0 => 0.0
1.0 => p
2.0 => q
julia> x == y
true
julia> x == 0:2
true
julia> refarray(x)
3-element Vector{Int64}:
0
1
2
julia> getvaluelabels(x)
Dict{Int64, String} with 2 entries:
2 => "b"
1 => "a"
julia> valuelabels(x) == ["0", "a", "b"]
true
julia> push!(x, 2)
4-element LabeledVector{Int64, Vector{Int64}, Int64}:
0 => 0
1 => a
2 => b
2 => b
julia> push!(x, 3 => "c")
5-element LabeledVector{Int64, Vector{Int64}, Int64}:
0 => 0
1 => a
2 => b
2 => b
3 => c
julia> deleteat!(x, 4:5)
3-element LabeledVector{Int64, Vector{Int64}, Int64}:
0 => 0
1 => a
2 => b
julia> append!(x, [0, 1, 2])
6-element LabeledVector{Int64, Vector{Int64}, Int64}:
0 => 0
1 => a
2 => b
0 => 0
1 => a
2 => b
julia> v = ["a", "b", "c"];
julia> LabeledArray(v, Int16)
3-element LabeledVector{Int16, Vector{Int16}, Union{Char, Int32}}:
1 => a
2 => b
3 => c
ReadStatTables.LabeledVector
— TypeLabeledVector{V, A, K} <: AbstractVector{LabeledValue{V, K}}
Alias for LabeledArray{V, 1, A, K}
.
ReadStatTables.LabeledMatrix
— TypeLabeledMatrix{V, A, K} <: AbstractMatrix{LabeledValue{V, K}}
Alias for LabeledArray{V, 2, A, K}
.
Accessing Values and Labels
For LabeledValue
, the underlying data value can be retrieved via unwrap
. The value label can be obtained via valuelabel
or conversion to String
. For LabeledArray
, the underlying data values can be retrieved via refarray
. An iterator of value labels that maintains the shape of the LabeledArray
can be obtained by calling valuelabels
.
DataAPI.unwrap
— Functionunwrap(x::LabeledValue)
Return the value underlying the value label of x
.
ReadStatTables.valuelabel
— Functionvaluelabel(x::LabeledValue)
Return the value label associated with x
.
ReadStatTables.getvaluelabels
— Functiongetvaluelabels(x::LabeledValue)
Return the dictionary of value labels (typically assoicated with a data column) attached to x
.
getvaluelabels(x::LabeledArray)
getvaluelabels(x::SubArray{<:Any, <:Any, <:LabeledArray})
getvaluelabels(x::Base.ReshapedArray{<:Any, <:Any, <:LabeledArray})
getvaluelabels(x::SubArray{<:Any, <:Any, <:Base.ReshapedArray{<:Any, <:Any, <:LabeledArray}})
Return the dictionary of value labels attached to x
.
getvaluelabels(tb::ReadStatTable)
getvaluelabels(tb::ReadStatTable, name::Symbol)
Return a dictionary of all value label dictionaries contained in tb
obtained from the data file. Return a specific dictionary of value labels if a name
is specified.
Each dictionary of value labels is associated with a name that may appear in the variable-level metadata under the key vallabel
for identifying the dictionary of value labels attached to each data column. The same dictionary may be associated with multiple data columns. Modifying the metadata value of vallabel
for a data column switches the associated value labels for the data column. If the metadata value is set to Symbol("")
, the data column is not associated with any value label.
DataAPI.refarray
— Functionrefarray(x::LabeledArray)
refarray(x::SubArray{<:Any, <:Any, <:LabeledArray})
refarray(x::Base.ReshapedArray{<:Any, <:Any, <:LabeledArray})
refarray(x::SubArray{<:Any, <:Any, <:Base.ReshapedArray{<:Any, <:Any, <:LabeledArray}})
Return the array of values underlying a LabeledArray
.
ReadStatTables.valuelabels
— Functionvaluelabels(x::AbstractArray{<:LabeledValue})
Return an iterator over the value labels of all elements in x
. The returned object is a subtype of AbstractArray
with the same size of x
.
The iterator can be used to collect value labels to arrays while discarding the underlying values.
Examples
julia> x = LabeledArray([1, 2, 3], Dict(1=>"a", 2=>"b"))
3-element LabeledVector{Int64, Vector{Int64}, Int64}:
1 => a
2 => b
3 => 3
julia> lbls = valuelabels(x)
3-element ReadStatTables.LabelIterator{LabeledVector{Int64, Vector{Int64}, Int64}, 1}:
"a"
"b"
"3"
julia> collect(lbls)
3-element Vector{String}:
"a"
"b"
"3"
julia> CategoricalArray(lbls)
3-element CategoricalArray{String,1,UInt32}:
"a"
"b"
"3"