Value Labels

Value labels collected from the data files are incorporated into the associated data columns via a custom array type LabeledArray.

LabeledValue and LabeledArray

LabeledValue and LabeledArray are designed to imitate how variables associated with value labels are represented in the original data files from the statistical software. The former wraps a data array with a reference to the value labels; while the latter wraps a single data value. The element of a LabeledArray is always a LabeledValue. However, a LabeledValue obtained from a LabeledArray is only constructed when being retrieved via getindex for efficient storage.

Some noteworthy distinctions of a LabeledArray are highlighted below:

  • Values are never re-encoded when a LabeledArray is constructed.[1]
  • It is allowed for some values in a LabeledArray to not have a value label.[2]
  • A label is always a String even when it is associated with missing.

In essence, a LabeledArray is simply an array of data values (typically numbers) bundled with a dictionary of value labels. There is no restriction imposed on the correspondence between the data values and value labels. Namely, a data value in a LabeledArray is not necessarily attached with a value label from the associated dictionary; while the key of a value label contained in the dictionary may not match any array element. Furthermore, the dictionary of value labels may be switched and shared across different LabeledArrays. When setting values in a LabeledArray, the array of data values are modified directly with no additional check on the associated dictionary of value labels. For this reason, the functionality of a LabeledArray is not equivalent to that of an array type designed for categorical data (e.g., CategoricalArray from CategoricalArrays.jl). They are not complete substitutes for each other.

More details are below:

ReadStatTables.LabeledValueType
LabeledValue{T, K}

Value of type T associated with a dictionary of value labels with keys of type K. If a value v is not euqal (==) to a key in the dictionary, then string(v) is taken as the value label. See also LabeledArray.

The value underlying a LabeledValue can be accessed via unwrap. The value label can be obtained by calling valuelabel or converting a LabeledValue to String via convert. The dictionary of value labels (typically assoicated with a data column) can be accessed via getvaluelabels.

Comparison operators ==, isequal, <, isless and isapprox compare the underlying value of type T and disregard any value label. To compare the value label, use valuelabel to retrieve the label first.

Examples

julia> lbls = Dict{Int,String}(0=>"a", 1=>"a");

julia> v0 = LabeledValue(0, lbls)
0 => a

julia> v1 = LabeledValue(1, lbls)
1 => a

julia> vm = LabeledValue(missing, lbls)
missing => missing

julia> v0 == v1
false

julia> v1 == 1
true

julia> isnan(v1)
false

julia> isequal(vm, missing)
true

julia> unwrap(v0)
0

julia> valuelabel(v1) == "a"
true

julia> getvaluelabels(v1) === lbls
true
source
ReadStatTables.LabeledArrayType
LabeledArray{V, N, A<:AbstractArray{V, N}, K} <: AbstractArray{LabeledValue{V, K}, N}

N-dimensional dense array with elements associated with value labels.

LabeledArray provides functionality that is similar to what value labels achieve in statistical software such as Stata. When printed to REPL, a LabeledArray just looks like an array of value labels. Yet, only the underlying values of type V are stored in an array of type A. The associated value labels are looked up from a dictionary of type Dict{K, String}. If a value v is not equal (==) to a key in the dictionary, then string(v) is taken as the value label. The elements of type LabeledValue{V, K} are only constructed lazily when they are retrieved.

The array of values underlying a LabeledArray can be accessed via refarray. The dictionary of value labels assoicated with a LabeledArray can be accessed via getvaluelabels. An iterator over the value labels for each element, which has the same array shape as the LabeledArray, can be obtained via valuelabels.

Equality comparison (==) involving a LabeledArray only compares the underlying values and disregard any value label. To compare the value labels, use valuelabels to obtain the labels first.

Additional array methods such as push!, insert!, deleteat!, append! are supported for LabeledVector. They are applied on the underlying array of values retrieved via refarray and do not modify the dictionary of value labels.

For convenience, LabeledArray(x::AbstractArray{<:AbstractString}, ::Type{T}=Int32) converts a string array to a LabeledArray by encoding the string values with integers of the specified type (Int32 by default).

Examples

julia> lbls1 = Dict(1=>"a", 2=>"b");

julia> lbls2 = Dict(1.0=>"p", 2.0=>"q");

julia> x = LabeledArray([0, 1, 2], lbls1)
3-element LabeledVector{Int64, Vector{Int64}, Int64}:
 0 => 0
 1 => a
 2 => b

julia> y = LabeledArray([0.0, 1.0, 2.0], lbls2)
3-element LabeledVector{Float64, Vector{Float64}, Float64}:
 0.0 => 0.0
 1.0 => p
 2.0 => q

julia> x == y
true

julia> x == 0:2
true

julia> refarray(x)
3-element Vector{Int64}:
 0
 1
 2

julia> getvaluelabels(x)
Dict{Int64, String} with 2 entries:
  2 => "b"
  1 => "a"

julia> valuelabels(x) == ["0", "a", "b"]
true

julia> push!(x, 2)
4-element LabeledVector{Int64, Vector{Int64}, Int64}:
 0 => 0
 1 => a
 2 => b
 2 => b

julia> push!(x, 3 => "c")
5-element LabeledVector{Int64, Vector{Int64}, Int64}:
 0 => 0
 1 => a
 2 => b
 2 => b
 3 => c

julia> deleteat!(x, 4:5)
3-element LabeledVector{Int64, Vector{Int64}, Int64}:
 0 => 0
 1 => a
 2 => b

julia> append!(x, [0, 1, 2])
6-element LabeledVector{Int64, Vector{Int64}, Int64}:
 0 => 0
 1 => a
 2 => b
 0 => 0
 1 => a
 2 => b

julia> v = ["a", "b", "c"];

julia> LabeledArray(v, Int16)
3-element LabeledVector{Int16, Vector{Int16}, Union{Char, Int32}}:
 1 => a
 2 => b
 3 => c
source

Accessing Values and Labels

For LabeledValue, the underlying data value can be retrieved via unwrap. The value label can be obtained via valuelabel or conversion to String. For LabeledArray, the underlying data values can be retrieved via refarray. An iterator of value labels that maintains the shape of the LabeledArray can be obtained by calling valuelabels.

DataAPI.unwrapFunction
unwrap(x::LabeledValue)

Return the value underlying the value label of x.

source
ReadStatTables.getvaluelabelsFunction
getvaluelabels(x::LabeledValue)

Return the dictionary of value labels (typically assoicated with a data column) attached to x.

source
getvaluelabels(x::LabeledArray)
getvaluelabels(x::SubArray{<:Any, <:Any, <:LabeledArray})
getvaluelabels(x::Base.ReshapedArray{<:Any, <:Any, <:LabeledArray})
getvaluelabels(x::SubArray{<:Any, <:Any, <:Base.ReshapedArray{<:Any, <:Any, <:LabeledArray}})

Return the dictionary of value labels attached to x.

source
getvaluelabels(tb::ReadStatTable)
getvaluelabels(tb::ReadStatTable, name::Symbol)

Return a dictionary of all value label dictionaries contained in tb obtained from the data file. Return a specific dictionary of value labels if a name is specified.

Each dictionary of value labels is associated with a name that may appear in the variable-level metadata under the key vallabel for identifying the dictionary of value labels attached to each data column. The same dictionary may be associated with multiple data columns. Modifying the metadata value of vallabel for a data column switches the associated value labels for the data column. If the metadata value is set to Symbol(""), the data column is not associated with any value label.

source
DataAPI.refarrayFunction
refarray(x::LabeledArray)
refarray(x::SubArray{<:Any, <:Any, <:LabeledArray})
refarray(x::Base.ReshapedArray{<:Any, <:Any, <:LabeledArray})
refarray(x::SubArray{<:Any, <:Any, <:Base.ReshapedArray{<:Any, <:Any, <:LabeledArray}})

Return the array of values underlying a LabeledArray.

source
ReadStatTables.valuelabelsFunction
valuelabels(x::AbstractArray{<:LabeledValue})

Return an iterator over the value labels of all elements in x. The returned object is a subtype of AbstractArray with the same size of x.

The iterator can be used to collect value labels to arrays while discarding the underlying values.

Examples

julia> x = LabeledArray([1, 2, 3], Dict(1=>"a", 2=>"b"))
3-element LabeledVector{Int64, Vector{Int64}, Int64}:
 1 => a
 2 => b
 3 => 3

julia> lbls = valuelabels(x)
3-element ReadStatTables.LabelIterator{LabeledVector{Int64, Vector{Int64}, Int64}, 1}:
 "a"
 "b"
 "3"

julia> collect(lbls)
3-element Vector{String}:
 "a"
 "b"
 "3"

julia> CategoricalArray(lbls)
3-element CategoricalArray{String,1,UInt32}:
 "a"
 "b"
 "3"
source
  • 1The values themselves are sometimes meaningful and should not be treated as reference values.
  • 2In case a label is requested for a value that is not associated with a label, the value is converted to String.