Date and Time Values

Date and time values in the data files are recognized based on the format of each variable. Many data/time formats can be recognized without user intervention.[1] In case certain date/time formats are not recognized, they can be added easily.

Translating Date and Time Values

For all date/time formats from Stata, SAS and SPSS, the date and time values are stored as the numbers of periods elapsed since a reference date or time point (epoch) chosen by the software. Therefore, knowing the reference data/time and the length of a single period is sufficient for uncovering the represented date/time values for a given format.

Info

Two exceptions are Stata format "%tw" for weeks and "%ty" for years. Stata always counts the week numbers starting from the first day of a year. Each year always consists of 52 weeks. Any remaining day at the end of a year is counted as the 52th week within that year. Conversion for a variable with format "%tw" is therefore handled differently. For "%ty", the recorded numerical values are simply the calendar years without any transformation. A variable with format "%ty" is not converted to Julia Date or DateTime.

If a variable is in a date/time format that can be recognized, the values will be displayed as Julia Date or DateTime when printing a ReadStatTable. Notice that the underlying numerical values are preserved and the conversion to the Julia Date or DateTime happens only lazily via a MappedArray when working with a ReadStatTable.

julia> using ReadStatTables, DataFrames
julia> tb = readstat("data/sample.dta")5×7 ReadStatTable: Row │ mychar mynum mydate dtime mylabl ⋯ │ String3 Float64 Date? DateTime? Labeled{Int8} Label ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ a 1.1 2018-05-06 2018-05-06T10:10:10 Male ⋯ 2 │ b 1.2 1880-05-06 1880-05-06T10:10:10 Female ⋯ 3 │ c -1000.3 1960-01-01 1960-01-01T00:00:00 Male ⋯ 4 │ d -1.4 1583-01-01 1583-01-01T00:00:00 Female ⋯ 5 │ e 1000.3 missing missing Male ⋯ 2 columns omitted
julia> tb.mydate5-element mappedarray(ReadStatTables.Num2DateTime{Date, Dates.Day}(Date("1960-01-01"), Dates.Day(1)), ReadStatTables.DateTime2Num{ReadStatTables.Num2DateTime{Date, Dates.Day}}(ReadStatTables.Num2DateTime{Date, Dates.Day}(Date("1960-01-01"), Dates.Day(1))), ::SentinelArrays.SentinelVector{Float64, Float64, Missing, Vector{Float64}}) with eltype Union{Missing, Date}: 2018-05-06 1880-05-06 1960-01-01 1583-01-01 missing
julia> tb.mydate.data5-element SentinelArrays.SentinelVector{Float64, Float64, Missing, Vector{Float64}}: 21310.0 -29093.0 0.0 -137696.0 missing
julia> colmetadata(tb, :mydate, "format")"%td"

The variable-level metadata key named format informs ReadStatTable whether the variable represents date/time and how the numerical values should be interpreted. Changing the format directly affects how the values are displayed, although the numerical values remain unchanged.

julia> colmetadata!(tb, :mydate, "format", "%tm")ColMetaIterator{ReadStatColMeta} with 7 entries:
  :mychar => ReadStatColMeta(character, %-1s)
  :mynum  => ReadStatColMeta(numeric, %16.2f)
  :mydate => ReadStatColMeta(date, %tm)
  :dtime  => ReadStatColMeta(datetime, %tc)
  :mylabl => ReadStatColMeta(labeled, %16.0f)
  :myord  => ReadStatColMeta(ordinal, %16.0f)
  :mytime => ReadStatColMeta(time, %tcHH:MM:SS)
julia> tb.mydate5-element mappedarray(ReadStatTables.Num2DateTime{Date, Dates.Month}(Date("1960-01-01"), Dates.Month(1)), ReadStatTables.DateTime2Num{ReadStatTables.Num2DateTime{Date, Dates.Month}}(ReadStatTables.Num2DateTime{Date, Dates.Month}(Date("1960-01-01"), Dates.Month(1))), ::SentinelArrays.SentinelVector{Float64, Float64, Missing, Vector{Float64}}) with eltype Union{Missing, Date}: 3735-11-01 -0465-08-01 1960-01-01 -9515-05-01 missing
julia> colmetadata!(tb, :mydate, "format", "%8.0f")ColMetaIterator{ReadStatColMeta} with 7 entries: :mychar => ReadStatColMeta(character, %-1s) :mynum => ReadStatColMeta(numeric, %16.2f) :mydate => ReadStatColMeta(date, %8.0f) :dtime => ReadStatColMeta(datetime, %tc) :mylabl => ReadStatColMeta(labeled, %16.0f) :myord => ReadStatColMeta(ordinal, %16.0f) :mytime => ReadStatColMeta(time, %tcHH:MM:SS)
julia> tb.mydate5-element SentinelArrays.SentinelVector{Float64, Float64, Missing, Vector{Float64}}: 21310.0 -29093.0 0.0 -137696.0 missing

Copying a ReadStatTable (e.g., converting to a DataFrame) may drop the underlying numerical values. Hence, users who wish to directly work with the underlying numerical values may want to preserve the ReadStatTable generated from the data file.

julia> df = DataFrame(tb)5×7 DataFrame
 Row │ mychar   mynum    mydate     dtime                mylabl     myord      ⋯
     │ String3  Float64  Float64?   DateTime?            LabeledV…  LabeledV…  ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ a            1.1    21310.0  2018-05-06T10:10:10  Male       low        ⋯
   2 │ b            1.2   -29093.0  1880-05-06T10:10:10  Female     medium
   3 │ c        -1000.3        0.0  1960-01-01T00:00:00  Male       high
   4 │ d           -1.4  -137696.0  1583-01-01T00:00:00  Female     low
   5 │ e         1000.3  missing    missing              Male       missing    ⋯
                                                                1 column omitted
julia> df.mydate5-element Vector{Union{Missing, Float64}}: 21310.0 -29093.0 0.0 -137696.0 missing

In the above example, df.mydate only contains the Date values and the underlying numerical values are lost when constructing the DataFrame.

The full lists of recognized date/time formats for the statistical software are stored as dictionary keys; while the associated values are tuples of reference date/time and period length.[2] If a date/time format is not found in the dictionary, no type conversion will be attempted. Additional formats may be added by inserting key-value pairs to the relevant dictionaries.

julia> ReadStatTables.stata_dt_formatsDict{String, Tuple{Union{Date, DateTime}, Dates.Period}} with 6 entries:
  "%tc" => (DateTime("1960-01-01T00:00:00"), Millisecond(1))
  "%tw" => (Date("1960-01-01"), Week(1))
  "%td" => (Date("1960-01-01"), Day(1))
  "%tq" => (Date("1960-01-01"), Month(3))
  "%tm" => (Date("1960-01-01"), Month(1))
  "%th" => (Date("1960-01-01"), Month(6))
julia> ReadStatTables.sas_dt_formats["MMDDYY"](Date("1960-01-01"), Dates.Day(1))
julia> ReadStatTables.spss_dt_formats["TIME"](DateTime("1582-10-14T00:00:00"), Dates.Second(1))
  • 1For Stata, all date/time formats except "%tC" and "%d" are supported. The only difference between the "%tC" format and the "%tc" format is that "%tC" takes into account leap seconds while "%tc" does not. Since the DateTime type in the Dates module does not allow leap seconds, only the "%tc" format is supported. The "%d" format that appears in earlier versions of Stata is no longer documented in recent versions. For SAS and SPSS, the coverage of date/time formats might be less comprehensive.
  • 2For Stata, the reference for date/time value translation is the official Stata documentation. Only the first three characters in the format strings affect the coding. For SAS and SPSS, the reference is pyreadstat/_readstat_parser.pyx.