vignettes/DDharmonize_1YearCounts.Rmd
DDharmonize_1YearCounts.Rmd
The purpose of this process is to standardizes/harmonize one-year vital and population counts taken from the UNDP database.
There are two functions involved in this processP: DDharmonize_Vitals1
1 is implemented on births and deaths data while DDharmonize_Pop1
2 is implemented on population data.
The functions are defined as follows:
DDharmonize_Vitals1(indata, type = c(“births”,“deaths”))
DDharmonize_Pop1(indata)
In this vignette, we will use the pop1_df
dataset that is embedded on this package, and the DDharmonize_Pop1()
function to show how one-year population counts are harmonized. This dataset represents population counts in Bangladesh for the year 1950 for cases where SexID = “Both sexes”.
## Load the packages required
library(rddharmony)
library(kableExtra)
library(dplyr)
library(purrr)
## Create a function to be used to generate the table output
tab_output <- function(tab) {
kable(tab, booktabs = TRUE, align = "c", table.envir = "capctable", longtable = TRUE) %>%
kable_styling() %>%
row_spec(0, bold = T, color = "white", background = "#6ebed8") %>%
kable_paper(html_font = "helvetica") %>%
scroll_box(width = "100%", height = "300px")
}
The data, with only a few variables displayed, is shown below.
pop_complete <- pop1_df %>%
select(LocID, LocName, DataSourceYear, DataStatusName, SexName, SexID, DataSourceYear, TimeLabel, TimeMid, starts_with("Age"), DataValue, SeriesID) %>%
select(-agesort) %>%
arrange(AgeLabel)
pop_complete %>% tab_output()
LocID | LocName | DataSourceYear | DataStatusName | SexName | SexID | TimeLabel | TimeMid | AgeID | AgeUnit | AgeStart | AgeEnd | AgeSpan | AgeMid | AgeLabel | AgeSort | DataValue | SeriesID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2001 | Year | 0 | 1 | 1 | 0.5 | 0 | 1 | 1910501.750 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2002 | Year | 1 | 2 | 1 | 1.5 | 1 | 33 | 1725512.500 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2011 | Year | 10 | 11 | 1 | 10.5 | 10 | 130 | 968541.750 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2012 | Year | 11 | 12 | 1 | 11.5 | 11 | 145 | 928481.750 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2013 | Year | 12 | 13 | 1 | 12.5 | 12 | 155 | 899861.688 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2014 | Year | 13 | 14 | 1 | 13.5 | 13 | 166 | 873766.562 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2015 | Year | 14 | 15 | 1 | 14.5 | 14 | 173 | 855092.188 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2016 | Year | 15 | 16 | 1 | 15.5 | 15 | 186 | 831297.375 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2017 | Year | 16 | 17 | 1 | 16.5 | 16 | 212 | 811367.500 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2018 | Year | 17 | 18 | 1 | 17.5 | 17 | 228 | 786055.812 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2019 | Year | 18 | 19 | 1 | 18.5 | 18 | 236 | 760603.375 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2020 | Year | 19 | 20 | 1 | 19.5 | 19 | 253 | 748705.250 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2003 | Year | 2 | 3 | 1 | 2.5 | 2 | 48 | 1515458.875 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2021 | Year | 20 | 21 | 1 | 20.5 | 20 | 260 | 730268.000 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2022 | Year | 21 | 22 | 1 | 21.5 | 21 | 282 | 728241.250 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2023 | Year | 22 | 23 | 1 | 22.5 | 22 | 292 | 705503.188 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2024 | Year | 23 | 24 | 1 | 23.5 | 23 | 295 | 682028.750 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2025 | Year | 24 | 25 | 1 | 24.5 | 24 | 300 | 637370.375 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2026 | Year | 25 | 26 | 1 | 25.5 | 25 | 304 | 631710.812 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2027 | Year | 26 | 27 | 1 | 26.5 | 26 | 319 | 620529.625 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2028 | Year | 27 | 28 | 1 | 27.5 | 27 | 323 | 606084.812 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2029 | Year | 28 | 29 | 1 | 28.5 | 28 | 326 | 589641.812 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2030 | Year | 29 | 30 | 1 | 29.5 | 29 | 327 | 576689.625 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2004 | Year | 3 | 4 | 1 | 3.5 | 3 | 54 | 1381031.625 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2031 | Year | 30 | 31 | 1 | 30.5 | 30 | 329 | 558884.125 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2032 | Year | 31 | 32 | 1 | 31.5 | 31 | 337 | 525863.188 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2033 | Year | 32 | 33 | 1 | 32.5 | 32 | 346 | 510112.344 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2034 | Year | 33 | 34 | 1 | 33.5 | 33 | 347 | 494640.594 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2035 | Year | 34 | 35 | 1 | 34.5 | 34 | 350 | 479107.219 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2036 | Year | 35 | 36 | 1 | 35.5 | 35 | 353 | 464034.875 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2037 | Year | 36 | 37 | 1 | 36.5 | 36 | 364 | 468768.000 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2038 | Year | 37 | 38 | 1 | 37.5 | 37 | 368 | 452547.281 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2039 | Year | 38 | 39 | 1 | 38.5 | 38 | 370 | 436699.844 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2040 | Year | 39 | 40 | 1 | 39.5 | 39 | 371 | 383876.344 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2005 | Year | 4 | 5 | 1 | 4.5 | 4 | 61 | 1311924.250 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2041 | Year | 40 | 41 | 1 | 40.5 | 40 | 373 | 372721.312 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2042 | Year | 41 | 42 | 1 | 41.5 | 41 | 382 | 377564.375 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2043 | Year | 42 | 43 | 1 | 42.5 | 42 | 386 | 365664.250 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2044 | Year | 43 | 44 | 1 | 43.5 | 43 | 388 | 353987.594 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2045 | Year | 44 | 45 | 1 | 44.5 | 44 | 389 | 342388.125 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2046 | Year | 45 | 46 | 1 | 45.5 | 45 | 392 | 330876.625 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2047 | Year | 46 | 47 | 1 | 46.5 | 46 | 409 | 337556.375 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2048 | Year | 47 | 48 | 1 | 47.5 | 47 | 414 | 324955.875 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2049 | Year | 48 | 49 | 1 | 48.5 | 48 | 415 | 312400.969 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2050 | Year | 49 | 50 | 1 | 49.5 | 49 | 418 | 280398.500 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2006 | Year | 5 | 6 | 1 | 5.5 | 5 | 65 | 1213268.875 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2051 | Year | 50 | 51 | 1 | 50.5 | 50 | 420 | 270636.750 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2052 | Year | 51 | 52 | 1 | 51.5 | 51 | 429 | 260230.234 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2053 | Year | 52 | 53 | 1 | 52.5 | 52 | 433 | 250146.094 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2054 | Year | 53 | 54 | 1 | 53.5 | 53 | 434 | 240261.172 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2055 | Year | 54 | 55 | 1 | 54.5 | 54 | 435 | 230474.688 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2056 | Year | 55 | 56 | 1 | 55.5 | 55 | 438 | 220809.125 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2057 | Year | 56 | 57 | 1 | 56.5 | 56 | 449 | 211293.625 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2058 | Year | 57 | 58 | 1 | 57.5 | 57 | 452 | 201874.750 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2059 | Year | 58 | 59 | 1 | 58.5 | 58 | 453 | 192542.125 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2060 | Year | 59 | 60 | 1 | 59.5 | 59 | 454 | 186251.766 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2007 | Year | 6 | 7 | 1 | 6.5 | 6 | 88 | 1180320.625 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2061 | Year | 60 | 61 | 1 | 60.5 | 60 | 457 | 177076.109 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2062 | Year | 61 | 62 | 1 | 61.5 | 61 | 468 | 168137.172 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2063 | Year | 62 | 63 | 1 | 62.5 | 62 | 475 | 159553.344 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2064 | Year | 63 | 64 | 1 | 63.5 | 63 | 476 | 151449.359 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2065 | Year | 64 | 65 | 1 | 64.5 | 64 | 477 | 143912.688 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2066 | Year | 65 | 66 | 1 | 65.5 | 65 | 481 | 137008.281 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2067 | Year | 66 | 67 | 1 | 66.5 | 66 | 490 | 130768.383 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2068 | Year | 67 | 68 | 1 | 67.5 | 67 | 494 | 125131.289 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2069 | Year | 68 | 69 | 1 | 68.5 | 68 | 497 | 120036.930 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2070 | Year | 69 | 70 | 1 | 69.5 | 69 | 500 | 126044.266 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2008 | Year | 7 | 8 | 1 | 7.5 | 7 | 104 | 1126399.000 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2071 | Year | 70 | 71 | 1 | 70.5 | 70 | 502 | 121063.992 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2072 | Year | 71 | 72 | 1 | 71.5 | 71 | 509 | 112953.422 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2073 | Year | 72 | 73 | 1 | 72.5 | 72 | 513 | 104817.023 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2074 | Year | 73 | 74 | 1 | 73.5 | 73 | 515 | 96714.945 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2075 | Year | 74 | 75 | 1 | 74.5 | 74 | 516 | 88679.055 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2076 | Year | 75 | 76 | 1 | 75.5 | 75 | 518 | 80752.039 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2077 | Year | 76 | 77 | 1 | 76.5 | 76 | 526 | 72982.914 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2078 | Year | 77 | 78 | 1 | 77.5 | 77 | 528 | 65430.473 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2079 | Year | 78 | 79 | 1 | 78.5 | 78 | 529 | 58156.363 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2080 | Year | 79 | 80 | 1 | 79.5 | 79 | 531 | 51223.043 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2009 | Year | 8 | 9 | 1 | 8.5 | 8 | 117 | 1077033.875 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2081 | Year | 80 | 81 | 1 | 80.5 | 80 | 533 | 44688.840 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2082 | Year | 81 | 82 | 1 | 81.5 | 81 | 539 | 38596.305 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2083 | Year | 82 | 83 | 1 | 82.5 | 82 | 542 | 32968.590 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2084 | Year | 83 | 84 | 1 | 83.5 | 83 | 543 | 27824.787 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2085 | Year | 84 | 85 | 1 | 84.5 | 84 | 544 | 23183.484 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2086 | Year | 85 | 86 | 1 | 85.5 | 85 | 547 | 19059.059 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2087 | Year | 86 | 87 | 1 | 86.5 | 86 | 552 | 15447.104 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2088 | Year | 87 | 88 | 1 | 87.5 | 87 | 554 | 12324.689 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2089 | Year | 88 | 89 | 1 | 88.5 | 88 | 556 | 9665.167 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2090 | Year | 89 | 90 | 1 | 89.5 | 89 | 558 | 7441.825 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2010 | Year | 9 | 10 | 1 | 9.5 | 9 | 126 | 1004101.062 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2091 | Year | 90 | 91 | 1 | 90.5 | 90 | 560 | 5625.794 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2092 | Year | 91 | 92 | 1 | 91.5 | 91 | 566 | 4177.848 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2093 | Year | 92 | 93 | 1 | 92.5 | 92 | 569 | 3048.548 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2094 | Year | 93 | 94 | 1 | 93.5 | 93 | 570 | 2186.498 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 2095 | Year | 94 | 95 | 1 | 94.5 | 94 | 572 | 1540.306 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 769 | Year | 95 | 0 | -1 | 97.5 | 95+ | 574 | 2993.026 | 2.144924e+12 |
50 | Bangladesh | 2018 | Final | Both sexes | 3 | 1950 | 1950.5 | 700 | Year | 0 | -1 | -1 | -1.0 | Total | 999 | 41397624.000 | 2.144924e+12 |
We begin by initializing sex specific outputs.
cpl_sex <- NULL
abr_from_cpl_sex <- NULL
sexes <- unique(pop_complete$SexID)
sexes
#> [1] 3
This process should be looped over each of the sex ids but our data only has Both sexes
records.
sex <- 3
df <- pop_complete %>%
dplyr::filter(SexID == sex & !is.na(DataValue)) %>%
mutate(AgeLabel = as.character(AgeLabel)) %>%
distinct()
If “Final” data status is available, we keep only the final series.
if ("Final" %in% unique(df$DataStatusName)) {
df <- df %>%
dplyr::filter(DataStatusName == "Final")
}
We check for multiple series ids, and for each series id, we check whether it is a full series with all age groups represented and an open age greater than 60 3.
ids_series <- unique(df$SeriesID)
n_series <- length(ids_series)
df_out <- NULL
for (i in 1:n_series) {
df_one <- df %>% dplyr::filter(SeriesID == ids_series[i])
df_one_std <- df_one[df_one$AgeSpan %in% c(-1, -2, 1), ]
if (nrow(df_one_std) > 60) {
df_one$check_full <- dd_series_isfull(df_one_std, abridged = FALSE)
} else {
df_one$check_full <- FALSE
}
df_out <- rbind(df_out, df_one)
}
df <- df_out
rm(df_out)
In cases where we have multiple total age labels for the same series, and there is a one that is equal to the computed total, we drop the rest to be left with this one 4.
df <- df %>% dd_multiple_totals()
If there is more than one series and the latest series is full, we keep only this one. If it is not full, we keep the latest data source record for each age 5. We also tidy up the dataframe at this point.
if (n_series > 1) {
latest_source_year <- max(df$DataSourceYear)
check_latest_full <- unique(df$check_full[df$DataSourceYear == latest_source_year])
if (any(check_latest_full)) {
df <- df[df$DataSourceYear == latest_source_year, ]
} else {
df <- df %>% dd_latest_source_year()
}
}
df <- df %>%
select(DataSourceYear, AgeStart, AgeEnd, AgeLabel, AgeSpan, DataValue) %>%
distinct()
If there are still duplicate age groups, we keep the last one in current sort order.
df <- df %>%
mutate(sorting = 1:nrow(df)) %>%
group_by(AgeLabel) %>%
mutate(keeping = max(sorting)) %>%
ungroup() %>%
dplyr::filter(sorting == keeping) %>%
select(-sorting, -keeping)
If there is no record for unknown age or if the difference between the reported total and the sum over age (referred to as computed total) is equivalent to the unknown age, we set the data value of this unknown age to zero 6.
if (!("Unknown" %in% df$AgeLabel)) {
df <- df %>%
bind_rows(data.frame(
AgeStart = -2,
AgeEnd = -2,
AgeSpan = -2,
AgeLabel = "Unknown",
DataSourceYear = NA,
DataValue = 0
))
}
df <- df %>% dd_drop_unknowns()
If the “Total” value is less than the sum over age, we discard it.
total_reported <- df$DataValue[df$AgeLabel == "Total"]
total_computed <- sum(df$DataValue[df$AgeLabel != "Total"])
if (!is_empty(total_reported) & !is_empty(total_computed)) {
if (total_reported < total_computed) {
df <- df %>%
dplyr::filter(AgeLabel != "Total")
}
}
We then identify the start age of the open age group needed to close the series 7 and flag whether this open age group exists in the series. We then drop records for open age groups that do not close the series.
oag_start <- dd_oag_agestart(df, multiple5 = FALSE)
oag_check <- paste0(oag_start, "+") %in% df$AgeLabel
if (!all(df$AgeLabel %in% c("Total", "Unknown"))) {
df <- df %>%
dplyr::filter(!(AgeStart > 0 & AgeSpan == -1 & AgeStart != oag_start)) %>%
arrange(AgeStart, AgeSpan)
}
An AgeSort
field that identifies the standard age groups is added to the data 8.
df <- dd_age_standard(df, abridged = FALSE) %>%
dplyr::filter(!is.na(DataValue))
We also check that the data is in fact a complete series starting at age zero and without gaps …
check_cpl <- df %>%
dplyr::filter(AgeSpan == 1) %>%
summarise(
minAge = min(AgeStart),
maxAge = max(AgeStart),
nAge = length(unique(AgeStart))
)
check_cpl <- check_cpl$minAge == 0 & check_cpl$nAge == check_cpl$maxAge + 1
… and compute all possible open age groups given the available input 9.
if (check_cpl == TRUE) {
df_oag <- dd_oag_compute(df, age_span = 1)
df <- df %>%
bind_rows(df_oag[!(df_oag$AgeLabel %in% df$AgeLabel) &
df_oag$AgeStart == oag_start, ]) %>%
arrange(AgeSort)
}
if (!("AgeSort" %in% names(df))) {
df <- dd_age_standard(df, abridged = FALSE) %>%
dplyr::filter(!is.na(DataValue))
check_cpl <- FALSE
}
We check again whether any open age group exists and …
oag_start <- df %>% dd_oag_agestart()
oag_check <- paste0(oag_start, "+") %in% df$AgeLabel
… if total is missing and series is otherwise complete, we compute the total.
if (!("Total" %in% df$AgeLabel) & oag_check == TRUE) {
df <- df %>%
bind_rows(data.frame(
AgeStart = 0,
AgeEnd = -1,
AgeLabel = "Total",
AgeSpan = -1,
AgeSort = 184,
DataSourceYear = NA,
DataValue = sum(df$DataValue[df$AgeSpan == 1]) +
df$DataValue[df$AgeSpan == -1 & df$AgeStart == oag_start] +
df$DataValue[df$AgeLabel == "Unknown"]
))
}
A note is then created to alert about missing data.
df$note <- NA
if (check_cpl == FALSE | oag_check == FALSE) {
df$note <- "The complete series is missing data for one or more age groups."
}
df$SexID <- sex
The process is repeated for all the sex ids and finally, a series field is added to the data indicting that this is a complete series.