The purpose of this process is to standardizes/harmonize one-year vital and population counts taken from the UNDP database.

There are two functions involved in this processP: DDharmonize_Vitals1 1 is implemented on births and deaths data while DDharmonize_Pop1 2 is implemented on population data.

The functions are defined as follows:

DDharmonize_Vitals1(indata, type = c(“births”,“deaths”))

DDharmonize_Pop1(indata)

In this vignette, we will use the pop1_df dataset that is embedded on this package, and the DDharmonize_Pop1() function to show how one-year population counts are harmonized. This dataset represents population counts in Bangladesh for the year 1950 for cases where SexID = “Both sexes”.

## Load the packages required
library(rddharmony)
library(kableExtra)
library(dplyr)
library(purrr)

## Create a function to be used to generate the table output
tab_output <- function(tab) {
  kable(tab, booktabs = TRUE, align = "c", table.envir = "capctable", longtable = TRUE) %>%
    kable_styling() %>%
    row_spec(0, bold = T, color = "white", background = "#6ebed8") %>%
    kable_paper(html_font = "helvetica") %>%
    scroll_box(width = "100%", height = "300px")
}

The data, with only a few variables displayed, is shown below.

pop_complete <- pop1_df %>%
  select(LocID, LocName, DataSourceYear, DataStatusName, SexName, SexID, DataSourceYear, TimeLabel, TimeMid, starts_with("Age"), DataValue, SeriesID) %>%
  select(-agesort) %>%
  arrange(AgeLabel)

pop_complete %>% tab_output()
LocID LocName DataSourceYear DataStatusName SexName SexID TimeLabel TimeMid AgeID AgeUnit AgeStart AgeEnd AgeSpan AgeMid AgeLabel AgeSort DataValue SeriesID
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2001 Year 0 1 1 0.5 0 1 1910501.750 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2002 Year 1 2 1 1.5 1 33 1725512.500 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2011 Year 10 11 1 10.5 10 130 968541.750 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2012 Year 11 12 1 11.5 11 145 928481.750 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2013 Year 12 13 1 12.5 12 155 899861.688 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2014 Year 13 14 1 13.5 13 166 873766.562 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2015 Year 14 15 1 14.5 14 173 855092.188 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2016 Year 15 16 1 15.5 15 186 831297.375 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2017 Year 16 17 1 16.5 16 212 811367.500 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2018 Year 17 18 1 17.5 17 228 786055.812 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2019 Year 18 19 1 18.5 18 236 760603.375 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2020 Year 19 20 1 19.5 19 253 748705.250 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2003 Year 2 3 1 2.5 2 48 1515458.875 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2021 Year 20 21 1 20.5 20 260 730268.000 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2022 Year 21 22 1 21.5 21 282 728241.250 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2023 Year 22 23 1 22.5 22 292 705503.188 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2024 Year 23 24 1 23.5 23 295 682028.750 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2025 Year 24 25 1 24.5 24 300 637370.375 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2026 Year 25 26 1 25.5 25 304 631710.812 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2027 Year 26 27 1 26.5 26 319 620529.625 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2028 Year 27 28 1 27.5 27 323 606084.812 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2029 Year 28 29 1 28.5 28 326 589641.812 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2030 Year 29 30 1 29.5 29 327 576689.625 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2004 Year 3 4 1 3.5 3 54 1381031.625 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2031 Year 30 31 1 30.5 30 329 558884.125 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2032 Year 31 32 1 31.5 31 337 525863.188 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2033 Year 32 33 1 32.5 32 346 510112.344 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2034 Year 33 34 1 33.5 33 347 494640.594 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2035 Year 34 35 1 34.5 34 350 479107.219 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2036 Year 35 36 1 35.5 35 353 464034.875 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2037 Year 36 37 1 36.5 36 364 468768.000 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2038 Year 37 38 1 37.5 37 368 452547.281 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2039 Year 38 39 1 38.5 38 370 436699.844 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2040 Year 39 40 1 39.5 39 371 383876.344 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2005 Year 4 5 1 4.5 4 61 1311924.250 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2041 Year 40 41 1 40.5 40 373 372721.312 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2042 Year 41 42 1 41.5 41 382 377564.375 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2043 Year 42 43 1 42.5 42 386 365664.250 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2044 Year 43 44 1 43.5 43 388 353987.594 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2045 Year 44 45 1 44.5 44 389 342388.125 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2046 Year 45 46 1 45.5 45 392 330876.625 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2047 Year 46 47 1 46.5 46 409 337556.375 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2048 Year 47 48 1 47.5 47 414 324955.875 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2049 Year 48 49 1 48.5 48 415 312400.969 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2050 Year 49 50 1 49.5 49 418 280398.500 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2006 Year 5 6 1 5.5 5 65 1213268.875 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2051 Year 50 51 1 50.5 50 420 270636.750 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2052 Year 51 52 1 51.5 51 429 260230.234 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2053 Year 52 53 1 52.5 52 433 250146.094 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2054 Year 53 54 1 53.5 53 434 240261.172 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2055 Year 54 55 1 54.5 54 435 230474.688 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2056 Year 55 56 1 55.5 55 438 220809.125 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2057 Year 56 57 1 56.5 56 449 211293.625 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2058 Year 57 58 1 57.5 57 452 201874.750 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2059 Year 58 59 1 58.5 58 453 192542.125 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2060 Year 59 60 1 59.5 59 454 186251.766 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2007 Year 6 7 1 6.5 6 88 1180320.625 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2061 Year 60 61 1 60.5 60 457 177076.109 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2062 Year 61 62 1 61.5 61 468 168137.172 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2063 Year 62 63 1 62.5 62 475 159553.344 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2064 Year 63 64 1 63.5 63 476 151449.359 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2065 Year 64 65 1 64.5 64 477 143912.688 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2066 Year 65 66 1 65.5 65 481 137008.281 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2067 Year 66 67 1 66.5 66 490 130768.383 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2068 Year 67 68 1 67.5 67 494 125131.289 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2069 Year 68 69 1 68.5 68 497 120036.930 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2070 Year 69 70 1 69.5 69 500 126044.266 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2008 Year 7 8 1 7.5 7 104 1126399.000 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2071 Year 70 71 1 70.5 70 502 121063.992 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2072 Year 71 72 1 71.5 71 509 112953.422 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2073 Year 72 73 1 72.5 72 513 104817.023 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2074 Year 73 74 1 73.5 73 515 96714.945 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2075 Year 74 75 1 74.5 74 516 88679.055 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2076 Year 75 76 1 75.5 75 518 80752.039 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2077 Year 76 77 1 76.5 76 526 72982.914 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2078 Year 77 78 1 77.5 77 528 65430.473 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2079 Year 78 79 1 78.5 78 529 58156.363 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2080 Year 79 80 1 79.5 79 531 51223.043 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2009 Year 8 9 1 8.5 8 117 1077033.875 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2081 Year 80 81 1 80.5 80 533 44688.840 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2082 Year 81 82 1 81.5 81 539 38596.305 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2083 Year 82 83 1 82.5 82 542 32968.590 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2084 Year 83 84 1 83.5 83 543 27824.787 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2085 Year 84 85 1 84.5 84 544 23183.484 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2086 Year 85 86 1 85.5 85 547 19059.059 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2087 Year 86 87 1 86.5 86 552 15447.104 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2088 Year 87 88 1 87.5 87 554 12324.689 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2089 Year 88 89 1 88.5 88 556 9665.167 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2090 Year 89 90 1 89.5 89 558 7441.825 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2010 Year 9 10 1 9.5 9 126 1004101.062 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2091 Year 90 91 1 90.5 90 560 5625.794 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2092 Year 91 92 1 91.5 91 566 4177.848 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2093 Year 92 93 1 92.5 92 569 3048.548 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2094 Year 93 94 1 93.5 93 570 2186.498 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 2095 Year 94 95 1 94.5 94 572 1540.306 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 769 Year 95 0 -1 97.5 95+ 574 2993.026 2.144924e+12
50 Bangladesh 2018 Final Both sexes 3 1950 1950.5 700 Year 0 -1 -1 -1.0 Total 999 41397624.000 2.144924e+12

We begin by initializing sex specific outputs.


cpl_sex <- NULL
abr_from_cpl_sex <- NULL

sexes <- unique(pop_complete$SexID)
sexes
#> [1] 3

This process should be looped over each of the sex ids but our data only has Both sexes records.

sex <- 3
df <- pop_complete %>%
  dplyr::filter(SexID == sex & !is.na(DataValue)) %>%
  mutate(AgeLabel = as.character(AgeLabel)) %>%
  distinct()

If “Final” data status is available, we keep only the final series.

if ("Final" %in% unique(df$DataStatusName)) {
  df <- df %>%
    dplyr::filter(DataStatusName == "Final")
}

We check for multiple series ids, and for each series id, we check whether it is a full series with all age groups represented and an open age greater than 60 3.

ids_series <- unique(df$SeriesID)
n_series <- length(ids_series)


df_out <- NULL
for (i in 1:n_series) {
  df_one <- df %>% dplyr::filter(SeriesID == ids_series[i])

  df_one_std <- df_one[df_one$AgeSpan %in% c(-1, -2, 1), ]
  if (nrow(df_one_std) > 60) {
    df_one$check_full <- dd_series_isfull(df_one_std, abridged = FALSE)
  } else {
    df_one$check_full <- FALSE
  }
  df_out <- rbind(df_out, df_one)
}
df <- df_out
rm(df_out)

In cases where we have multiple total age labels for the same series, and there is a one that is equal to the computed total, we drop the rest to be left with this one 4.

If there is more than one series and the latest series is full, we keep only this one. If it is not full, we keep the latest data source record for each age 5. We also tidy up the dataframe at this point.

if (n_series > 1) {
  latest_source_year <- max(df$DataSourceYear)
  check_latest_full <- unique(df$check_full[df$DataSourceYear == latest_source_year])
  if (any(check_latest_full)) {
    df <- df[df$DataSourceYear == latest_source_year, ]
  } else {
    df <- df %>% dd_latest_source_year()
  }
}

df <- df %>%
  select(DataSourceYear, AgeStart, AgeEnd, AgeLabel, AgeSpan, DataValue) %>%
  distinct()

If there are still duplicate age groups, we keep the last one in current sort order.

df <- df %>%
  mutate(sorting = 1:nrow(df)) %>%
  group_by(AgeLabel) %>%
  mutate(keeping = max(sorting)) %>%
  ungroup() %>%
  dplyr::filter(sorting == keeping) %>%
  select(-sorting, -keeping)

If there is no record for unknown age or if the difference between the reported total and the sum over age (referred to as computed total) is equivalent to the unknown age, we set the data value of this unknown age to zero 6.

if (!("Unknown" %in% df$AgeLabel)) {
  df <- df %>%
    bind_rows(data.frame(
      AgeStart = -2,
      AgeEnd = -2,
      AgeSpan = -2,
      AgeLabel = "Unknown",
      DataSourceYear = NA,
      DataValue = 0
    ))
}

df <- df %>% dd_drop_unknowns()

If the “Total” value is less than the sum over age, we discard it.

total_reported <- df$DataValue[df$AgeLabel == "Total"]
total_computed <- sum(df$DataValue[df$AgeLabel != "Total"])

if (!is_empty(total_reported) & !is_empty(total_computed)) {
  if (total_reported < total_computed) {
    df <- df %>%
      dplyr::filter(AgeLabel != "Total")
  }
}

We then identify the start age of the open age group needed to close the series 7 and flag whether this open age group exists in the series. We then drop records for open age groups that do not close the series.


oag_start <- dd_oag_agestart(df, multiple5 = FALSE)
oag_check <- paste0(oag_start, "+") %in% df$AgeLabel

if (!all(df$AgeLabel %in% c("Total", "Unknown"))) {
  df <- df %>%
    dplyr::filter(!(AgeStart > 0 & AgeSpan == -1 & AgeStart != oag_start)) %>%
    arrange(AgeStart, AgeSpan)
}

An AgeSort field that identifies the standard age groups is added to the data 8.

df <- dd_age_standard(df, abridged = FALSE) %>%
  dplyr::filter(!is.na(DataValue))

We also check that the data is in fact a complete series starting at age zero and without gaps …

check_cpl <- df %>%
  dplyr::filter(AgeSpan == 1) %>%
  summarise(
    minAge = min(AgeStart),
    maxAge = max(AgeStart),
    nAge = length(unique(AgeStart))
  )
check_cpl <- check_cpl$minAge == 0 & check_cpl$nAge == check_cpl$maxAge + 1

… and compute all possible open age groups given the available input 9.

if (check_cpl == TRUE) {
  df_oag <- dd_oag_compute(df, age_span = 1)

  df <- df %>%
    bind_rows(df_oag[!(df_oag$AgeLabel %in% df$AgeLabel) &
      df_oag$AgeStart == oag_start, ]) %>%
    arrange(AgeSort)
}

if (!("AgeSort" %in% names(df))) {
  df <- dd_age_standard(df, abridged = FALSE) %>%
    dplyr::filter(!is.na(DataValue))
  check_cpl <- FALSE
}

We check again whether any open age group exists and …

oag_start <- df %>% dd_oag_agestart()
oag_check <- paste0(oag_start, "+") %in% df$AgeLabel

… if total is missing and series is otherwise complete, we compute the total.

if (!("Total" %in% df$AgeLabel) & oag_check == TRUE) {
  df <- df %>%
    bind_rows(data.frame(
      AgeStart = 0,
      AgeEnd = -1,
      AgeLabel = "Total",
      AgeSpan = -1,
      AgeSort = 184,
      DataSourceYear = NA,
      DataValue = sum(df$DataValue[df$AgeSpan == 1]) +
        df$DataValue[df$AgeSpan == -1 & df$AgeStart == oag_start] +
        df$DataValue[df$AgeLabel == "Unknown"]
    ))
}

A note is then created to alert about missing data.

df$note <- NA
if (check_cpl == FALSE | oag_check == FALSE) {
  df$note <- "The complete series is missing data for one or more age groups."
}

df$SexID <- sex

The process is repeated for all the sex ids and finally, a series field is added to the data indicting that this is a complete series.

cpl_sex <- rbind(cpl_sex, df)

if (!is.null(cpl_sex)) {
  cpl_sex <- cpl_sex %>%
    mutate(
      abridged = FALSE,
      complete = TRUE,
      series = "complete"
    )
}