Poll averaging for the Voice referendum

I average public polls estimating voting intentions for the Voice referendum. My analysis produces estimates of daily levels and trend in the proportion supporting “Yes” and the distinctiveness of each pollster’s estimates.

Author
Affiliation

University of Sydney

Published

12:28AM 15 October 2023

Summary

  • “Yes” is currently estimated to have 42.1% support (\pm 1.6 percentage points).

  • Support for “Yes” fell by about 1/4 of a percentage point per week over the first quarter of the year (§5), accelerating to a decline of almost 3/4s of a percentage point per week around mid-June, but returning to a 4/10s of a percentage point decline per week for “Yes” in the closing weeks of the campaign. (§7).

  • Caveats abound when interpreting the poll average as a forecast of the referendum outcomes. Poll averages have missed outcomes by up two to three percentage points in recent Australian federal elections. Further, we have no historical record against which to calibrate referendum polling.

  • Given the model’s assumptions and the recent set of polls entering the model, the poll average for “Yes” is forecast to be at 42.1% \pm 1.6 on the day of referendum, 14 October 2023. Interpret this forecast of the poll average with care, given the caveats noted above.

  • There is considerable cross-pollster dispersion. Recent polls from Newspoll-Pyxis and Redbridge report lower levels of “Yes” voting intention than other pollsters. Essential and Roy Morgan on-line produce higher levels of “Yes” support relative to other pollsters. A poll from The Australia Institute is a clear outlier with an estimate for Yes about 10 percentage points higher than the poll average at the time it was fielded. (§8).

1 Polls are like 1940s radar

Consider an imprecise and possibly biased sensor intermittently reporting the position of a moving target. This is an apt analogy for public opinion polls commissioned by media organisations.

In this analogy:

  • Sensor imprecision (or “noise” or unreliability) corresponds to a poll’s sampling error. In the best case of simple random sampling the magnitude of sampling error decreases in the square root of the poll’s sample size.1

  • Bias is the expected or average value of sensor noise. Every pollster’s procedures induce at least some bias, ideally negligible in magnitude, but unknown a priori.2

  • While political campaigns sometimes use high frequency “tracking” polls, commercial polls reported in the media have fixed, relatively short field periods, as if a sensor was on-line intermittently or directed away from the target of interest for much of its operations.

  • 1 Recall that the standard error of a proportion y \in [0,1], the sum of n independent binary observations, is \sqrt{y(1-y)/n}. In practice, the use of post-stratification weights to reduce non-response bias means that the “effective” n is smaller than the poll’s stated sample size.

  • 2 Sources of bias include (a) coverage errors in sampling and non-response bias not remedied by weighting, (b) item non-response, (c) question-wording and ordering effects on the questionnaire, (d) in-house procedures for data processing, handling missing data and the reporting of results.

  • Combining these imprecise, possibly biased, temporally fragmented signals in a statistical framework lets us estimate of the position and trajectory of public opinion.

    2 Poll averaging model

    We use state space models, with a long and distinguished lineage in signal processing, where this class of model is sometimes referred to as a “Kalman filter”.3 Elaborations of these models are the foundations of robotics, autonomous systems, self-driving vehicles, and, in this case, the rigorous measurement and analysis of public opinion.4

  • 3 Applications include target tracking in defence and aerospace. Rudolf Kalman’s seminal work on the topic at NASA’s Ames Research Center was in support of the first lunar missions.

  • 4 See Chapter 9 of my book, Bayesian Analysis for the Social Sciences, Wiley, 2009.

  • In brief:

    • poll p yields a proportion “Yes”, y_p \in [0,1], fielded on day(s) t(p) by pollster j(p) with sample size n_p.

    • we seek estimates of an underlying state vector (\xi_t, \zeta_t)', t = 1, \ldots, T, its elements corresponding to the level of and trend in public opinion, respectively.

    • over short (“local”) time scales, the level of public opinion \xi_t follows a linear trend with random Gaussian innovations.

    • the trend component of the state vector, \zeta_t, is interpretable as a “rate of change” parameter and evolves via a Gaussian random walk.

    • time t is discretized to days

    • the model includes time-constant biases \delta specific to each pollster, indexed by j \in 1, \ldots, J.

    • pollster biases sum to zero across the J pollsters, a normalisation required since there is no unbiased, reference measurement against which we can calibrate the polls.56

  • 5 Any “invariant to translation” normalisation of the \delta_j is sufficient to resolve this indeterminacy in the model. The particular normalisation that \sum_j \delta_j = 0 corresponds to the assumption that the polls are “collectively” unbiased. This assumption has has been incorrect in recent Australian federal elections, resulting in poll averages that were wrong by as much as two to three percentage points in 2019. We simply have no way of knowing if the “sum to zero” normalisation is reasonable until the referendum is held and so caveats abound.

  • 6 Contrast poll-averaging after an election, when electoral authorities have published final, accurate tallies of votes. These results supply an exact reference point \xi_T from which we can reconstruct the trajectory of public opinion and calibrate polls’ biases, subject to an assumed “law of motion” for public opinion. This is akin to forensically reconstructing missile trajectories from landing points.

  • Operationalised mathematically: \begin{align} y_p & = \begin{pmatrix} 1 & 0 \end{pmatrix} \begin{pmatrix} \xi_{t(p)} \\[3pt] \zeta_{t(p)} \end{pmatrix} + \delta_{j(p)} + \varepsilon_p \\[12pt] \varepsilon_p & \sim N \left( 0, \, \frac{y_p \cdot (1-y_p)}{n_p} \right) \\[12pt] \begin{pmatrix} \xi_{t+1} \\ \zeta_{t+1} \end{pmatrix} & = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \begin{pmatrix} \xi_t \\ \zeta_t \end{pmatrix} + \begin{pmatrix} v_t \\ w_t \end{pmatrix} \\[12pt] v_t & \sim N(0, \sigma^2_v) \\ w_t & \sim N(0, \sigma^2_w) \end{align} We also adopt the following priors, initialisations and normalisations:

    \begin{align} \xi_1 & \sim N(.5, .10^2) \\ \zeta_1 & = 0 \\ \sigma_v & \sim \text{Uniform}(0, .01) \\ \sigma_w & \sim \text{Uniform}(0, .001) \\ \delta_j & \sim N(0, {.075}^2) \\ \sum_{j=1}^J \delta_j & = 0 \end{align}

    Implementation is via Markov chain Monte Carlo (MCMC) in stan, producing arbitrarily many samples from the posterior density of the parameters (\xi_1, \ldots, \xi_T, \zeta_2, \ldots, \zeta_T, \delta_1, \ldots, \delta_J, \sigma_v, \sigma_w)' conditional on the poll estimates y, their samples sizes n, their field dates and the model. These samples are then summarised so as to produce graphical and tabular summaries; see below.

    3 Data

    Data is compiled by Casey Briggs (Australian Broadcasting Corporation).

    • Yes proportion y_p computed as Yes/(Yes + No), setting aside undecideds/DK. The sample size n_p is adjusted as necessary, to correspond to the count of Yes and No responses in poll p.

    • We “telescope out” each single poll to supply an observation for each day the poll is in the field, allocating the total sample size evenly to each field day.

    • We use an “effective” n if reported by the pollster, rather than the reported sample size; see below.

    Code
    #d <- read_csv(here("data/Voice_PollingData.csv"))  
    swap_day_month <- function(dt){
      d <- day(dt)
      m <- month(dt)
      y <- year(dt)
      out <- as.Date(paste(m,d,y,sep = "/"),
                     format = c("%d/%m/%Y")
                     )
      return(out)
    }
    
    library(httr)
    url <- "https://raw.githubusercontent.com/caseybriggs/polling-data/main/Voice_PollingData.csv"
    response <- GET(
      url,
      add_headers(
        Authorization = paste("Token", Sys.getenv("GITHUB_PAT"))
        )
      )
    
    d <- read.csv(textConnection(content(response,"text"))) %>%
      janitor::remove_empty(c("cols","rows"))
    
    #d <- read.csv("~/Downloads/Voice_PollingData.csv") %>% 
    #  janitor::remove_empty(c("cols","rows"))
    
    d <- d %>% 
      mutate(
        across(ends_with("_prop"),as.numeric),
        across(ends_with("SampleSize"),as.numeric)
      ) %>% 
      mutate(
        
        startDate = as.Date(FieldedStart, format = "%d/%m/%Y"),
    
        startDate = if_else(
          is.na(startDate),
          swap_day_month(as.Date("1900-01-01") + as.integer(FieldedStart) - 2),
          startDate
          ),
        
        endDate = as.Date(FieldedEnd, format = "%d/%m/%Y"),
    
        endDate = if_else(
          is.na(endDate),
          swap_day_month(as.Date("1900-01-01") + as.integer(FieldedEnd) - 2),
          endDate
          )
      ) 
      
    ## add in last Newspoll 
    lastPoll <- data.frame(
      Pollster = "Newspoll-Pyxis",
      startDate = as.Date("2021-10-04"),
      endDate = as.Date("2021-10-12"),
      SampleSize = 2638,
      EffectiveSampleSize = 1575,
      Yes_prop = 37,
      No_prop = 57,
      Unsure_prop = 6
    )
    
    if((d %>% tail(1) %>% pull(Pollster)) != "Newspoll-Pyxis"){
      d <- d %>% 
        bind_rows(lastPoll)
    }
    
    thePollsters <- sort(unique(d$Pollster))
    startDate <- min(d$startDate) - 4
    endDate <- as.Date("2023-10-14")
    dateSeq <- seq.Date(startDate,endDate,by = "day")
    
    ## sample size deflator, industry-wide
    deflator_IndustryWide <- median(d$EffectiveSampleSize/d$SampleSize,
                                    na.rm = TRUE)
    
    ## compute an implied effective n where it isn't provided
    d <- d %>% 
      mutate(
        MarginOfError = as.numeric(MarginOfError),
        ImpliedEffectiveSampleSize = 1/((MarginOfError/100)^2)
        )
    
    ## adjustments for effective n 
    ## use effective sample size if given
    ## use implied effective sample size if given and
    ## is smaller than 90% of nominal sample size
    ## otherwise deflator is undefined for that poll
    ## and makes no contribution to pollster median deflator
    d <- d %>% 
      mutate(
        z = if_else(
          is.na(EffectiveSampleSize),
          ImpliedEffectiveSampleSize,
          EffectiveSampleSize),
        z = if_else(
          z < .9*SampleSize & is.na(EffectiveSampleSize),
          z,
          EffectiveSampleSize
          )
        )
      
    pollster_deflators <- d %>% 
      group_by(Pollster) %>% 
      summarise(
        deflator = median(z/SampleSize,
                          na.rm = TRUE)
        ) %>% 
      ungroup()
    
    d <- d %>% 
      left_join(pollster_deflators,by = "Pollster")
    
    makeData_fromPoll <- function(obj){
      dseq <- seq.Date(from = obj$startDate,
                       to = obj$endDate,"day")
      ndays <- length(dseq)
      
      out <- tibble(
        day = dseq,
        i = match(dseq,dateSeq),
        mid_date = trunc(median(dseq)),
        Unsure_prop = obj$Unsure_prop,
        y = if_else(is.na(Unsure_prop),
                    obj$Yes_prop/100,
                    obj$Yes_prop/(obj$Yes_prop + obj$No_prop)
                    ),
        undecided = Unsure_prop/100,
        n = if_else(!is.na(obj$z),
                    obj$z,
                    obj$deflator*obj$SampleSize)
      ) %>%
        mutate(
          n_eff_model = if_else(
            !is.na(obj$z),
            obj$z,
            obj$deflator*obj$SampleSize
            ),
          n_eff_model = if_else(
            is.na(n_eff_model),
            deflator_IndustryWide*obj$SampleSize,
            n_eff_model
            ),
          undecided = if_else(is.na(undecided),0,undecided),
          n_eff_model = n_eff_model*(1 - undecided),
          n = n_eff_model/ndays,
          v = y*(1 - y)/n,
          sigma = sqrt(v)
          ) %>%
        select(-Unsure_prop)
      
      return(out)
    }  
      
    d_long <- d %>% 
      mutate(j = match(Pollster,thePollsters)) %>% 
      mutate(i_poll = 1:n()) %>% 
      group_nest(i_poll) %>% 
      mutate(g = map(.x = data, .f = ~makeData_fromPoll(.x))) %>% 
      ungroup() %>% 
      unnest(c(data,g)) 
    
    d_output_plot <- d_long %>% 
      group_by(i_poll) %>% 
      filter(day == mid_date) %>% 
      ungroup()

    3.1 Polls in the analysis

    3.2 Sample size adjustments

    Polls enter the averaging model with adjustments to their nominal sample size n, down-weighting the precision of each poll as a function of how much weighting appears to have been used by the pollster. This also helps guard against the poll average being unduly influenced by polls with apparently large sample sizes.

    Further, since we work with just “Yes” vs “No” proportions, we scale down the sample sizes by any reported “Undecided” proportion.

    We then compute the “effective” sample size n_{\text{eff}} for each poll as follows:

    • did the poll report an “effective sample size”, n_{\text{eff}}? If so, set n = n_{\text{eff}} in the computation of the poll’s variance p(1-p)/n.

    • did the poll report a margin of error, m? Does this margin of error imply a sample size smaller than either than the nominal or effective sample size?7 If so, use this implied, effective sample size.

    • else, if m or n_{\text{eff}} are not reported, deflate the poll’s nominal sample size by the median value of n_{\text{eff}}/n, observed for the pollster in question.

    • if none of the previous adjustments can be computed, then apply the median n_{\text{eff}}/n computed over the entire set of polls.

  • 7 If, per convention, the margin of error m is equal to two standard errors around a sample estimate of p = .5, then m = 2 \sqrt{p(1-p)/n_{\text{eff}}} implies n_{\text{eff}} = m^{-2}.

  • Median, ratio of effective sample size (reported or implied) to nominal sample size, by pollster:

    Code
    datatable(pollster_deflators,rownames = FALSE) %>% 
      formatRound(2,digits = 3)

    Nominal and effective sample sizes (reported and/or implied), and the resulting n_{\text{eff}}-per-sample-day computed for each poll in the analysis, in the right-most column of this table:

    Code
    datatable(
      d_long %>%
      group_by(i_poll) %>% 
      slice(1) %>% 
        mutate(
          startDate = strftime(startDate,"%y-%m-%d"),
          endDate = strftime(endDate,"%y-%m-%d")
          ) %>% 
      select(Pollster,
             startDate,
             endDate,
             SampleSize,
             EffectiveSampleSize,
             undecided,
             MarginOfError,
             ImpliedEffectiveSampleSize,
             n_eff_model,
             n) %>% 
      ungroup() %>% 
        select(-i_poll),
      rownames = FALSE,
      colnames = c("Pollster","start","end","n","n_eff","Undecided",
                   "moe",
                   "moe n_eff","n_eff_model","n_eff/day"),
      options = list(
        initComplete = JS(
          "function(settings, json) {",
          "$(this.api().table().header()).css({'font-size': '67%'});",
          "$(this.api().table().body()).css({'font-size': '67%'});",
          "}"
          )
        )
      ) %>% 
      formatRound(c("SampleSize",
                    "EffectiveSampleSize",
                    "ImpliedEffectiveSampleSize",
                    "n_eff_model"), 
                  mark = ",",
                  digits = 0) %>% 
      formatRound("MarginOfError",digits = 2) %>% 
      formatRound("n", digits = 1) %>% 
      formatRound(8, digits = 2)

    4 Fit model in stan

    Define run constants, bundle up data and parameter initialisation:

    Code
    library(rstan)
    debug <- FALSE
    ncores <- parallel::detectCores() - 2
    run_stan_flag <- TRUE
    test <- FALSE
    testScale <- 2
    niter <- 2500
    burnin <- 500
    if (test) {
      niter <- as.integer(round(niter/testScale))
      burnin <- as.integer(round(burnin/testScale))
    }
    
    forStan <- as.list(
      d_long %>%
        select(i, j, y, n, sigma) 
      )
    
    forStan$tau_omega_upper <- .01
    forStan$tau_zeta_upper  <- .001
    forStan$tau_xi_one_init <- .10
    forStan$tau_zeta_one_init <- .0005
    forStan$NOBS <- length(forStan$y)
    forStan$NPOLLSTERS <- length(thePollsters)
    forStan$NDAYS <- length(dateSeq)
      
    init_function <- function(chain_id = 1,
                              data = forStan){
      m <- lm(y ~ i,data = forStan)
      xi <- predict(m,
                    newdata = data.frame(i = 1:forStan$NDAYS))
      xi <- xi + runif(n = length(xi),
                       min = -.005,
                       max =  .005)
      xi_init <- cbind(xi,0)
      tmpData <- tibble(y = forStan$y,
                        j = forStan$j,
                        xi = xi[forStan$i])
      delta_raw <- tmpData %>% 
        group_by(j) %>% 
        summarise(e = mean(y - xi)) %>% 
        ungroup() %>% 
        arrange(j) %>% 
        pull(e)
      delta_raw <- delta_raw + runif(min = -.01, max = .01, n = length(delta_raw))
      
      print(xi_init)
      
      list(
        xi = xi_init,
        delta_raw = delta_raw#,
        #tau_omega  = .001,
        #tau_zeta = .0005
      )
      
    }
    
    run_stan_mtime <- file.mtime(here("data/voice_stan_output.rds")) 
    dtime <- difftime(Sys.time(),run_stan_mtime,units = "days")
    run_stan <- is.na(dtime) | dtime > 4
    Code
    nchains <- ncores
    nchains <- 4
    m <- stan(file = here("sum_to_zero_alt.stan"),
              data = forStan,
              pars = c("xi","delta","xi_star","d0","tau_omega","tau_zeta"),
              init = if (nchains > 1) {
                     lapply(1:nchains,
                            function(id){
                              init_function(chain_id = id,
                                            data = forStan)
                            })
                   } else {
                     init_function
                   },
                iter = niter,
                warmup = burnin,
                thin = if_else(test,1,5),
                cores = ncores,
                chains = nchains,
                control = list(adapt_delta = 0.98,
                             max_treedepth = 14),
                verbose = TRUE)
    
    saveRDS(m,
            file = here("data/voice_stan_output.rds")
            )

    4.1 Extract results

    A little fancy footwork here: level is in xi_star but trend is in xi[,2].

    Code
    m <- readRDS(file = here("data/voice_stan_output.rds"))
    s <- summary(m, pars = c("xi","xi_star"))
    xi <- s$summary %>%
      as_tibble() %>%
      mutate(param = rownames(s$summary),
             j = as.integer(str_extract(
               param,
               pattern = "\\[([0-9]{1,3})",
               group = 1
             ))) %>%
      ## this next line tosses out xi[,1] (unnormalized level estimates) 
      filter(!str_detect(pattern = fixed(",1]"),
                         string = param)) %>%
      ## if parameter name has "[,2]" then its a trend parameter
      ## otherwise its a level parameter
      mutate(type = if_else(
        str_detect(pattern = fixed(",2]"),
                   string = param),
        "trend",
        "level")
        )
    
    xi <- xi %>%
      mutate(
        date = rep(
          seq.Date(from = startDate,
                   by = "day",
                   length = max(j)),
          times = 2)
        )

    5 Support for “Yes” over time

    With the referendum drawing closer (just 0 days away as of the time of writing), I project the current trajectory of the poll average out to 14 October 2023:

    • this projection is not a forecast of the referendum outcome, but of the poll average;

    • the projection is based on the trend in recent polls, which may not hold to 14 October;

    • the 95% credible interval around the forecast poll average gets wider as we forecast further into the future, reflecting uncertainty in day-to-day movements in sentiment and uncertainty in the magnitude of the trend, even under the assumptions encoded in the model.

    With these qualifications and caveats in mind, the model forecast of the poll average on October 14 is 42.1% \pm 1.6.

    Also included on the graph are two horizontal reference lines, one at 50% and another at 45.1%, the national “Yes” vote in the Republic referendum in 1999.

    • roll over data points to reveal details about specific polls.

    • proportion “Yes” plotted here is Yes/(Yes+No), setting aside Undecideds.

    • shaded area corresponds to a 95% credible intervals around estimated trajectory.

    • note how uncertainty (literally) balloons in the absence of polls.

    • see below for discussion of the outlier Australia Institute poll and estimates of relative pollster biases.

    6 Poll average for Yes, last 90 days of the campaign

    7 Trend

    Code
    library(ggrepel)
    g_trend <- ggplot(xi %>%
                        filter(type == "trend") %>%
                        filter(date <= Sys.Date()),
                aes(
                  x = date,
                  y = mean * 700,
                  ymin = lo * 700,
                  ymax = up * 700
                  )
                ) +
      geom_hline(yintercept = 0) +
      geom_ribbon(fill = "orange",
                  alpha = .15) +
      geom_line(linewidth = 2) +
      scale_x_date(
        "",
        date_labels = "%b %y",
        date_breaks = "month",
        minor_breaks = NULL,
        expand = c(0, 0)
      ) + 
      scale_y_continuous("",
                         breaks = seq(-2.25, .75, by = .25),
                         minor_breaks = NULL) + 
      labs(title = "Trend in poll average",
           subtitle = "Weekly percentage point change in Yes on Voice to Parliament",
           caption = "Data compiled by Casey Briggs (ABC); analysis and computation by Prof Simon Jackman.") +
      theme_minimal(base_family = "SF Pro Text")
    
    print(g_trend)

    • the graph above reports \zeta_t times 700, to make the trend estimate interpretable as a weekly rate of change in percentage points.

    • positive/negative value for trend indicate upward/downward slope in level of “Yes”.

    • the trend estimates indicate that “Yes” percentage falling by about 1/4 of a percentage point a week since Christmas 2022.

    • this rate of decline increased throughout Q1 2023, reaching 3/4s of a percentage point by mid June 2023.

    8 Pollster bias

    The chart below shows estimates of each pollster’s \delta_j or “relative bias”

    • the point shows the estimated \delta_j, with bars extending to cover 50% and 95% credible intervals.

    • by construction, total or average cross-pollster bias is zero.

    • positive \delta_j corresponds to over-estimates of “Yes” relative to other pollsters.

    • conversely, negative \delta_j corresponds to under-estimate of “Yes” relative to other pollsters.

    There is considerable cross-pollster dispersion evident in the stream of polling, perhaps even ticking up in recent weeks, with Redbridge and Newsoll-Pyxsis producing lower “Yes” estimates relative to other pollsters. This is not to say these estimates are wrong or biased. In fact, the referendum results could reveal these pollsters to be the most accurate, or not; we will know by mid-evening, Saturday night.

    Code
    delta <- extract(m,pars = "delta")$delta * 100
    colnames(delta) <- thePollsters
    delta <- delta %>%
      as_tibble() %>%
      pivot_longer(cols = everything(),
                   names_to = "label",
                   values_to = "value") %>% 
      mutate(label = fct_reorder(label,value))
    
    xlims <- max(abs(quantile(delta$value,c(.0025,.9975))))
    xlims <- c(-1,1)*xlims
    
    library(ggdist)
    ggplot(delta,
           aes(y = label,
               x = value)) + 
      geom_vline(xintercept = 0) +
      stat_gradientinterval(fill = "skyblue",point_size = 4)  +
      scale_y_discrete("") + 
      scale_x_continuous(
        expand = c(0,0),
        limits = xlims,
        "Pollster bias in Yes direction, relative to cross-pollster average",
        minor_breaks = NULL) + 
      theme_minimal(base_family = "SF Pro Text")

    The precision of the pollster bias estimates increase as we observe more polls and/or sample-size from a given pollster.