Population estimating for British Columbia: Kick out the outliers

by William Warren Munroe, September, 2006

Why are the population estimates for British Columbian municipalities unreliable? Because BC Stats uses sub-standard, and non-statistical techniques to estimate population change for BC including changing the numbers outside of the models. As well, the real methods used to estimate and forecast population are not available for public scrutiny.

Demographers use statistical methods to estimate population. In BC, the change in the number of electrical (from BC Hydro and Fortis) and telephone (from Telus) hook-ups are used to estimate the change in population. Every five years these population estimates, based on the land line hook-ups are compared to the census results. The difference between the 2001 census numbers and those generated by the hook-up method was out by approximately 6% or about a quarter of a million people for BC. The hook-up method was not working, so what to do?

In order to resolve the difference, an old technique acceptable in the 1950's and 60's was employed...kick out the outliers. The thinking here was (is) that if some of the data does not fit with your expectations, kick it out. Outliers, in the old way of thinking, are considered to be "noise" which was (is) acceptable to ignore. The prevailing view was one that emphasized homogeneity (similarity) and was adverse to heterogeneity (difference). Once removed, the model is rerun to see, in this case, if the change in hook-ups between the census years 1996 and 2001 indicates accurately the change in the census results.

Since there were so many outliers, this technique was repeated for two and a half months but to no avail. Why didn't this tried, if not true, technique work? Because Surrey was an outlier and it is too big, population wise, to be kicked out without raising more than eye brows. Smaller municipalities, particularly rural areas, were removed without worry, but Surrey could not so easily be ignored.

While this dated technique was being pushed to its limits, I found that the change in the number of people per household between 1996 and 2001 varied considerably across the province. While the trend to fewer people per household continued for most of the municipalities and aggregated unorganized areas, there were some areas that stayed the same and still others that saw an increase in the number of people per household, including, you guessed it, Surrey.

At this point some thought as to why Surrey stands out would have been in order. Unfortunately, because so much time was wasted kicking out outliers, the deadline for publication of the population estimates demanded immediate action. Therefore, instead of testing other indicators for population change, like births, another dated quasi statistical technique was imposed....splitting the data. Also, I would like to mention here that I ran a spatial analysis of the change in number of births for Local Health Areas from 1986 to 2003 and found that Surrey had far more births than the rest of the province, and was four standard deviations out from the norm. When I showed the map of the spatial interpolation of the change in number of births, the manager said it had no relevance and I was told not to use the color printer any more. However, my co-worker was allowed to move the color printer to his office and print personal photos. BC Stats needs to be cleaned up.

The population estimates since 2002 were based on two groups of municipalities: those with fewer people per household and those with the same or more people per household. Surrey was in the latter, smaller group of 22 areas along with the municipalities in the Okanagan which are serviced by Fortis. Flags should have gone up to signal the possibility that the administrative data from Fortis could not simply be lumped in with the BC Hydro data without greater scrutiny and possible revision. My concerns about the Fortis data were ignored.

The use of Telus data should be scrutinized as well since cell phone use reduces the usefulness of this data source. My concerns about the Telus data and my requests to have the new methods presented to the public were also ignored and unfortunately considered to be an affront to the close knit group of people controlling the population numbers. Indeed, I was removed from meetings, yelled at, and when I asked for team effectiveness training to address the yelling I was set up for a constructive dismissal and fired. I was an outlier considered to be noise.

Unfortunately, the technique of splitting the data into two groups assumes that changes in the number of people per household that occurred between the 1996 and the 2001 census years will be the same for the post censal years 2002 to 2006.

These dated techniques reflect a dated statistical organization more interested in finding others to blame than to use sound statistical techniques. Statistical organizations providing information and interpretation of data about human activity need to be able to learn from differences to be able to understand change.

Comment added, February 2008

Another example of how BC Stats is not interested in improvements, is the resistance towards Statistics Canada's refinements in the census regarding Aboriginal Identification. Since Statistics Canada began to allow Canadians to identify themselves. the number of people with Aboriginal ancestry has increased. The BC Stats' manager viewed the increase in the number of Aboriginals as being a result of people having watched "Dances with Wolves", or they were "looking for a hand out", and that people "should not be allowed to identify themselves". He also laughingly stated that the problems experienced by Aboriginals are because of "their enjoyment of [a brand of cheap wine]". He also described having had to work with an Aboriginal saying that he was impossible to work with and was encouraged to move on.

Not only do these statistical techniques need to be revised (outliers should not be kicked out 'cause they don't fit) but the internal organization in the Population Section of BC Stats needs to be revised as well. Statistical organizations need to foster a work environment that is capable of sharing ideas and testing hypothesis as well as sharing data rather than pretending self worth by claiming ownership of the public's data. Indeed, BC Stats is a perfect example of how tax funded organizations stifle innovation. Please find time to watch Professor Rosling's video about unveiling the beauty of statistics.

Organizations that resist and restrict the sharing of data and ideas will be unable to keep up with the changing world we live in. Providing sound data and interpretation allows for sound decisions regarding capital expenditures for health care, education facilities, transportation etc. Methods papers should be published in a timely manner to allow the public to weigh the value of the numbers being generated. The people of BC deserve to be able to scrutinize the data and the methods used to decide whether to open or close health care facilities and schools.

William Munroe is a migration analyst who has worked on projects with Stats Canada, such as testing alternative indicators of social and economic integration and BC Stats. He worked for BC Stats of the Ministry of Labour and Citizen's Services from 2002 to 2006 a Population Analyst with the responsibility of provincial expert on migration for BC municipalities, Regional Districts, Economic Regions, Local Health Areas, including Micro Health Areas such as Vancouver's West End and the Downtown eastside, Health Regions, School Districts, and Provincial Electoral Districts.

He was forced to leave the position after submitting papers providing solutions to reduce error and insisting that the real methods papers be published.