Data Updates

Data Methodology

Estimated reading: 7 minutes 1005 views

Data Update 2024.2

Overview:

CiG’s geo-demographic projected data is developed to address and transform the outdated Canadian data development and licensing model for marketers. We focus on providing current, robust and diverse data for defining, understanding and precisely targeting audiences at scale.

A Faster Projection Process for More Dynamic Insights:

As of the 2023.1 Data Release, we can now project data collected through consumer surveys for the entire Canadian population, enabling us to rapidly increase the speed at which we can add new attributes and data to our reports.

This is achieved through a system of dynamic micro-clustering and identifying patterns between households and population cohorts. We’ve also automated the way we load the data into the intelligentVIEW platform, reducing the time needed to perform an update from months to simply hours.

Overall, this enhanced process provides the most up-to-date, accurate, and useful data for your projects. Through our 30,000+ consumer attributes and 94 actionable segments, we can accurately address all 40 million Canadians across demographics, financial, lifestyles and other variables for the best addressable audiences.

Here’s how our unique data approach is driving actionable insights for better marketing effectiveness.

Approach:

CiG’s unique approach to developing a data ecosystem provides a solution to:

  • Data Latency – methodology allows for the delivery of quarterly and semi-annual data updates through intelligentVIEW, our cloud-based insights platform.
  • Resource Intensive Data Processing – leverage the rapid growth of statistical and modelling software and the increase in processing speeds to automate the process.
  • Restrictive Usage – data is included in all of CiG’s plans, with the flexibility to download and use data, as needed, removing the prohibitive usage allowances found in the traditional data licensing model.
  • Cost – data included in all SaaS pricing plans, with add-ons available to download data on demand, and pay for what you need.
  • Coverage – data are developed with the latest geographic files updated quarterly, providing maximum access to addressable market audiences where they live.

CiG is transforming how companies build, maintain and validate their first, second, and third-party data. We deliver this through a combination of an ever-expanding list of new, privacy-compliant data sources that are becoming available in the Canadian marketplace.

  • New tools and technology allow the automation of data processing, decreasing the reliance on existing traditional data providers and costly consultants.
  • Distribution through the cloud.

CiG has automated and modernized the access to, and usage of, consumer and market data to drive insights for Canadian businesses.

Some of our Data Sources:

  • Statistics Canada Data (under licence) – this data includes census, wealth, assets and household expenditures data that is supplied and continually updated at the various levels of Statistics Canada geographies for census and intercensal periods.
  • Canada Post Householder Plus Data (HHP), (under licence) – leveraged to facilitate unaddressed advertising mail, the HHP contains the most up-to-date list of active Canadian postal codes updated monthly. Up-to-date addressable postal codes and current vintage Canada Post Smart Mail monthly data feeds ensure maximum access to highly targetable audiences in a timely fashion.
  • Mover Listing Data – publicly available real estate listings data used for microeconomic financial indicators and updated monthly. Real estate values are a strong indication and highly predictive of overall household wealth characteristics.
  • Numeris RTS Survey Data (under licence) – collected semi-annually, the Numeris RTS is a national research study containing demographics, media preferences and consumer brand and product behaviours across a survey sample of approximately 45,000 Canadian households.
  • Postal Code Conversion File – built from a variety of publicly available sources, CiG’s PCCF file is an up-to-date and accurate link between Canada Post postal codes and other geographies including the Canadian Census geographies with a monthly update schedule.

Our Methodology:

CiG’s methodology begins with our unique approach that pools our data sources into an ecosystem where sources are used to cross-validate and contribute to each other’s robustness.

Treating our data as a part of an ecosystem, rather than independent tables or data products, means that statistical relationships are exploited, bases (populations, etc.) are consistent, and each data source is a valuable contributor to the maintenance of the many.

Maintaining the CiG data ecosystem is solving challenges with data sources released at different frequencies with various lag times from monthly, to semi-annually, annually and, as with the Canadian census, every five to seven years.

With the CiG data ecosystem approach, marketers and analysts see the benefit of the most current release of source data soon after it is released, rather than waiting to gather all the data sources over a year, collating them and spending months projecting them. Without the ecosystem, marketers see data projections based on source data that are several releases old.

CiG designed processes to bring older releases of data, such as the 2011 and 2016 Census, up to date, project them to six-digit postal code, and then leverage those patterns to project other attributes from samples (i.e. Numeris RTS study) to populations.

It is well known that the census population does not reflect the actual population. By leveraging data sources including historical real estate patterns, housing starts, and Canada Post monthly data releases, we can accurately adjust population estimates at both the micro and macro geographic levels.

Statistics Canada releases adjusted quarterly population estimates. Click here to read more.

As per CiG‘s latest data update in March 2024, when aggregating all postal code populations reveals that Canada’s population is estimated at 40,770,158 covering 16,339,440 households.

Q2 2023 Q3 2023 Q4 2023 Q1 2024 Q2 2024
39,171,407
40,097,761
40,288,907
40,480,052
40,770,158

Keeping in mind that the main purpose of intelligentVIEW data is to create, understand and precisely target audiences for marketing at scale, we need to accurately project the characteristics of panel responders to the Canadian population.

Panel data, such as Numeris RTS and Vividata have research samples that are carefully collected and weighted to represent the Canadian population. We developed an approach that clusters matching geodemographic characteristics from postal code demographics to the weighting attributes collected on panel responders.

This approach effectively creates estimates at the six-digit postal code level for the entire country, allowing marketers to profile customers, markets and prospects across thousands of consumer characteristics with a near 100 percent match rate.

Accurate geographic allocation of data is integral not only to the execution of marketer programs but the integration and cross-validation of many data sources.

CiG has carefully built and maintains an accurate location database for postal codes and their many one/one to many relationships with Statistics Canada standard geographies. These files are often referred to as postal code conversion files. We have further enhanced our PCCF with a comprehensive urbanity classification with the following eleven classes:

  • MOSTLY DOWNTOWN APARTMENTS
  • BIG CITY SUBURBS
  • MOSTLY URBAN APARTMENTS
  • COMMERCIAL/RESIDENTIAL
  • URBAN RESIDENTIAL MIX
  • DOWNTOWN RESIDENTIAL MIX
  • RURAL
  • COMMERCIAL
  • MOSTLY URBAN HOMES
  • SMALL CITY, TOWNS, AND SUBURBS
  • MOSTLY DOWNTOWN HOMES

CiG’s data ecosystem is now used to expedite the production of estimates across all attributes at the postal code level, add new attributes from sources and validate outputs against sources and geographies.

Some of the processes and techniques leveraged in the construction of the original base data sources and in the maintenance of the ecosystem are:

  • Principle components analysis
  • Correlation analysis
  • Linear Regression
  • Logistic Regression
  • Other Non-linear regression techniques
  • Clustering (mostly K-means / K-median)

A part of CiG’s investment in developing core data assets and our data ecosystem is the release of our proprietary segmentation system, intelligentSEGMENTS.

IntelligentSEGMENTS are built using k-median and k-means techniques that first creates a base of 20 large clusters across core socio-economic characteristics at the postal code level. Candidates within each of the large clusters are then clustered further into 5 clusters each with a focus on behavioural, ethnicity, housing type, education, and hundreds of other characteristics.

The resulting 100 clusters are re-ranked first on socioeconomics at the group level, and then again at the subgroup level. Small clusters are analyzed relative to their neighbouring clusters and collapsed resulting in a system of 94 robust segments.

Penetration and index values of thousands of additional variables are used to further describe and bring the 94 intelligentSEGMENTS to life. Automated processes identify attributes of significance and generate and compile data-driven statements from these profiling tables to form initial segment descriptions.

CiG data team analysts, with the assistance of marketing, further enhance the written descriptions, generate core profile statistics and assemble imagery that portrays the households and families that populate each intelligentSEGMENT.