New method for confidentialising business demography tables
This page explains our new method for confidentialising business demography tables for the 2016 release.
See New Zealand Business Demography Statistics: At February 2016 – additional tables for the first release of tables using the new method.
Statistics NZ has implemented an ‘input perturbation’ approach to confidentialising business demography tables.
Input perturbation involves adding a small amount of ‘noise’ to the data at the individual (ie business or person) level, in such a way that the tables derived from this perturbed data are unbiased and contain as much information as possible while targeting protection to the sensitive cells.
Input perturbation is used by other statistical agencies
Perturbation methods are being used in production by a number of other official statistical agencies. In particular, the US Census Bureau uses a ‘noise infusion’ method to protect longitudinal employment data (Abowd et al, 2012), and the Australian Bureau of Statistics uses noise in the protection of frequency tables accessed via their remote server TableBuilder (Chipperfield et al, 2016).
A coordinated approach to count tables and magnitude tables
We have developed an approach which perturbs both count and magnitude tables – we call this the Noised Counts and Magnitudes (NCM) method. This method is being considered more widely across the organisation as part of the development of an automated confidentiality service.
Note that, in the context of business demography, the respondent whose confidentiality is being protected is the business. This means that tables of employee counts are considered magnitude tables, as the number of employees is a magnitude with respect to the business.
How it works
Each business is assigned a random number uniformly distributed between 0 and 1. This random number is fixed across time to ensure the same degree of perturbation is applied to the business over time.
For count tables, the business-level random numbers are used to generate a new random number for businesses grouped together in a cell, and this is the basis for a ‘fixed’ version of random rounding to base 3 (FRR3) which will ensure that the same group of businesses will always be rounded the same way in related tables.
The random number is used to generate a ‘noise multiplier’ for the generation of magnitude tables (ie employee counts). The noise aggregates to the table level in such a way that it is targeted towards sensitive cells where there is a disclosure risk.
Individual values are protected by at least +/- n% so, for the most vulnerable cells with only one business, we guarantee this level of uncertainty about the employee count of that business. For cells composed of many businesses the noise will tend to cancel out. We will flag cells with more than a certain level of noise so that analysts can treat these values with caution.
Suppression of small counts not required
Application of the NCM method in some other contexts – for example the population census – would require some suppression of small cells in the count tables. This is because there is a small chance of RR3 being ‘breakable’ for certain combinations of rounded cell values in the interior and margins of the table, and then for attribute disclosure to be possible from the tabulation variables.
For business demography tables, however, there is not a risk of disclosing new information from the tabulation variables used to define the business count tables – because this information is already in the public domain. In addition, the original counts cannot be used to help derive the magnitude tables (of employee counts), which are already protected to at least n% even when the exact corresponding counts are known.
Therefore, we propose that no suppression is required for the business demography output tables. A final decision on this will be informed by the testing.
The benefits of this NCM method compared with the previous confidentialisation method are that:
- more data will be released
- related tables will be consistent with each other – that is, the same cell in related tables will have the same value.
Abowd, JM, Gittings Kaj Kaj, McKinney, K, Stephens, B, Vilhuber, L, & Woodcock, S (2012). Dynamically consistent noise infusion and partially synthetic data as confidentiality protection measures for related time series. Retrieved from http://dx.doi.org/10.2139/ssrn.2159800
Chipperfield, J, Gow, D, & Loong, B (2016). The Australian Bureau of Statistics and releasing frequency tables via a remote server. Statistical Journal of the IAOS 32(1), 53–64. Retrieved from http://content.iospress.com.
Updated 19 December 2016