CENSUS_INS21ES_A_LV_2021_0000
Total usually resident population by place of usual residence, sex, current activity status (number of employed persons), country/place of birth and place of usual residence one year prior to the census of the 2021 population and housing census results geocoded to a 1 km<sup>2</sup> European grid.
Data aggregation of 13 indicators for 6 topics (total population, sex (male, female), age (under 15 years, 15 – 64 years, 65 years and over), current activity status (employed), country of birth (reporting country, other Memberstate, elsewhere), place of usual residence one year prior to the census (usual residence unchanged, move within reporting country, move from the outside of the reporting country)) was done.
The data tables are made as SDMX compliant (based on SDMX GRID DSDs).
‘Geographic Information System’ (GIS) software performed the following steps:
1. Given (national) coordinates were transformed into the required EU grid reference frame.
2. The corresponding 1 km² grid cell in which the transformed coordinates lie was identified.
3. The respective grid cell code was allocated to the microdata record in question.
Data aggregation was made on estimated individual data based on registers.
The difference between the Latvian 1km2 network and the European 1km2 network is that they are located at a different angle to each other.
Simple
- Date (Creation)
- 2021-01-01
- Citation identifier
- gisco-services / https://gisco-services.ec.europa.eu/census/2021/INSPIRE/Data/CENSUS_INS21ES_A_LV_2021_0000
- Point of contact
-
Organisation name Individual name Electronic mail address Role Central Statistical Bureau of Latvia, Social Statistics Methodology Section
sigita.meldere@csp.gov sigita.meldere@csp.gov.lv
Owner
- Keywords
-
-
EUROSTAT metadata
-
- Access constraints
- Other restrictions
- Access constraints
- Other restrictions
- Other constraints
- No conditions apply to access and use
- Use constraints
- Other restrictions
- Other constraints
- No conditions apply to access and use
- Spatial representation type
- Vector
- Distance
- 1000 metres
- Language
- English
- Topic category
-
- Society
- Begin date
- 2021-01-01
- Reference system identifier
- ETRS89-extended / LAEA Europe
- Distribution format
-
Name Version Geopackage file(.gpkg)
GML
3.2.1
Text
CSV
Text
SDMX
- OnLine resource
-
Protocol Linkage Name WMTS
https://gisco-services.ec.europa.eu/cmaps/service?REQUEST=GetCapabilities&SERVICE=WMTS ViewService (WMTS) of the Census 2021 data
- OnLine resource
-
Protocol Linkage Name ATOM
https://gisco-services.ec.europa.eu/census/2021/INSPIRE/Data/LV_PD_3035_CSV.zip The compressed resource (CSV) file contains data and metadata
- OnLine resource
-
Protocol Linkage Name ATOM
https://gisco-services.ec.europa.eu/census/2021/INSPIRE/Data/LV_PD_3035_GML.zip The compressed resource (GML) file contains spatial data and metadata
- OnLine resource
-
Protocol Linkage Name ATOM
https://gisco-services.ec.europa.eu/census/2021/INSPIRE/Data/LV_PD_3035_GPKG.zip The compressed resource (GPKG) file contains spatial data and metadata
- OnLine resource
-
Protocol Linkage Name ATOM
https://gisco-services.ec.europa.eu/census/2021/INSPIRE/Data/LV_PD_3035_SDMX.zip The compressed resource (SDMX) file contains spatial data and metadata
- OnLine resource
-
Protocol Linkage Name ATOM
https://gisco-services.ec.europa.eu/census/2021/INSPIRE/PD.atom Downloadservice (ATOM-Feed) of all the various packages
- Hierarchy level
- Dataset
Conformance result
- Date (Publication)
- 2010-12-08
- Explanation
-
This data set is conformant with the INSPIRE Implementing Rules for the interoperability of spatial data sets and services
- Pass
- Yes
- Statement
-
Data on usually resident population at the reference period 01/01/2021:
<p class="Abstract">Population and Housing Census 2011 resulted in specifying number of resident population of Latvia, and it notably – by 155 thousand or 7% – differed from the population number calculated in line with the information in Register of Natural Persons supervised by the Office of Citizenship and Migration Affairs (due to non-registered migration). As European Union does not have common methodology for estimating population number, Central Statistical Bureau has worked out a new method for more precise estimation of population number in Latvia. The method is based on statistical classification and migration mirror statistics. Statistical classification aims at dividing Latvian population registered within the Register of Natural Persons of Latvia into two classes (groups) – persons actually living in Latvia (usual resident population of Latvia) and persons actually living abroad. The statistical classification model has been developed with the help of logistic regression analysis. The aim of the model is to predict the probability of being a resident for each individual. Necessary probability to be included in the usual resident population differs depending on age and gender. The model had been developed using data from 2011 Census on actual place of residence and data from administrative data sources on 2010, 1 January 2011 or 1 March 2011.
<p class="Abstract">Data from Register of Natural Persons and administrative data sources are used to determine the statuss of a person. With the help of administrative register data, on each person registered within the Population Register there are more than 200 characteristics variables developed.
<p class="Abstract">Detailed methodology available:
<ul>
<li><a href=" https://stat.gov.lv/sites/default/files/Metadati/Iedz_Metodologija_ENG_2022f.pdf">Method Used to Produce Population Statistics</a></li>
<li><a href=" https://stat.gov.lv/en/metadata/5911-population-and-key-demographic-indicators/sims2">Population and key demographic indicators</a></li>
<li><a href=" https://stat.gov.lv/en/metadata/12730-activity-status-population/sims2">Activity status of population</a></li>
</ul>Individual data from administrative data sources are received electronically (via server) according to the agreements signed between statistical institution and data providers.For the administrative data sources, the number of records, lengh of ID codes (also number of double ID codes) were validated, as well mathematical and logical verification was done.
For the compiled data (results) mathematical and logical verification was done and corresponding indicators from Labour force survey (LFS) were used for quality assessment. Cross-tabulations were created and a 95% confidence intervals were constructed from the LFS data and compared with the Census results."Place of usual residence" - there are no particular reasons for data unreliability for this topic.
To meet the requirements of the Regulation (EU) 2018/1799 census data should be linked with the geographic co-ordinates. State Address Register (SAR) of Latvia has address codes and geographic co-ordinates for the address. Address code is used to link people from Register of Natural Persons (RNP) with address and with housing data from Real Estate State Cadastre Information System (RESC IS). In cases when more than one residential building in the address exists, some methodological solutions were found. For instance, if there were several one dwelling houses in the respective address, it was decided that the one with larger floor space would be occupied; if there was one one-dwelling house and the rest were two or more dwelling houses, then the only one dwelling house was recognized as occupied. In cases when addresses did not show precise location of living house, more precise information about geographic co-ordinates was taken from the RESC IS as they are available on the level of building. In addition, experts from SAR carried out systematic work to ensure the quality of address data and the compliance of the names and numbers of addressing objects with the requirements of regulatory acts, requests for information or clarification of missing addresses in the SAR and addresses that do not meet the rules of the Addressing System were prepared and sent to local governments and solutions were found. As a result only 4841 persons (less than 0.3% of total population) stayed unallocated in the Population and Housing Census 2021 data base in Latvia and were put in the virtual cell.
IF STAT = 'T' and OBS_VALUE = 0 then POPULATED = '0'. Such discrepancy is possible depending on statistical disclosure control algorithms applied.
<span>OBS_VALUE is equal to totals for T=F,M and key STAT. Such discrepancy is possible depending on statistical disclosure control algorithms applied.</span>
<span><span>OBS_VALUE is equal to totals for T = NAT + OTH + EU_OTH and key STAT. Such discrepancy is possible depending on statistical disclosure control algorithms applied.</span></span>
<span><span>OBS_VALUE is equal to totals for T = Y_LT15 + Y15-64 + Y_GE65 and key STAT. Such discrepancy is possible depending on statistical disclosure control algorithms applied.</span></span>Confidentiality of individual data is protected by <a href=" https://likumi.lv/ta/en/en/id/274749">Statistics Law</a>:
Section 7. Competence of the Statistical Institution in Production of Official Statistics
<ul>
<li>(2) The statistical institution shall:</li>
<li>8) ensure statistical confidentiality in accordance with the procedures laid down in this Law;</li>
</ul>
Section 17. Data Processing and Statistical Confidentiality
Section 19. Dissemination of Official Statistics
<ul>
<li>(1) The statistical institution shall disseminate official statistics in a way that does not allow either directly or indirectly identify a private individual or a State institution in cases other than those laid down in Section 25 of this Law.</li>
<li>(2) The statistical institution shall publish the official statistics which have been produced within the framework of the Official Statistics Programme in a publicly available form and by a predetermined deadline on the portal of official statistics. Until the moment of publication of official statistics this statistics shall not be published.</li>
</ul>
More on <a href=" https://www.csp.gov.lv/en/information-security-and-data-protection">information security and data protection</a>.<strong>Total population</strong>
Statistical disclosure control methods should minimize information loss, particularly in inhabited and uninhabited areas. To ensure grid data confidentiality the cell-key method is used. There are 52 % empty grids and from populated grids, there are 64 % grids with small values (less than 10 persons). After the cell-key method was used, there are 6 % “fake zeroes” (flaged as inhabited) which is a better result than 64 % suppressed cells if cell suppression method would be used. Inhabited flags are used as a compromise between disclosure control and information loss. Zero frequencies grids where not changed to positive frequencies.
Some of the demographic variables are published on a national square grid with the size of the grid cell ranging from 1 km to 100 m and at the level of administrative and territorial units, such as statistical regions, municipalities and rural territories.
To ensure data confidentiality many indicators on the national grid are published only according to the share in the respective cell. When publishing the share of the population by country of birth, to ensure the confidentiality of data, only the indicators “born in Latvia” and “born outside Latvia” were used (other countries are also singled out in the data tables on a larger scale). Due to the confidentiality of sensitive data, network data by nationality are published only by the share of Latvian citizens.
The 1km² grid was analyzed in more detail after using the cell-key method. The cell-key method has considerable advantages, especially in comparison to the more often used cell suppression method.
To ensure confidentiality in the Latvian grid we use cell suppression if there are fewer than 10 persons in the grid cell so there are smaller opportunities to get situations where we can recognize persons if we use the LV grid and EU grid.
<strong><em>Other 12 datasets</em></strong>
The cells of the 1km2 grid map of Latvia are placed at a different angle than the cells of the European grid map.
If the disclosure risk is high, an SDC protection method needs to be applied to the data. A protection method changes the data to reduce the disclosure risk and make the data release possible.
Grid data have high disclosure risk. Especially it is important for such countries as Latvia with the low population density.
Grid data are calculated on 1km x 1km squares. This geographical variable needs to be considered from the viewpoint of statistical disclosure control, especially with regard to already existing and used geographical variables. The importance of grid data lies in their easily understandable interpretation.
CSB of Latvia used a cell key method to protect against disclosure by differencing by adding "noise" to dataset, known as data perturbation.
Data perturbation is a technique that adds "noise" to datasets to allow individual record confidentiality. This technique allows users to find summary information about the data while reducing the risk of a security breach.
The cell key method offers protection against disclosure by differencing, where two or more slightly different datasets could be compared to expose an individual respondent, and in instances where a few datasets can be constructed and could otherwise be linked together to reconstruct records from the microdata. As it was asked by the Regulation (EU) 2018/1799 zero frequencies grids where not changed to positive frequencies.
Typical dataset would have around 10%-20% of cell counts perturbed by a small amount, and small counts were more likely to have been perturbed than large counts.
The existence of, for example, a count “1” in a hypercube or grid cell means that the respective individual unit is at risk of being identified from the table. Often, it is nevertheless not be possible to infer something about this individual. But it is necessary to assess the risks that this data can be combined and analysed together with other data.
After using cell key method and adding noise, the data was analysed, and it was concluded that the risk of recognition and attribute disclosure in sparsely populated grid cells remains.
Attribute disclosure might happen if an attribute of an individual/household (or more individuals/households) can be learnt from the hypercube/grid data. Any non-zero frequency presented in a hypercube (or grid cell) discloses that at least one individual/household exists in the dataset with the attributes defining the cell. This might be considered especially critical if a group defined by the cross-combination of a subset of the variables defining the nonzero cell is already very small. An intruder might identify a person/household in this group and learn from the data that some (at least one) of these people also have properties defined by the remaining part of the variable-combination of the non-zero cells. Imagine a small group of people with a specific age/sex combination in a small municipality where the data exhibit that one/some of them fall into a category of the place-of birth variable considered sensitive. Obviously, risks of attribute disclosure are more direct risks, compared to the identification risks associated with the publication of small counts.
Disclosure by differencing might happen if we take the difference of two tables (or some parts of two tables) and the resulting table is disclosive. Different geographical variables, such as grid squares and the NUTS classification, potentially increase the risk of disclosure by differencing. Especially grid data, combined with other traditional geographical variables might be susceptible of disclosure by differencing.
It was decided to use the cell suppression method for the grid data tables.
CSB of Latvia used a cell suppression to protect grid datasets (not for number of population). The sensitive entries, which are called primary suppressions, need to be suppressed. Cells with less than 10 inhabitants were hidden.
The same cells are hidden in all 12 data tables, they are 29% of the populated cells.
Some of the demographic variables in Latvia are published on a national square grid with the size of the grid cell ranging from 1 km² to 100 m² and at the level of administrative and territorial units, such as statistical regions, municipalities, and rural areas. The interconnectedness of these two geographical classifications presents a challenge, for the statistical disclosure control methods. It is important to check the disclosure risk, which arises due to the publication of the same data on two parallel non-nested geographical classifications.
To ensure data confidentiality, many indicators on the national grid are published only according to the share in the respective cell. When publishing the share of the population by country of birth, to ensure the confidentiality of data, only the indicators “born in Latvia” and “born outside Latvia” were used (other countries are also singled out in the data tables on a larger scale). Due to the confidentiality of sensitive data, network data by nationality are published only by the share of Latvian citizens.
To ensure confidentiality in the national grid cell suppression is used if there are fewer than 10 persons in the grid cell so there are smaller opportunities to get situations where persons can be recognized if the Latvian grid and EU grid is used. The difference between the Latvian 1km2 network and the European 1km2 network is that they are located at a different angle to each other.
The coverage of the European network data with the national 100 m² grid was in addition evaluated, and both this noise in the case of the EU grid and the non-publication of data below a certain threshold in the case of the national grid reduce the risk of recognition.
Metadata
- File identifier
- CENSUS_INS21ES_A_LV_2021_0000 XML
- Metadata language
- English
- Character set
- UTF8
- Hierarchy level
- Dataset
- Date stamp
- 2024-01-08T09:00:00
- Metadata author
-
Organisation name Individual name Electronic mail address Role Central Statistical Bureau of Latvia, Social Statistics Methodology Section
sigita.meldere@csp.gov sigita.meldere@csp.gov.lv
Point of contact