This glossary contains non-technical descriptions of terms used frequently by the Data Linkage BranchThe specialist team at the Department of Health who are responsible for developing and maintaining the WA Data Linkage System, performing data linkage, and the facilitation of access to linked data.. If you would like other terms added, please contact dataservices@health.wa.gov.au

 

General Definitions

Ad-hoc LinkageA single-purpose, one-off linkage, usually relating to a specific research project. These are typically, but not always, linked using a manually created dataset, without being loaded into the WA Data Linkage System 

A single-purpose, one-off linkage, usually relating to a specific research project. These are typically, but not always, linked using a manually created datasetA collection of similar items of information, for example a WA Births dataset might contain many thousands of pieces of information, each of which contains the name, place, and date of birth for WA people., without being loaded into the WA Data LinkageA complex technique for connecting data records within and between datasets using demographic data (e.g. name, date of birth, address, sex, medical record number). Also called ‘Record Linkage’ or ‘Linkage’. System.

Application for Data

The method used to request linked data from the DLBData Linkage Branch: the specialist team at the Department of Health who are responsible for developing and maintaining the WA Data Linkage System, performing data linkage, and the facilitation of access to linked data.. The Application for Data uses a set of modular forms that are specific to each type of service and dataset requested.

CARESCustodian Administered Research Extract Server: a DLB initiative that streamlines linked data extraction, quality control and delivery services.                                       

Custodian Administered Research Extract ServerA DLB initiative that streamlines linked data extraction, quality control and delivery services.: a DLB initiative that streamlines linked data extractionCan refer to: (1) the extraction of linkage keys (by Linkage Officers); or (2) the extraction of service data to which these keys will be appended (by CARES or the relevant Data Custodians)., quality controlA process that examines an extract of data for completeness and correctness. This includes checking file counts, consistency of roots, presence of requested fields, overlap between groups, field formatting and the correct application of inclusion/exclusion criteria. and delivery services.

Cohort SelectionThe first phase of data coordination for a linked data request that defines the study group of interest. E.g. a group of people born in a specific time range, or selected from inpatient records by searching for a particular diagnosis code. In adherence to the Separation Principle, this phase is run separately from the Data Linkage Team if it involves clinical information.

The first phase of data coordination for a linked data request that defines the study group of interest. E.g. a group of people born in a specific time range, or selected from inpatient records by searching for a particular diagnosis code. In adherence to the Separation PrincipleThe separation of roles to ensure privacy is maintained, whereby identifiable information is kept separate to clinical/service information. The DLB strictly adheres to this principle and it must be maintained in all projects requesting linked data., this phase is run separately from the Data Linkage Team if it involves clinical information.

Core LinkagesThe long-running, routine linkages that represent the ‘spine’ of the WA Data Linkage System to which other datasets can be linked. Comprises: (1) Hospital Morbidity; (2) Emergency Department; (3) Midwives Notifications; (4) Mental Health; (5) Cancer Registry; (6) Births, Deaths, Marriages; and (7) Electoral Roll.

The long-running, routine linkages that represent the ‘spine’ of the WA Data Linkage System to which other datasets can be linked. Comprises: (1) Hospital Morbidity; (2) Emergency Department; (3) Midwives Notifications; (4) Mental Health; (5) Cancer Registry; (6) Births, Deaths, Marriages; and (7) Electoral Roll. See also ‘Non-Core Linkages’.

Data

Can refer to:

(1) the demographic data used in the Data Linkage process; or
(2) information pertaining to services provided to people or their clinical information (available only from Data Custodians, including via CARES).

Data ApplicantA person, group, entity who formally requests access to linked data.

A person, group, entity who formally requests access to linked data.

Data Custodian

The person within an organisation/agency formally assigned to collect, manage, secure and disclose a dataset on a day-to-day basis at the direction of the Data StewardThe person within an organisation/agency formally assigned to set the strategic purpose, operation and disclosure model of a data collection..

Data Flow DiagramA pictorial representation of the flow of data from party to party. Most often used for specific linked data requests, and a requirement for explaining complex data requests.

A pictorial representation of the flow of data from party to party. Most often used for specific linked data requests, and a requirement for explaining complex data requests.

Data Linkage

A complex technique for connecting data records within and between datasets using demographic data (e.g. name, date of birth, address, sex, medical record number).  Also known as ‘Record Linkage’ or ‘Linkage’. See also 'ProbabilisticProbabilistic linkage is a method of linking records using non-unique identifiers (e.g. name, date of birth) to establish weights which represent the likelihood that two records belong to the same person. These weight are used to inform matches and non-matches, and can include clerical review for a selected 'grey area' in between. Linkage' and 'DeterministicDeterministic linkage is a method of linking records using unique identifiers (e.g. Medicare number, Elector number), where any two records with the same unique identifier are deemed to be a definite match. No clerical review is done when undertaking deterministic linkage. It is assumed that the unique identifiers being used are truly unique across the population being linked. Linkage'.

Data ServicesCan refer to: (1) The Data Services Team – another name for the Data Linkage Branch Client Services Team. The Team can be contacted through the general Data Services email address of dataservices@health.wa.gov.au; or (2) the unique services offered by the Data Linkage Branch, including Linkage, Extraction, Family Connections, Geocoding and Sample Selection. These services correlate to the Data Services Forms, which are completed by Data Applicants.

Can refer to: 
(1) The Data Services Team – another name for the Data Linkage Branch Client Services Team. The Team can be contacted through the general Data Services email address of dataservices@health.wa.gov.au; or 
(2) The unique services offered by the Data Linkage Branch, including Linkage, Extraction, Family ConnectionsA link that connects people who belong to the same family (e.g. mother, father, sibling, cousin), provided as a list of pairs of related ROOTS and their relation type., GeocodingThe process of assigning a geographical coordinate to a named feature. In the case of the feature being a street address, the address must first be matched to a known address in a spatially referenced dataset such as those maintained by Landgate (in WA) and Australia Post. Once a geographical coordinate has been assigned to the feature, census statistical areas (SA1, SA2, LGA) can be derived. and Sample Selection. These services relate directly to the Data Services Forms, which are completed by Data Applicants.

Data Steward

The person within an organisation/agency formally assigned to set the strategic purpose, operation and disclosure model of a data collection.

Dataset

A collection of similar items of information, for example a WA Births dataset might contain many thousands of pieces of information, each of which contains the name, place, and date of birth for WA people.

DeidentifiedRelated to the identifiability of a dataset/data item; where the identity of a person/organisation has been removed and therefore is not immediately obvious nor can be reasonably ascertained using other sources of information. The National Health and Medical Research Council discourages the use of this term as its meaning can be unclear, however, this term is still used by a number of other authorities.

Related to the identifiability of a dataset/data item; where the identity of a person/organisation has been removed and therefore is not immediately obvious nor can be reasonably ascertained using other sources of information. The National Health and Medical Research Council discourages the use of this term as its meaning can be unclear, however, this term is still used by a number of other authorities. See related term ‘unidentifiable’.

Derived Indigenous Status FlagA DLB-created value that represents a ‘best guess’ of the indigenous status of a person, via an algorithm which collects indigenous status information from various records in a linkage chain to calculate a single result. Formerly known as the ‘Getting Our Story Right (GOSR) flag’.            

A DLB-created value that represents a ‘best guess’ of the indigenous status of a person, via an algorithm that collects indigenous status information from various records in a linkage chainAll of the linked records thought to belong to a single person. to calculate a single result. Formerly known as the ‘Getting Our Story Right (GOSR) flag’.

DLB

Data Linkage Branch: the specialist team at the Department of Health who are responsible for developing and maintaining the WA Data Linkage System, performing data linkage, and the facilitation of access to linked data.

DOH HRECDepartment of Health Human Research Ethics Committee. A group that provides governance over Department of Health personal health information and its use in research. See also http://ww2.health.wa.gov.au/Corporate/Articles/N_R/Research-ethics

Department of Health Human Research Ethics Committee. A group that provides governance over Department of Health personal health information and its use in research.

EOIExpression of Interest: the colloquial term for a draft Application for Data.

Expression of Interest: The colloquial term for a draft Application for Data.

Extraction

Can refer to:

(1) the extraction of linkage keys (by Linkage Officers);  or
(2) the extraction of service data to which these keys will be appended (by CARES or the relevant Data Custodians).

Family Connections

A link that connects people who belong to the same family (e.g. mother, father, sibling, cousin), provided as a list of pairs of related ROOTS and their relation type.

Feasibility LetterA letter from the DLB Project Officer indicating in-principle support from Data Custodians and the DLB for a linked data request. The letter communicates to an Applicant that his/her request is technically feasible and that the application can proceed to the next phase of ethical review by DOH HREC. The letter does not communicate formal approval.

A letter from the DLB Project Officer indicating in-principle support from Data Custodians and the DLB for a linked data request. The letter communicates to an Applicant that his/her request is technically feasible and that the application can proceed to the next phase of ethical review by DOH HREC. The letter does not communicate formal approval.

Identifiable

Related to the identifiability of a dataset/data item; where the identity of a person/organisation is immediately obvious.

Infrastructure LinkageA dataset approved for linkage to the WADLS that can be potentially accessed for multiple projects and is not tied to one specific project approval, timeline or group.

A dataset approved for linkage to the WADLSWestern Australian Data Linkage System: The Western Australian system used to connect available health and other related information for the WA population. This incorporates database tables holding demographic data and linkage keys, and the bespoke tools used by Linkage Staff to process, create, store and retrieve them. that can be potentially accessed for multiple projects and is not tied to one specific project approval, timeline or group.

Link

A connection between records indicating that two records are deemed to belong to the same person.

Linkage

See ‘Data Linkage’.

MoUMemorandum of Understanding: A non-binding agreement between parties. In this context, MoUs are generally a mechanism to allow data sharing between a data provider and DLB. E.g. to establish an infrastructure linkage, DLB would have an MoU with the Data Provider and also request approval for the new linkage from DOH HREC.

Memorandum of UnderstandingA non-binding agreement between parties. In this context, MoUs are generally a mechanism to allow data sharing between a data provider and DLB. E.g. to establish an infrastructure linkage, DLB would have an MoU with the Data Provider and also request approval for the new linkage from DOH HREC.: a non-binding agreement between parties. In this context, MoUs are generally a mechanism to allow data sharing between a data provider and DLB. E.g. to establish an infrastructure linkage, DLB would have an MoU with the Data Provider and also request approval for the new linkage from DOH HREC.

MyFTThe secure online file transfer system operated by WA Health. This system is used by the DLB to send and receive data.

The secure online file transfer system operated by WA Health. This system is used by the DLB to send and receive data.

Non-core Linkages

All linkages of datasets (other than the Core Linkages) that have been loaded into the WADLS.

Potentially ReidentifiableRelated to the identifiability of a dataset/data item; where the identity of a person/organisation is not immediately obvious but could be ascertained through a unique combination of fields or using information that the recipient already holds.

Related to the identifiability of a dataset/data item; where the identity of a person/organisation is not immediately obvious but could be ascertained through a unique combination of fields or using information that the recipient already holds.

Quality Control

A process that examines an extract of data for completeness and correctness. This includes checking file counts, consistency of roots, presence of requested fields, overlap between groups, field formatting and the correct application of inclusion/exclusion criteria.

Role SeparationThe practice of separating access to identifiable information from clinical or service information. e.g. within a Project Team, one person manages the identifiable information of consenting participants. Another member of the team analyses and manages the deidentified health service information. Neither member has access to both sets of information.

The practice of separating access to identifiable information from clinical or service information. e.g. within a Project Team, one person manages the identifiable information of consenting participants. Another member of the team analyses and manages the deidentified health service information. Neither member has access to both sets of information.

Record

A single data item sourced from a data collection, which typically refers to one event, instance or registration (e.g. hospital discharge, birth registration, car crash), although in rare cases can refer to more than one. The specifics of what constitutes a record varies between data collections, depending on how the data is recorded and stored. Each record contains: (1) demographic information (names, addresses, etc), that DLB uses to link the data, and; (2) service information (diagnoses, procedures, etc) that are used by Applicants to perform analysis.

Record Linkage

See ‘Data Linkage’.

Separation Principle

The separation of roles to ensure privacy is maintained, whereby identifiable information is kept separate to clinical/service information. The DLB strictly adheres to this principle and it must be maintained in all projects requesting linked data. Also see ‘Role Separation’.

SUFEXThe secure online file transfer system established by the Population Health Research Network and run through Curtin University. SUFEX is similar in security and functionality to MyFT. DLB uses it as the backup data transfer system if MyFT is unable to be used.

The secure online file transfer system managed by Curtin University. SUFEX is similar in security and functionality to MyFT. DLB uses it as the backup data transfer system if MyFT is unable to be used.

Unidentifiable

Related to the identifiability of a dataset/data item; where the identity of a person/organisation is not immediately obvious and it is not reasonably possible to reidentify a person/organisation using other sources of information.

WAAHEC

Western Australian Aboriginal Health Ethics Committee: the ethics committee run through the Aboriginal Health Council of WA. The Committee’s objectives are to effectively monitor ethically sound and culturally appropriate research where Aboriginality is of interest and ensure the benefits to Aboriginal people.

WADLS

Western Australian Data Linkage SystemThe Western Australian system used to connect available health and other related information for the WA population. This incorporates database tables holding demographic data and linkage keys, and the bespoke tools used by Linkage Staff to process, create, store and retrieve them.: the Western Australian system used to connect available health and other related information for the WA population. This incorporates database tables holding demographic data and linkage keys, and the bespoke tools used by Linkage Staff to process, create, store and retrieve them.

 

 

Linkage Process Definitions

BlockingA part of the linkage process whereby pairs of records are subsetted for comparison based on a list of fields on which they must match exactly.                                               A part of the linkage process whereby pairs of records are subsetted for comparison based on a list of fields on which they must match exactly.
Chain All of the linked records thought to belong to a single person. Also see definition for ROOT.
Chain NumberThe common identifier assigned by the WADLS to records that have been linked together. The common identifier assigned by the WADLS to records that have been linked together.
Clerical ReviewThe process whereby potential matches that do not match strongly enough to automatically match, nor weakly enough to automatically discard, are manually evaluated by a Linkage Officer. The process whereby potential matches that do not match strongly enough to automatically match, nor weakly enough to automatically discard, are manually evaluated by a Linkage Officer.
Data CleaningThe process of standardising the demographic fields of data to be linked, to ensure maximum compatibility with existing linked data. The process of standardising the demographic fields of data to be linked, to ensure maximum compatibility with existing linked data.
Deterministic Linkage A method of linking records using unique identifiers (e.g. Medicare number, Elector number), where any two records with the same unique identifier are deemed to be a definite match. No clerical review is done when undertaking deterministic linkage. It is assumed that the unique identifiers being used are truly unique across the population being linked.
Encrypted LPNOEncrypted versions of the LPNO are provided to clients to identify individual records. Encrypted versions of the LPNO are provided to clients to identify individual records.
EncryptionA process where information is transformed so that it is unrecognisable, and where this transformation can only be reversed (decrypted) by a person with the same secret key used to encrypt the original data. A process where information is transformed so that it is unrecognisable, and where this transformation can only be reversed (decrypted) by a person with the same secret key used to encrypt the original data.
False NegativeA correct link that has not been discovered (i.e. a ‘missed’ link). DLB’s interactive, multi-pass, multi-dataset approach to linkage is designed to minimise false negatives. A correct link that has not been discovered (i.e. a ‘missed’ link). DLB’s interactive, multi-pass, multi-dataset approach to linkage is designed to minimise false negatives.
False PositiveA link that has been made in error between the data for two or more distinct people. DLB has numerous quality assurance measures in place to ensure the maximum number of false positives are filtered out prior to links being loaded into the WADLS. A link that has been made in error between the data for two or more distinct people. DLB has numerous quality assurance measures in place to ensure the maximum number of false positives are filtered out prior to links being loaded into the WADLS.
GeocodeA point on the earth’s surface described spatially by a geographical coordinate, usually a latitude/longitude, representing the position of a known feature such as a street address, a named place or an item of infrastructure. Alternatively, this term is often used to describe the derived boundaries. A point on the earth’s surface described spatially by a geographical coordinate, usually a latitude/longitude, representing the position of a known feature such as a street address, a named place or an item of infrastructure. Alternatively, this term is often used to describe the derived boundaries.
Geocoded AddressA geographical coordinate that approximates the centre of either the property or the main building on the property described by the property street address. A geographical coordinate that approximates the centre of either the property or the main building on the property described by the property street address.
Geocoding The process of assigning a geographical coordinate to a named feature. In the case of the feature being a street address, the address must first be matched to a known address in a spatially referenced dataset such as those maintained by Landgate (in WA) and Australia Post. Once a geographical coordinate has been assigned to the feature, census statistical areas (SA1, SA2, LGA) can be derived.
Linkage Key Can refer to:
(1) a Chain Number; or
(2) a ROOT.
LPNO DLB’s in-house record identifier, assigned to every record loaded into the WA Data Linkage System.
MatchingA part of the linkage process whereby blocked pairs of records are compared, according to user-set parameters, to determine the strength of the match. A part of the linkage process whereby blocked pairs of records are compared, according to user-set parameters, to determine the strength of the match.
%-LinkedFor each data set, this refers to the percentage of records that have been linked to any other record in the WA Data Linkage System. Also referred to as the ‘completeness’ of the given linkage, noting that this number may be impacted by records that are impossible to link (due to missing or invalid demographic information). For each data set, this refers to the percentage of records that have been linked to any other record in the WA Data Linkage System. Also referred to as the ‘completeness’ of the given linkage, noting that this number may be impacted by records that are impossible to link (due to missing or invalid demographic information).
Probabilistic Linkage A method of linking records using non-unique identifiers (e.g. name, date of birth) to establish weights which represent the likelihood that two records belong to the same person. These weight are used to inform matches and non-matches, and can include clerical review for a selected 'grey area' in between.
ROOT

The ‘master’ LPNO used to identify a Chain. It is generally the LPNO belonging to the earliest record in a person’s Chain that has been sourced from a Core health dataset. The ROOT and Chain Number will largely correspond 1-to-1.

The ROOT is used for:
(1) ease of processing (they can be handled the same as LPNOs); and
(2) security (i.e., DLB provides the encrypted ROOT to Data Applicants to identify Chains, rather than the Chain Number).