Infrastructure linkages – Overview

Infrastructure linkages are linkages of datasets that are generally available for a range of approved projects (as opposed to linkages of datasets for a single approved project only). The most accessed infrastructure linkages are the DLB’s 'core' linkages, however the DLB also maintains a wide range of other non-core infrastructure linkages. For more information on the infrastructure datasets available please see the Dataset Menu.

Infrastructure linkages are established according to the following principles:

  • the linkage is technically feasible;
  • data is provided regularly for linkage updates (e.g. annually);
  • data can be accessed by multiple projects and/or groups according to the relevant application and approval process;
  • there is ongoing value for the linkage to be performed as infrastructure work rather than project-specific work;
  • the new linkage is approved by DOH HRECDepartment of Health Human Research Ethics Committee. A group that provides governance over Department of Health personal health information and its use in research. See also http://ww2.health.wa.gov.au/Corporate/Articles/N_R/Research-ethics;
  • the Data Provider is ‘linkage ready’ insofar as having the staff, skills and resources needed to supply the data for linkage, answer data queries, and fulfil project-specific data extractionCan refer to: (1) the extraction of linkage keys (by Linkage Officers); or (2) the extraction of service data to which these keys will be appended (by CARES or the relevant Data Custodians). requirements once all of the necessary approvals are in place;
  • the DLB charges for linkages to ensure there is sufficient funding available for initial development and ongoing operational costs;
  • the Data Provider and/or a Research Group is responsible for meeting the aforementioned costs;
  • the DLB schedules work (including linkages) to optimise the balance between capacity and demand.

All linkage activities involve logistical risks, and this increases with complexity. Generally, infrastructure linkages are associated with the greatest complexity (and risk) due to a variety of inter-related factors, including legal, ethical and legislative considerations, the stakeholders involved, the underlying data quality, and the effort, duration, skills and budget required. Often, it is difficult to estimate complexity with absolute certainty prior to commencement of an infrastructure linkage. For this reason, it can take the DLB several months to link data before it is available for approved uses. Sometimes, this duration can take longer due to unforeseen circumstances (e.g. changes in staff, data collection systems, data set specifications, available resources, etc.). 

 

Requesting a new Infrastructure linkage

The DLB Project Manager coordinates the application process of new infrastructure linkages. If you have any queries about infrastructure linkages please contact Tom Eitelhuber at tom.eitelhuber@health.wa.gov.au

 

Standard linkage processes

Data can be transferred to the DLB via secure online file transfer systems such as MyFTThe secure online file transfer system operated by WA Health. This system is used by the DLB to send and receive data.. New datasets provided to the DLB for linkage go through multiple stages before they can be linked, including:

  • liaison with Data Provider;
  • evaluation;
  • process preparation;
  • importing;
  • cleaning;
  • assignment of linkage specific recordA single data item sourced from a data collection, which typically refers to one event, instance or registration (e.g. hospital discharge, birth registration, car crash), although in rare cases can refer to more than one. The specifics of what constitutes a record varies between data collections, depending on how the data is recorded and stored. Each record contains: (1) demographic information (names, addresses, etc), that DLB uses to link the data, and; (2) service information (diagnoses, procedures, etc) that are used by Applicants to perform analysis. IDs; and
  • linkage strategy development.

Theoretically, data updates (‘refreshes’) are more straightforward to complete, because processes and linkage strategies can be reused. Be aware that changes to the data, or poor consolidation against what has been received before, can lead to problems (especially time and cost).

 

General advice for making data as ’linkage ready’ as possible

Data linkageA complex technique for connecting data records within and between datasets using demographic data (e.g. name, date of birth, address, sex, medical record number). Also called ‘Record Linkage’ or ‘Linkage’. uses information about ‘who’ a person is more than data about ‘what has happened’ to him/her. The most useful fields for linkage are:

  • Given name;
  • Surname;
  • Date of birth;
  • Street address;
  • Suburb; and
  • Postcode.


Multi-component fields such as name and address are best split into their subcomponents and provided as separate fields. For example:

  • First name;
  • Middle name;
  • Surname;
  • Street address;
  • Suburb;
  • Postcode.


Information that ‘deterministically connects’ (matches exactly) the new data to another dataset is very valuable. Examples include Unit Medical Record Number (UMRN) and Elector Number.


The data formats the DLB accepts are character-delimited (e.g. by commas or tabs), fixed width text files and Excel spreadsheets. If a data provider cannot provide data in this format, then they should contact the DLB to discuss.


Every record must have a unique ID that maps back to the original system/collection. If the data does not have a unique record ID, consider creating one, or providing the DLB with a list of fields which, when taken together, are unique (e.g. person ID and event date).


If the new data is 'person based' rather than 'event based' (i.e. everyone has one record with all of their information contained within), this could be problematic, particularly if the data provider overwrites old values with new ones (e.g. updating addresses). Contact the DLB to discuss.


Non-human records need to be removed before linkage, noting some datasets include animals, vehicles, etc. Please endeavour to remove these records before supplying it to the DLB.


Text equivalents of NULL, such as 'no fixed permanent address' and 'N/A' can be problematic, as the DLB want to avoid mistaking them for a match to one another. Data will be easier to link if these values are left empty instead.


Metadata is very useful when determining how to use fields for linkage, particularly when they are understood by only a select group of people with specialised knowledge. Data providers must ensure they provide the DLB with data dictionaries and code lists, if they exist.


Data should be checked closely for errors after exporting it from the original system. Some of the most common errors that the DLB encounters are:

  • single records wrapping onto two lines;
  • inconsistent number of delimiters or fields;
  • inclusion of the separating character in the field value (e.g. addresses with commas in them, within a comma-delimited file);
  • the same record ID being assigned to more than one record; and
  • inadvertently reformatting fields (e.g. expressing a number in truncated scientific notation in Excel).

A data provider should cross-check the field list in the relevant Data Agreement with the DLB to ensure they have supplied all approved fields.


If ongoing data updates will be provided to the DLB, these will be easiest to process if to the data provider can provide the DLB with records that are new or have been modified since their previous provision. If this is impossible, they can theoretically provide the DLB with a full refresh of all records, however, this becomes very problematic if the data has undergone any changes since the last update.


Data changes such as the addition of new fields, moving the data to a new storage system, or reallocation of record IDs, will impact processing requirements for updates. Provide the DLB with as much information as possible to bridge the gap between the old version of the data and the new one (e.g. new-to-old ID mapping files, change of field name information, etc.). It is helpful if a data provider can format the data to look as similar as possible to the previous version (e.g. same format, same field order, same naming conventions, with newly added fields appended to the end of each record). 


Ensure old records do not disappear from your system.


Document thoroughly any related data processes to prevent knowledge loss at the source. This is especially important during periods of staff changeover or system redevelopment/replacement.