Data Ready

La Trobe University

Data Ready background

Data Ready
La Trobe University

Data Ready background

Data Ready
La Trobe University

Data Ready background

Data Ready

Glossary

A

ADA - Australian Data Archive.

Aggregated data - A combination of unit records created with the objective that individual details are not disclosed.

ALA - Atlas of Living Australia.

ANDS - Australian National Data Service.

Anonymisation - The process of adapting data so that individuals cannot be identified from it.

API - Application Programming Interface. A way computer programs talk to one another. Can be understood in terms of how a programmer sends instructions between programs.

ATSIDA - Aboriginal and Torres Strait Islander Archive.

Attribution Licence - A licence that requires that the original source of the licensed material is cited (attributed).

Australian Code for the Responsible Conduct of Research - The purpose of the Code is to guide Australian institutions and researchers in responsible research practices. Compliance with the Code is a prerequisite for receipt of National Health and Medical Research Council and Australian Research Council funding. The Australian Code for the Responsible Conduct of Research was jointly developed by the National Health and Medical Research Council and the Australian Research Council and Universities Australia, and is often referred to simply as "the Code" or the acronym ACRCR.

Authoritative - Able to be trusted as being accurate or true; reliable: e.g. “clear, authoritative information”.

Authoritative data source - A recognised or official data production source with a designated mission statement or source/product to publish reliable and accurate data for subsequent use by customers. An authoritative data source may be the functional combination of multiple, separate data sources.

 

B

Big data - A loose term, not formally defined, for high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing, that can give enhanced insight and decision making.

Big data analytics - The process of examining and interrogating big data assets to derive insights of value for decision making.

BitTorrent - BitTorrent is a protocol for distributing the bandwith for transferring very large files between the computers which are participating in the transfer. Rather than downloading a file from a specific source, BitTorrent allows peers to download from each other.

 

C

Commercial Use/Re-Use - Use that is intended for or directed toward commercial advantage or private monetary compensation.

Connectivity - Connectivity relates to the ability for communities to connect to the Internet, especially the World Wide Web.

Content - The collection of information stored for a purpose in a file, folder or electronic message.

Copyright - A right for the creators of creative works to restrict others’ use of those works. An owner of copyright is entitled to determine how others may use that work.

Creative Commons - A non-profit US organisation that enables the sharing and use of creativity and knowledge through free legal tools.

CSIRO - Commonwealth Scientific and Industrial Research Organisation.

CSV - Comma-separated values. A file type used to store tabular data (numbers and text) in plain-text form.

Curation - Curation of digital materials involves active interference to mitigate digital obsolescence, for example by migrating data to ensure compliance with evolving industry standards to ensure it continues be accessible. This may require migration from one storage technology to another, or data manipulation to meet new standards for data recording and presentation in human or machine readable forms.

 

D

Data - facts and statistics collected together for reference or analysis.

Data Access Protocol - A system that allows outsiders to be granted access to databases without overloading either system.

Data Discovery - The process of finding out what data exists and how it can be accessed.

Data Management - Data Management refers to those activities that control how data is collected, organised, used, disseminated and disposed of. It includes measurement, monitoring, and auditing of all these activities. Data Management excludes the use of data in research, but includes recording entities and processes involved in producing and influencing the data in order to assist reproducibility.

Data Sharing - The transfer, by agreement, of data collected for a specific purpose between two or more parties.

Dataset - A collection of data, usually presented in tabular form, presented either electronically or in other formats.

De-anonymisation - a process of attempting to determine the identity of a person or individual to whom a pseudonymised dataset relates.

Derived data - A data element or dataset adapted from other data sources using a mathematical, logical, or other type of transformation; eg arithmetic formula, composition, aggregation. See Value-added data.

Digital rights management - A class of access control technologies that are used by hardware manufacturers, publishers, copyrightholders and individuals with the intent to limit the use of digital content and devices after sale.

Disclosive - Data is potentially disclosive if, despite the removal of obvious identifiers, characteristics of this dataset in isolation or in conjunction with other datasets might lead to identification of the individual to whom a record belongs.

Disposal - A range of processes associated with implementing records destruction or transfer decisions which are documented in retention authorities.

Document - Any content whatever its medium (written on paper or stored in electronic form or as a sound, visual or audiovisual recording).

Dryad - A nonprofit repository for data underlying the international scientific and medical literature.

 

G

Geospatial Data - Also known as spatial data or geographic information, it is the data that represents the geographic location of natural and man-made features on Earth.

 

H

HTML - HyperText Markup Language. The standard markup language used to create web pages

 

I

IAR - Information Asset Register. A register to capture and organise meta-data about the information held by government departments and agencies.

Information - Interpretation and analysis of data that when presented in context represents added value, message or meaning.

Intellectual property rights - monopolies granted to individuals for intellectual creations.

 

J

JSON - JavaScript Object Notation. An open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. It is used primarily to transmit data between a server and web application, as an alternative to XML.

 

L

Licence (Noun) - a legal document giving permission to use information.

Linked data - term describing best practice of exposing, sharing and connecting items of data on the semantic web using unique resource identifiers (URIs) and resource description framework (RDF). Not to be confused with data linking.

 

M

Machine-readable - Formats that are machine readable are ones which are able to have their data easily extracted by computer programs.

Metadata - data that describes or defines other data.

Modelled data - information created by mathematical representation of data relationships, sometimes used to simulate environments that are difficult to observe reliably or consistently.

 

N

Non-commerical use - use that is not intended for or directed toward commercial advantage or private monetary compensation.

 

O

Ontology - formal representation of knowledge as a set of concepts within a domain, and the relationships among those concepts.

Open data - Data is open if anyone is free to access, use, modify, and share it, subject, at most, to measures that preserve provenance and openness.

Open government data - Open data produced by the government. This is generally accepted to be data gathered during the course of business as usual activities which do not identify individuals or breach commercial sensitivity.

Open standards - Generally understood as technical standards which are free from licencing restrictions. Can also be interpreted to mean standards which are developed in a vendor-neutral manner.

 

P

PARADISEC - Pacific and Regional Archive for Digital Sources in Endangered Cultures.

Personal Data - Data which relate to a living individual who can be identified.

Plain text - The contents of an ordinary sequential file readable as textual material without much processing.

Primary Research Materials - Primary research materials comprise data and materials generated or collected by the researcher as part of their research.

Pseudonymised Data - Data relating to a specific individual where the identifiers have been replaced by artificial identifiers to prevent identification of the individual.

Public domain - works that are publicly available and in which the intellectual property rights have expired or been waived.

Public sector information - information collected or controlled by the public sector.

 

R

Raw data - data collected which has not been subjected to processing or any other manipulation beyond that necessary for its first use. Raw data, i.e. unprocessed data, is a relative term; data processing commonly occurs by stages, and the "processed data" from one stage may be considered the "raw data" of the next.

RDF - Resource Description Framework. A W3C standard, it is the foundation of several technologies for modelling distributed knowledge and is meant to be used as the basis of the Semantic Web.

Re-use - use of content outside of its original intention.

Records: (AS ISO 15489.1-2002, s.3.15) - Information created, received, and maintained as evidence and information by an organisation or person, in pursuance of legal obligations, or in the transaction of business.

Research - Original investigation undertaken to gain knowledge, understanding and insight.

Research data - Any data collected during research which would be used to validate the research findings and/or facilitate the reproduction of the research.

Research Data Management Plan - A Research Data Management Plan (RDMP) is a document that describes how you will collect, organise, manage, store, secure, back up, preserve, and share your data.

Research materials - Any materials used or generated in the course of conducting research.

Researcher - In the context of these guidelines, the term researcher refers to anyone undertaking or piloting research in association or affiliation with La Trobe University including but not limited to academics, students, higher degree by research candidates, professional staff and third party associates.

Retention - The act of retaining and ensuring readability of records for specified periods.

RSS feed - Rich Site Summary, uses a family of standard web feed formats to publish frequently updated information: blog entries, news headlines, audio, video.

 

S

Semantic web - a web of data that can be processed directly and indirectly by machines, providing a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is based on the Resource Description.

Share-alike licence - a licence that requires users of a work to provide the content under the same or similar conditions as the original.

SLDR - Speech and Language Data Repository.

Secondary Data - Secondary data is data collected by someone other than the researcher.

Staff - In the context of these guidelines, the term 'staff' refers to all employees of La Trobe University or affiliated enterprises with which the University has a formal agreement and includes casual employees, clinical staff and unpaid members of the University such as Honorary and Adjunct appointments, all of which are registered on the HR system.

 

T

Taxonomy - the science or technique of classification.

tDAR - The Digital Archaeological Record.

TERN - Terrestrial Ecosystem Research Network.

TSV - Tab-seperated values, a very common form of text file format for sharing tabular data. The format is extremely simple and highly machine-readable.

 

U

Unit records - Individual items of information from surveys or observations that often contain confidential details.

URI - Uniform resource identifier, generic term for all types of names and addresses that refer to objects on the World Wide Web. A URL is one kind of URI.

URL - Uniform resource locator, a type of URI that identifies a resource via a representation of its network location.

 

V

Value-Added Information (or Data) - data to which value has been added to enhance and facilitate its use and effectiveness by or for users.

 

W

Web API - an API that is designed to work over the internet.

 

X

XML - Extensible Markup Language. A markup language that defines a set of rules for encoding documents in a format which is both human-readable and machine-readable.