Data Ready

La Trobe University

Data Ready background

Data Ready
La Trobe University

Data Ready background

Data Ready
La Trobe University

Data Ready background

Data Ready

Glossary

A

ADA - Australian Data Archive.

Aggregated data - A combination of unit records created with the objective that individual details are not disclosed.

ALA - Atlas of Living Australia.

ANDS - Australian National Data Service.

Anonymisation - The process of adapting data so that individuals cannot be identified from it.

API - Application Programming Interface. A way computer programs talk to one another. Can be understood in terms of how a programmer sends instructions between programs.

Archiving- Moving data that is no longer actively used to an electronic storage that allows long-term retention.

ATSIDA - Aboriginal and Torres Strait Islander Archive.

Attribution Licence - A licence that requires that the original source of the licensed material is cited (attributed).

Australian Code for the Responsible Conduct of Research - The purpose of the Code is to guide Australian institutions and researchers in responsible research practices. Compliance with the Code is a prerequisite for receipt of National Health and Medical Research Council and Australian Research Council funding. The Australian Code for the Responsible Conduct of Research was jointly developed by the National Health and Medical Research Council and the Australian Research Council and Universities Australia, and is often referred to simply as "the Code" or the acronym ACRCR.

Authoritative - Able to be trusted as being accurate or true; reliable: e.g. “clear, authoritative information”.

Authoritative data source - A recognised or official data production source with a designated mission statement or source/product to publish reliable and accurate data for subsequent use by customers. An authoritative data source may be the functional combination of multiple, separate data sources.

 

B

Backup- A copy of a file of computer data that can be used to restore if the original data is lost.

Big data - A loose term, not formally defined, for high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing, that can give enhanced insight and decision making.

Big data analytics - The process of examining and interrogating big data assets to derive insights of value for decision making.

BitTorrent - BitTorrent is a protocol for distributing the bandwith for transferring very large files between the computers which are participating in the transfer. Rather than downloading a file from a specific source, BitTorrent allows peers to download from each other.

 

C

Character encoding -The method by which standard and non-standard characters can be described by software so that they can display correctly.

Character sets -A list of characters that can be used by applications.

Chief Investigator -Staff member responsible for all intellectual, administrative and ethical aspects of a research project, from conception to finalisation. They are also responsible for the communication of project outcomes.

CloudStor - A file sharing and cloud storage solution for the research and education sector. It is a La Trobe recommended data storage and sharing solution. All La Trobe staff and students have free access to this service.

Collaborator -A researcher who contributes to the project for some or all of its duration or they may make frequrent or substantial contribution. Collaborators generally exclude those who make only an occasional or relatively minor contribution to the research or those not seen as researchers (e.g. technicians and research assistants)

Commercial Use/Re-Use - Use that is intended for or directed toward commercial advantage or private monetary compensation.

Connectivity - Connectivity relates to the ability for communities to connect to the Internet, especially the World Wide Web.

Content - The collection of information stored for a purpose in a file, folder or electronic message.

Controlled Vocabulary - A controlled vocabulary is one where the language used is restricted to an authoritative list of terms.

Copyright - A right for the creators of creative works to restrict others’ use of those works. An owner of copyright is entitled to determine how others may use that work.

Creative Commons - A non-profit US organisation that enables the sharing and use of creativity and knowledge through free legal tools.

CSIRO - Commonwealth Scientific and Industrial Research Organisation.

CSV - Comma-separated values. A file type used to store tabular data (numbers and text) in plain-text form.

Curation - Curation of digital materials involves active interference to mitigate digital obsolescence, for example by migrating data to ensure compliance with evolving industry standards to ensure it continues be accessible. This may require migration from one storage technology to another, or data manipulation to meet new standards for data recording and presentation in human or machine readable forms.

 

D

Data - facts and statistics collected together for reference or analysis.

Data Access Protocol - A system that allows outsiders to be granted access to databases without overloading either system.

Data Documentation - Information that will allow others to understand and analyse data, including method of collection, definitions of all fields, and analysis methods.

Data Discovery - The process of finding out what data exists and how it can be accessed.

Data Encryption - A way to modify files so that it can be transferred securely.

Data Preservation - The process of conserving files to ensure that they remain secure and readable.

Data Retention - Plans to ensure data files will remain secure and accessible for the long-term.

Data Sharing - The transfer, by agreement, of data collected for a specific purpose between two or more parties.

Dataset - A collection of data, usually presented in tabular form, presented either electronically or in other formats.

De-anonymisation - a process of attempting to determine the identity of a person or individual to whom a pseudonymised dataset relates.

Derived data - A data element or dataset adapted from other data sources using a mathematical, logical, or other type of transformation; eg arithmetic formula, composition, aggregation. See Value-added data.

Digital rights management - A class of access control technologies that are used by hardware manufacturers, publishers, copyrightholders and individuals with the intent to limit the use of digital content and devices after sale.

Disclosive - Data is potentially disclosive if, despite the removal of obvious identifiers, characteristics of this dataset in isolation or in conjunction with other datasets might lead to identification of the individual to whom a record belongs.

Disposal - A range of processes associated with implementing records destruction or transfer decisions which are documented in retention authorities.

Document - Any content whatever its medium (written on paper or stored in electronic form or as a sound, visual or audiovisual recording).

DOI (Digital Object Identifier) - A unique, persistent, registered identifier assigned to a digital object to ensure that it will be citable, institutionally managed, and accessible for long-term use.

Durable File Format- A file is considered to be 'durable' if it is able to be opened by others or your future self using readily available programs beyond the duration of the research project..

Dryad - A nonprofit repository for data underlying the international scientific and medical literature.

 

E

Electronic Labratory Notebook - An electronic application that can be used to securely record and store research plans, experimental designs, procedures, and results.

Enterprise Storage - A central electronic storage faciliy. La Trobe has a number of enterprise storage options are available to researchers.

 

F

FAIR principles - A set of guidlines (Findable, Accessible, Interoperable, Reusable) to ensure that anyone can find, read, use and reuse data.

Field of Research - A standard by which research is classified according to subject. Each code consists of three levels - a two digit Division, a four digit Group and a 6 digit Field e.g. Divsion 09 Engineering | Group 0901 Aerospace Engineering | Field 090101 Aerodynamics (excl. Hypersonic Aerodynamics)

 

G

Geospatial Data - Also known as spatial data or geographic information, it is the data that represents the geographic location of natural and man-made features on Earth.

 

H

HTML - HyperText Markup Language. The standard markup language used to create web pages

 

I

IAR - Information Asset Register. A register to capture and organise meta-data about the information held by government departments and agencies.

Identifier - An identifier is a reference number or string of characters that can be used to uniquely identify a particular data element within your dataset.

Information - Interpretation and analysis of data that when presented in context represents added value, message or meaning.

Intellectual property rights - monopolies granted to individuals for intellectual creations.

 

J

JSON - JavaScript Object Notation. An open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. It is used primarily to transmit data between a server and web application, as an alternative to XML.

 

L

Licence (Noun) - a legal document giving permission to use information.

Linked data - term describing best practice of exposing, sharing and connecting items of data on the semantic web using unique resource identifiers (URIs) and resource description framework (RDF). Not to be confused with data linking.

 

M

Machine-readable - Formats that are machine readable are ones which are able to have their data easily extracted by computer programs.

Mediated access- Access to data and files that are limited to certain users either by automated methods (such as password) or manual intervention (such as a researcher emailing encrypted files).

Metadata - data that describes or defines other data.

Modelled data - information created by mathematical representation of data relationships, sometimes used to simulate environments that are difficult to observe reliably or consistently.

 

N

Non-commerical use - use that is not intended for or directed toward commercial advantage or private monetary compensation.

 

O

Online research notebook - La Trobe University's online application designed to replace paper research notebooks.

Ontology - formal representation of knowledge as a set of concepts within a domain, and the relationships among those concepts.

Open data - Data is open if anyone is free to access, use, modify, and share it, subject, at most, to measures that preserve provenance and openness.

Open government data - Open data produced by the government. This is generally accepted to be data gathered during the course of business as usual activities which do not identify individuals or breach commercial sensitivity.

Open standards - Generally understood as technical standards which are free from licencing restrictions. Can also be interpreted to mean standards which are developed in a vendor-neutral manner.

 

P

PARADISEC - Pacific and Regional Archive for Digital Sources in Endangered Cultures.

Personal Data - Data which relate to a living individual who can be identified.

Personal Equipment- Electronic devices that belong to the researcher (e.g. laptop owned by them) rather than the University. It is not recommended to store data and files on these.

Plain text - The contents of an ordinary sequential file readable as textual material without much processing.

Primary Research Materials - Primary research materials comprise data and materials generated or collected by the researcher as part of their research.

Pseudonymised Data - Data relating to a specific individual where the identifiers have been replaced by artificial identifiers to prevent identification of the individual.

Public domain - works that are publicly available and in which the intellectual property rights have expired or been waived.

Public sector information - information collected or controlled by the public sector.

 

Q

Qualtrics - Survey software recommended for use by La Trobe researchers.

 

R

Raw data - data collected which has not been subjected to processing or any other manipulation beyond that necessary for its first use. Raw data, i.e. unprocessed data, is a relative term; data processing commonly occurs by stages, and the "processed data" from one stage may be considered the "raw data" of the next.

RDF - Resource Description Framework. A W3C standard, it is the foundation of several technologies for modelling distributed knowledge and is meant to be used as the basis of the Semantic Web.

Re-use - use of content outside of its original intention.

Records: (AS ISO 15489.1-2002, s.3.15) - Information created, received, and maintained as evidence and information by an organisation or person, in pursuance of legal obligations, or in the transaction of business.

REDCap- Survey software recommended for use by La Trobe researchers.

Research - Original investigation undertaken to gain knowledge, understanding and insight.

Researchdata.latrobe - La Trobe University's collaborative digital repository powered by Figshare.

Research data - Any data collected during research which would be used to validate the research findings and/or facilitate the reproduction of the research.

Research Data Management Plan - A Research Data Management Plan (RDMP) is a document that describes how you will collect, organise, manage, store, secure, back up, preserve, and share your data.

Research materials - Any materials used or generated in the course of conducting research.

Researcher - In the context of these guidelines, the term researcher refers to anyone undertaking or piloting research in association or affiliation with La Trobe University including but not limited to academics, students, higher degree by research candidates, professional staff and third party associates.

Retention - The act of retaining and ensuring readability of records for specified periods.

RSS feed - Rich Site Summary, uses a family of standard web feed formats to publish frequently updated information: blog entries, news headlines, audio, video.

 

S

Semantic web - a web of data that can be processed directly and indirectly by machines, providing a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is based on the Resource Description.

Sensitive data - Data that can be used to identify an individual, species, object, or location that introduces risk of discrimination, harm, or unwanted attention

Share-alike licence - a licence that requires users of a work to provide the content under the same or similar conditions as the original.

Secondary Data - Secondary data is data collected by someone other than the researcher.

Staff - In the context of these guidelines, the term 'staff' refers to all employees of La Trobe University or affiliated enterprises with which the University has a formal agreement and includes casual employees, clinical staff and unpaid members of the University such as Honorary and Adjunct appointments, all of which are registered on the HR system.

 

T

Taxonomy - the science or technique of classification.

tDAR - The Digital Archaeological Record.

TERN - Terrestrial Ecosystem Research Network.

TSV - Tab-seperated values, a very common form of text file format for sharing tabular data. The format is extremely simple and highly machine-readable.

 

U

Unicode - A standard applying to how textual characters are coded, stored, and displayed

Unit records - Individual items of information from surveys or observations that often contain confidential details.

URI - Uniform resource identifier, generic term for all types of names and addresses that refer to objects on the World Wide Web. A URL is one kind of URI.

URL - Uniform resource locator, a type of URI that identifies a resource via a representation of its network location.

 

V

Value-Added Information (or Data) - data to which value has been added to enhance and facilitate its use and effectiveness by or for users.

 

W

Web API - an API that is designed to work over the internet.

 

X

XML - Extensible Markup Language. A markup language that defines a set of rules for encoding documents in a format which is both human-readable and machine-readable.