Skip to Main Content

About datasets

About datasets

A dataset is a collection of data. It is often in the form of a spreadsheet, table or database. However, it could also be a collection of notes from qualitative studies, or videos or images taken as part of a research project.

Open data means that the data "can be freely used, shared and built-on by anyone, anywhere for any purpose" (Open Knowledge Foundation). The idea that data should be open access is becoming more popular. For example:

  • Academic journals increasingly require authors to share the datasets associated with their research findings. A reference to the dataset can be found in the published article.
     
  • Funding bodies such as NHMRC and ARC often encourage or require grant recipients to openly publish datasets from their funded studies so that maximum benefit can be obtained from their investment.
     
  • Open Government Data (OGD). Many governments worldwide follow open data practices. This promotes transparency, accountability, and allows reuse of the vast data collected by government agencies. 

As a result of these policies, more datasets are becoming available to researchers for investigation and reuse. 

Why use datasets?

You can use existing datasets to verify or replicate the results of a study. Having research data available increases transparency of the research project.

Researchers can use datasets to conduct secondary analyses. This maximizes the use of existing data, by enabling researchers to investigate new hypotheses or different perspectives, or to aggregate data across multiple studies.

Academics can use open datasets in statistics classes to provide real world examples for students.

Evaluating datasets

When evaluating a dataset, you might consider:

  • Provenance – government, university and other known institutions are reliable sources of datasets.
  • Quality of metadata – how well is the data described? Are the variables described clearly so that you can confidently use the data?
  • Size of the dataset – is it large enough for your purpose?
  • Availability – some datasets are only available if certain conditions are met. You might need permission from the data owner, or have approval from an ethics committee.