This is an important step that will be covered in Module 4, visualisation, but is good to keep in the back of your mind.
Finally, you need to establish whether you are going to use a sample of the data or look at everything.
Sampling is a technique where well-defined measures are used to establish how many samples need to be taken for it to be representative of the greater population.
This approach is used because of the time it takes to sample everything and because sampling can provide accurate estimates.
It is important to note that the sampling methodology used can introduce errors if there is a sample inconsistency or bias is introduced.
Identify the location or source of the data. This could be a place in the process rather than a physical location.
For example, you might want to conduct a survey at a movie theatre and collect data before and after the movie—there is a physical location for collection and a time location for collection.
Will you use an existing collection or piece of software to collect data? Do you need to negotiate licences or access to the dataset? Do you need to sign approvals or agreements to ensure that all legal requirements are met?
If you are doing your own data collection, have you followed ethical principles, trained your collectors, and found people to do the gathering work?
You need to work out how you will measure the data or how it was measured. Was it collected from rain gauges, from a survey, or by counting people? Knowing what you are measuring and how it is being measured really refines your approach to the data.
You will need enough data to see the patterns or trends, as well as to apply any statistical techniques.
The amount of data you need is a concept that covers a number of records including:
Work out which data gives you the required answers. Remember that some data can be used to answer multiple questions, but also that sometimes you need multiple data sources to answer even simple questions.
The data you collect and use should be relevant to your research question.
Ask the question: ‘What is the data that will show me the reality of the situation I need to understand in order to answer the question?’