Five Ways to Use Previously Collected Survey Data to Improve Quality in a Survey

When you find yourself surveying a population where significant information is known about those who are in the study prior to them completing the survey (such as in a longitudinal survey, a panel, or when the respondents are part of a known group such as a membership organization), such data can be used as part of the survey instrument design.  

While it is tempting to do whenever possible, when using such data, care should be given to how and when it is used - as it is not always recommended.  Using previously collected data may introduce a wide range of errors, which may in some cases be more significant than the intended gain in quality.  

Some of the possible issues with using previously collected data include:

  • Flaws in the collection or maintenance of the original data source (these could include all forms of error and may be the result of the data being collected for alternative purposes - not for research),
  • Changes to respondents experiences between when the original data was collected/generated and the current questionnaire,
  • Context changes / environmental changes that may have influenced how the responded answered previous questions,
  • Respondent recall / memory problems,
  • Respondent memory changes that could lead to rejection of previous answers, 
  • Respondent confidentiality concerns when a topic is sensitive.

As such, care should be taken to only use existing data in a survey when it is methodologically important to do so (there is clear improved data quality as a result of doing so), and it should never be done just because it is possible to do.

Here is some general guidance about when and how I recommend using previous survey data in a current survey:

  1. To drive sample selection.  When you are looking to learn more about a specific characteristic, you may use previous data to identify individuals who you would like to follow-up with some additional questions.  For example, a questionnaire entirely devoted to asking about being a cancer survivor could be fielded to a sample of people who are known to be cancer survivors.
  2. To drive question logic.  When you have data that you do not expect will have changed, it can be effectively used to drive question logic in the current survey.  For example, those who indicated that they have ever been diagnosed with high blood pressure at Time 1 may now be asked about medications they have taken for high blood pressure in the past, without having to ask about the diagnosis again.
  3. To provide parameters for validation checks.  Previous known data can help validate later responses.  This must be done with care because it is possible that the previous data may be incorrect or have changed, and a validation triggered now may frustrate respondents.  I would only do this if the value in obtaining a response that is guided by the validation is greater than the possible loss of the respondent for the study if it causes frustration.
  4. To provide context for the respondent.  Often it can be helpful to provide context for respondents, so that their response is as comparable to that of others as possible.  If you goal is to learn more about a vehicle that the respondent identified as having been purchased in a Time 1 survey, and you want to ask more questions about that car today at Time 2, you may want to use their previous responses about the car to set the context for them.  For example, "When you responded to a survey last Fall, you told us about your Blue Buick Enclave that you were using as your primary vehicle.  Thinking about that vehicle...."
  5. To ease the burden of updating routine data.  Sometimes you may make the survey response task much simpler for the respondent if you provide their previous responses back to them, allowing them to keep, or change, the item depending on their current situation.  For example, when responders are being asked several questions about each family member, it may be helpful to pre-load in a list of family member names to the survey, and present them to the respondent, asking if there are any changes to their immediate family before you proceed.  Another example of this is when you are capturing preferred contact information, or best times of day for a telephone follow-up, or other administrative items which are not likely to change a lot, but also not of significant value to the research.  Presenting those responses back to the respondent, and asking them to simply click "next" to accept the same responses as before can reduce respondent burden, yet give you an opportunity to identify updated information.  However, be careful with this approach.  If this approach is used for anything that is too specific, or too sensitive, or that is likely to change, you can be increasing the burden in doing this.  Also, because respondents will be most likely to simply accept what is entered, this should never be done if you truly wish to capture a valid response that may change.  While this can be useful in specific situations, it is not a normally recommend approach.

Again, care should always be taken to only use previous data when it is methodologically important to do so.  If you are considering using previous data, build the case for why it is important to do so.  If you truly reduce burden (through asking fewer questions as a result) and you know that you will get better data by doing so, then this is a wonderful technique to use in today's highly programmatic surveys.