Techniques for Demonstrating Validity in Qualitative Studies 293

10.4 Threats to Validity for Qualitative Research

10.4.1 Techniques for Demonstrating Validity in Qualitative Studies 293

Several techniques contribute to validity in qualitative research, such as the methods employed in differing investigations to demonstrate or assure specific validity crite-ria [36]. Qualitative research methodology requires a multitude of strategic choices, many of which are practical; however, the rationale for inquiry is not based on a set of deterministic rules. Contextual factors contribute to the decision as to which technique will optimally reflect specific criteria of validity in particular research sit-uations. Techniques can be variously employed, adapted, and combined to achieve different purposes.

Whittemore et al. [36] divide these techniques into four main categories: design consideration, data generation, analytics and presentation. We combined the tech-niques from Whittemore et al. [36] with the ones from Maxwell [27] and Lincoln and Guba [22]. Table 10.6 shows the techniques that we believe are the most relevant to secure software engineering research.

Table 10.6: Techniques for addressing threats to validity in qualitative research.

Type of Technique Technique

Design Consideration

Developing a self-conscious research design Sampling decisions (i.e., sampling adequacy) Employing triangulation

Peer debriefing

Performing a literature review Sharing perquisites of privilege

Data Generation

Articulating data collection decisions Demonstrating prolonged engagement

Rich data – demonstrating persistent/intense observation Referential adequacy – providing verbatim transcription) Reflexive journaling

Demonstrating saturation

Analytics

Articulating data analysis decisions Member checking or respondent validation Expert checking

Exploring rival explanations, discrepant evidence and neg-ative cases

Triangulation

Drawing data reduction tables Presentation

Providing evidence that supports interpretations Acknowledging the researcher perspective Thick descriptive data

Most of the techniques are self-explanatory, but for some of them it is important to provide a description and further details. The following descriptions are based on Maxwell [27] and Lincoln and Guba [22]:

Intensive long-term involvement: Lengthy and intensive contact with the phe-nomena (or respondents) in the field to assess possible sources of distortion and especially to identify saliencies in the situation. It provides more complete data about specific situations and events than any other method. Not only does it provide a larger amount and variety of data, it also enables the researcher to check and confirm the observations and inferences. Repeated observations and interviews, as well as the sustained presence of the researcher in the study setting can help rule out spurious associations and premature theories. It also allows a much greater opportunity to develop and test alternative hypotheses in the course of the research process. Finally, the period of prolonged engagement also provides the investigator an opportunity to build trust.

Rich data / persistent observation: Both long-term involvement and intense in-terviews enable the researcher to collect rich data, i.e., data that are detailed

and varied enough to provide a full revealing picture of what is going on. The purpose of persistent observation is to identify those characteristics and el-ements in the situation that are most relevant to the problem or issue being pursued and focused on them in detail. In interview studies, such data gener-ally require verbatim transcripts of the interviews, not just notes of what the researcher felt was significant.

Respondent validation or member checking: This is about systematically so-liciting feedback about the data and conclusions from the people under study.

This is of the most important way of ruling out the possibility of misinterpret-ing the meanmisinterpret-ing of what participants say and do and the perspective they have on what is going on, as well as being an important way of identifying biases and misunderstandings of what is observed.

Searching for discrepant evidence and negative cases: This is a key part of the logic of validity testing in qualitative research. Instances cannot be accounted for by a particular interpretation or explanation that can point to important defects in that account. The basic principle here is that the researcher needs to rigorously examine both the supporting and the discrepant data to assess whether it is more plausible to retain or modify the conclusion, being aware of all of the pressures to ignore data not fitting the conclusions.

Triangulation: Collecting information from a diverse range of individuals and settings, using a variety of methods, and at times, different investigators and theories. This strategy reduces the risk of chance associations of systematic biases due to a specific method, and allows a better assessment of the generality of the explanations of the developers.

Peer debriefing: Exposing oneself to a disinterested professional peer to ”keep the inquirer honest,” assists in developing working hypotheses, develops and tests the emerging design, and facilitates emotional catharsis.

Member checking: The process of continuous, informal testing of information by soliciting reactions of respondents to the investigator’s reconstruction of what he or she has been told or otherwise found out and to the constructions offered by other respondents or sources, and a terminal, formal testing of the fi-nal case report with a representative sample of stakeholders. Member checking is both informal and formal, and it should occur continuously.

Thick descriptive data: Narrative description of the context so that judgments about the degree of fit or similarity may be made by others who may wish to apply all or part of the findings elsewhere. (Although it is by no means clear how thick a thick description needs to be.) Dyb˚a et al. [14] discuss how to define what context variables should be accounted for in a study.

Referential adequacy: A means for establishing the adequacy of critiques writ-ten for evaluation purposes under the connoisseurship model. The recorded materials provide a kind of benchmark against which later data analysis and

interpretations (the critiques) can be tested for adequacy. Aside from the obvi-ous value of such materials for demonstrating that different analysts can reach similar conclusions given whatever data categories have emerged, they can also be used to test the validity of the conclusions.

Reflexive journaling: A kind of diary in which the investigator, on a daily ba-sis or as needed, records a variety of information about themselves (what is happening in terms of their own values and interests and for speculation about growing insights) and method (information about methodological decisions made and the reasons for making them) in addition to the daily schedule and logistics of the study.

In the following subsection, we discuss examples of studies that discuss some of these validity threats.

10.4.2 Examples of Threats to Validity for Qualitative Studies

We performed a search in five systematic reviews in software security and did not find many examples of how researchers handle threats to validity in their studies.

The only two qualitative studies we found in these systematic reviews that deal with or mention threats to validity are described as examples below.

10.4.2.1 Case Studies

This subsection gives an example of threats to validity analysis for a questionnaire-based empirical study, which was reported in [28]. An overview of the study and the threats to validity analysis follow.

Overview of the Study.Today, companies are required to have control over their IT assets, and to provide proof of this in the form of independent IT audit reports.

However, many companies have outsourced various parts of their IT systems to other companies, which potentially threatens the control they have over their IT assets. To provide proof of having control iver outsourced IT systems, the outsourcing client and outsourcing provider need a written service-level agreement (SLA) that can be audited by an independent party. SLAs for availability and response time are com-mon practice in business, but so far there is no practical method for specifying con-fidentiality requirements in an SLA. Specifying concon-fidentiality requirements is hard because in contrast to availability and response time, confidentiality incidents cannot be monitored: attackers who breach confidentiality try to do this unobserved by both client and provider. In addition, providers usually do not want to reveal their own infrastructure to the client for monitoring or risk assessment. Elsewhere, the authors have presented an architecture-based method for confidentiality risk assessment in IT outsourcing. The authors adapt this method to confidentiality requirements speci-fication, and present a case study to evaluate this new method. The method is based on specifying confidentiality requirements according to risk assessment results.

Threats to Validity.A summary of the threats to the validity analysis of the study follows. The discussed threats are:

Credibility: The authors say that: “...To validate a method, we eventually need a realistic context in which the method is applied. Applying it to a toy problem is fine for illustration, and testing in an experiment is good for improving our understanding of the method, but in order to know whether the method will work in practice, it has to be used in practice. This could be done by a field experiment, in which practitioners use the method to solve an experimental problem. This is extremely expensive but not impossible. In our case, we opted for the more realistic option, given our budget, of using the method ourselves for a real world problem.”

Transferability: The authors applied their method for confidentiality risk as-sessment and comparison twice with similar results, both in multinational in-dustrial companies where confidentiality was not a critical requirement until external regulators enforced it. The authors also state where the transferabil-ity of the results may apply: Operating in highly competitive markets, these companies are very cost-sensitive and they will therefore not aim at maximum confidentiality. This might well be different in privacy-sensitive organizations such as health care or insurance companies, or in high confidentiality orga-nizations such as the military. Nevertheless, confidentiality is not the highest-priority requirement for the context of the study. All of this supports reusability to any context that satisfies the three assumptions, with similar answers to the research questions for those contexts.

Dependability: The authors say: “...We answered the reusability question by identifying the conditions under which the methods can be used, and actually showing that it could be used in another case satisfying these assumptions.

Like all inductive conclusions, our conclusion that the method can be used in other cases is uncertain, but because we used analytic reasoning rather than statistical reasoning, we cannot quantify this uncertainty.” Thereby, the authors have shown that they are concerned about the reliability of the results. How-ever, the authors affirm that repeatability of the results needs further research.

Confirmability: The authors says “we find no reasoning errors or observational mistakes so we claim these answers are valid,” but the aspect of neutrality is not clear cut when the authors were the ones using the method and provid-ing feedback. The findprovid-ings still reflect participants’ voice and conditions of the inquiry, but it remains unclear to what extent the authors took their own researcher biases, perspective, or motivations into account.

Observations.Even though the authors did not use the nomenclature we used above, they were quite conscious of revealing the possible threats to the study results and what they had done to mitigate possible threats. In addition, the authors were very good to describe the context of the case study, so transferability can be established more easily.

10.4.2.2 Interviews

This subsection gives an example of threats to validity analysis for an interview-based empirical study, which was reported in [3]. An overview of the study and the threats to validity analysis follow.

Overview of the Study. Agile methods are widely employed to develop high-quality software, but theoretical analyses argue that agile methods are inadequate for security-critical projects. However, most agile-developed software today needs to satisfy baseline security requirements, so we need to focus on how to achieve this for typical agile projects. The author provides insights from the practitioner’s perspec-tive on security in agile development and reports on exploratory, qualitaperspec-tive findings from 10 interviews. The goal of the study is to expand on the theoretical findings on security-critical agile development through an exploration of the challenges and their mitigation in typical agile development projects.

Threats to Validity.A summary of the threats to the validity analysis of the study follows.

Credibility: The participants’ views were collected in interviews only; the pos-sible threats of the chosen design are not addressed.

Transferability: The author says: “We conducted the sampling of the partici-pants in a way to represent a wide variety of agile development contexts. Sam-pling dimensions included the interviewee’s process role and project charac-teristics, such as team size and development platform.” This statement refers to the degree to which the results of the qualitative research can be generalized or transferred to other contexts or settings.

Dependability: Not discussed in the paper. But the author mentions: Since the sample size is limited for interviews, we focused on covering a broad range of development contexts. The results are, by study design, not sound and repre-sentative, but extends the prior theoretical findings with a practical perspective and offers a description as an initial hypothesis for further research.

Confirmability: The author mentions: “We directly contacted the interviewees primarily through agile development meet-ups. The interviews lasted between 30 and 60 minutes. Based on detailed notes on each interview, we structured the statements by common concepts iteratively with each interview and clus-tered related aspects to derive the findings on challenges and mitigations.” Re-garding the neutrality aspect, the author mentions: The interviews offer only subjective data and are prone to researcher or participant bias, but does not explain what he did to mitigate some of these threats.

Observations.The author did not specify the threats to the validity of the studies in detail and failed to show how he tried to mitigate some threats to validity.

In document Threats to Validity in Empirical Software Security Research (sider 17-23)