Week 5 Assignment

Week 5 Assignment



This chapter: covers general features of fixed design research, typically involving the collection of quantitative data; discusses how the trustworthiness (including reliability, validity and generalizability) of findings from this style of research can be established; explores the attractions and problems of doing experiments in real world research; gives particular attention to the randomized controlled trial (RCT) and whether it can be legitimately viewed as the ‘gold standard’ of research designs; attempts to provide a balanced view of the ubiquitous evidence‐based movement; differentiates between true experimental, quasi‐experimental and single‐case experimental designs; considers non‐experimental fixed designs; and concludes by discussing how to decide on sample sizes in fixed design research.



This chapter deals with approaches to social research where the design of the study is fixed before the main stage of data collection takes place. In these approaches the phenomena of interest are typically quantified. This is not a necessary feature. As pointed out by Oakley (2000, p. 306) there is nothing intrinsic to such designs which rules out qualitative methods or data. Murphy, Dingwall, Greatbatch, Parker and Watson (1998) give examples of purely qualitative fixed design studies, and of others using both qualitative and quantitative methods, in the field of health promotion evaluation. It has already been argued in Chapter 3 that there can be considerable advantage in linking research to theory. With fixed designs, that link is straightforward: fixed designs are theory‐driven. The only way in which we can, as a fixed design requires, specify in advance the variables to be included in our study and the exact procedures to be followed, is by having a reasonably well‐articulated theory of the phenomenon we are researching. Put in other terms, we must already have a substantial amount of conceptual understanding about a phenomenon before it is worthwhile following the risky strategy of investing precious time and resources in such designs. This may be in the form of a model, perhaps represented pictorially as a conceptual framework as discussed in Chapter 3. Such models help to make clear the multiple and complex causality of most things studied in social research. Hard thinking to establish this kind of model before data collection is invaluable. It suggests the variables we should target: those to be manipulated or controlled in an experiment and those to be included in non‐experimental studies. In realist terms, this means that you have a pretty clear idea about the mechanisms likely to be in operation and the specific contexts in which they will, or will not, operate. You should also know what kind of results you are going to get, and how you will analyse them, before you collect the data. If the study does deliver the expected relationships, it provides support for the existence of these mechanisms and their actual operation in this study. This does not preclude your following up interesting or unexpected patterns in the data. They may suggest the existence of other mechanisms which you had not thought of. Large‐scale studies can afford to draw the net relatively wide. Large numbers of participants can be involved, several subgroups established, perhaps a range of different contexts covered, more possible mechanisms tested out. For the small‐scale studies on which this text focuses, and in real world settings where relevant previous work may be sparse or non‐existent, there is much to be said for a multi‐strategy design (see Chapter 8) with an initial flexible design stage which is primarily exploratory in purpose. This seeks to establish, both from discussions with professionals, participants and others involved in the initial phase, and from the empirical data gathered, likely ‘bankers’ for mechanisms operating in the situation, contexts where they are likely to operate and the characteristics of participants best targeted. The second fixed design phase then incorporates a highly focused survey, experiment or other fixed design study. Even with a preceding exploratory phase, fixed designs should always be piloted. You carry out a mini‐version of the study before committing yourself to the big one. This is, in part, so you can sort out technical matters to do with methods of data collection to ensure that, say, the questions in a questionnaire are understandable and unambiguous. Just as importantly, it gives you a chance to ensure you are on the right lines conceptually. Have you ‘captured’ the phenomenon sufficiently well for meaningful data to be collected? Do you really have a good grasp of the relevant mechanisms and contexts? This is an opportunity to revise the design: to sharpen up the theoretical framework; develop the research questions; rethink the sampling strategy. And perhaps to do a further pilot.


Although this may seem like overkill, in fixed design research piloting is essential for once you start collecting ‘real’ data (i.e. after the pilot work), and participants are engaged, it cannot be changed. This is especially salient in funded research, for if you are not delivering what the funder wants then there will be issues further down the line. Therefore, it’s always sensible to discuss the outputs of your pilot with your funders. Also, while the central part of what you are going to do with your data should be thought through in advance, i.e. you are primarily engaged in a confirmatory task in fixed designs, there is nothing to stop you also carrying out exploratory data analysis (see Chapter 17, p. 415). It may be that there are unexpected patterns or relationships which reveal inadequacies in your initial understanding of the phenomenon. You cannot expect to confirm these revised understandings in the same study but they may well provide an important breakthrough suggesting a basis for further research. This chapter seeks to provide a realist‐influenced view of fixed design research. There is coverage of true experimental, single‐case experimental, quasi‐experimental and non‐experimental fixed designs. The differences between these types of design are brought out and some examples given. In the ‘true’ experiment, two or more groups are set up, with random allocation of people to the groups. The experimenter then actively manipulates the situation so that different groups get different treatments. Single‐case design, as the name suggests, focuses on individuals rather than groups and effectively seeks to use persons as their own control, with their being subjected to different experimentally manipulated conditions at different times. Quasi‐experiments lack the random allocation to different conditions found in true experiments. Non‐experimental fixed designs do not involve active manipulation of the situation by the researcher. However, the different fixed designs are similar in many respects, as discussed in the following section.


General features of fixed designs


Fixed designs are usually concerned with aggregates: with group properties and with general tendencies. In traditional experiments, results are reported in terms of group averages rather than what individuals have done. Because of this, there is a danger of the ecological fallacy – that is, of assuming that inferences can be made about individuals from such aggregate data (Connolly, 2006; Harrison & McCaig, 2014). Single‐case experimental designs are an interesting exception to this rule. Most non‐experimental fixed research also deals with averages and proportions. The relative weakness of fixed designs is that they cannot capture the subtleties and complexities of individual human behaviour. For that you need flexible designs. Or, if you want to capture individual complexities, as well as group aggregates, then a multi‐strategy design is a more appropriate route to take (see Chapter 8). Even single‐case designs are limited to quantitative measures of a single simple behaviour or, at most, a small number of such behaviours. The advantage of fixed designs is in being able to transcend individual differences and identify patterns and processes which can be linked to social structures and group, or organizational, features. Fixed designs traditionally assume a ‘detached’ researcher to guard against the researcher having an effect on the findings of the research. Researchers typically remain at a greater physical and emotional distance from the study than those using flexible designs. In experimental research, the experimenter effect is well known. It is now widely acknowledged that the beliefs, values and expectations of the researcher can influence the research process at virtually all of its stages (Rosenthal, 1976, 2003; Rosnow & Rosenthal, 1997; Kazdin, Rosenthal & Rosnow, 2009). Hence the stance now taken is that all potential biases should be brought out into the open by the researcher and every effort made to counter them. There are often long periods of preparation and design preliminaries before data collection and a substantial period of analysis after data collection. This does not, of course, in any way absolve the researcher from familiarity with the topic of the research, which is typically acquired vicariously from others, or from a familiarity with the literature, or from an earlier, possibly qualitative, study. There will be involvement during the data collection phase, but with some studies such as postal surveys this may be minimal. Your personal preference for a relatively detached, or a more involved, style of carrying out research is a factor to take into account when deciding the focus of your research project and the selection of a fixed or flexible design. It has been fashionable in some academic and professional circles to denigrate the contribution of quantitative social research. As Bentz and Shapiro (1998) comment, in a text primarily covering qualitative approaches:


There is currently an anti‐quantitative vogue in some quarters, asserting or implying that quantitative research is necessarily alienating, positivistic, dehumanizing, and not ‘spiritual’. In fact, it is clear that using quantitative methods to identify causes of human and social problems and suffering can be of immense practical, human, and emancipatory significance, and they are not necessarily positivistic in orientation. For example, quantitative methods are currently being used in the analysis of statistics to help identify the principal causes of rape. Baron and Straus have analyzed police records on rape quantitatively to look at the relative roles of gender inequality, pornography, gender cultural norms about violence, and social disorganization in causing rape (1989). Clearly knowing the relative contribution of these factors in causing rape would be of great significance for social policy, economic policy, the law, socialization, and the criminal justice system, and it is difficult to see how one would arrive at compelling conclusions about this without quantitative analysis (p. 124).


Baron and Straus (1993) also point out that quantitative and experimental methods have been used to understand social problems and criticize prevailing ideologies in a way which contributes to social change and the alleviation of human suffering (i.e. for emancipatory purposes as discussed in Chapter 2, p. 32). However, a move to fixed designs in areas where, traditionally, flexible designs have been used can only work if relatively large data samples can be collected to allow statistical analysis. Oakley (2000) suggests that this antipathy to quantitative, and in particular experimental, research derives in part from the influence of feminist methodologists who have viewed quantitative research as a masculine enterprise, contrasting it with qualitative research which is seen as embodying feminine values. She rejects this stereotyping and in her own work has made the transition from being a qualitative researcher to a staunch advocate of true randomized experiments.


Establishing trustworthiness in fixed design research


This is to a considerable extent a matter of common sense. Have you done a good, thorough and honest job? Have you tried to explore, describe or explain in an open and unbiased way? Or are you more concerned with delivering the required answer or selecting the evidence to support a case? If you can’t answer these questions with yes, yes and no, respectively, then your findings are essentially worthless in research terms. However, pure intentions do not guarantee trustworthy findings. You persuade others by clear, well‐written and presented, logically argued accounts which address the questions that concern them. These are all issues to which we will return in Chapter 19 on reporting. This is not simply a presentational matter, however. Fundamental issues about the research itself are involved. Two key ones are validity and generalizability. Validity, from a realist perspective, refers to the accuracy of a result. Does it capture the real state of affairs? Are any relationships established in the findings true, or due to the effect of something else? Generalizability refers to the extent to which the findings of the research are more generally applicable, for example in other contexts, situations or times, or to persons other than those directly involved.



Suppose that we have been asked to carry out some form of research study to address the research question: Is educational achievement in primary schools improved by the introduction of standard assessment tests at the age of seven? Leave on one side issues about whether or not this is a sensible question and about the most appropriate way to approach it. Suppose that the findings of the research indicated a ‘yes’ answer – possibly qualified in various ways. In other words, we measure educational achievement, and it appears to increase following the introduction of the tests. Is this relationship what it appears to be – is there a real, direct, link between the two things? Central to the scientific approach is a degree of scepticism about our findings and their meaning (and even greater scepticism about other people’s). Can we have been fooled so that we are mistaken about them? Unfortunately, yes – there is a wide range of possibilities for confusion and error.



Some problems come under the heading of reliability. This is the stability or consistency with which we measure something. For example, consider how we are going to assess educational achievement. This is no easy task. Possible contenders, each with their own problems, might include: a formal ‘achievement test’ administered at the end of the primary stage of schooling; or teachers’ ratings, also at the end of the primary stage; or the number, level and standard of qualifications gained throughout life. Let’s say we go for the first. It is not difficult to devise something which will generate a score for each pupil. However, this might be unreliable in the sense that if a pupil had, say, taken it on a Monday rather than a Wednesday, she would have got a somewhat different score. There are logical problems in assessing this, which can be attacked in various ways (e.g. by having parallel forms of the test which can be taken at different times, and their results compared). These are important considerations in test construction – see Chapter 13 for further details. Unless a measure is reliable, it cannot be valid. However, while reliability is necessary, it is not sufficient. A test for which all pupils always got full marks would be totally consistent but would be useless as a way of discriminating between the achievements of different pupils (there could of course be good educational reasons for such a test if what was important was mastery of some material). Unreliability may have various causes, including:


Participant error

In our example the pupil’s performance might fluctuate widely from occasion to occasion on a more or less random basis. Tiredness due to late nights could produce changes for different times of the day, pre‐menstrual tension monthly effects or hay fever seasonal ones. There are tactics which can be used to ensure that these kinds of fluctuations do not bias the findings, particularly when specific sources of error can be anticipated (e.g. keep testing away from the hay fever season).


Participant bias

This is more problematic from a validity point of view. It could be that pupils might seek to please or help their teacher, knowing the importance of ‘good results’ for the teacher and for the school, by making a particularly strong effort at the test. Or for disaffected pupils to do the reverse. Here it would be very difficult to disentangle whether this was simply a short‐term effect which had artificially affected the test scores, or a more long‐lasting side‐effect of a testing‐oriented primary school educational system. Consideration of potential errors of these kinds is part of the standard approach to experimental design.


Observer error

This would be most obvious if the second approach, making use of teachers’ ratings as the measure of pupil achievement, had been selected. These could also lead to more or less random errors if, for example, teachers made the ratings at a time when they were tired or overstretched and did the task in a cursory way. Again, there are pretty obvious remedies (perhaps involving the provision of additional resources).


Observer bias

This is also possible and, like participant bias, causes problems in interpretation. It could be that teachers in making the ratings were, consciously or unconsciously, biasing the ratings they gave in line with their ideological commitment either in favour of or against the use of standard assessment tests. This is also a well‐worked area methodologically, with procedures including ‘blind’ assessment (the ratings being made by someone in ignorance of whether the pupil had been involved in standard assessment tests) and the use of two independent assessors (so that inter‐observer agreements could be computed). Further details are given in Chapter 14, p. 331.


Types of validity

If you have made a serious attempt to get rid of participant and observer biases and have demonstrated the reliability of whatever measure you have decided on, you will be making a pretty good job of measuring something. The issue then becomes – does it measure what you think it measures? In the jargon – does it have construct validity? There is no easy, single, way of determining construct validity. At its simplest, one might look for what seems reasonable, sometimes referred to as face validity. An alternative looks at possible links between scores on a test and the third suggested measure – the pupils’ actual educational achievement in their later life (i.e. how well does it predict performance on the criterion in question, or predictive criterion validity). These and other aspects of construct validity are central to the methodology of testing. The complexities of determining construct validity can lead to an unhealthy concentration on this aspect of carrying out a research project. For many studies there is an intuitive reasonableness to assertions that a certain approach provides an appropriate measure. Any one way of measuring or gathering data is likely to have its shortcomings, which suggests the use of multiple methods of data collection. One could use all three of the approaches to assessing educational achievement discussed above (achievement tests, teachers’ ratings and ‘certificate counting’) rather than relying on any one measure. This is one form of triangulation – see Chapter 7, p. 171. Similar patterns of findings from very different methods of gathering data increase confidence in the validity of the findings. Discrepancies between them can be revealing in their own right. It is important to realize, however, that multiple methods do not constitute a panacea for all methodological ills. They raise their own theoretical problems; and they may in many cases be so resource‐hungry as to be impracticable (see Chapter 15, p. 383). Let us say that we have jumped the preceding hurdle and have demonstrated satisfactorily that we have a valid measure of educational achievement. However, a finding that achievement increases after the introduction of the tests does not necessarily mean that it increased because of the tests. This gets us back to the consideration of causation which occupied us in Chapter 2 (see p. 32). What we would like to do is to find out whether the treatment (introduction of the tests) actually caused the outcome (the increase in achievement). If a study can plausibly demonstrate this causal relationship between treatment and outcome, it is referred to as having internal validity. This term was introduced by Campbell and Stanley (1963), who provided an influential and widely used analysis of possible ‘threats’ to internal validity.These threats are other things that might happen which confuse the issue and make us mistakenly conclude that the treatment caused the outcome (or obscure possible relationships between them). Suppose, for example, that the teachers of the primary school children involved in the study are in an industrial dispute with their employers at the same time that testing is introduced. One might well find, in those circumstances, a decrease in achievement related to the disaffection and disruption caused by the dispute, which might be mistakenly ascribed to the introduction of tests per se. This particular threat is labelled as ‘history’ by Campbell and Stanley – something which happens at the same time as the treatment. There is the complicating factor here that a case might be made for negative effects on teaching being an integral part of the introduction of formal testing into a child‐centred primary school culture, i.e. that they are part of the treatment rather than an extraneous factor. However, for simplicity’s sake, let’s say that the industrial dispute was an entirely separate matter. Campbell and Stanley (1963) suggested eight possible threats to internal validity which might be posed by other extraneous variables. Cook and Campbell (1979) have developed and extended this analysis, adding a further four threats. All 12 are listed in Box 6.1 (Onwuegbuzie and McLean, 2003, expand this list to 22 threats at the research design and data collection stage, with additional threats present at the data analysis and interpretation stages). The labels used for the threats are not to be interpreted too literally – mortality doesn’t necessarily refer to the death of a participant during the study (though it might). Not all threats are present for all designs. For example, the ‘testing’ threat is only there if a pre‐test is given, and in some cases, its likelihood, or perhaps evidence that you had gained from pilot work that a ‘testing’ effect was present, would cause you to avoid a design involving this feature.




Robson, C., & McCartan, K. (2016). Real world research: A resource for users of social research methods in applied settings. Chichester: John Wiley & Sons.

"Looking for a Similar Assignment? Get Expert Help at an Amazing Discount!"
Looking for a Similar Assignment? Our Experts can help. Use the coupon code SAVE30 to get your first order at 30% off!

Hi there! Click one of our representatives below and we will get back to you as soon as possible.

Chat with us on WhatsApp