This book includes a plain text version that is designed for high accessibility. To use this version please follow this link.
Hobart_edit.qxp 30/7/08 12:42 pm Page 16
Multiple Sclerosis
be recapped. When a set of items is used as a scale, a claim is being validation assesses whether the final scale looks, on the face of it,
49
like it
made that a construct is being measured.
41
Implicit to this claim is some measures what is intended.
50
In the middle of the last century, Guilford named
theory of the construct being measured (a construct theory).
42
For these evaluations ‘validity by assumption’ and ‘faith validity,’
51
yet they remain
example, the RMI (see Figure 3) uses a set of 15 items. It makes a claim essentially unchallenged.
that mobility is being measured. As such, there must be some theory of
mobility underpinning the use of these specific 15 items. It follows that Statistical tests of scale validity are more formal than their non-statistical
the aim of validity testing is to establish the extent to which a specific counterparts, but remain weak evaluations of the extent to which a set of
construct is being measured and, by implication, the extent to which the items measures a construct. For example, examinations of internal construct
construct theory is supported. validity (e.g. factorial validity, internal consistency)
52
test the extent to which
the items of a scale are related statistically. This does not confirm that a set of
items marks out a clinically meaningful variable of interest, let alone tell us
Statistical tests of scale validity are
what a scale measures.
more formal than their non-statistical
Examinations of external construct validity (e.g. correlations with other
counterparts, but remain weak evaluations
measures,
53,54
testing known group differences,
55
hypothesis testing
52,53
)
assess the extent to which scale scores ‘behave’ as predicted and seek to
of the extent to which a set of items
determine whether a scale ‘does what it is intended to do.’
21
These tests,
measures a construct.
which focus on person scores and between-person variation in those
scores, are weak because there is no independent means of assessing the
extent to which the intention of the scale is attained.
56
Consequently, these
Current methods for establishing scale validity cannot achieve these validation techniques entail circular reasoning,
56
generate only
aims because they do not include formal methods for defining and circumstantial evidence of validity,
31
enable limited development of
testing construct theories.
42
While scales (e.g. the RMI) and the constructs construct theories, and result in ‘primitive’ understandings of exactly what
they purport to measure (e.g. mobility) always have names, they are is being measured.
42
Like their non-statistical counterparts, they have
rarely underpinned by a theory of the construct being measured that has remained essentially unchallenged for decades.
been deduced. Thus, there are rarely construct theories to test formally.
History has proved that proposing and challenging theories is central to Solution 2—Theory-referenced Measurement
scientific development.
43,44
Two things are needed to advance our understanding of precisely what scales
measure: explicit theories of the constructs being measured, and explicit
This situation seems surprising as explicit definitions of constructs would seem methods of testing those theories. Over the last 25 years, a number of groups
to be pre-requisites for establishing scale validity. It has arisen, in part, because have addressed these issues.
42,56–59,60,61
One group in particular has developed
the constructs measured by many scales are determined during their their ideas to an advanced level.
42,56,59
However, their work is largely
development. Typically, scale developers generate a large pool of items, group inaccessible to clinicians as it concerns the measurement of reading ability. A
them into potential scales, either statistically or thematically, decide what review of that work is illuminating.
construct each group seems to measure, and then remove unwanted or
irrelevant items. The main limitation of this approach is that the scale content,
rather than the construct intended for measurement, defines what the scale
Two things are needed to advance
measures. Neither grouping items statistically nor thematically ensures that
the items in a group measure the same construct, but this does explain why
our understanding of precisely what
items such as ‘having trouble meeting the needs of my family’ and ‘few social
scales measure: explicit theories of the
contacts outside the home’ appear in scales purporting to measure mobility
and fatigue, respectively. Furthermore, both methods of grouping items avoid
constructs being measured, and explicit
the process of defining, conceptualizing, and operationalizing variables,
methods of testing those theories.
which is central to valid measurement.
45–48
Even if the circumstances were different, and scales were underpinned by The central premise of this group’s approach is a change in focus from
explicit construct theories, current methods of validity testing would not studying people to studying items.
42
An example helps to make this idea
enable those theories to be tested adequately. Why? Because current tangible. The Lexile system is a scale for measuring people’s reading ability.
methods, which integrate evidence from non-statistical and statistical tests, The items of the scale are passages of text with different levels of readability
provide circumstantial evidence at best that a set of items is measuring a (reading difficulty). Responses to the items are scored to give a measure of
specific construct. reading ability. Theories suggest that the reading difficulty of a passage of text
is determined by the frequency of its words as they are used in everyday
Non-statistical tests of validity typically consist of assessments of content and communications and sentence length. Empirical studies support this construct
face validity. Content validation assesses whether scale development sampled theory by showing that these two item characteristics (word frequency and
all the relevant or important content or domains
49
and used ‘sensible methods sentence length) combine to form a construct specification equation
of scale construction’ and a ‘representative collection of items.’
50
Face consistently explaining >80% of the variation in item location (text difficulty).
59
16 US NEUROLOGY
Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52  |  Page 53  |  Page 54  |  Page 55  |  Page 56  |  Page 57  |  Page 58  |  Page 59  |  Page 60  |  Page 61  |  Page 62  |  Page 63  |  Page 64  |  Page 65  |  Page 66  |  Page 67  |  Page 68  |  Page 69  |  Page 70  |  Page 71  |  Page 72  |  Page 73  |  Page 74  |  Page 75  |  Page 76  |  Page 77  |  Page 78  |  Page 79  |  Page 80  |  Page 81  |  Page 82  |  Page 83  |  Page 84
Produced with Yudu - www.yudu.com