Psychometric Evaluation of PHQ9 Using Item-Response Theory.
|Title||Psychometric Evaluation of PHQ9 Using Item-Response Theory.|
|Publication Type||Journal Article|
|Authors||Amtmann D, Cook KF, Ehde DM, Johnson KL, Hinton K, Bombardier CH|
|Journal||International Journal of MS Care|
Background: The nine-item patient health questionnaire (PHQ9) is a scale used to screen for depressive episodes and symptoms in medical patients. Although it has shown promise in other populations, it has not been validated in individuals with multiple sclerosis (MS). Purpose: The purpose of this study was to apply itemresponse theory (IRT) methods to explore the psychometric properties of PHQ9 and examine functioning of the scale in individuals with MS. Sample: Data were collected from 107 people with MS. Participants responded to PHQ9 items as well as demographic, clinical, and other quality-of-life measures. Analyses: An IRT model appropriate for items with more than one response option (Andrich’s rating scale model) was used to calibrate the PHQ9 items. IRT assumptions of unidimensionality and local item independence were assessed by confirmatory factor analysis. Fit of PHQ9 items to the rating scale model was evaluated. The effective measurement range of the measure was estimated. Results: The interitem consistency of PHQ9 was extremely low (Cronbach’s a = .30). The measure had modest fit to a unidimensional model (comparative fit index = 0.928, TLI = 0.944, RSEA = 0.145), and a substantial percentage (44%) of item pairs exhibited higher than optimal residual correlation (>0.10). All but two of the items met traditional fit standards after calibration to the rating-scale model. Least fitting items were “moving/speaking slowly” and “better off dead.” The PHQ9 items were most effective at measuring people with higher levels of depression. Conclusions: Appropriate to its purpose as a screening instrument, the PHQ9 best discriminates among people with higher levels of depression. The results suggest that it would not be a good instrument for discriminating among lower levels of depression. There are several published cut points for the PHQ9. Future research should evaluate the equivalence of these cut points based on an IRT calibration of item responses.