OBJECTIVE: To evaluate the reliability and agreement of semi-quantitative scoring (SQS) and quantitative scoring (QS) systems. To compare the two types of scoring system and investigate the construct validity for both scoring systems.
METHODS: A total of 46 RA patients (median disease duration of 6.5 years) were enrolled in the study. They were investigated with colour Doppler ultrasound using the central position of the wrist. Disease activity score based on 28 joints (DAS-28) was determined for all patients using CRP. Two participants trained in the SQS system and two in the QS system evaluated the 46 anonymized images. All images were scored twice by each of the two assessors in order to assess both intra- and inter-reader reliability.
RESULTS: The reliability for the two systems were 0.964 for the QS, and 0.817 for the SQS, with a comparable inter-reader agreement for both scoring systems; 95% limits of agreement for the QS being between -7.7% and +6.7% on the colour fraction scale (0-100%), whereas SQS was between -0.8 and +0.8 on the ordinal scale from 0 to 3. There was a direct but non-linear relationship between the two modalities (Spearman's r = 0.73) and critical conceptual issues in the agreement between the scoring systems were revealed. The construct validity was poor for both systems with only a weak correlation to CRP.
CONCLUSION: High reliability and good agreement of both scoring systems were found when applied to the same patient cohort. Different scoring systems appear to be highly correlated.