BACKGROUND AND PURPOSE: Some uncertainty persists regarding the reproducibility of the recommended core set of performance-based tests, as well as common muscle function tests, when applied in individuals with knee osteoarthritis (KOA). The purpose of this study was to investigate the intrarater reliability and agreement of the recommended core set of performance-based tests and common muscle function tests in KOA.
METHODS: Participants (N=40) with radiographic and/or symptomatic KOA were evaluated twice with a 3-day interval between test sessions using the following tests: Leg extensor (LE) maximal muscle power measured in a Nottingham Power Rig; knee extensor (KE) peak isometric strength measured with a handheld dynamometer; 40-m walk test; 30-second chair-stand test; and 9-step stair climb test. Reliability was assessed using a 2-way, mixed-effects, single-measures model (3,1), absolute agreement-type intraclass correlation coefficient (ICC). Agreement was assessed using 95% limits of agreement (LOA) and LOA relative to the mean score from test and retest (LOA-%).
RESULTS: Reliability for all tests was very high (ICC ≥ 0.97). LOA (LOA-%) was ±32.3 watt (W) (±22%) for LE power; ±22.7 N·m (±24%) for KE strength; ±0.2 m/s (±10%) for 40-m walk test; ±2.4 repetitions (±14%) for 30-second chair-stand test; and ±2 second (±20%) for stair climb test. A potential participant learning effect was found for all 3 performance-based tests, indicated by the significantly better scores at retest.
DISCUSSION: The very high reliability found for the performance-based tests supports findings from previous studies and confirms discriminate reliability of these tests on a group level. Also, very high reliability estimates were demonstrated for both muscle function tests. This study also provided estimates of agreement for both performance-based and muscle function tests, which are important to consider when using these tests on an individual level in clinical practice.
CONCLUSION: When using these tests to monitor changes over time in the clinic, depending on the test, improvements of less than 10% to 24% could be a result of measurement error alone and therefore may not be considered an actual improvement after treatment.