Background: The history of the severity of seasonal allergic symptoms is often obtained post-seasonally as a retrospective assessment. Correct rating is essential when determining the efficacy of pharmaceutical treatment, indications for allergen-specific immunotherapy (SIT), or inclusion into controlled clinical studies. Objectives: To investigate the agreement between in- and post-seasonal ratings of seasonal symptoms, and to investigate whether the effect of SIT could be detected retrospectively. Material and methods: Thirty-five birch pollen-allergic patients were allocated to SIT or placebo in a double-blind study. Assessment of severity of symptoms from the nose, eyes and lungs were performed daily during the season 2000, and post-seasonally 6 months after the season in 1999 and 2000. A four-point verbal descriptor scale (VDS-4) was used at all occasions. A mean in-seasonal symptom rating was calculated for four periods: the day, the week and the 2 weeks with the highest symptoms score, and the arithmetic season (the period covering the mid-90% of the accumulated pollen count). In- and post-seasonal ratings were compared with Cohen's weighted kappa (κw). Results: Agreement between in-seasonal and retrospective ratings was fair to moderate (κw: 0.30-0.60). Post-seasonal ratings were most related to symptoms experienced in the week with the highest symptom scores, and least related to the arithmetic season. The post-seasonal ratings were significantly skewed towards higher symptom scores than the mean of in-seasonal ratings in periods ≥ 2weeks. Despite being comparable before intervention, only in the SIT-treated group was a significant decrease in post-season ratings of severity of rhinoconjunctivitis apparent (P < 0.05). Asthma scores were not reduced but fewer patients in the SIT group reported lung symptoms (P < 0.001). Conclusion: Post-seasonal assessment of seasonal allergic symptoms generally describes a shorter period than the arithmetic season. Post-season assessment tends to over-rate average symptom severity, but appears sufficiently sensitive to detect treatment efficacy.