The spread of early-stage (T1 and T2) adenocarcinomas to locoregional lymph nodes is a key event in disease progression of colorectal cancer (CRC). The cellular mechanisms behind this event are not completely understood and existing predictive biomarkers are imperfect. Here, we used an end-to-end deep learning algorithm to identify risk factors for lymph node metastasis (LNM) status in digitized histopathology slides of the primary CRC and its surrounding tissue. In two large population-based cohorts, we show that this system can predict the presence of more than one LNM in pT2 CRC patients with an area under the receiver operating curve (AUROC) of 0.733 (0.67-0.758) and patients with any LNM with an AUROC of 0.711 (0.597-0.797). Similarly, in pT1 CRC patients, the presence of more than one LNM or any LNM was predictable with an AUROC of 0.733 (0.644-0.778) and 0.567 (0.542-0.597), respectively. Based on these findings, we used the deep learning system to guide human pathology experts towards highly predictive regions for LNM in the whole slide images. This hybrid human observer and deep learning approach identified inflamed adipose tissue as the highest predictive feature for LNM presence. Our study is a first proof of concept that artificial intelligence (AI) systems may be able to discover potentially new biological mechanisms in cancer progression. Our deep learning algorithm is publicly available and can be used for biomarker discovery in any disease setting. © 2021 The Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.