The DigiLut Data Challenge 2024, organized by the Foch Hospital in collaboration with the Health Data Hub and funded by Banque Publique d’Investissement (Bpifrance), aimed to advance the detection of graft rejection following lung transplantation. This challenge attracted data scientists and AI experts from around the globe, who applied cutting-edge deep learning techniques to analyze digitized transbronchial biopsy slides. The objective was to develop algorithms capable of identifying pathological zones and determining the presence and severity of graft rejection.
Challenge Overview
The DigiLut challenge focused on detecting graft rejection in digitized transbronchial biopsy slides from lung transplant patients. Participants were provided with a unique image bank containing both annotated and non-annotated data. The goal was to create algorithms that could generate region-of-interest boxes for type A lesions on new, non-annotated slides.
Contextual Implementation
During the challenge, the competitors were provided with a managed Jupyterhub environment for their development work. Each team had access to a computational setup that included 6 vCPU, 13GB RAM, and 521GB persistent storage.
The environment supported real-time collaboration, allowing team members to work on the same notebooks and data. The environment was provided mainly to pre process the big dataset before downloading for training locally.
A SFTP server was also provided to competitors who wanted to download the complete > 3TB dataset.
Evaluation Metric
In the DigiLut Data Challenge 2024, the evaluation metrics focused on the Generalized Intersection over Union (GIoU) and the F1 Score to balance precision and recall, particularly aiming to minimize false negatives. The GIoU metric extends the traditional Intersection over Union (IoU) by incorporating the area of the smallest enclosing box that contains both the predicted and ground truth bounding boxes, thus providing a more comprehensive evaluation of the model's performance in detecting lesions.
The F1 Score, a harmonic mean of precision and recall, was used as the primary measure for ranking models on the private leaderboard. This score helps ensure a balanced evaluation by considering both false positives and false negatives, which is crucial in medical imaging tasks where accurate lesion detection is vital.
Winners and Their Solutions
CVN Team: Loïc Le Bescond and Aymen Sadraoui
URL: First Solution
The CVN Team secured the first place with a sophisticated approach that excelled in both accuracy and efficiency. Their solution involved a comprehensive preprocessing step to segment the Whole Slide Images (WSIs) into smaller, more manageable units. They employed a hybrid model combining convolutional neural networks (CNNs) and transformer-based architectures to effectively capture both local and global features in the biopsy slides. Their model's robustness was enhanced through extensive data augmentation and the incorporation of pseudo-labels from unlabeled data.
MaD Lab Team: Emmanuelle Salin and Arijana Bohr
URL: Second Solution
The MaD Lab Team took the second spot with an innovative two-step model that first performed ranking and then detection. Their approach involved splitting each WSI into "structures" or tissue segments. A ranking model predicted whether a structure contained a bounding box (bbox), and a detection model subsequently made precise bbox predictions. They leveraged multiple folds of CoAT and MaxViT models for ranking, and Co-DINO models for detection, ensuring high accuracy and reliability.
Harshit Sheoran
URL: Third Solution
Harshit Sheoran, the third-place winner, adopted a two-part strategy involving ranking and detection. His model divided each WSI into smaller "structures" and used a ranking model to identify those containing bboxes. The detection model then made precise bbox predictions for these structures. Harshit employed pseudo-labeling to enhance model training and used a blend of CoAT, MaxViT, and Co-DINO models. His technical implementation featured high-performance hardware, including an AMD TR 3960X CPU, 128GB RAM, and four NVIDIA 3090 GPUs.
The complete winners solutions, including notebooks and report are available in our Github repository : https://github.com/Trustii-team/DigiLut
You can also access and download all participants code at https://app.trustii.io => DigiLut Data Challenge => Leaderboard => Private
Technical Summary
The winning solutions of the DigiLut Data Challenge 2024 demonstrate the power of combining traditional CNNs with transformer-based architectures. The hybrid models used by the top teams effectively captured intricate details in the biopsy slides, leading to highly accurate predictions. The use of pseudo-labeling to incorporate unlabeled data was a common theme among the winners, highlighting its efficacy in improving model performance.
The preprocessing step of segmenting WSIs into smaller structures was crucial in managing the large image sizes and focusing the models on relevant regions. The application of advanced techniques like Soft Non-Maximum Suppression (soft-NMS) further refined the bbox predictions, ensuring precise identification of pathological zones.
Encouragement to the Trustii.io Data Scientists Community
The DigiLut Data Challenge 2024 has set a new benchmark in the application of deep learning for medical image analysis. The winning solutions are not only technically impressive but also offer valuable insights and methodologies that can be reused and built upon. We encourage the Trustii.io data scientists community to explore the winning solutions available on GitHub, delve into the detailed documentation, and leverage these state-of-the-art techniques in their own projects.
By reusing and iterating on the winners' code, the community can contribute to advancing the field of medical image analysis and improving patient outcomes in lung transplantation and beyond.
If you have any question, reach out to us at challenges@trustii.io
Comentários