SNOMED International is pleased to announce the winners of its recent Entity Linking Challenge. The challenge, which ran from the beginning of January 2024 to March 5, 2024, was hosted and organized by SNOMED International, and included partner organizations; DrivenData, which hosts online data science competitions; AI consultancy Veratai; and Physionet, the Research Resource for Complex Physiologic Signals.
The goal of the challenge was to train machine learning models to link clinical notes with specific topics based on the largest publicly available dataset of labeled clinical notes that had been de-identified and annotated with SNOMED CT concepts.
Since much of the world's healthcare data is stored in free-text documents (usually clinical notes taken by doctors), unstructured data can be challenging for clinicians, researchers and other stakeholders to analyze and extract meaningful insights from. The results of the challenge demonstrate that by applying a standardized terminology like SNOMED CT, healthcare organizations can convert this free-text data into a structured format that can be readily analyzed by computers, in turn stimulating the development of new medicines, treatment pathways, and better patient outcomes.
The challenge, which was open to teams and individuals in any field, offered a share of $25,000 in prize money to the top three winners. Participants could chart their progress against each other on a leaderboard, fostering a spirit of healthy competition.
“SNOMED International thanks all the organizations that worked with us on this initiative and all of those who entered and completed the challenge,” said CEO Don Sweete. “The better we understand how SNOMED CT can be used to analyze and extract meaning from clinical notes, the greater impact it will have. This competition will result in the development of tools and techniques that can be shared within the SNOMED CT and broader health standards and research communities and used to unlock and supercharge the clinical data currently captured in free text. That is how we demonstrate the deep capabilities of SNOMED CT, whether to support the interoperability of patient records or to mine clinical data for a multitude of research purposes.”
The entrants and winners
A total of 553 individuals participated in the challenge, all of whom were part of teams. Of those, approximately 40 submitted the final data required. Some entrants compete in a variety of machine learning and artificial intelligence challenges beyond the healthcare industry, augmenting awareness of SNOMED CT in other sectors.
Winners were selected based on a complex mathematical scoring mechanism which established a winning benchmark of which three winning teams surpassed.
Congratulations to:
First place winners ($12,500): Guy Amit, Yonatan Bilu, Irena Girshovitz and Chen Yanover (KI Institute, Israel)
Second place winners ($7,500): Gleb Sokolov, Mikhail Kulyabin and Aleksandr Galaida (Erlangen, Germany and Yerevan, Armenia)
Third place winners ($5,000): Vincenzo Della Mea, Mihai Horia Popescu and Kevin Roitero (Medical Informatics, Telemedicine & eHealth Lab, Italy)
For more information on the winners, the benchmark solution and the competition metric, read a recent blog authored by DrivenData.
SNOMED International Chief Digital Information Officer Rory Davidson said, “as this is the first time the organization has hosted a competition like this, it was difficult to predict the level of interest and participation from the community. We are extremely pleased to have received so many entrants, especially considering the niche nature of entity linking with SNOMED CT itself.” Rory elaborated to say, “one of the rewarding outcomes is that there have been individuals and teams taking part who would likely never have crossed paths with SNOMED CT and entity linking if we had not run the competition. By introducing SNOMED CT to new communities that we haven’t worked with before, we can share the message of how the terminology can be used alongside different AI technologies to improve the quality of healthcare data.”
Veratai AI Consultant Will Hardman added that while there are many existing entity linking approaches and models, there are very few high-performance, well-trained sets of entity linking models in the public domain that serve a variety of use cases. Going into this challenge, he said he expected three potential outcomes: the first, the stimulation of research into building good open source entity linking models; the second, teaching people who build entity linking models how to extract knowledge from the target knowledge base and use that to help inform the selection of concepts; and the third, the creation of a good, publicly available, SNOMED CT-coded data set.
Tom Joseph Pollard, Research Scientist at MIT and Technical Director of PhysioNet, said “the availability of annotated health data is crucial for advancing patient care and medical research. Pollard went on to say, “the manual annotation of data with standardized concepts remains a demanding and labor-intensive task. Through this partnership, PhysioNet and SNOMED International worked alongside the global research community to establish new benchmarks in entity linking. The challenge highlights the considerable potential of advances in artificial intelligence to transform the process of annotation, paving the way for medical research innovations that could ultimately improve patient outcomes.”
Next steps
SNOMED International will review the results of the challenge to better understand what can be learned from it. One of the known outcomes is that the artifacts produced from the process, such as annotation guidelines and lessons learned, will be shared with SNOMED International Members and the wider community. The winning solutions will also be published and open source so they will be freely available.
“The Entity Linking Challenge won’t result in something concrete that you can plug into your system tomorrow, but the lessons learned are invaluable and help us better understand and inform the extensive capabilities of SNOMED CT,” Rory said.
The completion of the competition marks the end of a years-long planning process – and potentially the beginning of additional endeavors to illustrate and harness the power of SNOMED CT in healthcare data analytics. SNOMED International is currently reviewing what future competitions may be, including being based on a different dataset, challenge or even a language. The challenges of hosting such competitions include finding a suitable dataset and potential partners, and identifying a repeatable, manageable and formulaic scoring mechanism.
Visit the Entity Linking Challenge website for more information.