
The main goal of this project is to address the challenge faced by healthcare professionals who must sift through extensive and often unstructured patient records to make informed clinical decisions. The project seeks to automate the summarization of medical records, reducing the manual workload, minimizing the risk of missing critical information, and ensuring that essential details are readily accessible. By the end of the project, students are expected to deliver a functional tool capable of processing raw medical records and generating meaningful, context-aware summaries that support continuity of care and improve clinical workflows.
To successfully complete the Medical Record Summarizer project, students will undertake a series of structured tasks and activities. The project begins with a comprehensive literature review, where students research state-of-the-art NLP methods and summarization techniques, particularly those relevant to the medical domain. Next, students will collect and preprocess medical record datasets, ensuring data cleanliness and suitability for NLP processing. They will then design and implement an NLP pipeline, incorporating steps such as tokenization, sentence segmentation, named entity recognition (NER) for extracting medical terms, and information extraction to identify relevant clinical details. The core of the project involves developing or adapting summarization algorithms—both extractive and abstractive approaches—to generate concise summaries from the processed data. Students will evaluate their models using automatic metrics and, if feasible, through feedback from healthcare professionals. The project also requires thorough documentation, including a detailed final report, and culminates in a formal presentation and demonstration of the working prototype.