Shared Task on Systematic Review Updates

UP2DATE @ SCOLIA 2027

Tasks

UP2DATE @ SCOLIA 2027 is a shared task focusing on the development of new methods to assist in updating systematic reviews.

Systematic reviews constitute the highest form of evidence in medicine. They are central to decision-making across health and medicine, including the regulatory approval of new treatments, the development of clinical guidelines, and the formulation of institutional and governmental health policies. However, creating a systematic review cost upwards of EUR130,000 and may take over a year to complete.

Unfortunately, it is exceedingly common for a systematic review to be outdated by the time it is published. As medical studies appear at a rate of roughly two per minute, the rigorous procedures that ensure systematic reviews’ comprehensiveness and accuracy prevent medical experts from matching this pace.

UP2DATE is divided into three main tasks:

  • Task 1: Review Update Prediction. This task corresponds to the phase of the systematic review process in which reviewers decide when an update is warranted. Since there is little methodological research to guide update timing, methods developed in this shared task have a good chance to impact future guidelines for systematic review updates. Participants will develop approaches that, given a systematic review topic and a collection of studies, predict a point in time such that all new studies would be captured by an updated review.
  • Task 2: Study Retrieval. This task corresponds to the study retrieval phase of a systematic review. Participants specifically target retrieving the additional studies that should be included in a review update, allowing participants to exploit knowledge about the studies have been included in a review’s initial version. Participants will develop Boolean queries, either manually or via automatic approaches that, given a systematic review topic and a collection of studies, maximize recall and precision, ideally retrieving exactly the studies that are ultimately included.
  • Task 3: Study Classification. This task corresponds to the study screening phase of a systematic review. Participants will develop classification systems, either human-in-the-loop or fully automatic, that predict whether a retrieved study should be included in the updated review. Participants are explicitly allowed to exploit information about studies included in the initial version of the review to assit with predictions.

Important Dates

  • XX.XX.2026 Release of data for Task 1.
  • XX.XX.2026 Submission deadline for Task 1.
  • XX.XX.2026 Release of data for Task 2.
  • XX.XX.2026 Submission deadline for Task 2.
  • XX.XX.2026 Release of data for Task 3.
  • XX.XX.2026 Submission deadline for Task 3.
  • XX.XX.2026 Evaluation results released.
  • XX.XX.2027 Participant paper submission deadline.
  • XX.XX.2027 UP2DATE session at SCOLIA 2027.

Data

The dataset comprises the metadata of 35 open-access Cochrane systematic reviews that have been updated. We provide topic information (i.e., title, query, included document IDs) for both the initial and updated version of the reviews. We provide 20 training topics with both initial and updated information, and hold out the updated information for 15 test topics that will only be released at the end of the shared task. All of the initial versions of the 35 topics will be available for each task.

The document collection will be based on a baseline PubMed dump that we will index for participants. The document collection is accessible in three ways:

The PubMed index can be downloaded from this link (TODO).

Submitting

Participants are invited to submit results for any or all of the three tasks independently.

  • For Task 1, participants should produce a plain text file containing a single date in the form YYYY-MM-DD that we will use to filter documents using the same index available to participants.
  • For Task 2, participants should produce a plain text file containing a PubMed Boolean query and a TREC run file containing their retrieval results. We will validate that the Boolean query corresponds to the TREC run file by running it on the Lucene index with pybool_ir.
  • For Task 3, participants should produce a plain text file where each line corresponds to a study classified to be included in the review. When we release the data for this task, we will also provide the set of studies that should be classified.

Submissions are handled via TIRA.

Evaluation

All tasks will be evaluated in terms of precision, recall and F{0.5,1,2}.

Organisers