Skip to content

Strategic Refinement

Summary

The goal of this project is to perform strategic refinement in Mondo and refine the hierarchy. The first step at this refinement focuses on removing classes that group diseases by phenotypic features and "similar" diseases (eg:'dysostosis with predominant craniofacial involvement'- MONDO:0800085).

Justification: Orphanet takes a different approach to classification compared to Mondo, where many disease entities are grouped together based on phenotypic features. This is causing some incorrect inferences in Mondo.

Approach: The approach is to obsolete many of the Orphanet grouping classes and review the resulting hierarchy to ensure proper classification of the children. This will be an incremental and iterative approach. An overview of the approach is:

  1. Review ORDO grouping classes for obsoletion
  2. tag terms for obsoletion and share obsoletion candidates with community and wait at least two months for feedback
  3. Generate tables for curator review
  4. Curator review of proposed obsoletion candidates
  5. Curation workflow

image

Discussion board

A discussion board is available to ask questions about this process. Please tag specific curators/developers as needed.

Relevant tickets

Review ORDO grouping classes for obsoletion

  1. the tab candidates.tsv has grouping classes, that have xrefs to ORDO and do not have any xrefs from any other external source
  2. Mondo curators (Nicole and Sabrina) did a first pass at reviewing this list and made a call if it would be okay to obsolete the term or if it should be ‘rescued’ meaning that we should not obsolete it right now, but we should revisit it later and consider obsoleting it
  3. reasons for ‘rescuing’ a term varied and are noted in the column J
  4. Nicole and Sabrina both looked at all the terms and noted if we agreed or disagreed if a term should be obsoleted (and if we disagreed, we rescued it and we’ll revisit it later)
  5. Any term that was marked for obsoletion, we added obsoletion tags (see workflow here) to obsolete those classes in 2 months from the date we added the tag (for this initial round, the obsoletion dates were either 2023-09-01 or 2023-10-01)
  6. When the work below is done (Curator review of proposed obsoletion candidates and curation workflow), we need to go back and re-review all of the rescued terms and determine if they should be obsoleted and do this process again.

Generate tables for curator review

Content to be added

Curator review of proposed obsoletion candidates

Curator Branch Review

  1. In this spreadsheet, there are branches with children that are prioritized for obsoletion.
  2. Each branch has a corresponding GitHub ticket and an assigned curator (assigned in GitHub) and a corresponding spreadsheet that lists terms that will either be:

    • orphaned
    • leave the branch
    • stay in the branch
  3. For each term that will be orphaned and "leave the branch", create the following columns in the spreadsheet (if it has not been done already):

Label for the parent parent class source PMID Curator confidence
SC % >A oboInOwl:source

Review and reassign superclasses to Orphaned terms

  1. review terms that will become orphaned when the grouping classes are obsoleted
  2. assign a new parent to each term
  3. See video here for more details
  4. Share the spreadsheet with another curator for review, if needed
  5. Nicole, Trish or Sabrina should proceed with obsoletion pipeline

Review and reassign superclasses to "leave the branch" terms

  1. review terms that will leave the branch when the grouping classes are obsoleted
  2. assign a new parent when appropriate, ie when the term leaving the branch should remain in the branch
  3. if you agree that a term should leave the branch, assign a "curator confidence" to indicate that the term was reviewed

Curation workflow

1. Add new parents to orphaned superclasses

  1. Add new parents via ROBOT template
    1. make sure to add a column to add the GH issue related to the review
  2. See example template here

2. Obsolete Terms

  1. Go to relevant GitHub ticket (for example, https://github.com/monarch-initiative/mondo/issues/6739)
  2. Copy and paste the table into a new tab in the spreadsheet (for example, see here)
  3. Create a new column with the CURIES (see column C)
  4. Create a new file in mondo/src/ontology/config/ named obsolete_me.txt
  5. Copy and paste the CURIES into obsolete_me.txt and save
  6. Run sh run.sh make mass_obsolete2 -B GITHUB_ISSUE_URL=GITHUB-ISSUE-URL, where the value of GITHUB-ISSUE-URL is a value like https://github.com/monarch-initiative/mondo/issues/6739 (this value does not need to be in quotes)
  7. Note: some terms were already obsoleted (e.g. because they were obsoleted in the context of another branch). These terms were skipped when the pipeline ran.
    1. Determine the list of terms already obsoleted (you can do this by comparing the list in the file obsolete_me.txt (containing all the terms to obsolete), and the file filtered_obsolete_me.txt (in which the already obsoleted terms were removed)
    2. Add the GH issue tracker to the terms that were already obsoleted (either manually, or by using a ROBOT template)
2. Review changes in Protege
  1. Check the branch and review changes for obsoleted terms and orphaned terms:
    • spot check a few terms to ensure they were properly obsoleted and have the correct Annotations
    • spot check a few terms to make sure they are assigned the correct parent (per the ROBOT template) and they have the correct source annotation(s)
    • check the top level disease branch to ensure there are only two subclasses: 'human disease' and 'non-human disease' (if there are any other classes under there, assert new superclasses to that term. Note in the PR if you are uncertain about the superclass assertion and would like additional review)
  2. Run the reasoner and make sure there are no unsatisfiable classes.
  3. Commit changes and create a PR and assign another curator to review

Review PR

  1. See review spreadsheet here