Create Stats based on Mondo Release Tags¶
Various statistics can be generated for the Mondo release tag asset mondo.owl
. This document will describe these statistics and how they can be generated.
Prerequisites¶
The statistics are run as make
goals that run within the Ontology Development Kit (ODK), therefore ODK and Docker are needed similar to running any other make
goal as part of Mondo development tasks. For example the command to merge a ROBOT template into Mondo (sh run.sh make merge_template
).
The Community/GitHub Issue statistics can be run as a GitHub Action or from the commandline. However, if running from the command line Python and knowledge of your GitHub Token is needed (this goal does not run within ODK).
The statistics that are generated from the commandline by default will be run on the most recent tagged release version of Mondo based on date or can be run by adding a specific Mondo tag release date, e.g. v2025-02-04, to generate statistics for Mondo based on the content in the tagged release.
These statistics are currently (2-Apr-2025) being manually saved for each tagged release in the Mondo Monthly statistics Google Sheet.
General Statistics¶
The General statistics include the following counts:
- Total number of human diseases
- Total number of non-human diseases
- Total number of human diseases in the rare subset
- Total number of human genetic diseases (human)
- Total number of infectious diseases (human)
- Total number of cancer diseases (human)
- Total number of genetic diseases (non-human)
- Total number of infectious diseases (non-human)
- Total number of cancer diseases (non-human)
Usage:
The General statistics can be run on the most recent tagged release as:
sh make run.sh create-general-mondo-stats-all
The General statistics can be run on a specific tagged release as:
sh run.sh make MONDO_TAG=v2025-02-04 create-general-mondo-stats-all
Rare Subset Statistics¶
The Rare subset statistics include counts of the following rare subsets:
- rare
- nord rare
- gard rare
- orphanet rare
- inferred rare
- mondo rare
Usage:
The Rare statistics can be run on the most recent tagged release as:
sh run.sh make create-rare-mondo-stats-all
The Rare statistics can be run on a specific tagged release as:
sh run.sh make MONDO_TAG=v2025-02-04 create-rare-mondo-stats-all
Synonym Statistics¶
The Synonym statistics include count of the following:
- exact
- narrow
- broad
- related
Within the set of exact synonyms, counts are also generated based on the source(s) for the exact synomym where the sources to count are limited to: OMIM, Orphanet, NCIT, DOID, ICD10CM, icd11.foundation.
Usage:
The Synonym statistics can be run on the most recent tagged release as:
sh run.sh make create-synonym-mondo-stats-all
The Synonym statistics can be run on a specific tagged release as:
sh run.sh make MONDO_TAG=v2025-02-04 create-synonym-mondo-stats-all
Community/GitHub Issue Statistics¶
The Community/GitHub Issue statistics include counts of new and closed issues between two calendar dates and the count of all issue labels for each set of new and closed tickets between the two dates. A list of unique GitHub handles for new and closed tickets is also generated along with the count of their unique GitHub labels for the set of these opened and closed tickets.
The dates used to generate these statistics are, by default, the most recent last two Mondo tagged release dates from the date when the statistics are generated. Therefore, if today is 28-Mar-2025, the most recent last two Mondo tagged release dates are 2025-03-04 and 2025-02-04 (see Mondo tagged releases). The date parameters can be overriden if needed (see below for details on how to do this).
NOTE: The GitHub UI shows issues based on your system settings timezone and the data retrieved by the GitHub API used to generate the Community/GitHub Issue statistics is based on UTC timezone (GitHub Timezones and the REST API). Therefore, depending on your system settings, there will be some variability between GitHub issue data filtered in the GitHub UI versus what is returned from the GitHub API and therefore in the Commuity Statistics reports.
Usage:
From the commandline:
- Export your GitHub token as:
export GITHUB_TOKEN=<YOUR-GITHUB-TOKEN>
- Run the
make
goal as:
make github-issue-stats
NOTE: This is not run with ODK thereforesh run.sh
is not needed. - Alternatively, to run with custom dates, e.g. from 2025-02-22 to 2025-03-03, use:
make github-issue-stats FROM_DATE=2025-02-22 TO_DATE=2025-03-01
From the GitHub Action:
- Go to the GitHub Action called Generate GitHub Issue Statistics
- Click on "Run workflow" and select the branch "master"
- By default, the GitHub Action to generate these statistics will use the dates of the last two most recent Mondo tag release dates since those are generally the time periods of interest. However, custom dates can be used (see screenshot below).
- Click the green "Run workflow" button
- Once complete, scroll to the bottom of the page to find the "Artifacts" section and click on the artifact name to download the ZIP file(s) with the reports.
Examples of GitHub UI filters for Issues:
- Filter by created date:
is:issue created:2025-02-04..2025-03-04
- Filter by closed date:
is:issue closed:2025-02-04..2025-03-04
-
Filter by closed date and label:
is:issue closed:2025-02-04..2025-03-04 label:"New term request"
Tips for searching by dates: GitHub - Query by dates
NOTE: An issue has a "created_at" event and if closed will have a "closed_at" event, which has a date value (Issue event types - closed) and (REST API endpoints for issue events). There is no event type of "open", however there is an issue "state" of being "open" or "closed". This also means filtering for issues "created" between two dates can contain issues with a state of being "open" or "closed".
Ontology Change Statistics¶
To be added
Alignment Statistics¶
To be added
Third party maintained - xrefs Statistics¶
To be added
Third party maintained - other Statistics¶
To be added