Skip to content
karafecho edited this page Dec 11, 2023 · 3 revisions

Back to Home

General Description

imProving Agent is Autonomous Reasoning Agent built on top of Scalable Precision Medicine Oriented Knowledge Engine (SPOKE) and is part of the NCATS Biomedical Data Translator Network. SPOKE is a precomputed large knowledge graph, a graph database that contains >25M "concepts" (the nodes) and >50M "factual" relationships (the edges) between the nodes.

It aims to improve user queries by utilizing data from EHR, clinical data sets, and multi-omic studies of large cohorts to extract empirical relevance of a given concept (node) for a particular context (e.g. the disease contained in a query).

Use these links to find out more about imProving Agents's Data, its algorithms, or some of its multi-omic cohort data.

Description of inference

imProving Agent employs simple algorithms to respond to the Translator queries "Which drugs may treat <disease>?" and "Which compounds affect activity of <gene>?"

Which drugs may treat <disease>?

For this query imProving Agent first queries SPOKE KP for known treatments for <disease> at any clinical trial phase; this knowledge has been extracted from CHEMBL and DrugCentral. These known treatments are ranked with our normal ranking algorithm and appear atop the results.

Next, imProving Agent determines an empirical relevance value Propagated SPOKE Entry Vector for the <disease> in the question. (The PSEV values of the nodes have been precomputed by training the SPOKE KG on the feature vectors of the patients in aforementioned cohorts that also contain disease labels). Given the PSEV, imProving Agent will find compounds that are closely associated with the <disease> of interest. SPOKE then scores these compounds with a penalty for lack of clinical trial data and appends them to the list of results recovered in the first step.

Unfortunately, this algorithm breaks down when no PSEV can be found for a queried disease identifier. This may occur when a disease is rare enough that sufficient cases weren’t available in the empirical cohort data for training, when there is no Disease Ontology identifier for the disease, or the disease is newly described since the training of the PSEVs.

Which compounds may affect activity of <gene>?

(note, this section covers all variants of this question).

First, imProving Agent will search for compounds that are known to directly affect the expression of a gene. This information comes from LINCS L1000 experiments. These answers are ranked first.

Next, imProving Agent uses a simple algorithm to search for a two-hop connection from compounds to the gene of interest using the following combinations:

imProving Agent's logic two-hop logic for Compound affects Gene query

Given the above, imProving Agent will assert an "affects" edge directly between <compound> and <gene>. These results are scored by imProving’s ranking algorithm and appended to the results of the direct query in the first step.

Description of imProving Agent’s resources

Scalable Precision Medicine Oriented Knowledge Engine (SPOKE)

imProving Agent primarily uses SPOKE (available in Translator via SPOKE-KP) as its knowledge source. SPOKE is a biomedical knowledge graph combining and connecting dozens of different data sources. You can read more about SPOKE here and visit the project page.

Propagated SPOKE Entry Vectors (PSEVs)

These vectors are embeddings of empirical cohort data onto the SPOKE knowledge graph. In brief, the algorithm, which uses multi-node biased random walks to simulate graph diffusion to compute the relevance, expressedas PSEVQ(i),of all the nodes i (prescribed drugs, symptoms, disease, etc.) in SPOKE for a concept Q of interest proceeds as follows:

  • Patient cohorts for a given concept Q (disease, drug, etc.) are identified in the EHR or other clinical data sets.
  • Data from these cohorts are de-identified, extracted, and translated into identifiers that map to SPOKE
  • Occurrence rates of different concepts i are used as weights to kick off many random walks across the SPOKE graph.
  • The number of times xQ(i) that a given node i is visited is recorded during the random walks. The vector of the normalized frequency xQ(i) for all nodes i makes up the resulting embedding vector for the concept Q.

You can read more about the PSEV algorithm here.

Translator Reasoner API

Supports TRAPI 1.4

Knowledge Providers Accessed

  • Multiomics Provider - BigGIM, Disease Risk Models, COVID-19
  • COHD
  • Text Mining
  • Genetics Provider (planned pending TRAPI 1.0)

Source Code

imProving Agent

https://github.com/suihuanglab/improving-agent

  • Takes query from ARS and extracts a graph q (output graph) from its internal Knowledge Network (SPOKE)
  • Queries KPs for relevant edge metadata that can be used in ranking, e.g. EHR co-occurrence
  • Checks empirical evidence from raw data of cohorts (EHR and multi-omics studies)

SPOKE PSEVs:

https://github.com/baranzini-lab/PSEV/

External Documentation

Clone this wiki locally