JSON Knowledge Graph (JKG) is a JSON specification for working with general knowledge graphs--specifically, property graphs.
JKG serves two roles:
- It is a data exchange format that allows both import into and extraction from a knowledge graph.
- It implements a data model for a general knowledge graph.
The JKG was originally developed to support the particular use case of knowledge graphs of discrete, encoded biomedical data. However, the JKG should support knowledge graphs of any data that can be represented as assertions involving triplets--i.e., statements that link entities through relationships in format
subject predicate object
A JKG JSON is a JSON file that conforms to the JKG specification.
JKG specifies the entities (nodes), relationships (edges), and properties of a knowledge graph. JKG makes no assumptions about the platform of the graph database that might host a knowledge graph, aside from the following:
- The platform supports JSON files.
- The platform supports the basic elements of a property knowledge graph (nodes, edges, and properties).
Implementations of the JKG in a particular graph database platform, of course, require integration with the architecture of the platform, including the methods by which data can be exchanged.
An implementation of JKG specific to neo4j is available in the jkg-neo4j repository.
A JSON that conforms to the JKG specification will contain the general information needed to instantiate a knowledge graph. This information includes categorical valuesets of reference information used for labels or classes of information.
It is possible to validate a JSON file against the JKG schema. Validated JKG JSONs files should import into a graph database without issues such as referential integrity.
The jkg-neo repository includes an application (validate_jkg_json) that validates a JKG JSON file against the JKG Schema. Although jkg-neo is a neo4j implementation of JKG, the validation application is a Python application that is independent of any graph database platform. The repository includes documentation that describes the forms of JKG validation.
A knowledge graph that is based on the JKG Schema will implement the JKG data model.
A JKG JSON consists of two arrays:
- a nodes array of node objects
- a rels array of relationship objects (rel objects)
An important characteristic of the JKG data model is that it derives from its structure instead of being defined by its structure.
Although JKG can model a knowledge graph that links entities by relationships, it actually represents one of its fundamental entities (the code) as a relationship with properties.
The entities of the JKG model are:
A concept is a discrete codification of an idea--or, as the UMLS Reference Manual describes it, a "meaning".
For example, the UMLS (a source of data for a particular JKG implementation)
defines the idea of Set of straight venules of kidney
with a concept with identifier C1183212.
A code is a representation of a concept in a particular vocabulary. Vocabularies include ontologies and databases, such as those in the OBO Foundry.
For example, the Foundational Model of Anatomy Ontology (FMA) represents
the concept UMLS:C1183212 (Set of straight venules of kidney) with the code 72007.
Uberon, on the other hand, represents the same concept with the code 0010181.
As described in Relationships below, a code is not represented as a node, but by the properties of a relationship.
A term is a "lexical variant" of a string descriptor for either a concept or code. For example, the code UBERON:0010181 can be described with terms that include "straight venules of kidney"; "venulae rectae"; and "venula rectae renis".
There two types of relationships in the JKG model:
Relationships of this type (also known as coderels in JKG) link a concept with a code (a representation of the concept in a source).
Although a code is an entity in JKG, it is not represented with a type of node; instead, it is represented as a set of properties of a CODE relationship between a concept entity and one of the code's term entities.
Relationships between concepts correspond to the common understanding of "relationship" (or predicate) in an ontology assertion. For example, the concept linked to the code for UBERON:0010181 has a isa relationship with the concept linked to the code for UBERON:0006544.
JKG features two basic types of node objects:
- Entity node objects that correspond directly to entities in the JKG data model.
- Reference node objects that correspond to information used to specify entity node objects, such as labels and categories.
A Source node is a reference node object that describes a source of encoded data used to populate a JKG JSON. Types of sources include:
- ontologies (e.g., OWL files), such as those in the OBO Foundry or NCBO BioPortal
- vocabularies, such as those maintained in the National Library of Medicine's UMLS Metathesaurus
- online repositories, such as UniProtKB
Keys of the Source node object are:
This is always ["Source"].
A nested object (dict) containing key/value pairs:
An identifier for the source in format owner:identifier
| Type of source | owner | identifier |
|---|---|---|
| UMLS vocabulary | UMLS | versioned source identifier (VSAB) |
| non-UMLS vocabulary | SAB | SAB |
SAB is the Source ABbreviation, an acronym that identifies the source.
Short name for the source
Source ABbreviation for the source
Description of the source
Version identifier for the source. Version information can take a variety of forms, including:
- official version identifier
- the release date of a source file
- the download date
For UMLS sources, the UMLS Metathesaurus Source Restriction Level.
A list of the UMLS term types used by the source. Term types are acronyms used to categorize terms used to describe a concept or code--e.g., PT for preferred term; SY for synonym; etc. UMLS term types are different than the JKG Term node object type.
A URL to a page maintained by the source owner
Following are examples of both a non-UMLS source (Uberon) and a UMLS source (SNOMEDCT_US).
{
"labels":["Source"],
"properties":{
"id":"UBERON:UBERON",
"name":"Uberon",
"description":"Uberon multi species anatomy ontology",
"sab":"UBERON",
"source_version":"2025-JAN-15",
"source":"http://purl.obolibrary.org/obo/uberon/uberon-base.owl"
}
},
{
"labels":["Source"],
"properties":{
"id":"UMLS:SNOMEDCT_US_2025_09_01",
"name":"US Edition of SNOMED CT, 2025_09_01",
"sab":"SNOMEDCT_US",
"source_version":"2025_09_01",
"srl":9,
"ttyl":["FN","IS","MTH_FN","MTH_IS","MTH_OAF","MTH_OAP","MTH_OAS","MTH_OF","MTH_OP","MTH_PT","MTH_PTGB","MTH_SY","MTH_SYGB","OAF","OAP","OAS","OF","OP","PT","PTGB","SB","SY","SYGB","XM"]
}
}
A Node_Label is a reference node object that describes a string used as an additional label for Concept node objects.
The set of Node_Label node objects currently corresponds to a subset of the semantic types of the UMLS Semantic Network.
Examples of node labels include "Laboratory Procedure" and "Substance".
Keys of the Node_Label node object are:
This is always ["Node_Label"].
A nested object (dict) containing key/value pairs:
The UMLS identifier of the semantic type to which the Node_Label corresponds, in format UMLS:TUI, where TUI is a Term Unique Idenetifier.
The definition of the Node_Label
The string for the Node_Label
The SAB of the source
{
"labels":["Node_Label"],
"properties":{
"id":"UMLS:T167",
"def":"A material with definite or fairly definite chemical composition.",
"node_label":"Substance",
"sab":"UMLS"
}
}
A Rel_Label reference node object describes a string used as a label for a relationship.
Keys of Rel_Label node objects include:
This is always ["Rel_Label"].
An object (dict) containing key/value pairs:
The identifier for the Rel_Label node object. The default format concatenates the SAB with the relationship label string.
The definition of the Rel_Label node object
The relationship label string
The SAB for the source for which the relationship label is valid
{
"labels":["Rel_Label"],
"properties":{
"id":"UMLS:allele_has_abnormality",
"def":"allele_has_abnormality",
"rel_label":"allele_has_abnormality",
"sab":"UMLS"}}
A Concept node objects represent a concept entity in the JKG model.
Keys of Concept node objects include:
A list that will contain at least the value "Concept", and may can also contain one of the Node_Label values (e.g., "Substance").
A nested object (dict) containing key/value pairs:
The Concept Unique Identifier (CUI) for the concept. The JKG extends the UMLS notion of CUI to non-UMLS vocabualaries.
- UMLS CUIs are alphanumeric strings starting with "C--e.g.,
C1183212. - CUIs for non-UMLS sources concatenate a code from the vocabulary with " CUI"--e.g.,
UBERON:8600004 CUI.
The preferred term for the concept. A concept only has one term.
The source of the concept--i.e., either "UMLS" or a source SAB.
{"labels":["Concept","Laboratory Procedure"],
"properties":{
"id":"UMLS:C2237094",
"pref_term":"arterial blood gases % oxygen saturation left atrium (lab test)",
"sab":"UMLS"
}
},
A Term node represents the Term entity in the JKG model. A Term node object describes a string that can be used as
- preferred terms for concepts
- terms for CODE relationships (coderels)
Keys of Term node objects include:
This is always ["Term"].
An object (dict) containing a single key id, for which the value is a term string.
{"labels":["Term"],
"properties":{
"id":"arterial blood gases % oxygen saturation left atrium (lab test)"
}
}
Rel objects represent the relationships of the JKG model.
There are two types of rel objects:
- coderel objects that link concepts to terms (and define codes)
- all other rel objects that link concepts
Although a coderel object links a Concept node to a Term node in the JKG structure, it also represents the code entity of the JKG model.
Keys of coderel objects include:
This is always "CODE".
A nested object (dict) containing key/value pairs:
A nested object (dict) that represents the origin of the code relationship--i.e., a concept. The start object contains a properties object with an id key for which the value is the CUI of the object concept.
A nested object (dict) that represents the terminus of the code link--i.e., a term. The end object contains a properties object with an id key for which the value is a term linked to a code--i.e., a Rel_Label of a particular term type.
A nested object that represents the code entity of the JKG model. Keys of the properties object include:
The SAB for the code
The definition for the code
The ID of the code in its vocabulary. code_id is in format SAB:_code
The UMLS term type of the term as it relates to its code. For example, a tty of "PT" identifies the Preferred Term for a code.
Following are coderels linking the code FMA:72007 to
the concept with CUI UMLS:C1183212 for the code's preferred term
(Set of straight venules of kidney) and to one of the code's term synonyms
(Straight venules of kidney).
{
"label":"CODE",
"end":{
"properties":{
"id":"Set of straight venules of kidney"
}
},
"properties":{
"sab":"FMA",
def":"",
tty":"PT",
"codeid":"FMA:72007"
},
"start":{
"properties":{
"id":"UMLS:C1183212"
}
}
},
{
"label":"CODE",
"properties":{
"sab":"FMA",
"def":"",
"tty":"SY",
"codeid":"FMA:72007"},
"start":{
"properties":{
"id":"UMLS:C1183212"}
},
"end":{
"properties":{
"id":"Straight venules of kidney"
}
}
A concept-concept rel object represents a relationship in the JKG model.
Keys of concept-concept rel objects include:
Corresponds to a Rel_Label object.
An object (dict) containing key/value pairs:
A nested object (dict) that represents the origin of the relationship--a concept. The start object contains a properties object with an id key for which the value is the CUI of the originating concept.
A nested object (dict) that represents the terminus of the relationship--a concept. The end object contains a properties object with an id key for which the value is the CUI of the terminating concept.
{
"label":"isa",
"properties":{
"sab":"UBERON"},
"start":{
"properties":{
"id":"UBERON:0011153 CUI"}
},
"end":{
"properties":{
"id":"UBERON:0010912 CUI"}
}
},
