Organization of the config files

graph_config.json

This configuration file helps piloting the RDF graphs parsing and discovery. It is used by both the ontology converter and the data converter. It defines the RDF terms to be expected in the graph, the reserved URIs, etc. It also features a lookup table for terminology RDF graphs, so they can be loaded by the ontology converter in a distinct memory slot as the main ontology graph, speeding up the computations. The fields are as follow:

Field description of graph_config

ONTOLOGY_GRAPHS_LOCATIONS

A list of paths (absolute or relative to the code repository root) to the folders containing the main ontology graphs. All the graphs found under (recursive, following symlinks) these points will be read.

RDF_FORMAT

Extension of the RDF files. All formats supported by RDFLIB can be used (see https://rdflib.readthedocs.io/en/stable/plugin_parsers.html). If set to ‘*’, all types are allowed. If set to one of the possible keywords (ex ‘turtle’), only this one is allowed.

PREF_LANGUAGE

The tag to chose first for the label predicate. Impacts the display language of your ontology items.

ALLOW_MIXED_TREES

Default (False) blocks subclass detection for classes having at least one DatatypeProperty or ObjectProperty. Expert use only.

TERMINOLOGIES_GRAPHS

A mapping (dict) between RDF prefixes and filenames (without extension). Ex: {”http://snomed.info/id/”:”snomed-ct-20220131”} means the snomed-ct-20220131.* will be loaded in memory, and accessed upon dscovery of a resource using the prefix http://snomed.info/id/ in its URI. Delete the value field to avoid loading the terminology file in memory (to speed up a testing session, for example). Specifying terminology graphs in this field allows to load them in a separate Graph instance and speeds up all the computations on the RDF graph.

uris

Macrocategory for global variables that define reserved URIs in the graph to parse.

ROOT_URIS

A list of URIs to be used as roots of the ontology. Every item in ENTRY_CONCEPTS should be a subclass of exactly one element in this list.

SUBCLASS_PREDICATE_URI

The subclass reserved URI. Default is rdfs:subClassOf

RANGE_PREDICATE_URI

The range reserved URI. Default is rdfs:range

TYPE_PREDICATE_URI

The type reserved URI. Default is rdf:type

COMMENT_PREDICATE_URI

The comment reserved URI. Default is rdfs:comment

LABEL_PREDICATE_URI

The label reserved URI. Default is rdfs:label

VALUESET_MARKER_URIS

Allows to define a Valueset abstract class that defines if a class A (itself of type Valueset) should be substitued by a list of classes. Example: death-status is of type Class and of type Valueset. It should be interpreted as a multi-choice list (dead, alive, unknown) since in the graph, :dead, :alive, :unkown are all of type death-status.

ENTRY_CONCEPTS

The whitelist of top concepts to be discovered while descending from the ROOT_URIS using the SUBCLASS_PRED_URI predicate.

BLACKLIST

A list of predicate URIs that should be ignored in the walk.

data_config.json

This file is only used by the data converter. It therefore specifies a data-specific blacklist, the path to the data and dependencies graphs, and describes the mappings for RDF contextual fields that are not featured in the i2b2 ontology. It contains structured instructions about how these fields should be unpacked and mapped to specific table and columns of the i2b2 star schema.

Field description of graph_config

DATA_GRAPHS_LOCATION

CONTEXT_GRAPHS_LOCATION

A list of paths (absolute or relative to the code repository root) to the folders containing context data graphs (typically units or providers). All the graphs found under (recursive, following symlinks) these points will be read. Make sure no ontology graph lies there!

MAX_BATCH_SIZE

The number of concept instances to be treated at once before writing. Reduce this number for low-budget but slower computation.

data_global_uris

A macrocategory for reserved URIs.

PROVIDER_CLASS_URI

The URI marking data provider information.

TO_IGNORE

A data-specific blacklist of classes, that will be appended to the original (see graph_config) BLACKLIST.

ENTRY_DATA_CONCEPTS

The classes to consider for the beginning of data exploration. Data exploration is done on this basis rather than patient-based. This should typically be a subset of the ENTRY_CONCEPTS defined in the graph_config.

COLUMNS_MAPPING

User-defined mappings from RDF data elements to i2b2 columns.

VALUE

A macrocategory for elements which value should end up either in the NVAL_NUM or TVAL_CHAR fields of the OBSERVATION_FACT columns. Each element of this category should be a dictionary implementing the ‘col’ and ‘misc’ keys.

col

The name of the OBSERVATION_FACT column to write the value into.

mandatory

[OPTIONAL] A flag stating if this element is necessary (‘True’) for the observation to make sense. By default, a patient identifier is mandatory, but an encounter ID isn’t.

misc

A dictionary hardcoding values into other columns of OBSERVATION_FACT. Example: {‘VALTYPE_CD’:’N’, ‘TVAL_CHAR’:’E’}

transform

[OPTIONAL] A list of the exact names of methods sequence to be called on the object to extract the actual value, if applicable. Example: [‘__repr__’] or [‘year’] or [‘year’, ‘__repr__’].

CONTEXT

A macrocategory for elements defining the observation context (patient, provider, encounter, unit, date, etc.). Each element should be a dictionary with the following keys:

col

The name of the OBSERVATION_FACT column to write the value into.

overwrite

‘True’ or ‘False’ string stating if the item should be refreshed instead of using the value of its parent. For example, one might want to have all modifier dates matching the date of the concept instance, or not. If True, will update. If False, will use the parent and ignore the current value.

pred_to_value

A list of RDF predicates to navigate to reach the actual value. This allows to easily hop through complex Patient, Provider or Encounter objects and be sure of which property defines the actual identifier.

verbose_value

[OPTIONAL] A list of RDF predicates to navigate to reach the (optional) verbose value (alternate value with more details). Some processes will look for it.

i2b2_rdf_config.json

This file describes the bindings between RDF datatypes and i2b2 table-columns, and defines additional filters for ontology elements that should or should not appear in the final ontology.

Field description of i2b2_rdf_config

DEBUG

If ‘True’ (string), the output of the conversion will include cleartext instead of fixed-length basecodes. This helps for data engineering and quality control. See the ‘verbose mode’ details.

MAX_BASECODE_LENGTH

Length of the basecode to be used. Default 50. We recommend not modifying it.

OUTPUT_TABLES_LOCATION

The path (relative to the root of the repository, or absolute) to which the output CSV files should be written.

PROJECT_NAME

The name of your project to be written in the tables identifiers.

IGNORE_TERM_ID

Display option. A list of terminology names (ex snomed, icd-10, etc.) for which you don’t want the items IDs to be part of the display name. (by default the display will be like ‘SNOMED 0123432-lorem ipsum’)

ONTOLOGY_DROP_DIC

A list of classes to be ignored for i2b2-specific reasons (such as patient and encounter information which aren’t part of the ontology in the i2b2 data model). Predicates pointing to only classes which URIs are in this list will be automatically ignored in the graph discovery

UNDROP_LEAVES

A dictionary to finely manage exceptions to the ONTOLOGY_DROP_DIC list depending on the context. Keys should be URIs specified in ONTOLOGY_DROP_DIC. Values are lists of class URIs. If one of those classes feature the dropped class as a direct child, the blaklisting is ignored. Example: I want to ban the rdfs:comment class and the rdfs:hasComment properties mapping to it, except when the said property applies to the rdfs:FreeTextConcept and rdfs:ComplicatedConcept for which it might be useful.

DATA_LEAVES

Giving identifiers for tails of the hierarchy (usually native XML types). Allows N-to-1 mapping (e.g if xls:Float and xls:double should be used equivalently)

EQUIVALENCES

Mappings from the key defined as right-side identifiers in DATA_LEAVES, and how they translate to i2b2.

VALUETYPE_CD

The i2b2 single-character defining the type of value (‘T’ for textual, ‘N’ for numerical)

C_METADATAXML

A configuration binding a specific type to its appropriate representation in the i2b2 XML pop-up (see the XML_PATTERN variable below).

Datatype (example of C_METADATAXML tag), possible values are Integer, PosInteger, Float, PosFloat, String, etc.

MIGRATIONS

A macrocategory defining how to aggregate ontology elements. See the admonition below this table for more details on how to use it.

concept

The concept to which the modifier to relocate is bound.

destination

The list of destination modifiers (as shortened paths) into which the details of the deleted element will be merged. ‘.’ points to the concept element itself, ‘item1/*’ points to all children of ‘item1’.

xmlvaluetype

The Datatype to be written (by default it will be imported from the deleted element but overwriting is allowed). Must be compatible with the C_METADATAXML panel.

newvisualattribute

The new display icon code (by example a item with no siblings could be merged into its parent: the parent shouldn’t be displayed as a folder anymore). i2b2 display codes are defined here: https://community.i2b2.org/wiki/display/ServerSideDesign/C_VISUALATTRIBUTES

XML_PATTERN

The default XML coding for the pop-up window bound to specific ontology items (will pop for text and numeric values). The details of its fields can be found in https://lcbru-trac.rcs.le.ac.uk/wiki/i2b2%20Ontology%20c_metadataxml%20Column

COLUMNS

A macrocategory for the i2b2 tables to be generated and their (oredered) columns. The default value of every column is null.

CONCEPT_DIMENSION

The ordered column names for the CONCEPT_DIMENSION table.

MODIFIER_DIMENSION

id.

METADATA

id.

TABLE_ACCESS

id.

OBSERVATION_FACT

id.

VISIT_DIMENSION

id.

ENCOUNTER_MAPPING

id.

PATIENT_DIMENSION

id.

PATIENT_MAPPING

id.

PROVIDER_DIMENSION

id.

Explanation of the migrate-reduce option

Use example:

{
"MIGRATIONS": {
    "myprefix:LabTestValue":{
        "concept": "myprefix:LabResult",
        "destination": [
        "myprefix:hasLabResultLabTestCode/\*"
        ],
        "xmlvaluetype": "Float"
        }
}
}

Ontology items fine-grained relocation

The most complex item of this configuration file is the list of instructions for element migration/aggregation. It is useful when an ontology item A should be moved to another location in the ontology tree, or which properties should be merged into one or more other items B_i. For example, one could want an SubjectAge concept to be flagged as a numeric item, but following strictly the RDF input would maybe lead to the SubjectAge concept having a child element AgeValue flagged as a numeric item. In this case, one would fill in the instructions for migrating AgeValue into SubjectAge. For a laboratory result, one could want the laboratory test codes to carry the numeric value instead of having a distinct LabResultValue element, which makes sense in RDF but not in i2b2 (a single item can carry more information).

The same kind of manipulation is done by default on the datetime and units elements, since i2b2 allows an ontology item to carry both its identifier, a date, a value and a unit if necessary. We use a naive approach for the datetime and unit, applying them to the direct parent (which will then make it trickle down to its other children). Such an approach doesn’t make sense for numerical values, hence the present instructions to configure aggregations.

Example of reduction:

_images/migrations_example.drawio.svg

While the RDF graph features 1-dimensional objects, i2b2 can deal with more complex objects. Automatic aggregation of several RDF properties into a single i2b2 element is limited. The top element shows the original RDF graph, the left element shows the output of a naive conversion and the right element shows the result of a better reduction, making full use of the ‘unit’ and ‘value’ fields of i2b2 items.

Another example of reduction:

_images/migrations_age.drawio.svg

The aggregation-migration system can also allow you to reduce unnecessarily complex hierarchies into a parent element.