Table Configuration Reference

Table configurations define how Tablassert transforms tabular data (Excel, CSV, TSV) into knowledge graph assertions.

Purpose

A table configuration specifies: - Data source location and format - How to extract subject-predicate-object triples - Entity resolution rules (taxonomic filtering, category preferences) - Provenance metadata - Optional edge annotations

Template vs Sections

Table configurations support two patterns:

Pattern 1: Template Only

Use when processing a single table with one output.

template:
  syntax: TC3
  source: {...}
  statement: {...}
  provenance: {...}

Pattern 2: Template + Sections

Use when processing variations of the same data (different columns, predicates, etc.) while sharing common configuration.

template:
  syntax: TC3
  source: {...}  # Shared by all sections
  provenance: {...}  # Shared by all sections

sections:
  - statement:  # Section 1: Gene-Disease
      subject: {encoding: gene_column}
      predicate: associated_with
      object: {encoding: disease_column}

  - statement:  # Section 2: Gene-Pathway
      subject: {encoding: gene_column}
      predicate: participates_in
      object: {encoding: pathway_column}

Merge Behavior (fastmerge)

Sections inherit from template and override specific fields:

Dictionaries: Recursive merge, section overrides template keys

template:
  statement:
    subject: {encoding: A}
    predicate: related_to

sections:
  - statement:
      predicate: associated_with  # Overrides, subject stays "A"

Lists: Concatenation (extends)

template:
  statement:
    subject:
      prioritize: [Gene]

sections:
  - statement:
      subject:
        prioritize: [Protein]  # Result: [Gene, Protein]

Scalars: Section replaces template

template:
  syntax: TC3

sections:
  - syntax: TC2  # Overrides (not recommended)

Use Cases

Single output: Template only

template:
  source: {kind: text, local: data.csv}
  statement: {...}

Multiple predicates, same source:

template:
  source: {kind: excel, local: data.xlsx}
  provenance: {publication: PMC123}

sections:
  - statement: {predicate: treats}
  - statement: {predicate: prevents}

Multiple columns, shared provenance:

template:
  source: {kind: text, local: data.csv}
  provenance: {publication: PMID456}
  statement:
    subject: {encoding: gene_symbol}

sections:
  - statement: {object: {encoding: column_A}}
  - statement: {object: {encoding: column_B}}

Configuration Schema

Template Metadata

Field	Type	Required	Description
`syntax`	String	Yes	Configuration version (must be `"TC3"`)
`status`	String	No	Development status: `"alpha"`, `"beta"`, `"primetime"`

Source

Defines the data file location and format.

Excel Source

Field	Type	Required	Description
`kind`	String	Yes	Must be `"excel"`
`local`	Path	Yes	Local file path for caching
`url`	URL	Yes	Download URL (HTTP/HTTPS)
`sheet`	String	No	Sheet name (default: `"Sheet1"`)
`row_slice`	List[Int\|"auto"]	No	Row range: `[start, end]` or `[start, "auto"]`
`rows`	List[Int]	No	Specific rows to include
`reindex`	List[Reindex]	No	Conditional row filtering

Example:

source:
  kind: excel
  local: ./data/mydata.xlsx
  url: https://example.com/data.xlsx
  sheet: "Sheet1"
  row_slice:
    - 2  # Start at row 2 (skip header)
    - auto  # Read to end

Text Source (CSV/TSV)

Field	Type	Required	Description
`kind`	String	Yes	Must be `"text"`
`local`	Path	Yes	Local file path for caching
`url`	URL	Yes	Download URL
`delimiter`	String	No	Column delimiter (default: `","`)
`row_slice`	List[Int\|"auto"]	No	Row range
`rows`	List[Int]	No	Specific rows
`reindex`	List[Reindex]	No	Conditional filtering

Example:

source:
  kind: text
  local: ./data/mydata.tsv
  url: https://example.com/data.tsv
  delimiter: "\t"
  row_slice:
    - 1
    - auto

Reindexing (Conditional Filtering)

Filter rows based on column values.

Field	Type	Description
`column`	String	Column name to evaluate
`comparison`	String	Operator: `"eq"`, `"ne"`, `"lt"`, `"le"`, `"gt"`, `"ge"`
`comparator`	String\|Int\|Float	Value to compare against

Example:

reindex:
  - column: p_value
    comparison: lt
    comparator: 0.05  # Keep rows where p_value < 0.05

Statement (Triple Definition)

Defines subject-predicate-object relationships.

Field	Type	Required	Description
`subject`	NodeEncoding	Yes	Subject entity configuration
`predicate`	String	Yes	Biolink predicate (e.g., `"associated_with"`)
`object`	NodeEncoding	Yes	Object entity configuration
`qualifiers`	List[Qualifier]	No	Edge qualifiers (context)

Example:

statement:
  subject:
    method: column
    encoding: gene_symbol
    prioritize: [Gene]
  predicate: treats
  object:
    method: column
    encoding: disease_name
    prioritize: [Disease]

NodeEncoding

Defines how to extract and resolve entities.

Field	Type	Required	Description
`method`	String	Yes	`"value"` (literal) or `"column"` (column reference)
`encoding`	String\|Int\|Float	Yes	Literal value or column name
`taxon`	Int	No	NCBI Taxon ID for filtering (e.g., `9606` for human)
`prioritize`	List[String]	No	Preferred Biolink categories
`avoid`	List[String]	No	Excluded Biolink categories
`regex`	List[Regex]	No	Pattern replacements
`fill`	String	No	Null-filling strategy: `"forward"`, `"backward"`, `"min"`, `"max"`, `"mean"`, `"zero"`, `"one"`
`remove`	List[String]	No	Strings to filter out
`prefix`	String	No	Add prefix to values
`suffix`	String	No	Add suffix to values
`explode_by`	String	No	Delimiter to split multi-value cells
`transformations`	List[Math]	No	Mathematical transformations

Method: Value vs Column

method: value - Use a literal value

subject:
  method: value
  encoding: CHEBI:41774  # All rows get this CURIE

method: column - Reference a column

Excel columns use letters converted to column_N: - Column A → column_1 or just "A" - Column B → column_2 or just "B"

subject:
  method: column
  encoding: A  # Read from column A

CSV/TSV columns use header names:

subject:
  method: column
  encoding: gene_symbol  # Read from "gene_symbol" column

Taxonomic Filtering

taxon: int - Filter entities by organism

subject:
  encoding: gene_column
  taxon: 9606  # Only human genes (Homo sapiens)

Common taxon IDs: - 9606 - Homo sapiens (human) - 10090 - Mus musculus (mouse) - 7227 - Drosophila melanogaster (fruit fly)

Category Prioritization

prioritize: list[category] - Prefer specific Biolink categories

subject:
  encoding: A
  prioritize:
    - Gene
    - Protein

If "TP53" maps to both Gene and Protein, prefer Gene.

avoid: list[category] - Exclude specific categories

subject:
  encoding: organism_name
  prioritize:
    - OrganismTaxon
  avoid:
    - Gene

Prevents misclassifying organism names as genes.

Text Transformations

regex: list[{pattern, replacement}] - Pattern-based replacements

subject:
  encoding: A
  regex:
    - pattern: ".*g__"
      replacement: ""  # Remove genus prefix
    - pattern: ";s__"
      replacement: " "  # Replace species separator

Executed in order.

remove: list[string] - Filter out specific strings

subject:
  encoding: A
  remove:
    - "^NA "  # Remove rows starting with "NA "

prefix / suffix - Add text

object:
  encoding: identifier
  prefix: "CUSTOM:"  # "123" → "CUSTOM:123"

Null Handling

fill: string - Fill null values using a strategy

Available strategies: - "forward" - Fill nulls with previous non-null value - "backward" - Fill nulls with next non-null value - "min" - Fill with column minimum - "max" - Fill with column maximum - "mean" - Fill with column mean - "zero" - Fill with 0 - "one" - Fill with 1

subject:
  encoding: gene_symbol
  fill: forward  # Propagate values down through null rows

annotations:
  - annotation: expression_level
    method: column
    encoding: expression
    fill: mean  # Replace nulls with column average

Multi-Value Handling

explode_by: string - Split delimited values into multiple rows

object:
  encoding: pathway_list
  explode_by: ";"  # "P1;P2;P3" → 3 separate edges

Mathematical Transformations

transformations: list[{function, arguments}]

Available functions: copysign, pow

Use the "values" token to reference column values in transformations.

Qualifiers

Add context to edges (anatomical location, species, etc.).

Field	Type	Description
`qualifier`	String	Biolink qualifier (e.g., `"species_context"`)
(inherits NodeEncoding)		All NodeEncoding fields available

Example:

qualifiers:
  - qualifier: species_context
    method: value
    encoding: NCBITaxon:9606

Provenance

Required metadata about data source.

Field	Type	Required	Description
`repo`	String	Yes	Repository: `"PMC"`, `"PMID"`
`publication`	String	Yes	Identifier (e.g., `"PMC11708054"`, `"PMID123"`)
`contributors`	List[Contributor]	Yes	Curation information

Contributor fields:

Field	Type	Required	Description
`kind`	String	Yes	`"curation"`, `"validation"`, `"tool"`
`name`	String	Yes	Contributor name
`date`	String	Yes	Date (free format)
`organizations`	List[String]	No	Affiliations
`comment`	String	No	Notes

Example:

provenance:
  repo: PMC
  publication: PMC11708054
  contributors:
    - kind: curation
      name: Skye Lane Goetz
      date: 09 JAN 2025
      organizations:
        - Institute for Systems Biology
        - CalPoly SLO
      comment: Migrated from TC2 to TC3

Annotations

Optional edge attributes (statistical metadata, notes, etc.).

Field	Type	Description
`annotation`	String	Attribute name (e.g., `"p value"`, `"sample size"`)
(inherits Encoding)		All Encoding fields available (method, encoding, regex, etc.)

Example:

annotations:
  - annotation: p value
    method: column
    encoding: C  # Read from column C

  - annotation: sample size
    method: value
    encoding: 450  # Literal value for all edges

  - annotation: multiple testing correction method
    method: value
    encoding: "Benjamini Hochberg"

Complete Example

Minimal table configuration:

template:
  syntax: TC3
  status: alpha

  source:
    kind: text
    local: ./data.csv
    url: https://example.com/data.csv
    row_slice: [1, auto]
    delimiter: ","

  statement:
    subject:
      method: column
      encoding: gene
      prioritize: [Gene]
    predicate: associated_with
    object:
      method: column
      encoding: disease
      prioritize: [Disease]

  provenance:
    repo: PMID
    publication: PMID12345678
    contributors:
      - kind: curation
        name: Example User
        date: 27 JAN 2026

  annotations:
    - annotation: p value
      method: column
      encoding: p_val

Next Steps

Advanced Example - Real-world configuration with complex transformations
Graph Configuration - How to orchestrate multiple tables
Tutorial - Step-by-step walkthrough