Tablassert

Tablassert is a highly performant declarative knowledge graph backend that extracts knowledge assertions from tabular data and exports NCATS Translator-compliant KGX (Knowledge Graph Exchange) NDJSON.

What is Tablassert?

Tablassert transforms biomedical tabular data (Excel, CSV, TSV) into knowledge graphs through:

  • Declarative YAML configuration - Define data transformations without code
  • Entity resolution - Map text to biological entities (genes, diseases, chemicals) using comprehensive databases
  • Multi-stage quality control - Exact matching, fuzzy matching, and BioBERT semantic validation
  • KGX compliance - Outputs NCATS Translator-compatible NDJSON for node and edge files

Key Features

  • Named Entity Recognition: Case-dependent, provenance-rich NER with taxonomic filtering
  • Quality Control: Three-stage validation (exact → fuzzy → BERT embeddings)
  • Biolink Compliance: Uses Biolink categories and predicates throughout
  • Performance: Parallel processing with disk caching for expensive operations
  • Reproducible: UV-based development environment with deterministic builds

Quick Start

# Install from PyPI (UV)
uv tool install tablassert
tablassert --help

# Install from PyPI (pip)
pip install tablassert
tablassert --help

# Install runtime-compatible Polars build
# (for CPUs without the required Polars instructions)
uv tool install "tablassert[rtcompat]"
# or
pip install "tablassert[rtcompat]"
tablassert --help

# Or install latest from GitHub main
uv tool install git+https://github.com/SkyeAv/Tablassert.git@main
tablassert --help

tablassert[rtcompat] is defined in pyproject.toml and installs a runtime-compatible Polars dependency for systems without the default Polars CPU instruction support.

For development from source:

git clone https://github.com/SkyeAv/Tablassert.git
cd Tablassert
uv sync

# Run with your configuration
uv run tablassert build-knowledge-graph <config>

Documentation Sections

Authors

  • Skye Lane Goetz - Institute for Systems Biology, CalPoly SLO
  • Gwênlyn Glusman - Institute for Systems Biology
  • Jared C. Roach - Institute for Systems Biology

License

See repository for license information.