Tablassert

Tablassert is a highly performant declarative knowledge graph backend that extracts knowledge assertions from tabular data and exports NCATS Translator-compliant KGX (Knowledge Graph Exchange) NDJSON.

What is Tablassert?

Tablassert transforms biomedical tabular data (Excel, CSV, TSV) into knowledge graphs through:

Declarative YAML configuration - Define data transformations without code
Entity resolution - Map text to biological entities (genes, diseases, chemicals) using comprehensive databases
Multi-stage quality control - Exact matching, fuzzy matching, and BioBERT semantic validation
KGX compliance - Outputs NCATS Translator-compatible NDJSON for node and edge files

Key Features

Named Entity Recognition: Case-dependent, provenance-rich NER with taxonomic filtering
Quality Control: Three-stage validation (exact → fuzzy → BERT embeddings)
Biolink Compliance: Uses Biolink categories and predicates throughout
Performance: Parallel processing with disk caching for expensive operations
Reproducible: UV-based development environment with deterministic builds

Quick Start

# Install from PyPI (UV)
uv tool install tablassert
tablassert --help

# Install from PyPI (pip)
pip install tablassert
tablassert --help

# Install runtime-compatible Polars build
# (for CPUs without the required Polars instructions)
uv tool install "tablassert[rtcompat]"
# or
pip install "tablassert[rtcompat]"
tablassert --help

# Or install latest from GitHub main
uv tool install git+https://github.com/SkyeAv/Tablassert.git@main
tablassert --help

tablassert[rtcompat] is defined in pyproject.toml and installs a runtime-compatible Polars dependency for systems without the default Polars CPU instruction support.

For development from source:

git clone https://github.com/SkyeAv/Tablassert.git
cd Tablassert
uv sync

# Run with your configuration
uv run tablassert build-knowledge-graph <config>

Documentation Sections

Installation - Installation methods (PyPI, GitHub main, source development)
CLI Reference - Command-line interface usage
Tutorial - Step-by-step example with synthetic data
Configuration - Graph and table configuration reference
API Reference - Core functions documentation

Authors

Skye Lane Goetz - Institute for Systems Biology, CalPoly SLO
Gwênlyn Glusman - Institute for Systems Biology
Jared C. Roach - Institute for Systems Biology

License

See repository for license information.