Tablassert
Tablassert is a highly performant declarative knowledge graph backend that extracts knowledge assertions from tabular data and exports NCATS Translator-compliant KGX (Knowledge Graph Exchange) NDJSON.
What is Tablassert?
Tablassert transforms biomedical tabular data (Excel, CSV, TSV) into knowledge graphs through:
- Declarative YAML configuration - Define data transformations without code
- Entity resolution - Map text to biological entities (genes, diseases, chemicals) using comprehensive databases
- Multi-stage quality control - Exact matching, fuzzy matching, and BioBERT semantic validation
- KGX compliance - Outputs NCATS Translator-compatible NDJSON for node and edge files
Key Features
- Named Entity Recognition: Case-dependent, provenance-rich NER with taxonomic filtering
- Quality Control: Three-stage validation (exact → fuzzy → BERT embeddings)
- Biolink Compliance: Uses Biolink categories and predicates throughout
- Performance: Parallel processing with disk caching for expensive operations
- Reproducible: UV-based development environment with deterministic builds
Quick Start
# Install from PyPI (UV)
uv tool install tablassert
tablassert --help
# Install from PyPI (pip)
pip install tablassert
tablassert --help
# Install runtime-compatible Polars build
# (for CPUs without the required Polars instructions)
uv tool install "tablassert[rtcompat]"
# or
pip install "tablassert[rtcompat]"
tablassert --help
# Or install latest from GitHub main
uv tool install git+https://github.com/SkyeAv/Tablassert.git@main
tablassert --help
tablassert[rtcompat] is defined in pyproject.toml and installs a runtime-compatible
Polars dependency for systems without the default Polars CPU instruction support.
For development from source:
git clone https://github.com/SkyeAv/Tablassert.git
cd Tablassert
uv sync
# Run with your configuration
uv run tablassert build-knowledge-graph <config>
Documentation Sections
- Installation - Installation methods (PyPI, GitHub main, source development)
- CLI Reference - Command-line interface usage
- Tutorial - Step-by-step example with synthetic data
- Configuration - Graph and table configuration reference
- API Reference - Core functions documentation
Authors
- Skye Lane Goetz - Institute for Systems Biology, CalPoly SLO
- Gwênlyn Glusman - Institute for Systems Biology
- Jared C. Roach - Institute for Systems Biology
License
See repository for license information.