Utilities (utils)
The utils module provides utility functions for deterministic UUID generation and other helper functions.
namespace_uuid()
Generates deterministic UUIDs for KGX edge identifiers using UUID v3 (MD5-based namespacing).
Function Signature
def namespace_uuid(domain: Any, *values: list[Any]) -> str
Parameters
domain: Any
Domain string used to create the namespace UUID.
Converted to string internally. Common domains:
- "edges" - For knowledge graph edges
- "nodes" - For custom node IDs
- "tablassert" - For application-specific IDs
*values: list[Any]
Variable number of values to include in the UUID.
Values are:
1. Converted to strings
2. Filtered (None values removed)
3. Joined with tabs ("\t")
4. Hashed within the domain namespace
Return Value
Returns a string representation of UUID v3.
Format: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
How It Works
Step 1: Create domain namespace
domain_uuid = uuid3(UUID("00000000-0000-0000-0000-000000000000"), domain)
Each domain gets a unique namespace UUID.
Step 2: Join values
joined = "\t".join([str(v) for v in values if v])
Example: ("HGNC:1234", "biolink:treats", "MONDO:0005148") → "HGNC:1234\tbiolink:treats\tMONDO:0005148"
Step 3: Generate UUID within namespace
return str(uuid3(domain_uuid, joined))
Deterministic Behavior
Same inputs always produce same UUID:
uuid1 = namespace_uuid("edges", "HGNC:1", "treats", "MONDO:1")
uuid2 = namespace_uuid("edges", "HGNC:1", "treats", "MONDO:1")
assert uuid1 == uuid2 # Always True
Different inputs produce different UUIDs:
uuid1 = namespace_uuid("edges", "HGNC:1", "treats", "MONDO:1")
uuid2 = namespace_uuid("edges", "HGNC:1", "treats", "MONDO:2")
assert uuid1 != uuid2 # Always True
Different domains produce different UUIDs:
uuid1 = namespace_uuid("edges", "A", "B")
uuid2 = namespace_uuid("nodes", "A", "B")
assert uuid1 != uuid2 # Different namespaces
Use Case: KGX Edge IDs
Tablassert uses this function to generate edge identifiers:
edge_id = namespace_uuid(
"edges",
subject_curie,
predicate,
object_curie,
*qualifiers,
publication_id
)
Example:
edge_id = namespace_uuid(
"edges",
"HGNC:11998", # TP53
"biolink:associated_with",
"MONDO:0005148", # Type 2 diabetes
"PMC11708054" # Publication
)
# Returns: "uuid:a1b2c3d4-e5f6-7890-abcd-ef1234567890"
Benefits: - Reproducible: Same edge always gets same ID across runs - Collision-resistant: MD5 hash makes collisions extremely unlikely - Traceable: ID incorporates edge components (subject, predicate, object, provenance)
KGX Compliance
NCATS Translator KGX specification requires: - Edge IDs must be globally unique - Edge IDs should be deterministic when possible
namespace_uuid() satisfies both requirements:
- Unique: UUID v3 with domain namespacing
- Deterministic: Same inputs → same output
Example Usage
Simple case:
from tablassert.utils import namespace_uuid
# Generate edge ID
edge_id = namespace_uuid("edges", "subject", "predicate", "object")
print(edge_id) # "uuid:12345678-1234-1234-1234-123456789abc"
With qualifiers:
# Include anatomical context qualifier
edge_id = namespace_uuid(
"edges",
"HGNC:1",
"biolink:expressed_in",
"UBERON:0002048", # Lung
"anatomical_context",
"UBERON:0002048"
)
Filtering None values:
# Automatically removes None
edge_id = namespace_uuid("edges", "A", None, "B", None)
# Equivalent to: namespace_uuid("edges", "A", "B")
Performance
- O(1) time complexity
- No external dependencies (uses Python stdlib
uuid) - Minimal memory (temporary string allocation only)
Suitable for millions of ID generations per second.
Related Functions
basespace(domain) - Creates namespace UUID from domain
def basespace(domain: str) -> UUID:
namespace = UUID("00000000-0000-0000-0000-000000000000")
return uuid3(namespace, domain)
Used internally by namespace_uuid().
Next Steps
- Entity Resolution - Core entity mapping
- Tutorial - See UUIDs in action