Skip to content

Quick Start

Get up and running with Cobjectric in 5 minutes and start computing metrics on your data.

Installation

pip install cobjectric

Define Your Model

First, create a model by subclassing BaseModel with typed fields:

from cobjectric import BaseModel

class Person(BaseModel):
    name: str
    age: int
    email: str

Compute Fill Rate

Fill Rate measures how "complete" your data is by checking which fields are filled vs missing.

person = Person.from_dict({
    "name": "John Doe",
    "age": 30,
    # email is missing
})

result = person.compute_fill_rate()
print(result.fields.name.value)   # 1.0 (present)
print(result.fields.age.value)    # 1.0 (present)
print(result.fields.email.value)  # 0.0 (missing)
print(result.mean())              # 0.667 (2 out of 3 fields)

Compare Data Completeness with Fill Rate Accuracy

Fill Rate Accuracy compares two objects to check if their field states match (both filled or both missing). It's useful for validating data pipeline outputs.

# Data generated by your pipeline
got = Person.from_dict({"name": "John", "age": 30})  # email missing

# Expected data
expected = Person.from_dict({
    "name": "Jane",  # Different value (doesn't matter!)
    "age": 25,       # Different value (doesn't matter!)
    "email": "[email protected]"
})

accuracy = got.compute_fill_rate_accuracy(expected)
print(accuracy.fields.name.value)   # 1.0 (both filled)
print(accuracy.fields.age.value)    # 1.0 (both filled)
print(accuracy.fields.email.value)  # 0.0 (got missing, expected filled)
print(accuracy.mean())              # 0.667

Key point: Fill Rate Accuracy only checks state (filled/missing), not actual values. The fact that name and age have different values doesn't affect the result.

Validate Data Values with Similarity

If you want to also validate actual values, use Similarity. This compares field values with support for fuzzy matching.

got = Person.from_dict({
    "name": "john doe",
    "age": 30,
    "email": "[email protected]"
})

expected = Person.from_dict({
    "name": "john doe",
    "age": 30,
    "email": "[email protected]"
})

similarity = got.compute_similarity(expected)
print(similarity.fields.name.value)   # 1.0 (exact match)
print(similarity.fields.age.value)    # 1.0 (exact match)
print(similarity.fields.email.value)  # 1.0 (exact match)
print(similarity.mean())              # 1.0

Optional Fields

You can define optional fields using | None:

class Person(BaseModel):
    name: str
    email: str | None

person = Person(name="John", email=None)
print(person.fields.email.value)  # None

Handle Missing Values

If a required field is not provided or has an invalid type, it will have MissingValue:

from cobjectric import MissingValue

person = Person(name="Jane", age="invalid")  # age is not an int
print(person.fields.age.value is MissingValue)  # True

Export to Pandas

You can export results to pandas Series and DataFrames for statistical analysis and visualization. Note: This requires installing the pandas extra:

pip install cobjectric[pandas]
person1 = Person(name="John", age=30, email="[email protected]")
person2 = Person(name="Jane", age=25)  # email missing
person3 = Person(name="Bob", age=40, email="[email protected]")

# Combine results
result1 = person1.compute_fill_rate()
result2 = person2.compute_fill_rate()
result3 = person3.compute_fill_rate()

collection = result1 + result2 + result3

# Export to pandas DataFrame
df = collection.to_dataframe()
print(df)
#    name  age  email
# 0   1.0  1.0    1.0
# 1   1.0  1.0    0.0
# 2   1.0  1.0    1.0

# Calculate statistics
print(collection.mean())  # {'name': 1.0, 'age': 1.0, 'email': 0.666...}

See the Pandas Export Guide for more details.

Next Steps

API Reference

See the API Reference for complete documentation of all classes and functions.