Quick Start
Get up and running with Cobjectric in 5 minutes and start computing metrics on your data.
Installation
Define Your Model
First, create a model by subclassing BaseModel with typed fields:
Compute Fill Rate
Fill Rate measures how "complete" your data is by checking which fields are filled vs missing.
person = Person.from_dict({
"name": "John Doe",
"age": 30,
# email is missing
})
result = person.compute_fill_rate()
print(result.fields.name.value) # 1.0 (present)
print(result.fields.age.value) # 1.0 (present)
print(result.fields.email.value) # 0.0 (missing)
print(result.mean()) # 0.667 (2 out of 3 fields)
Compare Data Completeness with Fill Rate Accuracy
Fill Rate Accuracy compares two objects to check if their field states match (both filled or both missing). It's useful for validating data pipeline outputs.
# Data generated by your pipeline
got = Person.from_dict({"name": "John", "age": 30}) # email missing
# Expected data
expected = Person.from_dict({
"name": "Jane", # Different value (doesn't matter!)
"age": 25, # Different value (doesn't matter!)
"email": "[email protected]"
})
accuracy = got.compute_fill_rate_accuracy(expected)
print(accuracy.fields.name.value) # 1.0 (both filled)
print(accuracy.fields.age.value) # 1.0 (both filled)
print(accuracy.fields.email.value) # 0.0 (got missing, expected filled)
print(accuracy.mean()) # 0.667
Key point: Fill Rate Accuracy only checks state (filled/missing), not actual values. The fact that name and age have different values doesn't affect the result.
Validate Data Values with Similarity
If you want to also validate actual values, use Similarity. This compares field values with support for fuzzy matching.
got = Person.from_dict({
"name": "john doe",
"age": 30,
"email": "[email protected]"
})
expected = Person.from_dict({
"name": "john doe",
"age": 30,
"email": "[email protected]"
})
similarity = got.compute_similarity(expected)
print(similarity.fields.name.value) # 1.0 (exact match)
print(similarity.fields.age.value) # 1.0 (exact match)
print(similarity.fields.email.value) # 1.0 (exact match)
print(similarity.mean()) # 1.0
Optional Fields
You can define optional fields using | None:
class Person(BaseModel):
name: str
email: str | None
person = Person(name="John", email=None)
print(person.fields.email.value) # None
Handle Missing Values
If a required field is not provided or has an invalid type, it will have MissingValue:
from cobjectric import MissingValue
person = Person(name="Jane", age="invalid") # age is not an int
print(person.fields.age.value is MissingValue) # True
Export to Pandas
You can export results to pandas Series and DataFrames for statistical analysis and visualization. Note: This requires installing the pandas extra:
person1 = Person(name="John", age=30, email="[email protected]")
person2 = Person(name="Jane", age=25) # email missing
person3 = Person(name="Bob", age=40, email="[email protected]")
# Combine results
result1 = person1.compute_fill_rate()
result2 = person2.compute_fill_rate()
result3 = person3.compute_fill_rate()
collection = result1 + result2 + result3
# Export to pandas DataFrame
df = collection.to_dataframe()
print(df)
# name age email
# 0 1.0 1.0 1.0
# 1 1.0 1.0 0.0
# 2 1.0 1.0 1.0
# Calculate statistics
print(collection.mean()) # {'name': 1.0, 'age': 1.0, 'email': 0.666...}
See the Pandas Export Guide for more details.
Next Steps
- Learn more about Similarity for fuzzy matching and value comparison
- Explore Field Specifications to customize metric functions
- Check out Pre-defined Specs for optimized configurations
- See Examples for real-world scenarios
- Export results to pandas with Pandas Export for analysis and visualization
API Reference
See the API Reference for complete documentation of all classes and functions.