Validating File Data
pydantic is a great tool for validating data coming from various sources.
In this section, we will look at how to validate data from different types of files.
Note
If you're using any of the below file formats to parse configuration / settings, you might want to
consider using the pydantic-settings library, which offers builtin
support for parsing this type of data.
JSON data¶
.json files are a common way to store key / value data in a human-readable format.
Here is an example of a .json file:
{
"name": "John Doe",
"age": 30,
"email": "[email protected]"
}
To validate this data, we can use a pydantic model:
import pathlib
from pydantic import BaseModel, EmailStr, PositiveInt
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
json_string = pathlib.Path('person.json').read_text()
person = Person.model_validate_json(json_string)
print(repr(person))
#> Person(name='John Doe', age=30, email='[email protected]')
If the data in the file is not valid, pydantic will raise a ValidationError.
Let's say we have the following .json file:
{
"age": -30,
"email": "not-an-email-address"
}
This data is flawed for three reasons:
1. It's missing the name field.
2. The age field is negative.
3. The email field is not a valid email address.
When we try to validate this data, pydantic raises a ValidationError with all of the
above issues:
import pathlib
from pydantic import BaseModel, EmailStr, PositiveInt, ValidationError
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
json_string = pathlib.Path('person.json').read_text()
try:
person = Person.model_validate_json(json_string)
except ValidationError as err:
print(err)
"""
3 validation errors for Person
name
Field required [type=missing, input_value={'age': -30, 'email': 'not-an-email-address'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.10/v/missing
age
Input should be greater than 0 [type=greater_than, input_value=-30, input_type=int]
For further information visit https://errors.pydantic.dev/2.10/v/greater_than
email
value is not a valid email address: An email address must have an @-sign. [type=value_error, input_value='not-an-email-address', input_type=str]
"""
Often, it's the case that you have an abundance of a certain type of data within a .json file.
For example, you might have a list of people:
[
{
"name": "John Doe",
"age": 30,
"email": "[email protected]"
},
{
"name": "Jane Doe",
"age": 25,
"email": "[email protected]"
}
]
In this case, you can validate the data against a list[Person] model:
import pathlib
from pydantic import BaseModel, EmailStr, PositiveInt, TypeAdapter
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
person_list_adapter = TypeAdapter(list[Person]) # (1)!
json_string = pathlib.Path('people.json').read_text()
people = person_list_adapter.validate_json(json_string)
print(people)
#> [Person(name='John Doe', age=30, email='[email protected]'), Person(name='Jane Doe', age=25, email='[email protected]')]
- We use
TypeAdapterto validate a list ofPersonobjects.TypeAdapteris a Pydantic construct used to validate data against a single type.
JSON lines files¶
Similar to validating a list of objects from a .json file, you can validate a list of objects from a .jsonl file.
.jsonl files are a sequence of JSON objects separated by newlines.
Consider the following .jsonl file:
{"name": "John Doe", "age": 30, "email": "[email protected]"}
{"name": "Jane Doe", "age": 25, "email": "[email protected]"}
We can validate this data with a similar approach to the one we used for .json files:
import pathlib
from pydantic import BaseModel, EmailStr, PositiveInt
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
json_lines = pathlib.Path('people.jsonl').read_text().splitlines()
people = [Person.model_validate_json(line) for line in json_lines]
print(people)
#> [Person(name='John Doe', age=30, email='[email protected]'), Person(name='Jane Doe', age=25, email='[email protected]')]
CSV files¶
CSV is one of the most common file formats for storing tabular data.
To validate data from a CSV file, you can use the csv module from the Python standard library to load
the data and validate it against a Pydantic model.
Consider the following CSV file:
name,age,email
John Doe,30,[email protected]
Jane Doe,25,[email protected]
Here's how we validate that data:
import csv
from pydantic import BaseModel, EmailStr, PositiveInt
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
with open('people.csv') as f:
reader = csv.DictReader(f)
people = [Person.model_validate(row) for row in reader]
print(people)
#> [Person(name='John Doe', age=30, email='[email protected]'), Person(name='Jane Doe', age=25, email='[email protected]')]
TOML files¶
TOML files are often used for configuration due to their simplicity and readability.
Consider the following TOML file:
name = "John Doe"
age = 30
email = "[email protected]"
Here's how we validate that data:
import tomllib
from pydantic import BaseModel, EmailStr, PositiveInt
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
with open('person.toml', 'rb') as f:
data = tomllib.load(f)
person = Person.model_validate(data)
print(repr(person))
#> Person(name='John Doe', age=30, email='[email protected]')