Validating File Data
pydantic
is a great tool for validating data coming from various sources.
In this section, we will look at how to validate data from different types of files.
!!! Note:
If you're using any of the below file formats to parse configuration / settings, you might want to
consider using the pydantic-settings
library, which offers builtin
support for parsing this type of data.
JSON data¶
.json
files are a common way to store key / value data in a human-readable format.
Here is an example of a .json
file:
{
"name": "John Doe",
"age": 30,
"email": "[email protected]"
}
To validate this data, we can use a pydantic
model:
import pathlib
from pydantic import BaseModel, EmailStr, PositiveInt
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
json_string = pathlib.Path('person.json').read_text()
person = Person.model_validate_json(json_string)
print(repr(person))
#> Person(name='John Doe', age=30, email='[email protected]')
If the data in the file is not valid, pydantic
will raise a ValidationError
.
Let's say we have the following .json
file:
{
"age": -30,
"email": "not-an-email-address"
}
This data is flawed for three reasons:
1. It's missing the name
field.
2. The age
field is negative.
3. The email
field is not a valid email address.
When we try to validate this data, pydantic
raises a ValidationError
with all of the
above issues:
import pathlib
from pydantic import BaseModel, EmailStr, PositiveInt, ValidationError
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
json_string = pathlib.Path('person.json').read_text()
try:
person = Person.model_validate_json(json_string)
except ValidationError as err:
print(err)
"""
3 validation errors for Person
name
Field required [type=missing, input_value={'age': -30, 'email': 'not-an-email-address'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.10/v/missing
age
Input should be greater than 0 [type=greater_than, input_value=-30, input_type=int]
For further information visit https://errors.pydantic.dev/2.10/v/greater_than
email
value is not a valid email address: An email address must have an @-sign. [type=value_error, input_value='not-an-email-address', input_type=str]
"""
Often, it's the case that you have an abundance of a certain type of data within a .json
file.
For example, you might have a list of people:
[
{
"name": "John Doe",
"age": 30,
"email": "[email protected]"
},
{
"name": "Jane Doe",
"age": 25,
"email": "[email protected]"
}
]
In this case, you can validate the data against a List[Person]
model:
import pathlib
from typing import List
from pydantic import BaseModel, EmailStr, PositiveInt, TypeAdapter
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
person_list_adapter = TypeAdapter(List[Person]) # (1)!
json_string = pathlib.Path('people.json').read_text()
people = person_list_adapter.validate_json(json_string)
print(people)
#> [Person(name='John Doe', age=30, email='[email protected]'), Person(name='Jane Doe', age=25, email='[email protected]')]
- We use
TypeAdapter
to validate a list ofPerson
objects.
import pathlib
from pydantic import BaseModel, EmailStr, PositiveInt, TypeAdapter
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
person_list_adapter = TypeAdapter(list[Person]) # (1)!
json_string = pathlib.Path('people.json').read_text()
people = person_list_adapter.validate_json(json_string)
print(people)
#> [Person(name='John Doe', age=30, email='[email protected]'), Person(name='Jane Doe', age=25, email='[email protected]')]
- We use
TypeAdapter
to validate a list ofPerson
objects.
TypeAdapter
is a Pydantic construct used to validate data against a single type.
JSON lines files¶
Similar to validating a list of objects from a .json
file, you can validate a list of objects from a .jsonl
file.
.jsonl
files are a sequence of JSON objects separated by newlines.
Consider the following .jsonl
file:
{"name": "John Doe", "age": 30, "email": "[email protected]"}
{"name": "Jane Doe", "age": 25, "email": "[email protected]"}
We can validate this data with a similar approach to the one we used for .json
files:
import pathlib
from pydantic import BaseModel, EmailStr, PositiveInt
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
json_lines = pathlib.Path('people.jsonl').read_text().splitlines()
people = [Person.model_validate_json(line) for line in json_lines]
print(people)
#> [Person(name='John Doe', age=30, email='[email protected]'), Person(name='Jane Doe', age=25, email='[email protected]')]
CSV files¶
CSV is one of the most common file formats for storing tabular data.
To validate data from a CSV file, you can use the csv
module from the Python standard library to load
the data and validate it against a Pydantic model.
Consider the following CSV file:
name,age,email
John Doe,30,[email protected]
Jane Doe,25,[email protected]
Here's how we validate that data:
import csv
from pydantic import BaseModel, EmailStr, PositiveInt
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
with open('people.csv') as f:
reader = csv.DictReader(f)
people = [Person.model_validate(row) for row in reader]
print(people)
#> [Person(name='John Doe', age=30, email='[email protected]'), Person(name='Jane Doe', age=25, email='[email protected]')]
TOML files¶
TOML files are often used for configuration due to their simplicity and readability.
Consider the following TOML file:
name = "John Doe"
age = 30
email = "[email protected]"
Here's how we validate that data:
import tomllib
from pydantic import BaseModel, EmailStr, PositiveInt
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
with open('person.toml', 'rb') as f:
data = tomllib.load(f)
person = Person.model_validate(data)
print(repr(person))
#> Person(name='John Doe', age=30, email='[email protected]')