Validating File Data
pydantic
is a great tool for validating data coming from various sources.
In this section, we will look at how to validate data from different types of files.
Note
If you're using any of the below file formats to parse configuration / settings, you might want to
consider using the pydantic-settings
library, which offers builtin
support for parsing this type of data.
JSON data¶
.json
files are a common way to store key / value data in a human-readable format.
Here is an example of a .json
file:
{
"name": "John Doe",
"age": 30,
"email": "[email protected]"
}
To validate this data, we can use a pydantic
model:
import pathlib
from pydantic import BaseModel, EmailStr, PositiveInt
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
json_string = pathlib.Path('person.json').read_text()
person = Person.model_validate_json(json_string)
print(person)
#> name='John Doe' age=30 email='[email protected]'
If the data in the file is not valid, pydantic
will raise a ValidationError
.
Let's say we have the following .json
file:
{
"age": -30,
"email": "not-an-email-address"
}
This data is flawed for three reasons:
- It's missing the
name
field. - The
age
field is negative. - The
email
field is not a valid email address.
When we try to validate this data, pydantic
raises a ValidationError
with all of the
above issues:
import pathlib
from pydantic import BaseModel, EmailStr, PositiveInt, ValidationError
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
json_string = pathlib.Path('person.json').read_text()
try:
person = Person.model_validate_json(json_string)
except ValidationError as err:
print(err)
"""
3 validation errors for Person
name
Field required [type=missing, input_value={'age': -30, 'email': 'not-an-email-address'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.10/v/missing
age
Input should be greater than 0 [type=greater_than, input_value=-30, input_type=int]
For further information visit https://errors.pydantic.dev/2.10/v/greater_than
email
value is not a valid email address: An email address must have an @-sign. [type=value_error, input_value='not-an-email-address', input_type=str]
"""
Often, it's the case that you have an abundance of a certain type of data within a .json
file.
For example, you might have a list of people:
[
{
"name": "John Doe",
"age": 30,
"email": "[email protected]"
},
{
"name": "Jane Doe",
"age": 25,
"email": "[email protected]"
}
]
In this case, you can validate the data against a list[Person]
model:
import pathlib
from pydantic import BaseModel, EmailStr, PositiveInt, TypeAdapter
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
person_list_adapter = TypeAdapter(list[Person]) # (1)!
json_string = pathlib.Path('people.json').read_text()
people = person_list_adapter.validate_json(json_string)
print(people)
#> [Person(name='John Doe', age=30, email='[email protected]'), Person(name='Jane Doe', age=25, email='[email protected]')]
- We use
TypeAdapter
to validate a list ofPerson
objects.TypeAdapter
is a Pydantic construct used to validate data against a single type.
JSON lines files¶
Similar to validating a list of objects from a .json
file, you can validate a list of objects from a .jsonl
file.
.jsonl
files are a sequence of JSON objects separated by newlines.
Consider the following .jsonl
file:
{"name": "John Doe", "age": 30, "email": "[email protected]"}
{"name": "Jane Doe", "age": 25, "email": "[email protected]"}
We can validate this data with a similar approach to the one we used for .json
files:
import pathlib
from pydantic import BaseModel, EmailStr, PositiveInt
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
json_lines = pathlib.Path('people.jsonl').read_text().splitlines()
people = [Person.model_validate_json(line) for line in json_lines]
print(people)
#> [Person(name='John Doe', age=30, email='[email protected]'), Person(name='Jane Doe', age=25, email='[email protected]')]
CSV files¶
CSV is one of the most common file formats for storing tabular data.
To validate data from a CSV file, you can use the csv
module from the Python standard library to load
the data and validate it against a Pydantic model.
Consider the following CSV file:
name,age,email
John Doe,30,[email protected]
Jane Doe,25,[email protected]
Here's how we validate that data:
import csv
from pydantic import BaseModel, EmailStr, PositiveInt
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
with open('people.csv') as f:
reader = csv.DictReader(f)
people = [Person.model_validate(row) for row in reader]
print(people)
#> [Person(name='John Doe', age=30, email='[email protected]'), Person(name='Jane Doe', age=25, email='[email protected]')]
TOML files¶
TOML files are often used for configuration due to their simplicity and readability.
Consider the following TOML file:
name = "John Doe"
age = 30
email = "[email protected]"
Here's how we validate that data:
import tomllib
from pydantic import BaseModel, EmailStr, PositiveInt
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
with open('person.toml', 'rb') as f:
data = tomllib.load(f)
person = Person.model_validate(data)
print(person)
#> name='John Doe' age=30 email='[email protected]'
YAML files¶
YAML (YAML Ain't Markup Language) is a human-readable data serialization format that is often used for configuration files.
Consider the following YAML file:
name: John Doe
age: 30
email: [email protected]
Here's how we validate that data:
import yaml
from pydantic import BaseModel, EmailStr, PositiveInt
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
with open('person.yaml') as f:
data = yaml.safe_load(f)
person = Person.model_validate(data)
print(person)
#> name='John Doe' age=30 email='[email protected]'
XML files¶
XML (eXtensible Markup Language) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.
Consider the following XML file:
<?xml version="1.0"?>
<person>
<name>John Doe</name>
<age>30</age>
<email>[email protected]</email>
</person>
Here's how we validate that data:
import xml.etree.ElementTree as ET
from pydantic import BaseModel, EmailStr, PositiveInt
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
tree = ET.parse('person.xml').getroot()
data = {child.tag: child.text for child in tree}
person = Person.model_validate(data)
print(person)
#> name='John Doe' age=30 email='[email protected]'
INI files¶
INI files are a simple configuration file format that uses sections and key-value pairs. They are commonly used in Windows applications and older software.
Consider the following INI file:
[PERSON]
name = John Doe
age = 30
email = [email protected]
Here's how we validate that data:
import configparser
from pydantic import BaseModel, EmailStr, PositiveInt
class Person(BaseModel):
name: str
age: PositiveInt
email: EmailStr
config = configparser.ConfigParser()
config.read('person.ini')
person = Person.model_validate(config['PERSON'])
print(person)
#> name='John Doe' age=30 email='[email protected]'