Skip to content

Validating File Data

pydantic is a great tool for validating data coming from various sources. In this section, we will look at how to validate data from different types of files.

!!! Note: If you're using any of the below file formats to parse configuration / settings, you might want to consider using the pydantic-settings library, which offers builtin support for parsing this type of data.

JSON data

.json files are a common way to store key / value data in a human-readable format. Here is an example of a .json file:

{
    "name": "John Doe",
    "age": 30,
    "email": "[email protected]"
}

To validate this data, we can use a pydantic model:

import pathlib

from pydantic import BaseModel, EmailStr, PositiveInt


class Person(BaseModel):
    name: str
    age: PositiveInt
    email: EmailStr


json_string = pathlib.Path('person.json').read_text()
person = Person.model_validate_json(json_string)
print(repr(person))
#> Person(name='John Doe', age=30, email='[email protected]')

If the data in the file is not valid, pydantic will raise a ValidationError. Let's say we have the following .json file:

{
    "age": -30,
    "email": "not-an-email-address"
}

This data is flawed for three reasons: 1. It's missing the name field. 2. The age field is negative. 3. The email field is not a valid email address.

When we try to validate this data, pydantic raises a ValidationError with all of the above issues:

import pathlib

from pydantic import BaseModel, EmailStr, PositiveInt, ValidationError


class Person(BaseModel):
    name: str
    age: PositiveInt
    email: EmailStr


json_string = pathlib.Path('person.json').read_text()
try:
    person = Person.model_validate_json(json_string)
except ValidationError as err:
    print(err)
    """
    3 validation errors for Person
    name
    Field required [type=missing, input_value={'age': -30, 'email': 'not-an-email-address'}, input_type=dict]
        For further information visit https://errors.pydantic.dev/2.10/v/missing
    age
    Input should be greater than 0 [type=greater_than, input_value=-30, input_type=int]
        For further information visit https://errors.pydantic.dev/2.10/v/greater_than
    email
    value is not a valid email address: An email address must have an @-sign. [type=value_error, input_value='not-an-email-address', input_type=str]
    """

Often, it's the case that you have an abundance of a certain type of data within a .json file. For example, you might have a list of people:

[
    {
        "name": "John Doe",
        "age": 30,
        "email": "[email protected]"
    },
    {
        "name": "Jane Doe",
        "age": 25,
        "email": "[email protected]"
    }
]

In this case, you can validate the data against a List[Person] model:

import pathlib
from typing import List

from pydantic import BaseModel, EmailStr, PositiveInt, TypeAdapter


class Person(BaseModel):
    name: str
    age: PositiveInt
    email: EmailStr


person_list_adapter = TypeAdapter(List[Person])  # (1)!

json_string = pathlib.Path('people.json').read_text()
people = person_list_adapter.validate_json(json_string)
print(people)
#> [Person(name='John Doe', age=30, email='[email protected]'), Person(name='Jane Doe', age=25, email='[email protected]')]
  1. We use TypeAdapter to validate a list of Person objects. TypeAdapter is a Pydantic construct used to validate data against a single type.
import pathlib

from pydantic import BaseModel, EmailStr, PositiveInt, TypeAdapter


class Person(BaseModel):
    name: str
    age: PositiveInt
    email: EmailStr


person_list_adapter = TypeAdapter(list[Person])  # (1)!

json_string = pathlib.Path('people.json').read_text()
people = person_list_adapter.validate_json(json_string)
print(people)
#> [Person(name='John Doe', age=30, email='[email protected]'), Person(name='Jane Doe', age=25, email='[email protected]')]
  1. We use TypeAdapter to validate a list of Person objects. TypeAdapter is a Pydantic construct used to validate data against a single type.

JSON lines files

Similar to validating a list of objects from a .json file, you can validate a list of objects from a .jsonl file. .jsonl files are a sequence of JSON objects separated by newlines.

Consider the following .jsonl file:

{"name": "John Doe", "age": 30, "email": "[email protected]"}
{"name": "Jane Doe", "age": 25, "email": "[email protected]"}

We can validate this data with a similar approach to the one we used for .json files:

import pathlib

from pydantic import BaseModel, EmailStr, PositiveInt


class Person(BaseModel):
    name: str
    age: PositiveInt
    email: EmailStr


json_lines = pathlib.Path('people.jsonl').read_text().splitlines()
people = [Person.model_validate_json(line) for line in json_lines]
print(people)
#> [Person(name='John Doe', age=30, email='[email protected]'), Person(name='Jane Doe', age=25, email='[email protected]')]

CSV files

CSV is one of the most common file formats for storing tabular data. To validate data from a CSV file, you can use the csv module from the Python standard library to load the data and validate it against a Pydantic model.

Consider the following CSV file:

name,age,email
John Doe,30,[email protected]
Jane Doe,25,[email protected]

Here's how we validate that data:

import csv

from pydantic import BaseModel, EmailStr, PositiveInt


class Person(BaseModel):
    name: str
    age: PositiveInt
    email: EmailStr


with open('people.csv') as f:
    reader = csv.DictReader(f)
    people = [Person.model_validate(row) for row in reader]

print(people)
#> [Person(name='John Doe', age=30, email='[email protected]'), Person(name='Jane Doe', age=25, email='[email protected]')]

TOML files

TOML files are often used for configuration due to their simplicity and readability.

Consider the following TOML file:

name = "John Doe"
age = 30
email = "[email protected]"

Here's how we validate that data:

import tomllib

from pydantic import BaseModel, EmailStr, PositiveInt


class Person(BaseModel):
    name: str
    age: PositiveInt
    email: EmailStr


with open('person.toml', 'rb') as f:
    data = tomllib.load(f)

person = Person.model_validate(data)
print(repr(person))
#> Person(name='John Doe', age=30, email='[email protected]')