Why use Pydantic?¶
Today, Pydantic is downloaded many times a month and used by some of the largest and most recognisable organisations in the world.
It's hard to know why so many people have adopted Pydantic since its inception six years ago, but here are a few guesses.
Type hints powering schema validation¶
The schema that Pydantic validates against is generally defined by Python type hints.
Type hints are great for this since, if you're writing modern Python, you already know how to use them. Using type hints also means that Pydantic integrates well with static typing tools like mypy and pyright and IDEs like pycharm and vscode.
Example - just type hints
(This example requires Python 3.9+)
from typing import Annotated, Dict, List, Literal, Tuple
from annotated_types import Gt
from pydantic import BaseModel
class Fruit(BaseModel):
name: str # (1)!
color: Literal['red', 'green'] # (2)!
weight: Annotated[float, Gt(0)] # (3)!
bazam: Dict[str, List[Tuple[int, bool, float]]] # (4)!
print(
Fruit(
name='Apple',
color='red',
weight=4.2,
bazam={'foobar': [(1, True, 0.1)]},
)
)
#> name='Apple' color='red' weight=4.2 bazam={'foobar': [(1, True, 0.1)]}
- The
name
field is simply annotated withstr
- any string is allowed. - The
Literal
type is used to enforce thatcolor
is either'red'
or'green'
. - Even when we want to apply constraints not encapsulated in python types, we can use
Annotated
andannotated-types
to enforce constraints without breaking type hints. - I'm not claiming "bazam" is really an attribute of fruit, but rather to show that arbitrarily complex types can easily be validated.
Learn more
See the documentation on supported types.
Performance¶
Pydantic's core validation logic is implemented in separate package pydantic-core
, where validation for most types is implemented in Rust.
As a result Pydantic is among the fastest data validation libraries for Python.
Performance Example - Pydantic vs. dedicated code
In general, dedicated code should be much faster that a general-purpose validator, but in this example Pydantic is >300% faster than dedicated code when parsing JSON and validating URLs.
import json
import timeit
from urllib.parse import urlparse
import requests
from pydantic import HttpUrl, TypeAdapter
reps = 7
number = 100
r = requests.get('https://api.github.com/emojis')
r.raise_for_status()
emojis_json = r.content
def emojis_pure_python(raw_data):
data = json.loads(raw_data)
output = {}
for key, value in data.items():
assert isinstance(key, str)
url = urlparse(value)
assert url.scheme in ('https', 'http')
output[key] = url
emojis_pure_python_times = timeit.repeat(
'emojis_pure_python(emojis_json)',
globals={
'emojis_pure_python': emojis_pure_python,
'emojis_json': emojis_json,
},
repeat=reps,
number=number,
)
print(f'pure python: {min(emojis_pure_python_times) / number * 1000:0.2f}ms')
#> pure python: 5.32ms
type_adapter = TypeAdapter(dict[str, HttpUrl])
emojis_pydantic_times = timeit.repeat(
'type_adapter.validate_json(emojis_json)',
globals={
'type_adapter': type_adapter,
'HttpUrl': HttpUrl,
'emojis_json': emojis_json,
},
repeat=reps,
number=number,
)
print(f'pydantic: {min(emojis_pydantic_times) / number * 1000:0.2f}ms')
#> pydantic: 1.54ms
print(
f'Pydantic {min(emojis_pure_python_times) / min(emojis_pydantic_times):0.2f}x faster'
)
#> Pydantic 3.45x faster
Unlike other performance-centric libraries written in compiled languages, Pydantic also has excellent support for customizing validation via functional validators.
Learn more
Samuel Colvin's talk at PyCon 2023 explains how pydantic-core
works and how
it integrates with Pydantic.
Serialization¶
Pydantic provides functionality to serialize model in three ways:
- To a Python
dict
made up of the associated Python objects - To a Python
dict
made up only of "jsonable" types - To a JSON string
In all three modes, the output can be customized by excluding specific fields, excluding unset fields, excluding default values, and excluding None
values
Example - Serialization 3 ways
from datetime import datetime
from pydantic import BaseModel
class Meeting(BaseModel):
when: datetime
where: bytes
why: str = 'No idea'
m = Meeting(when='2020-01-01T12:00', where='home')
print(m.model_dump(exclude_unset=True))
#> {'when': datetime.datetime(2020, 1, 1, 12, 0), 'where': b'home'}
print(m.model_dump(exclude={'where'}, mode='json'))
#> {'when': '2020-01-01T12:00:00', 'why': 'No idea'}
print(m.model_dump_json(exclude_defaults=True))
#> {"when":"2020-01-01T12:00:00","where":"home"}
Learn more
See the documentation on serialization.
JSON Schema¶
JSON Schema can be generated for any Pydantic schema — allowing self-documenting APIs and integration with a wide variety of tools which support JSON Schema.
Example - JSON Schema
from datetime import datetime
from pydantic import BaseModel
class Address(BaseModel):
street: str
city: str
zipcode: str
class Meeting(BaseModel):
when: datetime
where: Address
why: str = 'No idea'
print(Meeting.model_json_schema())
"""
{
'$defs': {
'Address': {
'properties': {
'street': {'title': 'Street', 'type': 'string'},
'city': {'title': 'City', 'type': 'string'},
'zipcode': {'title': 'Zipcode', 'type': 'string'},
},
'required': ['street', 'city', 'zipcode'],
'title': 'Address',
'type': 'object',
}
},
'properties': {
'when': {'format': 'date-time', 'title': 'When', 'type': 'string'},
'where': {'$ref': '#/$defs/Address'},
'why': {'default': 'No idea', 'title': 'Why', 'type': 'string'},
},
'required': ['when', 'where'],
'title': 'Meeting',
'type': 'object',
}
"""
Pydantic generates JSON Schema version 2020-12, the latest version of the standard which is compatible with OpenAPI 3.1.
Learn more
See the documentation on JSON Schema.
Strict mode and data coercion¶
By default, Pydantic is tolerant to common incorrect types and coerces data to the right type — e.g. a numeric string passed to an int
field will be parsed as an int
.
Pydantic also has strict=True
mode — also known as "Strict mode" — where types are not coerced and a validation error is raised unless the input data exactly matches the schema or type hint.
But strict mode would be pretty useless when validating JSON data since JSON doesn't have types matching many common python types like datetime
, UUID
or bytes
.
To solve this, Pydantic can parse and validate JSON in one step. This allows sensible data conversion like RFC3339 (aka ISO8601) strings to datetime
objects. Since the JSON parsing is implemented in Rust, it's also very performant.
Example - Strict mode that's actually useful
from datetime import datetime
from pydantic import BaseModel, ValidationError
class Meeting(BaseModel):
when: datetime
where: bytes
m = Meeting.model_validate({'when': '2020-01-01T12:00', 'where': 'home'})
print(m)
#> when=datetime.datetime(2020, 1, 1, 12, 0) where=b'home'
try:
m = Meeting.model_validate(
{'when': '2020-01-01T12:00', 'where': 'home'}, strict=True
)
except ValidationError as e:
print(e)
"""
2 validation errors for Meeting
when
Input should be a valid datetime [type=datetime_type, input_value='2020-01-01T12:00', input_type=str]
where
Input should be a valid bytes [type=bytes_type, input_value='home', input_type=str]
"""
m_json = Meeting.model_validate_json(
'{"when": "2020-01-01T12:00", "where": "home"}'
)
print(m_json)
#> when=datetime.datetime(2020, 1, 1, 12, 0) where=b'home'
Learn more
See the documentation on strict mode.
Dataclasses, TypedDicts, and more¶
Pydantic provides four ways to create schemas and perform validation and serialization:
BaseModel
— Pydantic's own super class with many common utilities available via instance methods.pydantic.dataclasses.dataclass
— a wrapper around standard dataclasses which performs validation when a dataclass is initialized.TypeAdapter
— a general way to adapt any type for validation and serialization. This allows types likeTypedDict
andNampedTuple
to be validated as well as simple scalar values likeint
ortimedelta
— all types supported can be used withTypeAdapter
.validate_call
— a decorator to perform validation when calling a function.
Example - schema based on TypedDict
from datetime import datetime
from typing_extensions import NotRequired, TypedDict
from pydantic import TypeAdapter
class Meeting(TypedDict):
when: datetime
where: bytes
why: NotRequired[str]
meeting_adapter = TypeAdapter(Meeting)
m = meeting_adapter.validate_python( # (1)!
{'when': '2020-01-01T12:00', 'where': 'home'}
)
print(m)
#> {'when': datetime.datetime(2020, 1, 1, 12, 0), 'where': b'home'}
meeting_adapter.dump_python(m, exclude={'where'}) # (2)!
print(meeting_adapter.json_schema()) # (3)!
"""
{
'properties': {
'when': {'format': 'date-time', 'title': 'When', 'type': 'string'},
'where': {'format': 'binary', 'title': 'Where', 'type': 'string'},
'why': {'title': 'Why', 'type': 'string'},
},
'required': ['when', 'where'],
'title': 'Meeting',
'type': 'object',
}
"""
TypeAdapter
for aTypedDict
performing validation, it can also validate JSON data directly withvalidate_json
dump_python
to serialise aTypedDict
to a python object, it can also serialise to JSON withdump_json
TypeAdapter
can also generate JSON Schema
Customisation¶
Functional validators and serializers, as well as a powerful protocol for custom types, means the way Pydantic operates can be customized on a per-field or per-type basis.
Customisation Example - wrap validators
"wrap validators" are new in Pydantic V2 and are one of the most powerful ways to customize Pydantic validation.
from datetime import datetime, timezone
from pydantic import BaseModel, field_validator
class Meeting(BaseModel):
when: datetime
@field_validator('when', mode='wrap')
def when_now(cls, input_value, handler):
if input_value == 'now':
return datetime.now()
when = handler(input_value)
# in this specific application we know tz naive datetimes are in UTC
if when.tzinfo is None:
when = when.replace(tzinfo=timezone.utc)
return when
print(Meeting(when='2020-01-01T12:00+01:00'))
#> when=datetime.datetime(2020, 1, 1, 12, 0, tzinfo=TzInfo(+01:00))
print(Meeting(when='now'))
#> when=datetime.datetime(2032, 1, 2, 3, 4, 5, 6)
print(Meeting(when='2020-01-01T12:00'))
#> when=datetime.datetime(2020, 1, 1, 12, 0, tzinfo=datetime.timezone.utc)
Learn more
See the documentation on validators, custom serializers, and custom types.
Ecosystem¶
At the time of writing there are 214,100 repositories on GitHub and 8,119 packages on PyPI that depend on Pydantic.
Some notable libraries that depend on Pydantic:
huggingface/transformers
107,475 starstiangolo/fastapi
60,355 starshwchase17/langchain
54,514 starsapache/airflow
30,955 starsmicrosoft/DeepSpeed
26,908 starsray-project/ray
26,600 starslm-sys/FastChat
24,924 starsLightning-AI/lightning
24,034 starsOpenBB-finance/OpenBBTerminal
22,785 starsgradio-app/gradio
19,726 starspola-rs/polars
18,587 starsmindsdb/mindsdb
17,242 starsRasaHQ/rasa
16,695 starsmlflow/mlflow
14,780 starsheartexlabs/label-studio
13,634 starsspotDL/spotify-downloader
12,124 starsSanster/lama-cleaner
12,075 starsairbytehq/airbyte
11,174 starsopenai/evals
11,110 starsmatrix-org/synapse
11,071 starsydataai/ydata-profiling
10,884 starspyodide/pyodide
10,245 starstiangolo/sqlmodel
10,160 starslucidrains/DALLE2-pytorch
9,916 starspynecone-io/reflex
9,679 starsPaddlePaddle/PaddleNLP
9,663 starsaws/serverless-application-model
9,061 starsmodin-project/modin
8,808 starsgreat-expectations/great_expectations
8,613 starsdagster-io/dagster
7,908 starsNVlabs/SPADE
7,407 starsbrycedrennan/imaginAIry
7,217 starschroma-core/chroma
7,127 starslucidrains/imagen-pytorch
7,089 starssqlfluff/sqlfluff
6,278 starsdeeppavlov/DeepPavlov
6,278 starsautogluon/autogluon
5,966 starsbridgecrewio/checkov
5,747 starsbentoml/BentoML
5,275 starsreplicate/cog
5,089 starsvitalik/django-ninja
4,623 starsapache/iceberg
4,479 starsjina-ai/discoart
3,820 starsembedchain/embedchain
3,493 starsskypilot-org/skypilot
3,052 starsPrefectHQ/marvin
2,985 starsmicrosoft/FLAML
2,569 starsdocarray/docarray
2,353 starsaws-powertools/powertools-lambda-python
2,198 starsNVIDIA/NeMo-Guardrails
1,830 starsroman-right/beanie
1,299 starsart049/odmantic
807 stars
More libraries using Pydantic can be found at Kludex/awesome-pydantic
.
Organisations using Pydantic¶
Some notable companies and organisations using Pydantic together with comments on why/how we know they're using Pydantic.
The organisations below are included because they match one or more of the following criteria:
- Using pydantic as a dependency in a public repository
- Referring traffic to the pydantic documentation site from an organization-internal domain - specific referrers are not included since they're generally not in the public domain
- Direct communication between the Pydantic team and engineers employed by the organization about usage of Pydantic within the organization
We've included some extra detail where appropriate and already in the public domain.
Adobe¶
adobe/dy-sql
uses Pydantic.
Amazon and AWS¶
- powertools-lambda-python
- awslabs/gluonts
- AWS sponsored Samuel Colvin $5,000 to work on Pydantic in 2022
Anthropic¶
anthropics/anthropic-sdk-python
uses Pydantic.
Apple¶
(Based on the criteria described above)
ASML¶
(Based on the criteria described above)
AstraZeneca¶
Multiple repos in the AstraZeneca
GitHub org depend on Pydantic.
Cisco Systems¶
- Pydantic is listed in their report of Open Source Used In RADKit.
cisco/webex-assistant-sdk
Comcast¶
(Based on the criteria described above)
Datadog¶
- Extensive use of Pydantic in
DataDog/integrations-core
and other repos - Communication with engineers from Datadog about how they use Pydantic.
Facebook¶
Multiple repos in the facebookresearch
GitHub org depend on Pydantic.
GitHub¶
GitHub sponsored Pydantic $750 in 2022
Google¶
Extensive use of Pydantic in google/turbinia
and other repos.
HSBC¶
(Based on the criteria described above)
IBM¶
Multiple repos in the IBM
GitHub org depend on Pydantic.
Intel¶
(Based on the criteria described above)
Intuit¶
(Based on the criteria described above)
Intergovernmental Panel on Climate Change¶
Tweet explaining how the IPCC use Pydantic.
JPMorgan¶
(Based on the criteria described above)
Jupyter¶
- The developers of the Jupyter notebook are using Pydantic for subprojects
- Through the FastAPI-based Jupyter server Jupyverse
- FPS's configuration management.
Microsoft¶
- DeepSpeed deep learning optimisation library uses Pydantic extensively
- Multiple repos in the
microsoft
GitHub org depend on Pydantic, in particular their - Pydantic is also used in the
Azure
GitHub org - Comments on GitHub show Microsoft engineers using Pydantic as part of Windows and Office
Molecular Science Software Institute¶
Multiple repos in the MolSSI
GitHub org depend on Pydantic.
NASA¶
Multiple repos in the NASA
GitHub org depend on Pydantic.
NASA are also using Pydantic via FastAPI in their JWST project to process images from the James Webb Space Telescope, see this tweet.
Netflix¶
Multiple repos in the Netflix
GitHub org depend on Pydantic.
NSA¶
The nsacyber/WALKOFF
repo depends on Pydantic.
NVIDIA¶
Mupltiple repos in the NVIDIA
GitHub org depend on Pydantic.
Their "Omniverse Services" depends on Pydantic according to their documentation.
OpenAI¶
OpenAI use Pydantic for their ChatCompletions API, as per this discussion on GitHub.
Anecdotally, OpenAI use Pydantic extensively for their internal services.
Oracle¶
(Based on the criteria described above)
Palantir¶
(Based on the criteria described above)
Qualcomm¶
(Based on the criteria described above)
Red Hat¶
(Based on the criteria described above)
Revolut¶
Anecdotally, all internal services at Revolut are built with FastAPI and therefore Pydantic.
Robusta¶
The robusta-dev/robusta
repo depends on Pydantic.
Salesforce¶
Salesforce sponsored Samuel Colvin $10,000 to work on Pydantic in 2022.
Starbucks¶
(Based on the criteria described above)
Texas Instruments¶
(Based on the criteria described above)
Twilio¶
(Based on the criteria described above)
Twitter¶
Twitter's the-algorithm
repo where they
open sourced
their recommendation engine uses Pydantic.
UK Home Office¶
(Based on the criteria described above)