pycldf.orm

Object oriented (read-only) access to CLDF data

To read ORM objects from a pycldf.Dataset, there are two generic methods:

pycldf.Dataset.objects()
pycldf.Dataset.get_object()

Both will return default implementations of the objects, i.e. instances of the corresponding class defined in this module. To customize these objects,

subclass the default and specify the appropriate component (i.e. the table of the CLDF dataset which holds rows to be transformed to this type):

from pycldf.orm import Language

class Variety(Language):
    __component__ = 'LanguageTable'

    def custom_method(self):
        pass

pass the class into the objects or get_object method.

In addition, module-specific subclasses of pycldf.Dataset provide more meaningful properties and methods, as shortcuts to the methods above. See ./dataset.html#subclasses-supporting-specific-cldf-modules for details.

Limitations:

We only support foreign key constraints for CLDF reference properties targeting either a component’s CLDF id or its primary key. This is because CSVW does not support unique constraints other than the one implied by the primary key declaration.
This functionality comes with the typical “more convenient API vs. less performance and bigger memory footprint” trade-off. If you are running into problems with this, you might want to load your data into a SQLite db using the pycldf.db module, and access via SQL. Some numbers (to be interpreted relative to each other): Reading ~400,000 rows from a ValueTable of a StructureDataset takes
- ~2secs with csvcut, i.e. only making sure it’s valid CSV
- ~15secs iterating over pycldf.Dataset['ValueTable']
- ~35secs iterating over pycldf.Dataset.objects('ValueTable')

The Object base class

class pycldf.orm.Object(dataset, row)[source]

Represents a row of a CLDF component table.

Subclasses of Object are instantiated when calling Dataset.objects or Dataset.get_object.

Variables:

dataset – Reference to the Dataset instance, this object was loaded from.
data – An OrderedDict with a copy of the row the object was instantiated with.
cldf – A dict with CLDF-specified properties of the row, keyed with CLDF terms.
id – The value of the CLDF id property of the row.
name – The value of the CLDF name property of the row.
description – The value of the CLDF description property of the row.
pk – The value of the column specified as primary key for the table. (May differ from id)

Parameters:

dataset (pycldf.dataset.Dataset) –
row (collections.OrderedDict[str, typing.Any]) –

aboutUrl(col='id')[source]

The table’s aboutUrl property, expanded with the object’s row as context.

Parameters:: col (str) –
Return type:: typing.Optional[str]

all_related(relation)[source]

CLDF reference properties can be list-valued. This method returns all related objects for such a property.

Parameters:: relation (str) –
Return type:: typing.Union[pycldf.util.DictTuple, list]

property component: str: Name of the CLDF component the object belongs to. Can be used to lookup the corresponding table via obj.dataset[obj.component_name()].

property key: tuple[int, str, str]: A key that is also unique across different Dataset instances.

propertyUrl(col='id')[source]

The table’s propertyUrl property, expanded with the object’s row as context.

Parameters:: col (str) –
Return type:: typing.Optional[str]

property references: tuple[pycldf.sources.Reference, ...]

pycldf.Reference instances associated with the object.

>>> obj.references[0].source['title']
>>> obj.references[0].fields.title
>>> obj.references[0].description  # The "context", typically cited pages

related(relation)[source]

The CLDF ontology specifies several “reference properties”. This method returns the first related object specified by such a property.

Parameters:: relation (str) – a CLDF reference property name.
Return type:: typing.Optional[pycldf.orm.Object]
Returns:: related Object instance.

valueUrl(col='id')[source]

The table’s valueUrl property, expanded with the object’s row as context.

Parameters:: col (str) –
Return type:: typing.Optional[str]

Component-specific object classes

class pycldf.orm.Borrowing(dataset, row)[source]

Parameters:

dataset (pycldf.dataset.Dataset) –
row (collections.OrderedDict[str, typing.Any]) –

class pycldf.orm.Code(dataset, row)[source]

Parameters:

dataset (pycldf.dataset.Dataset) –
row (collections.OrderedDict[str, typing.Any]) –

class pycldf.orm.Cognateset(dataset, row)[source]

Parameters:

dataset (pycldf.dataset.Dataset) –
row (collections.OrderedDict[str, typing.Any]) –

class pycldf.orm.Cognate(dataset, row)[source]

Parameters:

dataset (pycldf.dataset.Dataset) –
row (collections.OrderedDict[str, typing.Any]) –

class pycldf.orm.Contribution(dataset, row)[source]

Parameters:

dataset (pycldf.dataset.Dataset) –
row (collections.OrderedDict[str, typing.Any]) –

property sentences: Returns the ordered sentences of a text in a TextCorpus.

class pycldf.orm.Entry(dataset, row)[source]

Parameters:

dataset (pycldf.dataset.Dataset) –
row (collections.OrderedDict[str, typing.Any]) –

class pycldf.orm.Example(dataset, row)[source]

Parameters:

dataset (pycldf.dataset.Dataset) –
row (collections.OrderedDict[str, typing.Any]) –

property alternative_translations: list[pycldf.orm.Example]: Returns alternative translations for the Example.

property igt: str: The example in a plain text interlinear glossed representation.

property text: Examples in a TextCorpus are interpreted as lines of a text, which in turn is the module-specific interpretation of a CLDF contribution.

class pycldf.orm.Form(dataset, row)[source]

Parameters:

dataset (pycldf.dataset.Dataset) –
row (collections.OrderedDict[str, typing.Any]) –

class pycldf.orm.FunctionalEquivalentset(dataset, row)[source]

Parameters:

dataset (pycldf.dataset.Dataset) –
row (collections.OrderedDict[str, typing.Any]) –

class pycldf.orm.FunctionalEquivalent(dataset, row)[source]

Parameters:

dataset (pycldf.dataset.Dataset) –
row (collections.OrderedDict[str, typing.Any]) –

class pycldf.orm.Language(dataset, row)[source]

Language objects correspond to rows in a dataset’s LanguageTable.

Language objects provide easy access to somewhat complex derivatives of the dataset’s info on the language, e.g. its speaker area as GeoJSON object.

>>> from pycldf import Dataset
>>> ds = Dataset.from_metadata('tests/data/dataset_with_media/metadata.json')
>>> lg = ds.get_object('LanguageTable', '1')
>>> lg.speaker_area_as_geojson_feature['geometry']['type']
'MultiPolygon'

Parameters:

dataset (pycldf.dataset.Dataset) –
row (collections.OrderedDict[str, typing.Any]) –

property as_geojson_feature: None | dict[str, Any]: dict suitable for serialization as GeoJSON Feature object, with the point coordinate as geographic data.

See also

https://datatracker.ietf.org/doc/html/rfc7946#section-3.2

glottolog_languoid(glottolog_api)[source]

Get a Glottolog languoid associated with the Language.

Parameters:: glottolog_api – pyglottolog.Glottolog instance or dict mapping glottocodes to pyglottolog.langoids.Languoid instances.
Returns:: pyglottolog.langoids.Languoid instance or None.

property lonlat: tuple[decimal.Decimal, decimal.Decimal] | None

Returns:: (longitude, latitude) pair if coordinates are defined, else None.

property speaker_area: File | None: A pycldf.media.File object containing information about the speaker area of the language.

property speaker_area_as_geojson_feature: dict[str, Any] | None: dict suitable for serialization as GeoJSON Feature object, with a speaker area Polygon or MultiPolygon as geographic data.

See also

https://datatracker.ietf.org/doc/html/rfc7946#section-3.2

class pycldf.orm.Media(dataset, row)[source]

Parameters:

dataset (pycldf.dataset.Dataset) –
row (collections.OrderedDict[str, typing.Any]) –

class pycldf.orm.Parameter(dataset, row)[source]

The Parameter class provides support for interpreting a parameter’s string values as typed data and reading it accordingly. See Value below.

Parameters:

dataset (pycldf.dataset.Dataset) –
row (collections.OrderedDict[str, typing.Any]) –

property columnSpec: Column | None: Turns a JSON column specification in a column value into a Column object.

concepticon_conceptset(concepticon_api)[source]

Get a Concepticon conceptset associated with the Parameter.

Parameters:: concepticon_api – pyconcepticon.Concepticon instance or dict mapping conceptset IDs to pyconcepticon.models.Conceptset instances.
Returns:: pyconcepticon.models.Conceptset instance or None.

property datatype: Datatype | None: Turns a JSON datatype description in a column value into a Datatype object.

class pycldf.orm.Sense(dataset, row)[source]

Parameters:

dataset (pycldf.dataset.Dataset) –
row (collections.OrderedDict[str, typing.Any]) –

class pycldf.orm.Value(dataset, row)[source]

Value objects correspond to rows in a dataset’s ValueTable.

While a Value’s string representation is typically available from the value column, i.e. as Value.cldf.value, The interpretation of this value may be dictated by other metadata.

Categorical data will often describe possible values (aka “codes”) using a CodeTable. In this case, the associated Code object of a Value is available as Value.code.
Typed data may use a columnSpec property in ParameterTable to specify how to read the string value.

>>> from csvw.metadata import Column
>>> from pycldf import StructureDataset
>>> cs = Column.fromvalue(dict(datatype=dict(base='integer', maximum=5), separator=' '))
>>> ds = StructureDataset.in_dir('.')
>>> ds.add_component('ParameterTable')
>>> ds.write(
...     ParameterTable=[dict(ID='1', ColumnSpec=cs.asdict())],
...     ValueTable=[dict(ID='1', Language_ID='l', Parameter_ID='1', Value='1 2 3')],
... )
>>> v = ds.objects('ValueTable')[0]
>>> v.cldf.value
'1 2 3'
>>> v.typed_value
[1, 2, 3]

Parameters:

dataset (pycldf.dataset.Dataset) –
row (collections.OrderedDict[str, typing.Any]) –

property typed_value: If a parameter includes information about the datatype of its values, this information is used here to convert the value accordingly.