pycldf.orm

Object oriented (read-only) access to CLDF data

To read ORM objects from a pycldf.Dataset, there are two generic methods:

  • pycldf.Dataset.objects()

  • pycldf.Dataset.get_object()

Both will return default implementations of the objects, i.e. instances of the corresponding class defined in this module. To customize these objects,

  1. subclass the default and specify the appropriate component (i.e. the table of the CLDF dataset which holds rows to be transformed to this type):

    from pycldf.orm import Language
    
    class Variety(Language):
        __component__ = 'LanguageTable'
    
        def custom_method(self):
            pass
    
  2. pass the class into the objects or get_object method.

In addition, module-specific subclasses of pycldf.Dataset provide more meaningful properties and methods, as shortcuts to the methods above. See ./dataset.html#subclasses-supporting-specific-cldf-modules for details.

Limitations:

  • We only support foreign key constraints for CLDF reference properties targeting either a component’s CLDF id or its primary key. This is because CSVW does not support unique constraints other than the one implied by the primary key declaration.

  • This functionality comes with the typical “more convenient API vs. less performance and bigger memory footprint” trade-off. If you are running into problems with this, you might want to load your data into a SQLite db using the pycldf.db module, and access via SQL. Some numbers (to be interpreted relative to each other): Reading ~400,000 rows from a ValueTable of a StructureDataset takes

    • ~2secs with csvcut, i.e. only making sure it’s valid CSV

    • ~15secs iterating over pycldf.Dataset['ValueTable']

    • ~35secs iterating over pycldf.Dataset.objects('ValueTable')

The Object base class

class pycldf.orm.Object(dataset, row)[source]

Represents a row of a CLDF component table.

Subclasses of Object are instantiated when calling Dataset.objects or Dataset.get_object.

Variables:
  • dataset – Reference to the Dataset instance, this object was loaded from.

  • data – An OrderedDict with a copy of the row the object was instantiated with.

  • cldf – A dict with CLDF-specified properties of the row, keyed with CLDF terms.

  • id – The value of the CLDF id property of the row.

  • name – The value of the CLDF name property of the row.

  • description – The value of the CLDF description property of the row.

  • pk – The value of the column specified as primary key for the table. (May differ from id)

Parameters:
aboutUrl(col='id')[source]

The table’s aboutUrl property, expanded with the object’s row as context.

Return type:

typing.Optional[str]

CLDF reference properties can be list-valued. This method returns all related objects for such a property.

Parameters:

relation (str) –

Return type:

typing.Union[pycldf.util.DictTuple, list]

property component: str

Name of the CLDF component the object belongs to. Can be used to lookup the corresponding table via obj.dataset[obj.component_name()].

propertyUrl(col='id')[source]

The table’s propertyUrl property, expanded with the object’s row as context.

Return type:

typing.Optional[str]

property references: Tuple[Reference]

pycldf.Reference instances associated with the object.

>>> obj.references[0].source['title']
>>> obj.references[0].fields.title
>>> obj.references[0].description  # The "context", typically cited pages
related(relation)[source]

The CLDF ontology specifies several “reference properties”. This method returns the first related object specified by such a property.

Parameters:

relation (str) – a CLDF reference property name.

Return type:

typing.Optional[pycldf.orm.Object]

Returns:

related Object instance.

valueUrl(col='id')[source]

The table’s valueUrl property, expanded with the object’s row as context.

Return type:

typing.Optional[str]

Component-specific object classes

class pycldf.orm.Borrowing(dataset, row)[source]
Parameters:
class pycldf.orm.Code(dataset, row)[source]
Parameters:
class pycldf.orm.Cognateset(dataset, row)[source]
Parameters:
class pycldf.orm.Cognate(dataset, row)[source]
Parameters:
class pycldf.orm.Contribution(dataset, row)[source]
Parameters:
class pycldf.orm.Entry(dataset, row)[source]
Parameters:
class pycldf.orm.Example(dataset, row)[source]
Parameters:
property text

Examples in a TextCorpus are interpreted as lines of text.

class pycldf.orm.Form(dataset, row)[source]
Parameters:
class pycldf.orm.FunctionalEquivalentset(dataset, row)[source]
Parameters:
class pycldf.orm.FunctionalEquivalent(dataset, row)[source]
Parameters:
class pycldf.orm.Language(dataset, row)[source]

FIXME: describe usage!

Parameters:
glottolog_languoid(glottolog_api)[source]

Get a Glottolog languoid associated with the Language.

Parameters:

glottolog_apipyglottolog.Glottolog instance or dict mapping glottocodes to pyglottolog.langoids.Languoid instances.

Returns:

pyglottolog.langoids.Languoid instance or None.

property lonlat
Returns:

(longitude, latitude) pair

class pycldf.orm.Media(dataset, row)[source]
Parameters:
class pycldf.orm.Parameter(dataset, row)[source]
Parameters:
concepticon_conceptset(concepticon_api)[source]

Get a Concepticon conceptset associated with the Parameter.

Parameters:

concepticon_apipyconcepticon.Concepticon instance or dict mapping conceptset IDs to pyconcepticon.models.Conceptset instances.

Returns:

pyconcepticon.models.Conceptset instance or None.

class pycldf.orm.Sense(dataset, row)[source]
Parameters:
class pycldf.orm.Value(dataset, row)[source]

Value objects correspond to rows in a dataset’s ValueTable.

While a Value’s string representation is typically available from the value column, i.e. as Value.cldf.value, The interpretation of this value may be dictated by other metadata.

  • Categorical data will often describe possible values (aka “codes”) using a CodeTable. In this case, the associated Code object of a Value is available as Value.code.

  • Typed data may use a columnSpec property in ParameterTable to specify how to read the string value.

>>> from csvw.metadata import Column
>>> from pycldf import StructureDataset
>>> cs = Column.fromvalue(dict(datatype=dict(base='integer', maximum=5), separator=' '))
>>> ds = StructureDataset.in_dir('.')
>>> ds.add_component('ParameterTable')
>>> ds.write(
...     ParameterTable=[dict(ID='1', ColumnSpec=cs.asdict())],
...     ValueTable=[dict(ID='1', Language_ID='l', Parameter_ID='1', Value='1 2 3')],
... )
>>> v = ds.objects('ValueTable')[0]
>>> v.cldf.value
'1 2 3'
>>> v.typed_value
[1, 2, 3]
Parameters: