pycldf.db

Functionality to load a CLDF dataset into a sqlite db.

To make the resulting SQLite database useful without access to the datasets metadata, we use terms of the CLDF ontology for database objects as much as possible, i.e. - table names are component names (e.g. “ValueTable” for a table with propertyUrl http://cldf.clld.org/v1.0/terms.rdf#ValueTable) - column names are property names, prefixed with “cldf” + UNDERSCORE (e.g. a column with propertyUrl http://cldf.clld.org/v1.0/terms.rdf#id will be “cldf_id” in the database)

This naming scheme also extends to automatically created association tables. I.e. when a table specifies a list-valued foreign key, an association table is created to implement this many-to-many relationship. The name of the association table is the concatenation of - the url properties of the tables in this relationship or of - the component names of the tables in the relationship.

E.g. a list-valued foreign key from the FormTable to the ParameterTable will result in an association table

CREATE TABLE `FormTable_ParameterTable` (
  `FormTable_cldf_id` TEXT,
  `ParameterTable_cldf_id` TEXT,
  `context` TEXT,
  FOREIGN KEY(`FormTable_cldf_id`) REFERENCES `FormTable`(`cldf_id`) ON DELETE CASCADE,
  FOREIGN KEY(`ParameterTable_cldf_id`) REFERENCES `ParameterTable`(`cldf_id`) ON DELETE CASCADE
);

while a list-valued foreign key to a custom table may result in something like this

CREATE TABLE `FormTable_custom.csv` (
  `FormTable_cldf_id` TEXT,
  `custom.csv_id` TEXT,
  `context` TEXT,
  FOREIGN KEY(`FormTable_cldf_id`) REFERENCES `FormTable`(`cldf_id`) ON DELETE CASCADE,
  FOREIGN KEY(`custom.csv_id`) REFERENCES `custom.csv`(`id`) ON DELETE CASCADE
);
class pycldf.db.Database(dataset, **kw)[source]

Extend the functionality provided by csvw.db.Database by

  • providing consistent naming of schema objects according to CLDF semantics,

  • integrating sources into the DB schema.

association_table_context(table, column, fkey)[source]

Context for association tables is created calling this method.

Note: If a custom value for the context column is created by overwriting this method, select_many_to_many must be adapted accordingly, to make sure the custom context is retrieved when reading the data from the db.

Parameters
  • table

  • column

  • fkey

Returns

a pair (foreign key, context)

query(sql, params=None)[source]

Run sql on the database, returning the list of results.

Parameters

sql (str) –

Return type

list

static round_geocoordinates(item, precision=4)[source]

We round geo coordinates to precision decimal places.

See https://en.wikipedia.org/wiki/Decimal_degrees

Parameters
  • item

  • precision

Returns

item

to_cldf(dest, mdname='cldf-metadata.json', coordinate_precision=4)[source]

Write the data from the db to a CLDF dataset according to the metadata in self.dataset.

Parameters
  • dest – Destination directory for the CLDF data.

  • mdname – Name to use for the CLDF metadata file.

Return type

pathlib.Path

Returns

path of the metadata file

write(_force=False, _exists_ok=False, **items)[source]

Creates a db file with the core schema.

Parameters

force – If True an existing db file will be overwritten.

write_from_tg(_force=False, _exists_ok=False)[source]

Write the data from self.dataset to the database.

Parameters
  • _force (bool) –

  • _exists_ok (bool) –