oasis_archiv


Principle of Operation of the DBA


SQL statements generated by DB-Adapters queries the Daily Cache table. Building of the Daily Cache table is realised by the Cron demon [18] and consists of three stages: - gathering metadata, - grouping metadata according to objects they describe, - building the final shape of the Daily Cache table.

Metadata describe electronic art objects – artworks, provided by participating institutions. Each artwork (pictures, videos, etc.) has its unique identifier - origin_id. Artworks can have however multiple representations (multiple language versions for instance), called manifestations. Each manifestation is described by also unique oml_id. Metadata gathered during the first stage, stored in the temporary tables of the DBA-database, are then grouped according to the origin_id and oml_id fields separately. The third stage consists in generating the Daily Cache table as concatenation of the temporary tables. Temporary tables simplify accelerate and minimise failure of the Daily Cache building process. They give possibility to process metadata on the DB-Adapter side. Connection to the database server of partner is held only during the first stage.

Process of the Daily Cache generation is illustrated in the figure below.


[[Bild:]]

BU: Generation of the Daily Cache

Process of the Daily Cache generation can be supported by additional algorithms specified in the DB-Adapter config file. If sequence of Daily Cache generation is like it was indicated above (by createCacheTable() method for instance), then each stage can be realised by the separated task.

DB-Adapter – query processing

Searching for the electronic arts is supported by Content Based Indexing Extension module, which is designated to collect the additional information describing the database objects. The additional information is gathered using complex indexing techniques for electronic documents.

The Interface

The interface between Middleware and DB-Adapter consists of several PHP scripts. All scripts are places in the same path - the URL of DB-Adapter. Middleare stores this URL in it's database as "base_url" for each DB-Adapter. The scripts are listed below.

SCRIPT: query.php

Represents the interface for searching. DBA uses it's own cache

  • 0 no problem with copyright
  • 1 cache refresh per day (for example...)

Queries are case insensitive. The plus ("+") sign substitutes spaces. There is no WHERE clause - searching is executed in all fields unless specified in the Google like style: author:Vasulka. The list of WHERE-like query modifiers is defined elsewhere. The NOT modifier is implemented as the proceeding minus ("-") sign. It is also planned to implement OR modifier later on. Query example: some fulltext search mediatype:Audio mediatype:Video There is constructed a temporary table in the DBA database for every query. Some columns are just an extract from the daily generated table, the rest of columns is created on the fly.

Accepted parameters: • QUERY: the google-like query string • USERNAME: the username used by authorization module of the DB-Adapter (this is optional) • PASSWORD: the password used by authorization module of the DB-Adapter (this is optional) • LANGUAGE: DB-Adapter could use this value to generate language-specific values (this is optional) • NOCBMI: if the value is "1", values from indexing module will be excluded from result. (this is optional) NOTE: NOCBMI was implemented as oasis attribute "nocbmi".

The result are: • First line is the return code • Second line is the textual description • Third line is the identifier of created temporary table

Example: 0 Temporary table is ready tmp_table_123 Return codes: • 0: OK • 1: DB server down • 2: General error • >2: Some other error There is a check-ip functionality, that allows for access only from hosts specified in the configuration file. Authentication is assured by a list of known IP address.

SCRIPT: auth.php

Each search request could be extended with USERNAME and PASSWORD parameters. This allows DB-Adapter to alter results according to user privileges used in particular database. If the privileges are not supported by the database, DB-Adapter simply ignores the USERNAME/PASSWORD pair. Accepts parameters: • USERNAME • PASSWORD

The result are: • 0: access granted • 1: access denied • >1: Some other error

SCRIPT: getids.php

Accepts parameters: • TABLE: the temporary table created by DB-Adapter The result are: • first two lines contain the error code and reply message • next lines contain list of all origin_ids in the requested temporary table - one origin_id per line • It should be noted that each line must be terminated by new-line character especially the last line. • In case of error, following error-codes appear: 1. 1 Table name not specified 2. 2 DB connection failed 3. 3 DB query failed 4. 4 Could not open configuration file Example: 0 Result is ready 123123 234533 124242 568659 346347 567567

SCRIPT: cleanup.php

Drops the temporary table created by DB-Adapter. Accepts parameters: • TABLE: identifier of the temporary table generated by query.php The result are: • 0: Temporary table deleted successfully • 1: Temporary table does not exist. • 2: Could not delete temporary table. • 3: Table name not specified • 4: DB connection failed • 5: Could not open config file • >5: Some other error

SCRIPT: load.php (substituted by detail.php)

Loading of values of attributes on demand. Accepts parameters: • TABLE: identifier of the temporary table generated by query.php • OFFSET: offset in the result-set (optional, default 0) • SIZE: size of the result-set (optional, default infinite) • ATTRNAME: column in the temporary table • ID[]: list of origin_ids The result: • The result are Serialized PHP array with key as origin_id and value as value of the requested attribute. • In case of error, the result are serialized PHP string with error code and message. 1. 1 Wrong parameters 2. 2 DB connection failed 3. 3 DB query failed 4. 4 Could not open configuration file serialize("$errorCode $errorMessage"); Notes: • if the requested ATTRNAME does not exist in DBA's temporary table, script will return array with empty strings for each origin_id or 'MISSING-IN-DBA' string for debugging purposes. serialize(array('12345'=>)) or serialize(array('12345'=>'MISSING-IN-DBA')) • in case of multiple values, the multi- array is returned serialize(array('123'=>array('ISBN','ISSN','UMID'))) • The script can use g-zipped HTTP response if the client supports it. In PHP this is implemented like following example <?php

 ob_start('ob_gzhandler');
 echo 'this will be compressed transparently';
?>

SCRIPT: detail.php (after Marseille)

Loading values from all attributes of a given resource (improved version of load.php optimized for detail-view). Accepts parameters: • TABLE (optional): identifier of the temporary table generated by query.php. If the parameter is not specified, data will be taken directly from daily cache. • ORIGINID: unique identifier of the resource The result are: • first two lines contain the error code and reply message • next lines contain results - one value per line • each line obeys following format:ATTRNAME VALUE • if the value contains new-lines, are converted to \n or \r as well as backslashes are converted to \\ (PHP function addcslashes($string, "\n\r\\")) • In case of error, following error-codes appear. 1. 1 Wrong parameters 2. 2 DB connection failed 3. 3 DB query failed 4. 4 Could not open configuration file


Example

QUERY: http://foobar?ORIGINID=9675 REPLY: origin_id 12345 title foobar title language en language cz origin_id 23456 title This title\ncontains multiple\nlines title Yet another title language pl

SCRIPT: multiload.php

Loading multiple columns in a single request, suitable for big data transfers (improved version of load.php). Accepts parameters: • TABLE: identifier of the temporary table generated by query.php • ATTRNAME: comma separated list of names of attributes (columns of the temporary table) • COMPRESS: 1, if the result are compressed The result are: • first two lines contain the error code and reply message • next lines contain results - one value per line • each line obeys following format:ORIGINID ATTRNAME VALUE • if the value contains new-lines, it is converted to \n as wll as backslashes are converted to \\ (PHP function addcslashes($string, "\n\r\\")) • In case of error, following error-codes appear. 1. 1 Wrong parameters 2. 2 DB connection failed 3. 3 DB query failed 4. 4 Could not open configuration file Example QUERY: http://foobar?TABLE=tmp_1&ATTRNAME=origin_id,title,language DB-Adapter uses mechanism similar to the following SQL code: select origin_id, 'origin_id',origin_id from tmp_1 union select origin_id, 'title', title from tmp_1 union select origin_id, 'language', language from tmp_1 REPLY: 12345 origin_id 12345 12345 title foobar title 12345 language en 12345 language cz 23456 origin_id 23456 23456 title This title\ncontains multiple\nlines 23456 title Yet another title 23456 language pl

In the example above the multiload.php returned 2 resources, each resource contained 3 requested attributes and some of the attributes had multiple values. The 23456's title was an example of multi-line value escaped by \n. Notes: • While calling multiload.php, middleware doesn't send any ORIDIN_IDS to DB-Adapter because DBA returns full columns from temporary table. • if the requested ATTRNAME does not exist in DBA's temporary table, script returns array with empty strings for each origin_id or 'MISSING-IN-DBA' string for debugging purposes. • The script uses g-zipped HTTP response if the client supports it. In PHP this is implemented like following example

<?php
 ob_start('ob_gzhandler');
 echo 'this will be compressed transparently';
?>

SCRIPT: player.php Generates HTML code (web page with integrated media player) for playing the media from specified offset in the file/stream. Accepts parameters: • ORIGINID: the unique identification of artwork in remote database • OFFSETS: list of offsets in milliseconds delimited by comma e.g. 12,34,124 The result are: • HTML code with appropriate embed and object elements.

SCRIPT: institution.php Generates redirector URL for the artwork. If you want to redirect user's browser to another location use HTTP header "Location".

EXAMPLE PHP CODE:
<?php header('Location: http://www.google.com') ?>

Accepts parameters: • PROVENANCE: value of the attribute provenance • ORIGINID: value of the attribute origin_id The result are: • HTML page.

SCRIPT: logo.php Accepts parameters: • PROVENANCE: value of the attribute provenance • ORIGINID: value of the attribute origin_id • SIZE: possible values are: 1. small (ciant proposed 32x16) 2. medium (ciant proposed 64x32) 3. large (ciant proposed 128x64) The result are: • first line is the URL


Processing of the queries provided by the GUI interface consist of:

  • query the Daily Cache table,
  • query Content Based Indexing Extension,
  • integration of the results.

Queries are processed as it is indicated in the figure below.


[[Bild:]]

Processing of the queries

Processing of the queries starts from creation of object of the QueryManager class and execution of the query() method. Arguments passed to the query() method are as follow:

  • query,
  • array of attributes (which are compatible with the Oasis format),
  • array of factors which could define weight of the attributes.

First, column names indicated by the query modifiers are determined. Next, “free keywords” are selected – values that can occur in any column. Then the WHERE clause is formed and final sql statement (with keywords and conditions) is composed. This statement is used to query the Daily Cache table. Results retrieved from the Daily Cache table are saved to temporary table and supplemented by the statistical data:

  • weight of the result – calculated using the arrays of attributes and factors,
  • information about keywords which have been found.

The query() method returns a table of five elements:

  • error code (0 if response has been retrieved),
  • text message describing of error
  • name of the temporary table where the results are stored,
  • array of keywords,
  • additional flag

The last two elements are used to integrate the Daily Cache response with results retrieved from the Content Based Indexing Extension unit.

 

  Login