Principle of Operation of the DBA
SQL statements generated by DB-Adapters queries the Daily Cache table.
Building of the Daily Cache table is realised by the Cron demon [18] and consists of three stages:
- gathering metadata,
- grouping metadata according to objects they describe,
- building the final shape of the Daily Cache table.
Metadata describe electronic art objects – artworks, provided by participating institutions. Each artwork (pictures, videos, etc.) has its unique identifier - origin_id. Artworks can have however multiple representations (multiple language versions for instance), called manifestations. Each manifestation is described by also unique oml_id. Metadata gathered during the first stage, stored in the temporary tables of the DBA-database, are then grouped according to the origin_id and oml_id fields separately. The third stage consists in generating the Daily Cache table as concatenation of the temporary tables. Temporary tables simplify accelerate and minimise failure of the Daily Cache building process. They give possibility to process metadata on the DB-Adapter side. Connection to the database server of partner is held only during the first stage.
Process of the Daily Cache generation is illustrated in the figure below.
[[Bild:]]
BU: Generation of the Daily Cache
Process of the Daily Cache generation can be supported by additional algorithms specified in the DB-Adapter config file. If sequence of Daily Cache generation is like it was indicated above (by createCacheTable() method for instance), then each stage can be realised by the separated task.
DB-Adapter – query processing
Searching for the electronic arts is supported by Content Based Indexing Extension module, which is designated to collect the additional information describing the database objects. The additional information is gathered using complex indexing techniques for electronic documents.
The Interface
The interface between Middleware and DB-Adapter consists of several PHP scripts. All scripts are places in the same path - the URL of DB-Adapter. Middleare stores this URL in it's database as "base_url" for each DB-Adapter. The scripts are listed below.
SCRIPT: query.php
Represents the interface for searching. DBA uses it's own cache
- 0 no problem with copyright
- 1 cache refresh per day (for example...)
Queries are case insensitive. The plus ("+") sign substitutes spaces. There is no WHERE clause - searching is executed in all fields unless specified in the Google like style: author:Vasulka. The list of WHERE-like query modifiers is defined elsewhere. The NOT modifier is implemented as the proceeding minus ("-") sign. It is also planned to implement OR modifier later on.
Query example:
some fulltext search mediatype:Audio mediatype:Video
There is constructed a temporary table in the DBA database for every query. Some columns are just an extract from the daily generated table, the rest of columns is created on the fly.
Accepted parameters:
• QUERY: the google-like query string
• USERNAME: the username used by authorization module of the DB-Adapter (this is optional)
• PASSWORD: the password used by authorization module of the DB-Adapter (this is optional)
• LANGUAGE: DB-Adapter could use this value to generate language-specific values (this is optional)
• NOCBMI: if the value is "1", values from indexing module will be excluded from result. (this is optional)
NOTE: NOCBMI was implemented as oasis attribute "nocbmi".
The result are:
• First line is the return code
• Second line is the textual description
• Third line is the identifier of created temporary table
Example:
0
Temporary table is ready
tmp_table_123
Return codes:
• 0: OK
• 1: DB server down
• 2: General error
• >2: Some other error
There is a check-ip functionality, that allows for access only from hosts specified in the configuration file. Authentication is assured by a list of known IP address.
SCRIPT: auth.php
Each search request could be extended with USERNAME and PASSWORD parameters. This allows DB-Adapter to alter results according to user privileges used in particular database. If the privileges are not supported by the database, DB-Adapter simply ignores the USERNAME/PASSWORD pair.
Accepts parameters:
• USERNAME
• PASSWORD
The result are:
• 0: access granted
• 1: access denied
• >1: Some other error
SCRIPT: getids.php
Accepts parameters:
• TABLE: the temporary table created by DB-Adapter
The result are:
• first two lines contain the error code and reply message
• next lines contain list of all origin_ids in the requested temporary table - one origin_id per line
• It should be noted that each line must be terminated by new-line character especially the last line.
• In case of error, following error-codes appear:
1. 1 Table name not specified
2. 2 DB connection failed
3. 3 DB query failed
4. 4 Could not open configuration file
Example:
0
Result is ready
123123
234533
124242
568659
346347
567567
SCRIPT: cleanup.php
Drops the temporary table created by DB-Adapter.
Accepts parameters:
• TABLE: identifier of the temporary table generated by query.php
The result are:
• 0: Temporary table deleted successfully
• 1: Temporary table does not exist.
• 2: Could not delete temporary table.
• 3: Table name not specified
• 4: DB connection failed
• 5: Could not open config file
• >5: Some other error
SCRIPT: load.php (substituted by detail.php)
Loading of values of attributes on demand.
Accepts parameters:
• TABLE: identifier of the temporary table generated by query.php
• OFFSET: offset in the result-set (optional, default 0)
• SIZE: size of the result-set (optional, default infinite)
• ATTRNAME: column in the temporary table
• ID[]: list of origin_ids
The result:
• The result are Serialized PHP array with key as origin_id and value as value of the requested attribute.
• In case of error, the result are serialized PHP string with error code and message.
1. 1 Wrong parameters
2. 2 DB connection failed
3. 3 DB query failed
4. 4 Could not open configuration file
serialize("$errorCode $errorMessage");
Notes:
• if the requested ATTRNAME does not exist in DBA's temporary table, script will return array with empty strings for each origin_id or 'MISSING-IN-DBA' string for debugging purposes.
serialize(array('12345'=>)) or serialize(array('12345'=>'MISSING-IN-DBA'))
• in case of multiple values, the multi- array is returned
serialize(array('123'=>array('ISBN','ISSN','UMID')))
• The script can use g-zipped HTTP response if the client supports it. In PHP this is implemented like following example
<?php
ob_start('ob_gzhandler');
echo 'this will be compressed transparently';
?>
SCRIPT: detail.php (after Marseille)
Loading values from all attributes of a given resource (improved version of load.php optimized for detail-view).
Accepts parameters:
• TABLE (optional): identifier of the temporary table generated by query.php. If the parameter is not specified, data will be taken directly from daily cache.
• ORIGINID: unique identifier of the resource
The result are:
• first two lines contain the error code and reply message
• next lines contain results - one value per line
• each line obeys following format:ATTRNAME VALUE
• if the value contains new-lines, are converted to \n or \r as well as backslashes are converted to \\ (PHP function addcslashes($string, "\n\r\\"))
• In case of error, following error-codes appear.
1. 1 Wrong parameters
2. 2 DB connection failed
3. 3 DB query failed
4. 4 Could not open configuration file
Example
QUERY:
http://foobar?ORIGINID=9675
REPLY:
origin_id 12345
title foobar title
language en
language cz
origin_id 23456
title This title\ncontains multiple\nlines
title Yet another title
language pl
SCRIPT: multiload.php
Loading multiple columns in a single request, suitable for big data transfers (improved version of load.php).
Accepts parameters:
• TABLE: identifier of the temporary table generated by query.php
• ATTRNAME: comma separated list of names of attributes (columns of the temporary table)
• COMPRESS: 1, if the result are compressed
The result are:
• first two lines contain the error code and reply message
• next lines contain results - one value per line
• each line obeys following format:ORIGINID ATTRNAME VALUE
• if the value contains new-lines, it is converted to \n as wll as backslashes are converted to \\ (PHP function addcslashes($string, "\n\r\\"))
• In case of error, following error-codes appear.
1. 1 Wrong parameters
2. 2 DB connection failed
3. 3 DB query failed
4. 4 Could not open configuration file
Example
QUERY:
http://foobar?TABLE=tmp_1&ATTRNAME=origin_id,title,language
DB-Adapter uses mechanism similar to the following SQL code:
select origin_id, 'origin_id',origin_id from tmp_1
union select origin_id, 'title', title from tmp_1
union select origin_id, 'language', language from tmp_1
REPLY:
12345 origin_id 12345
12345 title foobar title
12345 language en
12345 language cz
23456 origin_id 23456
23456 title This title\ncontains multiple\nlines
23456 title Yet another title
23456 language pl
In the example above the multiload.php returned 2 resources, each resource contained 3 requested attributes and some of the attributes had multiple values. The 23456's title was an example of multi-line value escaped by \n.
Notes:
• While calling multiload.php, middleware doesn't send any ORIDIN_IDS to DB-Adapter because DBA returns full columns from temporary table.
• if the requested ATTRNAME does not exist in DBA's temporary table, script returns array with empty strings for each origin_id or 'MISSING-IN-DBA' string for debugging purposes.
• The script uses g-zipped HTTP response if the client supports it. In PHP this is implemented like following example
<?php
ob_start('ob_gzhandler');
echo 'this will be compressed transparently';
?>
SCRIPT: player.php
Generates HTML code (web page with integrated media player) for playing the media from specified offset in the file/stream.
Accepts parameters:
• ORIGINID: the unique identification of artwork in remote database
• OFFSETS: list of offsets in milliseconds delimited by comma e.g. 12,34,124
The result are:
• HTML code with appropriate embed and object elements.
SCRIPT: institution.php
Generates redirector URL for the artwork. If you want to redirect user's browser to another location use HTTP header "Location".
EXAMPLE PHP CODE:
<?php header('Location: http://www.google.com') ?>
Accepts parameters:
• PROVENANCE: value of the attribute provenance
• ORIGINID: value of the attribute origin_id
The result are:
• HTML page.
SCRIPT: logo.php
Accepts parameters:
• PROVENANCE: value of the attribute provenance
• ORIGINID: value of the attribute origin_id
• SIZE: possible values are:
1. small (ciant proposed 32x16)
2. medium (ciant proposed 64x32)
3. large (ciant proposed 128x64)
The result are:
• first line is the URL
Processing of the queries provided by the GUI interface consist of:
- query the Daily Cache table,
- query Content Based Indexing Extension,
- integration of the results.
Queries are processed as it is indicated in the figure below.
[[Bild:]]
Processing of the queries
Processing of the queries starts from creation of object of the QueryManager class and execution of the query() method. Arguments passed to the query() method are as follow:
- query,
- array of attributes (which are compatible with the Oasis format),
- array of factors which could define weight of the attributes.
First, column names indicated by the query modifiers are determined. Next, “free keywords” are selected – values that can occur in any column. Then the WHERE clause is formed and final sql statement (with keywords and conditions) is composed. This statement is used to query the Daily Cache table.
Results retrieved from the Daily Cache table are saved to temporary table and supplemented by the statistical data:
- weight of the result – calculated using the arrays of attributes and factors,
- information about keywords which have been found.
The query() method returns a table of five elements:
- error code (0 if response has been retrieved),
- text message describing of error
- name of the temporary table where the results are stored,
- array of keywords,
- additional flag
The last two elements are used to integrate the Daily Cache response with results retrieved from the Content Based Indexing Extension unit.
|