The (New) Design of the SymbolicData Data Collection

In the last years ontology based semantic representations of data gain more and more importance. This is reflected in the current transformation project to bring the SymbolicData data collection in such a knowledge base format. The new format takes into account the Web Ontology concepts as proposed in the Recommendation for the OWL Web Ontology Language. We assume the reader to be familiar with those concepts.

Overall Design

All SymbolicData examples within the SymbolicData Data Collection are uniquely addressed as subject in a certain RDF sentence by their URI, Uniform Resource Identificator. The RDF tree starting at that node, the OWL-Resource Record for that example, contains most of the information about that example. Typically, the description of an example looks as following

     <http://symbolicdata.org/Data/Ideal/Bronstein-86>
       sd:createdAt "1999-03-26" ;
       sd:createdBy sdp:Graebe_HansGert ;
       sd:hasDegreeList "2,2,3" ;
       sd:hasDimension "1" ;
       sd:hasLengthsList "3,4,5" ;
       sd:hasVariables "x,y,z,r" ;
       sd:relatedPolynomialSystem <http://symbolicdata.org/Data/IntPS/Bronstein-86> ;
         a sd:Ideal .

where under the shortcode URI Ideal/Bronstein-86 we describe an ideal (type sd:Ideal, namespace prefix symbolicdata.org/Data/Ideal/) generated by a list of integer polynomials

   <http://symbolicdata.org/Data/IntPS/Bronstein-86> 
     sd:createdAt "1999-03-26" ;
     sd:createdBy sdp:Graebe_HansGert ;
     sd:relatedXMLResource <http://symbolicdata.org/XMLResources/IntPS/Bronstein-86.xml> ;
       a sd:IntegerPolynomialSystem .

given in the XML-Resource file under the predicate sd:relatedXMLResource (sd: is the shortcode for the namespace symbolicdata.org/Data/Model#). Some more predicate values are given, most of them Literals (i.e., plain text in quotes), other OWL-Resource-URI’s as sdp:Graebe_HansGert that points to a Person record in the project’s People Knowledge Base People.ttl. Please respect the naming conventions for URIs within the SymbolicData Project.

This describes already the main concepts of the Data collection:

Since we use agile concepts for collection of Data the Ontologies of the Knowledge Bases are described only in an informal way in this wiki, see the Ontology Pages for the different Knowledge Bases.

The Scripts directory in the Repository contains best practice examples using the Data contributed by SymbolicData users.

XML-Resources

XML-Resources are the smallest indivisible units of structured information handled by SymbolicData, e.g., systems of polynomials or geometry proof schemes. Each XML-Resource is stored in a single file within a subdirectory of the web readable http://www.symbolicdata.org/XMLResources and addressed by that web link as URI.

According to their structure XML-Resources belong to different types. XML-Resources of a certain Type are collected as XML-Resource Bundle in a subdirectory of the web readable http://www.symbolicdata.org/XMLResources of a certain name, e.g., INTPS. This name is also the name of the type of those records. The structure of XML-Resource records of a certain type is given by an XSchema. E.g., records of type INTPS must validate against the XSchema PolynomialSystems.xsd.

Each XSchema includes two generic XSchema:

Small and medium sized items are encouraged to be stored in the SymbolicData Repository. The XML-Resource concept allows for decentralized storage if the items do have permalinks on the web.

The following types of XMLResources are available:

PolynomialSystem

Systems of polynomials in expanded distributive form with integer coefficients and variable names matching the regexp [a-zA-Z][a-zA-Z0-9]* (type INTPS) or

Systems of polynomials in expanded distributive form with coefficients from a given base domain (mainly finite fields) and variable names matching the regexp [a-zA-Z][a-zA-Z0-9]* (type ModPS)

GeoProofScheme

Proof schemes from mechanized geometry theorem proving. The description uses a denested syntax based on functions specified in GeoCode.xml (to be fixed).

OWL-Resources

OWL-Resources store information describing the XML-Resources and also relational information about them according to OWL design principles. This resolves a main disadvantage of the old SymbolicData format and allows for a flexibly extendable design of relations.

This part has to be rewritten completely.

OWL-Resources Intermediate (deprecated)

This ontology is translated (by hand) into a couple of XSchema descriptions, one for each OWL class. Each record (individuals in the OWL terminology) has a (human readable) identifier ‘id’ matching the regexp [a-zA-Z][a-zA-Z0-9_.-]. Individuals are identified according to their *class and id. A typical description of a reference to a local OWLResource has the form

     <OWL xref="ZeroDim.example_7" class="INTPSAnnotation"/>

where ‘xref’ gives the ‘id’ of the referred individual and ‘class’ the name of an XSchema that describes its structure.

An OWL individual is stored in an XML file sdowl:class/id.xml with root element ‘class’ and mandatory root attributes ‘id’ (for the identifier), ‘createdBy’ (for the nick name of the creator within the SymbolicData team) and ‘createdAt’ (for create time).

The following types of OWLResources are available (for a detailed description see Ontology.owl or the corresponding XSchema):

Annotation

The individuals contain annotation information, i.e., a text field ‘note’ together with links to related OWL or XML resources.

Contributor

The individuals contain information about the contribution and status of cooperation of a person within SymbolicData. Each contributor has a nick name that is used to assign a contribution to her person.

Ideal

A set of polynomials can define ideals in different rings if some of the variables are regarded as parameters. Different individuals associated to the same INTPSAnnotation individual contain information about different such settings and collect invariants about the ideal described by such a setting.

INTPSAnnotation

The individuals contain information about an associated XMLResource of type ‘IntegerPolynomialSystem’ that are related only to the polynomials as they are. Used to identify equal examples with different variable names etc.

Person

A hook to the GB-Publications Project (A. Zapletal). The individuals contain more detailed information about a person. Persons can be involved in different roles (e.g., contributor of data, author of a paper).

In a near future these data will be stored in full compliance with the OWL standard.

Tools and Scripts

In the first version, records of the data base were stored internally as a special Perl data type ‘Record’ based on hashes of strings and manipulated by Perl Tools that were developed within the SymbolicData project.

This approach is completely superseded by the development of generic semantic web tools widely available nowadays. Hence we did not migrate these tools but started to collect best practice examples of tools from users. Other users can learn from these scripts and adopt them for their own purposes.

A first distinction is made along languages (for the moment perl and python) and a second distinction is between Service scripts (i.e., manipulating the Data within SymbolicData) and Production scripts (i.e., Data output for CAS computations). At the moment there is a bundle of Perl scripts that use the XML::DOM Perl module, mainly for service, and a Python script for production purposes.

We encourage all users to supply their own scripts.

The string management facilities of Perl are well suited for creating output in various formats. Forthcoming versions will use also XSLT based technology.

For the evaluation of semantic aspects of records SymbolicData has to cooperate with software capable for symbolic manipulations. In the first version we used for such purposes Singular and MuPAD. With more experience an interface will be specified such that also other CAS can be used as underlying Computer Algebra Engine in the future. (Work scheduled for the log run)

Computations

To set up a trusted computation the user has to extract the digital data from the primary data base, prepare them for input to the specified Computer Algebra Software, create the corresponding input file, start and monitor the computation, and evaluate the output file. This requires much flexibility and SymbolicData provides only scripts of its users as best practice examples in the Compute section of the Scripts directory.