Reforming RDF

A modest proposal for reforming RDF:

Version 0.1

The current version of RDF assumes that the "network" or "set of triples" model is primary, and that the textual embedding is just a way of serializing the graphical structure. The advantage of this approach is that it allows us to use the HTML convention of being able to put an anchor anywhere and point to it from somewhere else. Each fragment of the network has a context-independent status as an assertion. The disadvantage is that we can't assert a complex expression without asserting its parts.

I have an alternative proposal, which does little violence to most uses of RDF, but does away with the ability to assert an arbitrary piece of an expression just by referring to it.

The idea is to make the textual representation the primary one, and formalize its syntax and semantics using traditional logic-based tools. In addition, we introduce the notion of a bound variable.

The syntax is essentially the same as before, with a few amendments. The main one is that we dispense with reification, and simply allow formulas to be nested. Example:

   <rdf:Or>
      <rdf:Description about="#USPres2000">
	 <winner resource="#AG"/>
      </rdf:Description>
      <rdf:Description about="#USPres2000">
	 <winner resource="#GWB"/>
      </rdf:Description>
   </rdf:Or>

which asserts that either AG or GWB wins the US presidential election of 2000.

We have expressions for binding variables, such as

   <rdf:Forall var="name">
       ... < ... ref="name">
   </rdf:Forall>

Within the scope of a variable binder such as Forall, ref="name" is like resource="#name", except that the link "#name" and the resource pointer are global, whereas var and ref set up a purely local link. (So we can reuse them without fear of ambiguity.) Example:

   <rdf:Exists var="v">
       <rdf:Type resource="http://social.org/categ-ns#Vote">
       <rdf:And>
	  <Vote ref="v">
	     <inElection resource="#USPres2000"/>
	     <venue resource="http://www.us.gov/states/Florida"/>
	  </Vote>
	  <rdf:Not>
             <rdf:Description ref="v">
	        <counted resource="&rdf;true"/>
	     </rdf:Description>
	  </rdf:not>
       </rdf:And>
   </rdf:Exists>

Descriptions can play three different roles:

When used with ID="...", they introduce a new constant and assert properties of it (possibly hypothetically; see below).
When used with var="...", they introduce a new variable; furthermore, the resulting Description becomes an abstract entity. Context determines what we are saying about it. See below.
When used with about="..." or ref="...", they don't introduce anything. They just assert further facts about an existing constant or variable.

With no explicit ID, var, or about, the description behaves like #1, but there's no constant to refer to.

We interpret a Description/ID at the top level as declaring that an entity exists. That ID can be used as a resource, just as now:

   <rdf:Description ID="USPres2000">
      <candidates>
         <rdf:bag>
	    <rdf:li resource="#GWB"/>
	    <rdf:li resource="#AG"/>
	    <rdf:li resource="#RN"/>
	    <rdf:li resource="#PB"/>
	 </rdf:bag>
      </candidates>
   </rdf:Description>

   <rdf:Description ID="GWB">
        ...
   </rdf:Description>

Here's an example of role 2: To express the fact that a research group has applied for a grant from the Office of Naval Research, one could add the following to a description of the group:

    ...<rdf:Description>
          <rdf:Type const="ont:action"/>
	  <ont:verb resource="http://verbs.org/verbs/#apply_for"/>
	  <ont:time>recently</ont:time>
	  <ont:object>
	     <rdf:Description var="g">
	        <rdf:Type resource="http://social.org/categ-ns#Grant"/>
		<agency resource="http://www.onr.mil"/>
	     </rdf:Description>
	  </ont:object>
       </rdf:Description>

The outer Description here is type 1; the inner one is type 2. The outer one asserts (relative to the context) that an event exists. The inner one describes a grant, but says nothing about it. The fact that it occurs as the object of an "apply_for" action means that it describes the thing applied for.

The expression const="ont:action" uses a new notational idea, that of a constant. This is simply a name (in the namespace sense) used to refer to something. Perhaps RDF already allows this, but if so it isn't obvious. The usual rules for resolving name prefixes apply. In the example, I'm assuming that "action" is a type defined in the ontology that governs this bit of RDF.

One cannot point to a Description anywhere but inside the expression where it appears. (If it's at the top level of a file, then the "expression" encloses everything, so the Description can be referred to from anywhere.) The reason for this restriction is that the current context may express a hypothetical state of affairs. Example:

   <rdf:CounterfactualIf>
      <rdf:Description ID="SantaClaus">
         <rdf:Assertion>
            <rdf:Forall var="cg">
               <rdf:Forall var="xmas">
		  <rdf:Type const="Event"/>
	          <rdf:Forall var="gev">
		      <rdf:Type const="Event"/>
		      <rdf:If>
			 <rdf:Description ref="gev">
		              <verb resource="http://verbs.org/verbs/#give"/>
			      <object ref="cg"/>
			      <purpose resource="http://www.social.org/rituals#Xmas"/>
			 </rdf:Description>
			 <rdf:Description ref="gev">
			      <actor resource="#SantaClaus"/>
			 </rdf:Description>
		      </rdf:If>
		  </rdf:Forall>
	       </rdf:Forall>
	    </rdf:Forall>
	 </rdf:Assertion>
      </rdf:Description>
	 ...
   </rdf:CounterfactualIf>

The original RDF had only two kinds of terms: those that could be referred to inside angle brackets, and aggregates such as bags and sequences. We generalize the notation to allow arbitrary terms. E.g.,

   <rdf:Description about="#Fred">
       <irs:total_income>
          <rdf:Term op="arith:sum">
             <rdf:Term op="rdf:apply">
	        <rdf:Term resource="http://www.irs.gov/salary"/>
		<rdf:Term resource="#Fred"/>
	     </rdf:Term>
             <rdf:Term op="rdf:apply">
	        <rdf:Term resource="http://www.irs.gov/tips"/>
		<rdf:Term resource="#Fred"/>
	     </rdf:Term>
	  </arith:sum>
       </irs:total_income>

which says that Fred's total income is the sum of his salary and tips, as defined on certain IRS web pages. (The IRS is the U.S. national tax collector.)

There may be a need for pointers to pieces of RDF considered as data objects, or considered as places in cyberspace. We should introduce an alternative field, perhaps 'expID,' for this purpose. It can be used anywhere, not just in a Description:

     <rdf:bag>
        ...
        <rdf:li  expID="here">
	...
	</rdf:li>
        ...
     </rdf:bag>

A reference to resource="#here" refers to this place in the file containing the expression (analogous to HTML name/hrefs). It may even be possible to assert that the expression at this expID is true, but in case it's not a well-formed assertion this should be an error. I will ignore expID's in the rest of this proposal.

Syntax and Semantics:

Here is a more systematic presentation of the syntax and semantics of the proposed notation. The basic notation is very simple and uniform, but we introduce abbreviations to make it more concise (and therefore a bit less systematic). The unabbreviated syntax is as follows:

formula      ::= assertion | description

assertion    ::= '<rdf:Assertion' opAttr? '>' propertyElt* '</rdf:Assertion>'

description  ::= '<rdf:Description var="' IDSymbol '"' opAttr? '>'
                 typeSpec? propertyElt*
                 '</rdf:Description>'

		 | '<rdf:Description ID="' IDSymbol '"' '>'
                    typeSpec? propertyElt*
                    '</rdf:Description>'
              
                 | '<rdf:Description about"' IDSymbol '"' '>'
                    propertyElt*
                   '</rdf:Description>'

propertyElt  ::= '<' propName refAttr '/>'
                 | '<' propName  '>' term '</' propName '>'
		 | '<' propName  '>' string '</' propName '>'

term         ::= '<rdf:Term var="' IDSymbol '"' opAttr '>'
                      typeSpec? argElt* '
                  </rdf:Term>'

		 | '<rdf:Term' opAttr '>'
                      argElt*
		   '</rdf:Term>'

                 | '<rdf:Term' refAttr '"/>'
		 | formula

opAttr       ::= 'op="' IDsymbol '"'

typeSpec     ::= '<rdf:Type' refAttr? '>'
                 | '<rdf:Op>' term '</rdf:Op>'

argElt       ::= '<' argName refAttr '/>'
                 | '<' argName '>' term '</' argName '>'
                 | '<' argName '>' string '</' argName '>'

refAttr      ::= 'resource="' URI '"' 
		 | 'ref="' IDSymbol '"'
		 | 'const="' IDSymbol '"'

The semantics are defined as usual in logic, by defining a function M(c, e) that specifies the meaning of expression e in variable-binding context c. c is a list of pairs of the form (var = val), and we add a new pair to it by writing c + [var = val]. We find a variable by evaluating lookup(c, v). We'll also assume that namespace names have meanings that can be found by lookup. The name is some set-theoretic entity such as a set of ordered pairs. We actually have two functions, MP(c, e), which gives the meaning of a proposition e; and MT(c, e), which gives the meaning of a term e. The meaning of a complex expression ultimately depends on the meanings of its atomic parts. Those parts are the resources and constants. The semantics ignores the optional expID's that can attach to any bracketed expression, because those have some metasyntactic use that doesn't concern us here. MP is defined by the following equations:

MP(c, '<rdf:Assertion op=' o '>'
	  propertyElt1
	  propertyElt2 
	  ...
	  propertyEltK
      '</rdf:Assertion>')

    = lookup(c, o)(MP(c, propertyElt1),
		   MP(c, propertyElt2),
		   ...,
		   MP(c, propertyEltK))

where g is a new symbol. If the opAttr is absent, then M(c,o) defaults to And.

MP(c, '<rdf:Description var=' v 'op=' o '>'
           typeSpec
	   propertyElt1
	   propertyElt2
	   ...
	   propertyEltK
     '</rdf:Description>')
     
   = lookup(c, o)(lambda (v : MT(c, typeSpec))
		     MP(c + [* = v], propertyElt1)
		     and MP(c + [* = v], propertyElt2)
		     and ... and MP(c + [* = v], propertyEltK))

If the typeSpec is missing, it defaults to <rdf:Type const="Something">. If the op is missing, it defaults to identity. (That is, we just refer to the lambda-expression without asserting anything about it.) A typical op is rdf:Forall; more below.

MP(c, '<rdf:Description ID=' v '>'
           typeSpec
	   propertyElt1
	   propertyElt2
	   ...
	   propertyEltK
     '</rdf:Description>')
     
   =   (v is of type MT(c, typeSpec))
       and MP(c + [* = v], propertyElt1)
       ...
       and MP(c + [* = v], propertyElt2)

We don't allow an opSpec with an "ID" Description.

MP(c, '<rdf:Description about=' URI '>'
	  propertyElt1
	  ...
	  propertyEltK
       '</rdf:Description>')

    = MP(c + [* = MT(URI)], propertyElt1)
      and ...
      ...
      and MP(c + [* = MT(URI)], propertyEltK)

We don't allow either typeSpec or opSpec with an "about" Description.

MP(c, '<' propName refAttr '/>')

    = lookup(c, propName)(lookup(c, *),
                          MT(c, refAttr))

MP(c, '<' propName '>' term '</' propName '>')

    = lookup(c, propName)(lookup(c, *),
			  MT(c, term))

MP(c, '<' propName '>' string '</' propName '>')

    = lookup(c, propName)(lookup(c, *),
			  string)

MT is defined by the following equations:

MT(c, a) = MP(c, a) if a is an Assertion or Description

MT(c, '<rdf:Term var=" v 'op=' o '>'
            typeSpec
	    argElt1 
            ...
            argEltK
      '</rdf:Term>')

     = (lambda (v : MT(c, typeSpec))
          MT(c, o)({<label(argElt1), MT(c, argElt1)>,
	            ...,
		    <label(argEltK), MT(c, argEltK)>)})

where label('<' argName ...'>'...'<' /argName '&bt;') = argName. We assume that the meaning of operator o is a function from sets of ordered pairs to some domain. Note that the op applies to the inside of the lambda expression in this case, as opposed to the outside for the similar Description. If this is too inconsistent, we could change it.

MT(c, '<rdf:Term op=' o '>'
	    argElt1 
            ...
            argEltK
      '</rdf:Term>')

     = MT(c, o)({<label(argElt1), MT(c, argElt1)>,
	         ...,
		 <label(argEltK), MT(c, argEltK)>})

MT(c, '<' argName refAttr '/>')

    = MT(c, refAttr)

MT(c, '<' argName '>' term '</' argName '>')

    = MT(c, term)

MT(c, '<' argName '>' string '</' argName '>')

    = string

MT(c, 'ref=' v) = lookup(c,v)

MT(c, 'resource=' URI) = denotation(URI)

MT(c, 'const=' s) = lookup(c, s) [which searches the appropriate namespace]

Abbreviations:

We now introduce the abbreviation conventions:

If an Assertion uses a standard operator o (rdf:And, rdf:Or, etc., you can just write <o> ...</o> instead of <Assertion op="o"> ... </Assertion>

Similarly for operators in Descriptions and Terms. So you can write

    <rdf:Forall var="x">
        <rdf:Type const="rdf:Integer"/>
        ...
    </rdf:Forall>

instead of

    <rdf:Description var="x" op="rdf:Forall">
        <rdf:Type const="rdf:Integer"/>
        ...
    </rdf:Description>

And you can say <rdf:Bag> ... </rdf:Bag> instead of <rdf:Term op="rdf:Bag">...</rdf:Term>

We allow types to be arbitrary terms, but often they are constants or variables. We allow them to be moved inside the header element in that case. So the formula above can be made even shorter:
```
    <rdf:Forall var="x" type="rdf:Integer"/>
        ...
    </rdf:Forall>
```

If all the arguments of a Term have label rdf:Arg, then the labels can be omitted. So

   <rdf:Assertion op ="rdf:And">
      <rdf:Arg>
         <rdf:Description about="http://www.flowers.org/roses">
	     <flower:color const="color:red"/>
	 </rdf:Description>
      </rdf:Arg>
      <rdf:Arg>
         <rdf:Description about="http://www.flowers.org/violets">
	     <flower:color const="color:blue"/>
	 </rdf:Description>
      </rdf:Arg>
   </rdf:Assertion>

can be abbreviated

   <rdf:And>
      <rdf:Description about="http://www.flowers.org/roses">
	  <flower:color const="color:red"/>
      </rdf:Description>
      <rdf:Description about="http://www.flowers.org/violets">
	  <flower:color const="color:blue"/>
      </rdf:Description>
   </rdf:And>