Extending N3

A modest proposal for extending N3:

Version 0.2

Disclaimer

This document is really the joint work of Jonathan Borden, Pat Hayes, and me, Drew McDermott. But we are still arguing about many of the details, so I can't hold them responsible for the whole thing. The only reason I am giving this out now is the upcoming DAML meeting, where it may provide a framework for discussion. Contact me with questions, or post to www.rdf-logic@w3.org.

Introduction

It has become clear recently that RDF needs some extensions, but there are severe disagreements about exactly what that means, and exactly how "reification" fits in (or exactly what reification is).

However, the following consensus seems to be emerging:

To the extent possible, old RDF files should still be analyzable as sets of triples, each asserting the truth of an arc in the underlying RDF graph (i.e., a binary predication). Old RDF parsers should be compatible with new, in the sense that on an old-style file they will produce the same sort of graph structure internally
In some circumstances, triples need to be augmented with at least a context marker indicating the circumstances under which they are asserted. (For example, Tim Berners-Lee's N3 processor uses "quads" internally.) The reason is to minimize the difference between a triple being asserted and a triple being mentioned, and in particular to avoid translating the latter into a description of the syntax of the triple.
There needs to be a clear separation between the logical syntax of the language, its representation in computers, and its representation in XML files.

What we are proposing here is a set of extensions to N3 that allow us to make progress on these issues without alienating too many people. We don't view the proposed extension as a "logical analysis" of RDF, or an "abstract syntax" for a logical language to be based on RDF. Instead, we view it as a bridge between a logical notation on the one hand, and the XML representation on the other. As will become clear, it is obvious how to translate any given logical syntax into this bridge notation, and likewise obvious how to translate the notation into XML. We need a name for this notation, and there's no obvious choice, so we will call it CTL (for "colored triples language").

The basic idea is to treat an RDF graph as a set of "named and scoped colored triples." Having triples be named allows us to talk about them; in the current RDF one can only refer to nodes for some reason.

Scoping allows us to label a triple with a scope name, which spells out the "contexts" where it's true. Contexts are those things delimited by curly braces in N3. However, rather than use the word "context," we prefer to say "bundle," because the word "context" has too many other uses.

Finally, color allows us to have different kinds of triples, which play different syntactic and semantic roles. There are four colors: proposition, term, fragment, and bundle. We indicate color by playing with brackets. Propositions are indicating by ordinary parentheses: (loves john mary). Terms are indicated by putting plus signs inside the parens, as in (+pair john mary+). Fragments, which are used to extend the syntax in various wyas, are indicated by putting periods inside the brackets, as in (.bvars x y.). Bundles are indicated with curly braces: {...} .

It may sound as if we are being disingenous about the word "triple." We've added three new fields, so we may be accused of replacing triples with sextuples. But name and scope are not actually part of a triple. Two triples are considered equal if their colors and components are equal, and by "components" we just mean the subject, predicate, and object. The name and scope make statements about the triple. Color is a genuine addition to the language, but when omitted it defaults to "proposition," the only possibility in RDF 1.0.

Syntax of CTL

The syntax of CTL comes in the usual two flavors, unabbreviated and abbreviated. The unabbreviated syntax is very simple. A belief set is a set of expressions, each of which is either a colored triple or a bundle. (You may picture a belief set as a web page, or some other container of a bunch of statements that are all on the same footing.)

A colored triple has the format

[n=] (* p s b *) [_u]

where the brackets indicate optional material, in this case the name and scope. (Later we'll introduce square brackets in the N3 sense.) The name n, when present, is a name of some kind, which is defined to refer to this triple. (We've been treating these as simple identifiers, because we don't understand the full metaphysics of XML names.) The brackets (*...*) are colored brackets as explained above. The fields p, s, and b are more or less as in N3: s must be a resource, b must be a resource or literal, and p must be a name. However, we also allow s and b to be names of colored triples or bundles in this belief set. The scope u, when present, must be the name of a bundle in this belief set.

The other kind of element in a belief set is a bundle, indicated thus:

[n=] { n1 n2 ... }[_u]

where n1, n2, etc. are the names of colored triples or bundles in this belief set, and u is the name of a bundle.

If the scope marker on a triple or bundle is absent, the scope is the entire belief set. The belief set is treated as a bundle of all its elements, but we never make the braces around it explicit. We call this the global bundle in what follows. Triples whose scope is the global bundle are said to have global scope. The reserved scope marker global can be used to make global scope explicit.

The unabbreviated syntax is pretty clumsy, and we will present a set of abbreviations shortly. But first we should explain what the proposed notation means, at least in simple cases. First, there are certain obvious restrictions on belief sets. No name can be defined more than once. There must not be any cycles in the name-containment graph. That is, there must not be a sequence of triples or bundles such that each contains the name of the one before it, and the first element of the sequence is the same as the last. If x is a triple or bundle whose name appears inside a bundle u, then x must have scope u. (Although we now have u mentioning x and x mentioning u, this does not count as a cycle, because scope markers don't count as part of the thing they hang off of.)

Semantics, part 1

If we forget fragments for a minute, then the semantics of the language is pretty straightforward. A propositional triple (p s b) denotes the proposition that relation p holds between s and b. We avoid the idea that the triple denotes either "truth" or "falsehood," because we want to say more complex things about a proposition (such as "Web page xyz claims it's true") than a truth-value reading would allow.

Term triples, of the form (+f s b+)_u, denote the value of the function f applied to s and b. E.g., (+pair 5 6+) denotes the ordered pair whose first element is 5 and second element is 6.

Finally, a bundle denotes the conjunction of all the triples it contains at its top level. This rule means that, if bundle 2 occurs inside bundle 1, none of the triples in bundle 2 is an automatic consequence of the truth of bundle 1. As before, we view the bundle as denoting a propositional object rather than a truth value.

Abbreviations

We can't complete and formalize these semantic ideas until we introduce fragments. But to allow less verbose example, let's explain the abbreviations:

If the name of a propositional triple t_u occurs only in u (a bundle), then you can dispense with the name and replace its only occurrece with the triple itself. So the belief set
```
      b0 = {b1 ...}
      b1 = (loves john mary)_b0
```
can be transformed to
```
      b0 = {(love john mary) ...}
```
Such a triple is said to have normal scope. Note that triples with no scope markers must be interpreted as having normal scope; whereas in the unabbreviated format, they are interpreted as having global scope.
If the name of a triple t_u occurs only in another triple with scope u, then you can replace the name occurrence with t, and omit the scope marker on t. So the belief set
```
       (children lucy b2)_u1
  b2 = (+pair fred ethel+)_u2
```
can be written (children lucy (+ pair fred ethel +))_u1.
A propositional triple (p s b) with normal scope can be written in the N3 notation
```
       s p b .
```
While we're at it, we can adopt N3's conventions for commas and semicolons. The fact that we use the period for two purposes should not cause any problems.
The N3 bracket notation can also be used, so that [* f b *] is short for (* f g b *), where g is a symbol not occurring anywhere else. (I.e., we assert the existence of the object, but say nothing about it except what's said by the stuff inside the brackets.) As before, the asterisks are placeholders for the colors, although we're not sure one would ever use this notation for nonpropositional triples.

With these abbreviations, a belief set asserting "The United Nations web site claims that vinegar production in Finland is climbing, but it's actually lower this year than last, and in any case is less than what Somalia produced this year" might be written:

     t1 = (+ production Finland vinegar +)
     <http://www.UN.org>
       claims { t1 change-status increasing. }.

     (+ value t1 2001 +)
       less-than (+ value t1 2000 +),
                 (+ value (+ production Somalia vinegar +) 2001 +).

(Yes, of course we should have real URIs instead of "Finland," "vinegar," etc., but they would obscure the point. At least we used one for the UN.)

Here is the unabbreviated version of the same belief set:

    t1 = (+ production Finland vinegar +)
    t2 = (claims <http://www.UN.org> u1)
    u1 = {t3}
    t3 = (change-status t1 increasing)_u1
    t4 = (less-than t1 t6)
    t5 = (less-than t1 t7)
    t6 = (+ value t1 2000 +)
    t7 = (+ value t8 2001 +)
    t8 = (+ production Somalia vinegar +)

Fragments

The purpose of fragments is to allow us to represent more complex syntactic objects using triples. The simplest example is the handling of predicates and functions that take more than two arguments. We introduce the "function" etc for this purpose, so that (between Topeka (.etc NewYork SanFrancisco.)) asserts the trinary relationship "between(Topeka, NewYork, SanFranciso)".

This example is a bit misleading, because we could if we wanted treat etc as a synonym for pair, and analyze between as a binary predicate between an object and a pair of objects. However, this approach will not always work. Consider a language that allows bound variables, as in (forall (x y z) ...). The expression (x y z) has no meaning. It makes sense only when it occurs as a fragment of a forall or some other variable binder. In other words, we can say formally what the semantics of (forall (vars)...) is, and the rule we adopt will refer to the meanings of the subexpressions of the forall, but (vars) is not one of those subexpressions.

Fragments solve the problem syntactically, but, as we shall see, not semantically. The idea is that anywhere you need a pseudo-expression that makes sense only from the point of view of an expression it occurs in, you use a fragment. The forall example would be written in CTL as

(forall (.bvars x (.etc y (.etc z (.).).).) ...)

using the abbreviation conventions to save us from writing out the names of all those fragment triples. (The details are discussed below.)

The use of fragments is to some extent an admission that at the CTL level we won't always be able to express the semantics of expressions that a web-based agent comes across. In the example, if the agent doesn't know what forall means in the namespace it comes from, then it won't be able to do much with expressions in which forall appears. However, we can limit the damage.

We introduce the idea of fragment flattening. This is an operation on belief sets that (conceptually) eliminates all etc triples, while "lengthening" the triples that mention them. The idea is that whenever the name of an etc fragment occurs in a triple, the arguments of the etc replace the occurrence of the fragment. So (in-line john (.etc mary (.etc fred ethel))) gets flattened to (in-line john mary fred ethel). This is not an abbreviation, but a mathematical operation we require to state the semantics formally.

It will also be handy to introduce an empty fragment, written (.). Its occurrences just get discarded in the flattening process, which means that some triples may get shorter. For instance, the unary predicate alive could be used as in the triple (alive john (.)), and this would get flattened to (alive john). While we're at it, let's abbreviate "(.).)" as "..)" .

The flattening process stops when all the etc triples and empty fragments are eliminated. If some fragments remain, then any triple referring to a fragment must be processed with a rule specific to its operator (e.g., forall), or marked as "unintelligble." And, of course, a triple that refers to an unintelligible triple must itself be marked as unintelligible. This rule applies to bundles as well, but there will be occasions when it is appropriate to extract the intelligible pieces of a bundle. They can be understood as being part of the bundle's content.

Semantics, part 2

The semantics of CTL is now easy to state. First, we flatten all the etc's out. Then we have the usual rules:

Atomic terms, such as URIs, refer to things in the world by a mapping that we assume is given for the purposes of this exercise. This include predicates and functions.
A term or proposition (* x a1 ...aN *), where one of the aI is a fragment, has no meaning, unless the namespace manager for x's namespace (or some other agent) provides a rule.
A term or proposition (* p a1 ...aN *)_u, where none of the aI is a fragment, means the entity
p$(a1$, a2$, ..., aN$)

where x$ is the denotation of x; in particular, p$ is the relation or function denoted by p. (We really should express this in model-theoretic terms, but that would be a distraction at this point. Those who insist on that kind of thing should pretend we said, e.g., "true if and only <s$, b$> is in the set p$, although that's too simple, because it makes propositional triples denote truth values.)
A bundle { n1, ..., nK} is the proposition that is true if and only every every triple among the nI with this bundle's scope is true.

Using CTL as a Bridge

Here are the nice things about the CTL notation, in our opinion:

There is just one way to state a binary relationship, and it's exactly the traditional RDF/N3 way, as a triple.
Bundles give us the ability to mention sets of triples without asserting them; or, more precisely, to control the contexts in which they're asserted.
It is fairly obvious how to go from an arbitrary logical language to CTL, and from CTL to XML. Furthermore, the CTL-to-XML mapping will look just like the N3-to-XML mapping in cases where they agree on semantics.

Let's look at the last claim in more detail, starting with logic-to-CTL mapping. Suppose we wanted to express a universally quantified implication, such as "Everyone that liked 'The Mummy' will like 'The Mummy Returns'." In a logical language, it would be the straightforward

     (forall (x) (if (likes x mummy) (likes x mummy-returns)))

Here is what it would look like in CTL:

     (forall (.bvars x ..)
             (if {(.var x ..) likes mummy.}
                 {(.var x ..) likes mummy-returns. }))

Note that we have to make the antecedent and consequent of the implication separate bundles, neither of which is asserted. Also, we have to write (.var x (.).), abbreviated (.var x ..), because otherwise the semantics would have to see x as the name of something, which it isn't. Note that we do not have to put the if or forall inside its own bundle, because the semantics of the forall expression govern how they are understood. A processor that doesn't understand forall will just mark the whole thing as unintelligible.

The CTL-to-XML translation is even simpler. We treat triple and bundle names as ordinary IDs. Here's how they get attached to triples. In RDF M & S, we need merely change

[6] propertyElt    ::= '<' propName '>' value '</' propName '>'
                       | '<' propName resourceAttr '/>'

to allow optional name, color, and scope attributes to accompany the propName. Recall that a property elt is the "last two thirds" of a triple, so that what is treated in N3/CTL as an abbreviation

        a p1 b1;
          p2 b2.

is in fact the norm in the XML serialization:

   <Description about="URI for a">
       <p1>b1</p1>   
       <p2 resource="URI for b2"/>
   </Description>

Here we're assuming that b1 is a literal and b2 is a resource, just to show the variation. The part of the expression bounded by <p1> ... </p1> corresponds to the CTL triple (p1 a b). The part bounded by <p2.../> corresponds to the CTL triple (p2 a b2).

To give names to the triples, we just write

   <Description about="URI for a">
       <p1 ID="t1">b1</p1>   
       <p2 ID="t2" resource="URI for b2"/>
   </Description>

so that they can be referred to as #t1 and #t2 elsewhere in the belief set.

Bundles are treated as containers in XML. (But see the discussion of containers, below), The bundle { n1 ... nK } corresponds to the XML

   <rdf:Bundle>
      <rdf:li resource="...reference to triple or bundle n1 ..."/>
      ...
      <rdf:li resource="...reference to nK..."/>
    </rdf:Bundle>

(Do we get brownie points for saying rdf:?) We also allow the same abbreviations as in CTL itself, that if a bundle or triple name occurs in just one place, one can dispense with the name and put the actual named item in that place.

Given this CTL example:

   (if {(sent you check1)}
       {(lost post-office check1)})

here is one way to XMLify it

   <Bundle ID="b1">
      <li resource="#t1"/>
   </Bundle>
   <Bundle ID="b2">
      <li resource="#t2"/>
   </Bundle>
   <Description about="URI for you">
       <sent resource="URI for check1" ID="t1" scope="#b1"/>
   </Description>
   <Description about="URI for post office">
       <lost resource="URI for check1" ID="t2" scope="#b2"/>
   </Description>
   <Description about="#b1">
       <if resource="#b2"/>
   </Description>

(The if looks backwards; perhaps implies would be clearer.) This version is closer to what a machine produces when it parses the XML, but abbreviations can make it more human-readable (without changing the internal representation):

   <Bundle ID="b1">
      <li>
	 <Description about="URI for you">
	    <sent resource="URI for check1"/>
	 </Description>
      </li>
   </Bundle>
   <Bundle ID="b2">
      <li>
	 <Description about="URI for post office">
	    <lost resource="URI for check1"/>
	 </Description>
      </li>
   </Bundle>
   <Description about="#b1">
      <if resource="#b2"/>
   </Description>

Note that we don't need the scope or ID flags on the propertyElts any more. (They will of course be present in the flattened-triples list for this belief set, and implicitly in any other internal representation.)

A Problem with Containers

We have been a bit vague about how bundles get reduced to triples. That's because there seems to be a problem with how containers in general get reduced to triples. In the M&S paper, the graph structure for a list is handled by introducing a name for the list and asserting that each element is an element. In the case at hand, one would get

     <Description ID="thelist">
        <rdf:type resource="list"/>
        <rdf:_1 resource="a"/>
        <rdf:_2 resource="b"/>
	...
        <rdf:_26 resource="z"/>
     </Description>

The problem is that this doesn't quite say the same thing. It asserts that there is a list, and that the named elements are elements, but it doesn't say that they're the only elements. (Thanks to Ziv Hellman for bringing this to our attention.) The absurd possibility exists that someone else on the web might state that a completely different element belongs in http://www.w3.org/thisbeliefset.xml#thelist. It's not a matter of whether the interloper has the "right" to do this, or whether the maintainers of the belief set have a "duty" to ignore him; it's metaphysically absurd for something defined as the list of 26 elements a,b,...,z to have other elements.

CTL's term notation avoids this problem, because list or cons just builds a term; there's no name floating around for somebody to make trouble with.

The correct way to deal with this problem is probably to change the semantics of containers. For now, we only care about bundles, so we stipulate that the statements in a belief set about what's in a given bundle in that belief set are exhaustive.