TimComments.html 3.82 KB
<HTML>
<HEAD>
<TITLE>Some commenst on SGML syntax</TITLE>
<NEXTID N="z1">
</HEAD>
<BODY>
<H1>Reform of SGML</H1>It is true that SGML was designed
from the standpoint of markup, that
is, annotations on text as to how
it should be formatted, rather than
as a language.  Here is my $.02 worth
.. I don't for a moment imagine that
ISO would really clean it up at this
stage. (March 93). We consider an
incremental cleaning up of justa
few points of SGML syntax.
<H2>Clean up those brackets</H2>The problems of interpretation of
the space betwen two tags would be
removed if one had a delimiter (say
a semicolon) which meant "end of
tag, begin new tag". For some reason
an empty piece of text is used for
this in SGML!  This is like using
a null string, or often a newline
string, as a statement separator.<P>
Suppose one could then write
<PRE>			&lt;TAG1 ATTR ATTR2 ;
 			TAG2 ATTRSF SDF SDF>
 
</PRE>instead of
<PRE>			&lt;TAG1 ATTR ATTR2>
			&lt;TAG2 ATTRSF SDF SDF>

</PRE>Try this with your average DTD and
see how clean it looks! The result
looks like (what it should be) a
computing language with text as parameters.
<H2>Free format</H2>Suppose white space be allowed between
the &lt; and the first character.  This
is unthinkable to the markup-minded
person who wants a &lt; by itself to
be an error but it looks SO much
nicer to a language-minded person:
<PRE>		&lt;SECTION LEVEL=2>
		&lt;STITLE ID=ABC>What Next?
		&lt;/STITLE>
		&lt;IDX>
		   &lt;FIG X=7 y=67  CAP="The solution">
		   Hello
		   &lt;/FRE>
		&lt;/IDX>

</PRE>would come out like
<PRE>		SECTION LEVEL=2;
		    STITLE ID=ABC  >What Next?&lt;
		 /STITLE;
		 IDX;
		   FIG X=7 y=67  CAP="The solution"
		   >Hello&lt;
		   /FIG;
		/IDX;

</PRE>It makes so much more sense to quote
the text instead of the markup when
there is much more markup than text.
 This way it can look like language
with embedded text or text with embedded
markup depending on which is predominant.
<H2>Unifying the quoting</H2>Now, the astute would realize that
the double quotes in the attribute
value CAP="The solution" are playing
basically the same role as the angle
brackets which are left around text,
and would suggest that they are made
equivalent.
<PRE>		SECTION LEVEL=2;
		    STITLE ID=ABC  >What Next?&lt;
		 /STITLE;
		 IDX;
		   FIG X=7 y=67  CAP=>The solution&lt;
		   >Hello&lt;
		   /FIG;
		/IDX;

</PRE>Now we have only one form of quoting
and we can easily distinguish between
markup and text because one is inside
and the other outside the quotes.
<H2>Mark the structure</H2>An independent point is a fundamental
bug in the language design which
makes it impossible to tell which
elements are empty without the DTD.
  In other words, the structure is
not apparent from the syntax. For
a "structured markup language", that's
pretty bad.<P>
 If you run Dynatext, for example,
all you have to do is tell it which
elements are empty and it can do
a good job without any DTD.  It should
really be possible to see the structure
at a low level. So I would suggest
some kind of opening symbol which
was mandatory on all element opening
tags. maybe a trailing / for symetry
with the leading  / of the closing
tag. For example:
<PRE>		SECTION/ LEVEL=2;
		    STITLE/ ID=ABC  >What Next?&lt;
		    /STITLE;
		   IDX/;
		      FIG/ X=7 y=67  CAP=>The solution&lt;
		         >Hello&lt;
		      /FIG;
		/  IDX;

</PRE>Now I can parse that and see that
I am missing a section end.<P>
Of course real language people might
want use a different concrete syntax:
<PRE>		{ section(level=2)
		    { stitle(id=abc)
		    	"What's Next?"
		    } stitle
		    { idx
		        { fig (x=7, y=67, cap="The solution")
			    "Hello"
		        } fig
	   	     } idx
	
</PRE>but we wouldn't like SGML not to
look like SGML, would we?  :-)
<PRE>	

</PRE>
<ADDRESS><A
NAME="0" HREF="http://www.w3.org./hypertext/TBL_Disclaimer.html">Tim BL</A></A>
</ADDRESS></BODY>
</HTML>