Mail to the author
xavier at ultra-fluide.com

Passer au
Français

XSLT Semantic markup

Semantic markup using XSL transformation

Semark.xsl is a universal semantic markup tool for XML content.

 

Semark.xsl working principle

 

Semark.xsl initiates an XML content markup based on an external lexicon. The tool, which is an XSL transformation, looks within the XML source content for occurrences of words or expressions listed in the lexicon and then introduces a markup for each of these occurrences.

In the diagram above, the lexicon has the word "content" as an entry. Semark.xsl will therefore seek the word "content" in the XML source document, then mark this word with the <a href="Link">content</a> tag. The type of markup used at this stage, a/href in this example, is a dynamic parameter which can be defined when running Semark.xsl. The attribute, "link" in this example, is a variable available in the lexicon for each entry.

If this brief explanation didn't succeed to give you a clear idea of the topic, you may try a demo online. If you have no more time to spend now, you may download the package and try it later on your computer.

When to use Semark.xsl:

Its use will make a tedious task much easier when:

The advantages of Semark.xsl are even more apparent when working with content which changes regularly and/or is updated by different writers.

Given these criteria, web sites (created using XHTML or which have XML content sources) can particularly benefit from Semark.xsl. With web content, many useful markups can be set up - acronym/title, abbr/title, a/href, strong, em... A span/class markup can also be created (although this has more to do with presentation than semantics).

Here are some examples where Semark.xsl can be particularly useful:

Additional information about Semark's underlying concepts.

Tool

It designates a system which handles specific tasks with the purpose of simplifying tasks handled by the user. Specifically, Semark.xsl sets up an automatic markup process. It provides consistently precise results and introduces the possibility of bulk processing.

Universal

Currently, an important fraction of electronic documents are saved in an XML meta language based format. XSL, the language used to code this program, is a standard available on all operating systems (Linux, Windows, Unix...) and in all environments (JAVA, PHP, .NET...). It is therefore possible to deal with large numbers of documents without depending on the platform used.

XML content (structured documents)

XML is a meta language (set of syntax rules) which makes structured documents possible. In order to create a structured document the content must be divided and redistributed within a structure defined using XML based language. The structure is a tree which sets out groups, sub-groups and sub-sub-groups... within which content may be stored.

Semark.xsl is written using XSLT language which uses XML syntax. XSLT was specially designed to handle XML documents. Semark.xsl is therefore able to handle any XML content.

Semantic markup

The semantics of a structured document are the sets of rules which define the structure of that document or type of document. Let's use this content as an example: "A marvellous story, written by Xxxx Yyyy is a book of 256 pages". In a structured version this content could be:

<book>
    <title>A marvellous story</title>
    <author>
      <firstname>Yyyy</firstname>
      <lastname>Xxxx</lastname>
    </author>
    <size>256</size>
</book>

As you can see, the structured content contains more information. Part of the information comes from the content itself, but the presence of the structure and the fact that the content is organized within this structure completes the information. For example, it is now clear that "A marvellous story" is the title of the book, which was not obvious from the basic content. Similarly, it wasn't clear which was the first name of the author but thanks to the structure this is now also clarified.
On the other hand, using or understanding structured content requires knowledge about its meaning and organization. In this example, we need to know that <size> uses the page as a unit in order to understand that the book is 256 pages long.

Given this example, it's easy to deduce that:

Semark.xsl is a semantic markup tool because:

The result is a document encompassing the same content, respecting the original semantic structure but offering a slightly higher level of information. Semark.xsl seeks the extra information within an external lexicon.

Semantic and presentation

Web debates often highlight the opposition between a document's presentation and its semantics. Current thinking and technology indicate that it is useful to separate presentation and semantics in order to manage them each effectively.

Given this context, we should point out that this tool is not designed to handle presentation. Despite this, it is worth noting that:

To conclude, Semark.xsl by nature handles the semantics of documents, but there is nothing to prevent one from using it to define features beyond semantics.


Agence de communication Ultra-Fluide : 01 47 70 23 32 - contact at ultra-fluide.com - 44 rue Richer 75009 Paris.