Xenon 1.0

The New Standard for Data

r1.1.19 edition — 11^th November 2024
Gene Thomas
Planet Earth Software

Xenon is the best way to represent information:

Native support for arrays.
Native support for a graph structure, elements may have multiple parents.
Native support for types used in serializing program data.
Unambiguous choice of data structure.
Readable multiple line indented text.
Terse, efficient to write by hand.
Can be implemented to be blazingly fast or using a mode-less tokenizer.
The xenon document is named.

Entities

Xenon has three basic named types: objects, arrays and scalars.

Objects contain named fields and are written <Object-Name> fields... <$>.
e.g. A Person object with two fields: <Person> <Name=Fred> <Height=1.67> <$>

Arrays are heterogeneous and are written <<Array-Name> item <&> items... <$>>.
e.g. An array with two items: <<Names> Fredrick <&> Freddy <$>>

Scalars, single values, are written <Scalar-Name=Value>.
e.g. A size of 2,500: <Size=2,500>

The xenon document itself is a named object, array or scalar. All of the examples here are both valid documents but could be inserted inside an object.

Objects

Objects, in addition to scalars show above may also contain other objects or arrays: <Book> <Name=A Plan> <Author> <Name=Eric Harrison> <Mobile=+64 24 240 990> <$> <<Reviews> Fascinating. <&> Of interest. <&> Worth reading. <$>> <$>

When objects are stored in arrays one leaves out the nameless object <> and <$>. <<People> <> <Name=Fred> <Disposition=Friendly> <$> <&> <> <Name=Jane> <Disposition=Aloof> <$> <$>>

Only the fields themselves are required.

<<People>
    <Name=Fred>
    <Disposition=Friendly>
<&>
    <Name=Jane>
    <Disposition=Aloof>
<$>>

If the object has no fields the nameless object <> and <$> are required. Otherwise one would be representing the empty string. The following array has one object with no fields, then an object with two. <<Phenomena> <> <$> <&> <Name=Aurora> <Color=Green> <$>>

Scalars

Xenon is very readable when indented. Leading and trailing newlines and spacing are intuitively removed from scalars as is indenting. <<Poem> I read some xenon. I was happy from then on. <$>> The example above extracts to “I read some xenon.\r\nI was happy from then on.” (using C language style \r for carriage return and \n for line feed. Xenon uses this convention itself, see escaping. We are assuming Internet/Windows line endings). The text is not “\r\n I read some xenon.\r\n I was happy from then on.\r\n” as written. This processing allows documents are more readable.

Similarly in named scalars the text is processed as expected. If there is more than one line all lines should be below the line with the name, that is the first line, e.g. <Description= A large leafy deciduous tree> The example above extracts to “A large leafy\r\ndeciduous tree”

The minimum indent of any line is removed from all lines. Also if the text begins with spacing (space or tab characters) then a newline that is removed. If the scalar is in an array and the text ends with a newline and then just spacing that is also removed.

It there is no leading spacing and a newline the leading spacing is not removed. The following example is not “well formed xenon” so, although to parses ok, the pattern should be avoided. <<Notes> Remember to smile <$>> The example above extracts to “ Remember to\r\nsmile”.

To clarify: <<An Array> <&> <$>> Both of the items in this array evaluate to the empty string “”.

In the case where all of the lines are indented a | is used to specify the indenting, e.g. <<Story> | A cat walked across the path <$>> The example above extracts to “ A cat walked\r\n across the path”.

Similarly in named scalars the text is intuitively processed. <Label= | A useful description> The example above also extracts to “ A useful\r\n description”.

It is an error to have any of the lines below the | start on or before the |. In this case the document shall be rejected. <Report= | The sailing could have been better>

The algorithm for text processing is specified here.

Arrays

Inside arrays themselves arrays do not have a name so one leaves that out. We use the same <<: <<To-Do-Lists> <<> Parse document <&> Write summary <$>> <&> Go on holiday <$>> Here is an array of “Parse document” and “Write summary” is inside the array called To-Do-Lists along with the “Go on holiday” item.

An array of one empty item is not an empty array. The following shows an array with one item, the empty string “”. <<Comments> <$>>

Here is an array with no items. <<Faults$$>>

As with populated arrays, one leaves out the name when the empty array is within another array. The following shows an array called Records with an initial sub array with no items and a subsequent sub array with one item, “24,000”. <<Records> <<$$>> <&> <<> 24,000 <$>> <$>>

Graph Structure

Xenon documents can be graphs/networks rather than just trees. Items can have more than one parent. This is required for serialization where the data structure may also contain loops. Items may be labeled with a id, e.g. #24 and referred to later using a reference, e.g. @24. References are always scalars even if the entity being referred to is an object or array.

<Person>
    <Name=Bonnie>
    <Spouse#jack-smith>
        <Name=Jack>
    <$>
    <Doctor=@jack-smith>
<$>

A person whose spouse and doctor are the same person, Jack.

Within arrays the id is specified before the value followed by a ;, e.g. #48;. The following shows a list of persons with Eric Barton in the list twice. <<Persons> #eric; <Name=Eric Barton> <Occupation=Xenoneer> <&> @eric; <$>>

Types

To support serialization entities can be labeled with types. For example when a field is declared as a base class of the object being serialized, we need to know what was actually stored. Types are stored in the same place as #ids, using :type after the name or :type; before the item in arrays. The following example shows a car vehicle object and a dog and a fish pet array items. The fish also has an id. The #id and :type can be in either order. Type are not used to identify the Recommended Format used to encode a scalar.

<Household>
    <Vehicle:HouseholdApp.Car,HouseholdApp>
        <Transmission=Manual>
        <Make=Toyota>
    <$>
    <<Pets>
        :HouseholdApp.Dog,HouseholdApp;
        <Name=Fido>
        <Breed=Alsatian>
    <&>
        #nemo:HouseholdApp.Fish,HouseholdApp;
        <Name=Nemo>
        <Container=Tank>
    <$>>
<$>

Document Object Model

To aid understanding the data structure a Document Object Model is presented. This is not the grammar of the language but a data model.

document := named-entity named-entity := named-object | named-array | named-scalar named-object := NamedObject(name, object) named-array := NamedArray(name, array) named-scalar := NamedScalar(name, scalar) object := Object(type?, id?, field*) array := Array(type?, id?, item*) scalar := Scalar(type?, id?, value) field := named-entity item := object | array | scalar name := CHAR+ id := CHAR+ type := CHAR+ value := CHAR*

After references are resolved items such as scalars may have multiple parents, e.g. a NamedScalar and an Array.

Comments

Comments prefixed with % may appear where whitespace may, between <> bracketed tokens. The comment is terminated with a newline. Comments may not appear inside scalars. Special characters do not need to be escaped within comments.

<Person>
    <Name=Allan Smith>
    <<Friends>
        % my best friend
        <Name=Manuel Jones>
        <Mobile=+64 24 99 24 90>
    <&>
        <Name=Freida Smith>
        <Mobile=024 444 346>
    <$>>
<$>

Escaping

\ is used to escape characters. The following characters must be escaped in any place except inside comments: <>=$&#@:;|\%!. Line feed (“newline”) is represented with \n, carriage return with \r and tab with \t. Unicode characters can be escaped with \u{X...} where X... is one to six hexadecimal digits, e.g. \u{1F62D} for a loudly crying face 😭, he has had to write in a markup language that is too verbose. The character following the \ must be in lower case but the hex digits of \u{X...} should be upper case but may be lower case. The values in \u{X...} must be in the valid Unicode range \u{0} to \u{10FFFF} excluding surrogates D800 to DFFF. <Details=The two lines\r\nmade I \u{1F60A}> Is the equivalent of: <Details= The two lines made I 😊>

Recommended Formats

Scalars may represent many different types of information. To facilitate interoperability between implementations, xenon defines how various kinds of information should be represented. Xenon libraries support conversion between types in the host language. Defining formats allows these to be compatible. Implementations may support additional types.

Booleans

Booleans are stored as true or false. Processors must decode upper or lower case, e.g. False.

<Happy=true>

Integers

Integers should be stored with commas for readability. Processors must decode commas. Processors must handle signed 32 bit numbers. Processors should handle signed 64 bit numbers.

<Count=30,000>

Real numbers

Real numbers should be stored with commas for readability. Processors must decode commas. Implementations must handle ieee 754 64 bit double precision floating point numbers including the special values infinity ∞, negative infinity -∞, negative zero -0, and not a number NaN. Numbers may use the e notation, e.g. 4.2957e24 for 4.2957 × 10²⁴. Processors may handle ieee 754 32 bit single precision numbers. An implementation must interpret both upper and lower case e.

<<Results>
    1,414,213.562
<&>
    ∞
<&>
    2.415e10
<&>
    NaN
<$>>

Dates and Times

Dates and timestamps must be in a subset of Iso 8601 (see rfc 3339) formats. Suffix Z for utc. There may be no parts of a second or one or more decimal places after the seconds. Only the date may be given. An implementation must interpret both upper and lower case for the T and Z but should output upper case.

<<Timestamps>
    2026-09-24T16\:45\:22.5383742
<&>
    2026-10-04T18\:25\:12Z
<&>
    2026-04-02
<$>>

Guids/Uuids

Guids should be shown in lower case with hyphens. e.g:

<Id=aa512e8e-cf97-445e-ac10-cb5a5ea3ef63>

An implementation must interpret both upper and lower case.

Binary

Binary data is stored in Base64 as per rfc 4648.

<Image=eOG0h+m04bS/ybQNCg\=\=>

Note that the padding (=) must be escaped in xenon.

Null

In xenon we use a :type of null to specify this special value, i.e. :null.

<Spouse:null=>

Names

Xenon is flexible with the characters allowed in entity names. Any character is allowed but some such as < or line feed must be escaped using \, e.g. \< or \n for line feed (new line). The name may not be the empty string.

Specifics

Documents must be utf-8 and should have a byte order mark. Newlines should be Internet/Windows style \r\n but may be Unix \n. Tabs should not be used, in indenting they expand to the next eight character column, as per Windows and Unix terminals. A tab’s width in spaces is 8 - ((column - 1) % 8) where the column is numbered from 1 and % is the remainder (modulus). Spacing is limited to the space and tab characters, no other characters are treated as whitespace.

Text Processing Algorithm

The algorithm for processing multiple lines of text is: The first line is never unindented, it is removed if it is spacing (tabs and spaces) then a newline, or left intact. Well formed xenon, as outputted by a xenon library, never has non spacing in the first line in indented output. If all lines to be unindented are spacing just the newlines are preserved. The text is unindented by the indentation of the line with the least indenting. Tabs are expanded to spaces such that there is a tab stop every 8 columns as per Windows and Unix terminals. If the scalar being unindented is in an array and the item ends with a newline then just spacing that newline and spacing are removed.

Implementations

A high performance C#/.Net de/serializing implementation exists as does a JavaScript Antlr version. If you are interested in implementing xenon please email Gene Thomas.

Closing Remarks

Xenon is more terse than json and has advantages of xml.

Xenon stands for eXtensible Efficient Network Object Notation. Extensible, it is easy to define new languages built upon xenon. Efficient in terms of characters per unit of information, it is terse. Network, the data may be a network/graph, not just a tree. Object Notation meaning xenon is a notation for defining objects, arrays and scalars.

For more information see xenondata.org.

A. Index of Markup

`<`	Starts a markup sequence, terminated with `>` or `>>`.
`<name=value>`	A scalar setting name to value.
`<name>`	The start of an object, fields follow terminated with `<$>`.
`<<name>`	The start of an array. Items are delimited by `<&>` and terminated with `<$>>`.
`<$>`	The end of an object.
`<$>>`	The end of an array.
`<<name$$>>`	An empty named array.
`<<$$>>`	An empty array item (in an array).
`<&>`	Separates items in an array.
`<><$>`	An empty object in an array, i.e. has no fields. This is omitted when the object, in an array, has fields
`#id`	An identifier to be referred to from another part of the document using a reference `@id`.
`@id`	A reference to another part of the document marked with `#id`.
`:type`	The type of the entity.
`\|text`	An indented scalar.
`no markup`	In an array an item with no markup is a scalar.
`#id;` `:type;` `#id:type;` `:type#id;` `@ref;`	; terminates the id and/or type before an array item and references in an array.
`% comment`	A comment, terminated by a new line.
`\c`	Escape the next character, i.e. remove the special meaning; or specify a line feed (newline) `\n`, carriage return `\r` or tab `\t`. The following characters must be escaped outside comments `<>=$&#@:;\|\%!`
`\u{X...}`	A Unicode value escape sequence, e.g. `\u{1F60E}` for 😎.
`!`	Reserved for the future, must be escaped as `\!`.

B. References

Internet Rfc 2119: Key words for use in Rfcs to Indicate Requirement Levels