Skip to content

Abstract Syntax Tree as JSON

Engelbert Niehaus edited this page May 3, 2018 · 22 revisions

In wtf_wikipedia the concept of an Abstract Syntax Tree (AST) is a tree representation of the WikiMedia syntax of the source text in Wiki Markdown. The data in the Wiki article in stored in a JSON structure. The JSON is valuable for data management of extracted content elements. The AST can be represented in a JSON as well, in which each node of the tree denotes a content element (e.g. paragraph, header/title, image, mathematical expression) occurring in the source text downloaded via the MediaWiki API with wtf.fetch(...). The syntax tree is "abstract" in a sense that is not representing a special output format in detail (e.g. HTML, LaTeX, MarkDown,...). The AST nodes are encode in the Wiki Markdown and tree structure can be used to generate the different output formats by application of the appropriate tree node handler for the title, image, sentences to the AST.

Similar to programming languages the abstract syntax trees can be derived from a concrete syntax trees, traditionally generated by parsing a given string compliant with a defined grammar.

Example of an AST

The following example is currently not an available feature in wtf_wikipedia. It could serve as basis for further generation of other export formats for formats that will never be implemented in wtf_wikipedia.

Wiki Page >-> wtf_wikipedia.js >-> AST >-> ast2odf.js >-> Open Document Format

{
   language: "en",
   domain: "wikiversity",
   article: "Water",
   ast: [
       {
           type:"paragraph",
           value:"",
           children:[
              {
                   type:"sentence",
                   value:"My first sentence.",
                  children:[]
             },
           {
                   type:"sentence",
                   value:"My Second sentence.",
                  children:[]
             },
          ]
       },
       {
           type:"title",
           value:"My Title",
           children:null

       },
       {
           type:"math",
           value:"\sum_{k=1}^{n} k^2",
           children:[]

       },
    ]
}
Clone this wiki locally