Lagarto properties

There are two sets of Lagarto configuration: for LagartoParser and LagartoDOMBuilder.

LagartoParser configuration


By default disabled, this property enables calculation of elements position: line, column (approx) and offset. It makes processing slow!


When set to true (default) Lagarto parser will detect IE conditional comments. If you know that your HTML does not have any you can disable this property to gain some more speed. If disabled and conditional comment is found, Lagarto will either send an error for revealed conditional comment tags or threat downlevel-hidden conditional comments as regular comments.


Special tags (in Lagarto) are: style, script and xmp. This flag determines if special tags content should be parsed or not - in other words, is it treated as PCDATA or CDATA section. When not parsed, content is simply taken as it is, so using signs like < or & is allowed. Otherwise, the content has to be correctly encoded.

LagartoDOMBuilder configuration

LagartoDOMBuilder is Lagarto DOM builder and has few additional properties that can be used to read and generate HTML, XHTML or XML content.


Defines if tag names and attribute names are case sensitive.


Irrelevant from content type, this flag simply defines if resulting DOM tree should contain comments or not.


List of void elements names. By default, it contains all HTML5 void elements. If set to null, then there are no void elements.


When an element is a void element, this flag defines if it can be self-closed or if it should have the standard end tag.


Enables rules for implicit end tags. There are a number of tags that do not require the use of a closing tag for valid HTML (body, li, dd, dt, p, td, tr,…). When this flag is on, these tags are implicitly closed if needed and no error/warning is output.

This feature somewhat slows down the parsing. If you are closing all your tags, consider switching this feature off, to improve performances.


This flag is used for XML modes, to ignore all whitespace content between two start or two end tags. Whitespace content between one open and one closed tag is still not ignored.


When enabled, LagartoDomBuilder will collect all errors during parsing and provide them via method getErrors().


Specifies expression for IE conditional comments that evaluates true. By default, it is "if !IE", meaning the HTML code will be treated as non-IE browser. If set to null, all conditional comments will be part of the DOM tree.


Holds an instance of LagartoNodeHtmlRenderer that builds the HTML string from DOM tree. You can tweak renderer more or add your own implementation. For example, you can set the case of tags and attribute names, or modify how some attributes are render.

Predefined parsing modes

There are 3 predefined parsing modes: HTML, XHTML and XML. They can be easily set by calling enableXxxMode() on LagartoDOMBuilder. These methods will configure the builder to work with HTML, XHTML or XML code. Besides these 3 modes, there some additional modes that can be turned on. Here are the details.

HTML mode (default)

    ignoreWhitespacesBetweenTags = false;       // collect all whitespaces
    caseSensitive = false;                      // HTML is case insensitive
    parseSpecialTagsAsCdata = true;             // script and style tags are CDATA
    voidTags = HTML5_VOID_TAGS;                 // list of void tags
    selfCloseVoidTags = false;                  // don't self close void tags
    impliedEndTags = true;                      // some tags end is implied
    enableConditionalComments = true;           // enable IE conditional comments
    conditionalCommentExpression = "if !IE";    // treat HTML as non-IE browser

XHTML mode

    ignoreWhitespacesBetweenTags = false;       // collect all whitespaces
    caseSensitive = true;                       // XHTML is case sensitive
    parseSpecialTagsAsCdata = false;            // all tags are parsed the same
    voidTags = HTML5_VOID_TAGS;                 // list of void tags
    impliedEndTags = false;                     // no implied tag ends
    selfCloseVoidTags = true;                   // self close void tags
    enableConditionalComments = true;           // enable IE conditional comments
    conditionalCommentExpression = "if !IE";    // treat XHTML as non-IE browser

XML mode

    ignoreWhitespacesBetweenTags = true;        // ignore whitespaces that are no content
    caseSensitive = true;                       // XML is case sensitive
    parseSpecialTagsAsCdata = false;            // all tags are parsed the same
    voidTags = null;                            // there are no void tags
    selfCloseVoidTags = false;                  // don't self close empty tags (can be changed!)
    impliedEndTags = false;                     // no implied tag ends
    enableConditionalComments = false;          // disable IE conditional comments
    conditionalCommentExpression = null;        // don't use

User can further change these predefined modes by setting individual flags.

HTML-Plus mode

Default HTML mode does not change the order of the nodes. However, HTML5 specification has some rules where nodes are moved around the DOM. For example, all tags written beyond table tags in a table are moved before table definition. Then, there are some special rules on which orphan tags may be closed and the scope in which they can be closed.

LagartoDOMBuilder offers HTML-Plus mode for building the tree, with some additional rules turned on. These rules require some additional processing.

Debug mode

You can easily turn on or off debugging with enableDebug() and disableDebug(). Debugging means that errors are collected and position is calculated.

Void and self-closing elements

(X)HTML modes are aware of void elements. Lagarto DOM builder tries to follow the specification and, therefore:

  • void elements in HTML mode are not self-closed
  • void elements in XHTML mode are self-closed

Regular, content elements are closed with close tag, even if it is an empty element.

XML mode is not aware of void elements. By default, empty elements are closed by closing tag, but that can be change to self-closing tags.

Fixing malformed content

Lagarto DOM tries to handle malformed content in an user-friendly way. Fixing errors also depends on parsing mode. Every error is logged as a warning.

As said, Lagarto DOM is not strict about the content and can't be used for validation.

Using predefined modes

Here is how predefined parsing modes can be used.

    LagartoDOMBuilder lagartoDOMBuilder = new LagartoDOMBuilder();
    Document doc = lagartoDOMBuilder.enableHtmlMode().parse(content);