
Jerry is a jQuery in Java. Jerry is a fast and concise Java Library that simplifies HTML document parsing, traversing and manipulating. Jerry is designed to change the way that you parse HTML content.
Jerry belongs to jodd-lagarto module and has dependency on SLF4J.
import static jodd.jerry.Jerry.jerry;
...
Jerry doc = jerry(html);
doc.$("div#jodd p.neat").css("color", "red").addClass("ohmy");
We tried to keep Jerry API identical to jQuery as much as possible; in some cases you can simply copy some jQuery code and paste it in Java! Of course, there are some differences due to different environments code is executed in.
In Java we do not have the document context as in browsers and javascript, so we need to create one first. To do that, simply pass HTML content to Jerry static factory method. That will create a root Jerry set, containing a Document root node of the parsed DOM tree.
What happens in the background is that Lagarto parser is invoked to build a DOM tree.
Jerry uses Lagarto DOM parser for parsing the content and building the DOM tree.
You can use most of standard CSS selectors and also most of the jQuery CSS selectors extensions. CSS selectors are supported by CSSelly.
As Jerry speaks Java, there are some differences in API made to make Jerry API more Java friendly. For example, css() method accepts an array of property/values, and not a single string.
jerry(html).$("tr:last").css("background-color", "yellow", "fontWeight", "bolder");
Similarly, each() method receives a callback instance and it is not so fluent as in javascript;):
doc.$("select option:selected").each(new JerryFunction() {
public boolean onNode(Jerry $this, int index) {
str.append($this.text()).append(' ');
return true;
}
});
As Jerry is all about static manipulation of HTML content, all jQuery methods and selectors that are related to any dynamic activity are not supported. This includes animations, ajax calls, selectors that depends on CSS definitions...
Jerry internally uses Lagarto DOM builder to parse input content and to produce HTML code. The builder is configurable and supports predefined parsing modes for HTML, XHTML and XML.
By default, Jerry uses the builder in HTML parsing mode. Here is an example how to change the predefined parsing mode:
Jerry jerry = jerry().enableHtmlMode().parse(html);
To configure it more, we can use the following snippet:
JerryParser jerryParser = Jerry.jerry(); jerryParser.getDOMBuilder().setIgnoreComments(true); Jerry jerry = jerryParser.parse(xhtml);
Check all details about configuration and parsing modes for Lagarto, parser used by Jerry.
Following examples depend on content of live web pages. At some point in time pages may change in such way that the examples stop working.
Site allmusic.com shows new music releases in the right column. Here is a simple code that downloads the page, parse it and displays all releases in console:
public class AllMusicNewReleases {
public static void main(String[] args) throws IOException {
// download the page super-efficiently
File file = new File(SystemUtil.getTempDir(), "allmusic.html");
NetUtil.downloadFile("http://allmusic.com", file);
// create Jerry, i.e. document context
Jerry doc = Jerry.jerry(FileUtil.readString(file));
// parse
doc.$("div#new_releases div.list_item").each(new JerryFunction() {
public boolean onNode(Jerry $this, int index) {
System.out.println("-----");
System.out.println($this.$("div.album_title").text());
System.out.println($this.$("div.album_artist").text().trim());
return true;
}
});
}
}Nice :)
Let's remove toolbar from Google page and remove Google logo image with simple HTML text.
public class ChangeGooglePage {
public static void main(String[] args) throws IOException {
// download the page super-efficiently
File file = new File(SystemUtil.getTempDir(), "google.html");
NetUtil.downloadFile("http://google.com", file);
// create Jerry, i.e. document context
Jerry doc = Jerry.jerry(FileUtil.readString(file));
// remove div for toolbar
doc.$("div#mngb").detach();
// replace logo with html content
doc.$("div#lga").html("<b>Google</b>");
// produce clean html...
String newHtml = doc.html();
// ...and save it to file system
FileUtil.writeString(
new File(SystemUtil.getTempDir(), "google2.html"),
newHtml);
}
}
Easy peasy!
To demonstrate the power of Jerry, we created a little Facebook bot just for fun:) The task was to create a bot that will login to Facebook account, list friends proposals and send a few 'Add friend' requests. To see how, read it here.
Jerry can be used to parse XML files, too! We needed to parse Maven POM files, in order to display dependencies on our download page. First, we had to enable the XML mode of Lagarto parser:
Jerry.JerryParser jerryParser = new Jerry.JerryParser(); jerryParser.enableXmlMode(); Jerry doc = jerryParser.parse(FileUtil.readString(pomFile));
and then we can access the content via CSS selectors, for example:
Jerry dependencies = doc.$("dependencies dependency");
dependencies.each(new JerryFunction() {
@Override
public boolean onNode(Jerry $this, int index) {
// skip test dependencies
if ($this.$("scope").text().equals("test")) {
return true;
}
String artifactId = $this.$("artifactId").text();
String optionalStr = $this.$("optional").text();
// process
return true;
}
});
Complete code you can find here.
Jerry can be used very nicely in Groovy! Rob Flatcher uses it in Groovy tests like this (snippet):
import static jodd.jerry.Jerry.jerry as $
def dom = $(output)
def labels = dom.find('label')
labels*.attr('for') == ['foo_hours', 'foo_minutes', 'foo_seconds']
labels*.text() == ['\u00a0hours ', '\u00a0minutes ', '\u00a0seconds ']
labels.every {
it.find(':input').attr('id') == it.attr('for')
}
For more details, check unit tests in grails-joda-time.
.