Sunday, September 07, 2008

Loading a page with javascript programatically - Httpclient x HtttUnit

Hi Guys

You might know that I appreciate very much surfing. As a sport for practicing myself, although after moving to England I haven't done as much as I could/should, and for viweing. I quite like following WCT, the main and closed championship with the best 45 surfers int he world, and its qualifier, the WQS. In my spare time, I was getting together a simple app to load the current rankings, then based on the current situation of incomplete competitions that might be running, generate an updatet ranking.

I thought in using httpclient to load the current rankings page, but the page uses javascript with document.writeln(' to write the table, and httpclient doesn't run javascript.

  • HttpUnit

Then I remembered that in the past I had done some website client automation with httpunit, then decided to use it.

  • Httpunit Maven dependency POM
First problem is to add it as a maven dependency. The last release, 1.7 from march/2008), is not available in the ibiblio main maven2 repository. I found it at the java.net maven 2 repository. To configure maven2 to use it add a settings.xml to your .m2 folder in your home directory with the following content:

<settings>
<profiles>
<profile>
<id>myprofile</id>
<repositories>
<repository>
<id>java.net-maven2-repository</id>
<url>http://download.java.net/maven/2/</url>
</repository>
</repositories>
</profile>
</profiles>
<activeProfiles>
<activeProfile>myprofile</activeProfile>
</activeProfiles>
</settings>


The following xml need to be added to your POM in order to import the right libraries into my project:

<dependency>
<groupId>javanettasks</groupId>
<artifactId>httpunit</artifactId>
<version>1.7</version>
<scope>compile</scope>
</dependency>

  • Missing something?
Ok, I re-run the mvn eclipse:eclipse to re-generate my eclipse project files, then I got a java snippet with a simple request of a webpage and printing it. That should show my ranking, as HttpUnit in this page states that it supports "document.writeln".
First problem: It gives a java.lang.NoClassDefFoundError: org/mozilla/javascript/Scriptable. After a bit more researching, I found the library that implements it, the mozilla Rhino, and added it to my POM.xml:


<dependency>
<groupId>rhino</groupId>
<artifactId>js</artifactId>
<version>1.6R5</version>
<scope>compile</scope>
</dependency>


  • Error during javascript processing:
Regenerating again the pom and regreshing the project in eclipse, just to find out that I would get still a further problem:

RHINO USAGE WARNING: Missed Context.javaToJS() conversion:
Rhino runtime detected object [Ljava.lang.Object;@110003 of class [Ljava.lang.Object; where it expected String, Number, Boolean or Scriptable instance. Please check your code for missing Context.javaToJS() call.
java.net.ConnectException: Connection refused: connect

I noticed in the 1.7 release notes for HttpUnit that the Rhino library was upgraded to the version 1.6R5. I tried changing to that version, but it doesn't seem to help much more.

Depending on the page I try to load I get a different message, for example the fotolog.net site:

failed: org.mozilla.javascript.EcmaError: TypeError: Cannot find function attachEvent in object [object Window]. (httpunit#10(eval)#1)

Any suggestion to make this work?

My resolved classpath (created by maven) is:
httpunit-1.7.jar
js-1.6R5.jar
jtidy-4aug2000r7-dev.jar
junit-3.8.1.jar
nekohtml-0.9.5.jar
servlet-api-2.4.jar
xercesImpl-2.4.0.jar

I'm using the following piece of code below to test:

import com.meterware.httpunit.GetMethodWebRequest;
import com.meterware.httpunit.WebConversation;
import com.meterware.httpunit.WebRequest;
import com.meterware.httpunit.WebResponse;

public class Main {
public static void main(String[] args) throws Exception {
String URL="http://www.aspworldtour.com/2008/ratings.asp?rView=w&rpage=menwqs&rRat=menwqs1&rNav=Men";
WebConversation wc = new WebConversation();
WebRequest req = new GetMethodWebRequest( URL );
WebResponse resp = wc.getResponse( req );
String text = resp.getText();
System.out.println(text);
}
}

UPDATE:
I tried then to use HtmlUnit (like mrs.gredler suggested) onthe same site, and that's the error I get instead:
Exception in thread "main" ======= EXCEPTION START ======== Exception class=[org.mozilla.javascript.WrappedException] com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: Connection refused: connect (http://ad.outsidehub.com/st?ad_type=ad&ad_size=728x90&section=337979#1) (http://oascentral.surfline.com/RealMedia/ads/adstream_mjx.ads/www.aspworldtour.com/1552025455@Top?#8)

2 comments:

Mrs.Gredler said...

Give HtmlUnit a try. It's in the central Maven repo, all of the transitive dependencies will take care of themselves, and it has much better JavaScript support than HttpUnit.

test account3 said...

Am also getting following error for following code:

org.mozilla.javascript.EcmaError: TypeError: Cannot find function attachEvent in object [object Image].

CODE:

WebConversation wc = new WebConversation();
WebRequest req = new GetMethodWebRequest("http://www.google.com");
WebResponse resp = wc.getResponse(req);

any help?