Latest ESXX Release


Sunday, June 14, 2009

Using Apache's HttpClient on Google App Engine

If you, like me, have tried to use Google's URL Fetch Java API on the Google App Engine, you've probably been disappointed. Sure, it's a small, clean API, but it's totally feature-less. The most advanced thing it supports seems to be ... well, it can follow redirects automatically. Wow. Cookies? Authentication? Forget it.

In ESXX, I use Apache's HttpClient 4, and it works really well. Wouldn't it be nice if you could use HttpClient on the App Engine? Well, now you can. All it takes is a custom connection manager that converts the final requests and feed them into the URL Fetch service, and then feeds the responses back into HttpClient.

You can have a look at the implementation here. It's just two classes, one ClientConnectionManager and one ManagedClientConnection class.

PS. ESXX now runs really well on GAE. Timers, the http, https, mailto, jdbc (in-memory H2) and data URI protocols and HTML parsing, yep, works! Only the dns and ldap are non-functional (they will probably never work). Check out trunk from Subversion to try it yourself. Build using ant gae-war.

Updated 2009-12-11: The URIs were double-encoded and query parameters did not work at all. Thanks for pointing this out, Thibaut!

Updated 2010-08-08: The URIs lacked a colon before the optional port number. Thanks for pointing this out, Nello! Also, I changed the license for the two files to LGPLv3.

36 comments:

Anonymous said...

I tried to compile your code in my GAE project but I get the following errors

am I missing something ?

java.net.InetAddress is not supported by Google App Engine's Java runtime environment

java.net.Socket is not supported by Google App Engine's Java runtime environment

javax.net.ssl.SSLSession is not supported by Google App Engine's Java runtime environment

Leviticus said...

Did you attach the connection manager to your HttpClient? (See getHttpClient() in http://svn.berlios.de/wsvn/esxx/trunk/src/org/esxx/js/protocol/HTTPHandler.java)

For the GAE code path, the connection manager has already been set using setConnectionManager() from http://svn.berlios.de/wsvn/esxx/trunk/jee/org/esxx/ESXXServlet.java.

Roger said...

Was wondering if this somehow could be used to make HtmlUnit work on GAE (which uses HttpClient). I know, not your project heh.

I tried to use htmlunit on GAE, to go visit an page (https://) and it freaked out badly hah.

I even did this (which I read all over for HttpClient/SSL, though I have no clue what it does):

Protocol easyhttps = new Protocol("https", new EasySSLProtocolSocketFactory(), 443);
Protocol.registerProtocol("https", easyhttps);

Anonymous said...

Hi,

Great post, thanks for sharing that.
Your connection factory works very well.

I have a one problem with cookies that I'd like to ask about.

I do a GET request and then cookie list is empty. There are some cookies when running the sample code in a development environment.

I will appreciate any help with that.

Leviticus said...

Roger, I'm afraid don't know much about HtmlUnit, despite both ESXX and HtmlUnit being based on Rhino.

However, I the socket factory approach is probably too low-level for GAE, which doesn't even allow sockets to be created, much less being connected.

You need to inject a connection manager instead, and work on the connection level, like in I did.

Leviticus said...

Anonymous, just to make sure, I just tried the following code in ESXX:

function handleGet(req) {
let url = new URI("http://martin.blom.org/cgi-bin/cookie.sh");
url.load();
url.load();
return url.jars.toSource();
}

When invoked, a list of cookies are returned by the function, and the cookies are also set correctly by the second load() request (I dumped the traffic with tcpdump on my end).

So you're probably missing something. Or could it be that the default cookie store in HttpClient is not GAE safe ...? Remember, the devkit does not enforce the class whitelist.

(In ESXX, I'm using a custom cookie store, so maybe that's why my code works but not yours?)

Jessica Mong said...

Hi Leviticus

What I do is:
1)
DefaultHttpClient client = new DefaultHttpClient(new GAEConnectionManager(), new BasicHttpParams());

2)send GET request

3)
client.getCookieStore().getCookies().size()=0

I created custom cookie store and it works for me now. Thanks for advice.

Jessica.

Jay Colson said...

berlios looks to be down, do you have the example source code anywhere else?

thx!

Leviticus said...

Jay, BerlOS seem to be working again. If not, drop me a mail and I'll send the files.

Peter said...

I think I was like Anonymous in his comment above: I was initially confused when I dropped your adapter classes (GAEConnectionManager.java and GAEClientConnection.java) into my project in eclipse and got a bunch of compile errors. To work around this, I put those classes into a separate jar that I referenced from my project. I explain this a bit more in my blog.
Thanks very much for posting this code!

Lydonchandra said...

very very nice! it works first time! thanks heaps!

Lydonchandra said...

hi, is there a way to follow a link on the response?

Leviticus said...

Lydonchandra, you need to run the result through an HTML parser and the extract the links from the parsed document.

This, of course, was one of the main reasons I wrote ESXX in the first place. Getting a list of links from a web page really should not have to be any more difficult than this:

let uri = new URI("http://www.example.com/");
let doc = uri.load("text/html");
let ref = doc..a.@href;

That's JavaScript + E4X, in case you don't recognize the syntax. If only things were that easy in Java ... :-) Anyway, I'd suggest a good HTML parser, such as HtmlCleaner, which I use in ESXX, and an XPath expression to fetch the links.

Lydonchandra said...

Thanks Leviticus.
Now I want more :D
I am trying to use HttpUnit in GAE instead, and as you might know, I got the URLStreamHandler error, because HttpUnit uses URLStreamHandler to process javascript and https.

Do you know how to use URLFetch in place of URLStreamHandler ??

Leviticus said...

No, sorry.

Thibaut said...
This comment has been removed by the author.
Thibaut said...

It's very useful !

Although, there seem to be a bug preventing to use it for URIs that include a queryString.

In GAEClientConnection#sendRequestHeader
URI uri = new URI(host.getSchemeName(), null, host.getHostName(), host.getPort(),
request.getRequestLine().getUri(), null, null);


should split the queryString from request.getRequestLine().getUri() and feed it to the next parameter, as otherwise the ? character is escaped and results in 404 resposnes

Thibaut said...

actually it's even worse/simpler.

The way it's currently done results in escaping twice (once in httpclient, once in GAEClientConnection through the construction of the URL from URI fragments).

Considering the inputs here are ALREADY escaped by HttpClient, there's no need to escape again and the following seems to be solving my issue

URI uri = new URI(host.getSchemeName()+"://"+ host.getHostName() +((host.getPort()==-1)?"": host.getPort()) +requestUri);

Leviticus said...

Silly me. Thanks a lot for finding this, Thibaut!

I have now pushed a fix to svn.

lonikar said...

Your classes work for HttpGet but not for HttpPost. I get a NullPointerException in GAEClientConnection.sendRequestEntity: request.getEntity().writeTo(baos);

Is there any way to make this work for POST?

lonikar said...

Never mind. I added the following code:

if(request.getEntity() != null)
request.getEntity().writeTo(baos);

This makes it work. I was not sure initially that this change would make POST work.

Thanks a lot for your initial solution.

Leviticus said...

lonikar, I didn't know it was allowed to call sendRequestEntity with a null entity, but indeed it is and AbstractHttpClientConnection does check for this condition.

Thanks for the fix (which I have now committed)!

Anonymous said...

Hi,

Firstly, thanks for this code. You have helped me immensely!

Secondly, you have an error in GAEClientConnection.sendRequestHeader where you set up the URI. Your current implementation does not provide a ":" before the port number if one is used.

I fixed it like this:

URI uri = new URI(host.getSchemeName() + "://" + host.getHostName()
+ ((host.getPort() == -1) ? "" : (":" + host.getPort()))
+ request.getRequestLine().getUri());

Lastly, I'm not sure what is up with your SVN site, but all the code is covered with weird artifacts like '1.5.0/docs/api/java/lang/IllegalArgumentException.html">' on almost every line!

I'm on a Mac using Firefox and Safari - maybe it's ok on Windows?

Thanks again for the code!

Nello

hebnern said...

I also ran into the same issue as Nello, and made the same fix.

Also, I was wondering about the licensing for these files. Since they are GPL, inclusion in a non-GPL application would require that the sources for the entire application be made available. Would you be willing to release these files under a more permissive license like for instance the Apache License so that people can use them in closed source projects?

Leviticus said...

Thanks for pointing this out, Nello. Fix checked in.

Sigh, that URI line has caused way to much trouble ... Hope that was the last one!

About the broken syntax highligting: Seems to be a WebSVN issue at Berlios. Switched links to ViewVC for now.

Leviticus said...

hebnern: Fair enough, as long as the bug fixes keep coming in. Some flattr would be nice too ... :-)

I've changed the license of the two files in question to LGPLv3. Should be compatible with most other licenses out there.

Anonymous said...

thanks for this code~
it is very usefully.
now i have a small question .
use this tow java file with httpclient4 can fetch htts://?
thanks

Anonymous said...

i have the same problem;
i use httpclient4 in gae.
code:
httpclient = new DefaultHttpClient(new GAEConnectionManager(),params);
HttpGet hp = new HttpGet("https://XXXXXXXXXXX");
hc.execute(hp);
but there is a Exception occurred
"The API package 'urlfetch' or call 'Fetch()' was not found."
but i use
httpclient = new DefaultHttpClient();
in my environment
it will be run
any ideas

Anonymous said...

Awesome!

Guy said...

Thanks for the code.

I'm trying to use it with HttpClient4.1 and seems to work only with HttpGet requests.

When I'm trying to use HttpPost:

HttpPost httpPost = new HttpPost(new URI("https://registration.xxx.com/login.fcc"));

List nvps = new ArrayList ();
nvps.add(new BasicNameValuePair("USER", user));
nvps.add(new BasicNameValuePair("PASSWORD", password));

try {
httpost.setEntity(new UrlEncodedFormEntity(nvps, HTTP.UTF_8));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}

HttpResponse response = itsHttpClient.execute(httpPost);

I'm getting GAE exception:

Caused by: java.security.AccessControlException: access denied (java.net.SocketPermission registration.xxx.com resolve)
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:323)
at java.security.AccessController.checkPermission(AccessController.java:546)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
at com.google.appengine.tools.development.DevAppServerFactory$CustomSecurityManager.checkPermission(DevAppServerFactory.java:166)
at java.lang.SecurityManager.checkConnect(SecurityManager.java:1031)
at java.net.InetAddress.getAllByName0(InetAddress.java:1146)
at java.net.InetAddress.getAllByName(InetAddress.java:1084)
at java.net.InetAddress.getAllByName(InetAddress.java:1020)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.resolveHostname(DefaultClientConnectionOperator.java:242)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:130)
at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:562)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
at com.robin.client.ClientFormLogin.login(ClientFormLogin.java:85)

When I'm using the same code for HttpGet request it is working with no problem.

Am I'm missing something?

Guy said...

To add to the above comment, the way I'm using to initial the HttpClient is by:

HttpParams httpParams = new BasicHttpParams();
ClientConnectionManager connectionManager = new GAEConnectionManager();
DefaultHttpClient itsHttpClient = new DefaultHttpClient(connectionManager, httpParams);

Leviticus said...

Guy, I'll have a look this week.

M. Maksin said...

Thanks, man, that made Ektorp work under GAE

manish said...

Great stuff. Do you want to mavenize this project and perhaps even upload it to a repository? hint, hint :)

I've attached a pom.xml for your convenience:


4.0.0
org.esxx.js.protocol
gae-connection-manager
1.0.0
GAEConnectionManager
The GAEConnectionManager as given in http://esxx.blogspot.com/2009/06/using-apaches-httpclient-on-google-app.html. Based on the advice given in http://peterkenji.blogspot.com/2009/08/using-apache-httpclient-4-with-google.html. All license and copyright information is in the source code.


LGPLv3
http://www.gnu.org/licenses/lgpl-3.0-standalone.html




1.3.8
4.0.1
2.3.2





org.apache.maven.plugins
maven-compiler-plugin
${maven.compiler.plugin.version}

1.6
1.6







com.google.appengine
appengine-api-1.0-sdk
${gae.version}
jar
compile
true


org.apache.httpcomponents
httpclient
${httpclient.version}
true

samir said...

Some of the Referenced object are deprecated can you please update these classes with latest objects

Leviticus said...

Samir et al,

My donation jar is awfully empty.