If you, like me, have tried to use Google's URL Fetch Java API on the Google App Engine, you've probably been disappointed. Sure, it's a small, clean API, but it's totally feature-less. The most advanced thing it supports seems to be ... well, it can follow redirects automatically. Wow. Cookies? Authentication? Forget it.
In ESXX, I use Apache's HttpClient 4, and it works really well. Wouldn't it be nice if you could use HttpClient on the App Engine? Well, now you can. All it takes is a custom connection manager that converts the final requests and feed them into the URL Fetch service, and then feeds the responses back into HttpClient.
You can have a look at the implementation here. It's just two classes, one ClientConnectionManager and one ManagedClientConnection class.
PS. ESXX now runs really well on GAE. Timers, the http, https, mailto, jdbc (in-memory H2) and data URI protocols and HTML parsing, yep, works! Only the dns and ldap are non-functional (they will probably never work). Check out master from Git to try it yourself. Build using ant gae-war.
Updated 2009-12-11: The URIs were double-encoded and query parameters did not work at all. Thanks for pointing this out, Thibaut!
Updated 2010-08-08: The URIs lacked a colon before the optional port number. Thanks for pointing this out, Nello! Also, I changed the license for the two files to LGPLv3.
Updated 2014-11-18: Updated links to Git repo (moved to Github from Berlios).
40 comments:
I tried to compile your code in my GAE project but I get the following errors
am I missing something ?
java.net.InetAddress is not supported by Google App Engine's Java runtime environment
java.net.Socket is not supported by Google App Engine's Java runtime environment
javax.net.ssl.SSLSession is not supported by Google App Engine's Java runtime environment
Did you attach the connection manager to your HttpClient? (See getHttpClient() in http://svn.berlios.de/wsvn/esxx/trunk/src/org/esxx/js/protocol/HTTPHandler.java)
For the GAE code path, the connection manager has already been set using setConnectionManager() from http://svn.berlios.de/wsvn/esxx/trunk/jee/org/esxx/ESXXServlet.java.
Was wondering if this somehow could be used to make HtmlUnit work on GAE (which uses HttpClient). I know, not your project heh.
I tried to use htmlunit on GAE, to go visit an page (https://) and it freaked out badly hah.
I even did this (which I read all over for HttpClient/SSL, though I have no clue what it does):
Protocol easyhttps = new Protocol("https", new EasySSLProtocolSocketFactory(), 443);
Protocol.registerProtocol("https", easyhttps);
Hi,
Great post, thanks for sharing that.
Your connection factory works very well.
I have a one problem with cookies that I'd like to ask about.
I do a GET request and then cookie list is empty. There are some cookies when running the sample code in a development environment.
I will appreciate any help with that.
Roger, I'm afraid don't know much about HtmlUnit, despite both ESXX and HtmlUnit being based on Rhino.
However, I the socket factory approach is probably too low-level for GAE, which doesn't even allow sockets to be created, much less being connected.
You need to inject a connection manager instead, and work on the connection level, like in I did.
Anonymous, just to make sure, I just tried the following code in ESXX:
function handleGet(req) {
let url = new URI("http://martin.blom.org/cgi-bin/cookie.sh");
url.load();
url.load();
return url.jars.toSource();
}
When invoked, a list of cookies are returned by the function, and the cookies are also set correctly by the second load() request (I dumped the traffic with tcpdump on my end).
So you're probably missing something. Or could it be that the default cookie store in HttpClient is not GAE safe ...? Remember, the devkit does not enforce the class whitelist.
(In ESXX, I'm using a custom cookie store, so maybe that's why my code works but not yours?)
Hi Leviticus
What I do is:
1)
DefaultHttpClient client = new DefaultHttpClient(new GAEConnectionManager(), new BasicHttpParams());
2)send GET request
3)
client.getCookieStore().getCookies().size()=0
I created custom cookie store and it works for me now. Thanks for advice.
Jessica.
berlios looks to be down, do you have the example source code anywhere else?
thx!
Jay, BerlOS seem to be working again. If not, drop me a mail and I'll send the files.
I think I was like Anonymous in his comment above: I was initially confused when I dropped your adapter classes (GAEConnectionManager.java and GAEClientConnection.java) into my project in eclipse and got a bunch of compile errors. To work around this, I put those classes into a separate jar that I referenced from my project. I explain this a bit more in my blog.
Thanks very much for posting this code!
very very nice! it works first time! thanks heaps!
hi, is there a way to follow a link on the response?
Lydonchandra, you need to run the result through an HTML parser and the extract the links from the parsed document.
This, of course, was one of the main reasons I wrote ESXX in the first place. Getting a list of links from a web page really should not have to be any more difficult than this:
let uri = new URI("http://www.example.com/");
let doc = uri.load("text/html");
let ref = doc..a.@href;
That's JavaScript + E4X, in case you don't recognize the syntax. If only things were that easy in Java ... :-) Anyway, I'd suggest a good HTML parser, such as HtmlCleaner, which I use in ESXX, and an XPath expression to fetch the links.
Thanks Leviticus.
Now I want more :D
I am trying to use HttpUnit in GAE instead, and as you might know, I got the URLStreamHandler error, because HttpUnit uses URLStreamHandler to process javascript and https.
Do you know how to use URLFetch in place of URLStreamHandler ??
No, sorry.
It's very useful !
Although, there seem to be a bug preventing to use it for URIs that include a queryString.
In GAEClientConnection#sendRequestHeader
URI uri = new URI(host.getSchemeName(), null, host.getHostName(), host.getPort(),
request.getRequestLine().getUri(), null, null);
should split the queryString from request.getRequestLine().getUri() and feed it to the next parameter, as otherwise the ? character is escaped and results in 404 resposnes
actually it's even worse/simpler.
The way it's currently done results in escaping twice (once in httpclient, once in GAEClientConnection through the construction of the URL from URI fragments).
Considering the inputs here are ALREADY escaped by HttpClient, there's no need to escape again and the following seems to be solving my issue
URI uri = new URI(host.getSchemeName()+"://"+ host.getHostName() +((host.getPort()==-1)?"": host.getPort()) +requestUri);
Silly me. Thanks a lot for finding this, Thibaut!
I have now pushed a fix to svn.
Your classes work for HttpGet but not for HttpPost. I get a NullPointerException in GAEClientConnection.sendRequestEntity: request.getEntity().writeTo(baos);
Is there any way to make this work for POST?
Never mind. I added the following code:
if(request.getEntity() != null)
request.getEntity().writeTo(baos);
This makes it work. I was not sure initially that this change would make POST work.
Thanks a lot for your initial solution.
lonikar, I didn't know it was allowed to call sendRequestEntity with a null entity, but indeed it is and AbstractHttpClientConnection does check for this condition.
Thanks for the fix (which I have now committed)!
Hi,
Firstly, thanks for this code. You have helped me immensely!
Secondly, you have an error in GAEClientConnection.sendRequestHeader where you set up the URI. Your current implementation does not provide a ":" before the port number if one is used.
I fixed it like this:
URI uri = new URI(host.getSchemeName() + "://" + host.getHostName()
+ ((host.getPort() == -1) ? "" : (":" + host.getPort()))
+ request.getRequestLine().getUri());
Lastly, I'm not sure what is up with your SVN site, but all the code is covered with weird artifacts like '1.5.0/docs/api/java/lang/IllegalArgumentException.html">' on almost every line!
I'm on a Mac using Firefox and Safari - maybe it's ok on Windows?
Thanks again for the code!
Nello
I also ran into the same issue as Nello, and made the same fix.
Also, I was wondering about the licensing for these files. Since they are GPL, inclusion in a non-GPL application would require that the sources for the entire application be made available. Would you be willing to release these files under a more permissive license like for instance the Apache License so that people can use them in closed source projects?
Thanks for pointing this out, Nello. Fix checked in.
Sigh, that URI line has caused way to much trouble ... Hope that was the last one!
About the broken syntax highligting: Seems to be a WebSVN issue at Berlios. Switched links to ViewVC for now.
hebnern: Fair enough, as long as the bug fixes keep coming in. Some flattr would be nice too ... :-)
I've changed the license of the two files in question to LGPLv3. Should be compatible with most other licenses out there.
thanks for this code~
it is very usefully.
now i have a small question .
use this tow java file with httpclient4 can fetch htts://?
thanks
i have the same problem;
i use httpclient4 in gae.
code:
httpclient = new DefaultHttpClient(new GAEConnectionManager(),params);
HttpGet hp = new HttpGet("https://XXXXXXXXXXX");
hc.execute(hp);
but there is a Exception occurred
"The API package 'urlfetch' or call 'Fetch()' was not found."
but i use
httpclient = new DefaultHttpClient();
in my environment
it will be run
any ideas
Awesome!
Thanks for the code.
I'm trying to use it with HttpClient4.1 and seems to work only with HttpGet requests.
When I'm trying to use HttpPost:
HttpPost httpPost = new HttpPost(new URI("https://registration.xxx.com/login.fcc"));
List nvps = new ArrayList ();
nvps.add(new BasicNameValuePair("USER", user));
nvps.add(new BasicNameValuePair("PASSWORD", password));
try {
httpost.setEntity(new UrlEncodedFormEntity(nvps, HTTP.UTF_8));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
HttpResponse response = itsHttpClient.execute(httpPost);
I'm getting GAE exception:
Caused by: java.security.AccessControlException: access denied (java.net.SocketPermission registration.xxx.com resolve)
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:323)
at java.security.AccessController.checkPermission(AccessController.java:546)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
at com.google.appengine.tools.development.DevAppServerFactory$CustomSecurityManager.checkPermission(DevAppServerFactory.java:166)
at java.lang.SecurityManager.checkConnect(SecurityManager.java:1031)
at java.net.InetAddress.getAllByName0(InetAddress.java:1146)
at java.net.InetAddress.getAllByName(InetAddress.java:1084)
at java.net.InetAddress.getAllByName(InetAddress.java:1020)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.resolveHostname(DefaultClientConnectionOperator.java:242)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:130)
at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:562)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
at com.robin.client.ClientFormLogin.login(ClientFormLogin.java:85)
When I'm using the same code for HttpGet request it is working with no problem.
Am I'm missing something?
To add to the above comment, the way I'm using to initial the HttpClient is by:
HttpParams httpParams = new BasicHttpParams();
ClientConnectionManager connectionManager = new GAEConnectionManager();
DefaultHttpClient itsHttpClient = new DefaultHttpClient(connectionManager, httpParams);
Guy, I'll have a look this week.
Thanks, man, that made Ektorp work under GAE
Great stuff. Do you want to mavenize this project and perhaps even upload it to a repository? hint, hint :)
I've attached a pom.xml for your convenience:
4.0.0
org.esxx.js.protocol
gae-connection-manager
1.0.0
GAEConnectionManager
The GAEConnectionManager as given in http://esxx.blogspot.com/2009/06/using-apaches-httpclient-on-google-app.html. Based on the advice given in http://peterkenji.blogspot.com/2009/08/using-apache-httpclient-4-with-google.html. All license and copyright information is in the source code.
LGPLv3
http://www.gnu.org/licenses/lgpl-3.0-standalone.html
1.3.8
4.0.1
2.3.2
org.apache.maven.plugins
maven-compiler-plugin
${maven.compiler.plugin.version}
1.6
1.6
com.google.appengine
appengine-api-1.0-sdk
${gae.version}
jar
compile
true
org.apache.httpcomponents
httpclient
${httpclient.version}
true
Some of the Referenced object are deprecated can you please update these classes with latest objects
Samir et al,
My donation jar is awfully empty.
Hi Leviticus,
I just ran into your blog post about getting Apache HttpClient 4 running on Google App Engine. Looks like great work! I would be really interested in having a look at the source code. Unfortunately, the Berlios SVN server does not exist anymore. Do you still have the code somewhere else?
Thanks,
Ralph
I've updated the links to point to Github.
Hmm it seems like your website ate my first comment (it was extremely long) so I guess I'll just sum it up what I submitted and say, I'm thoroughly enjoying your blog. I too am an aspiring blog writer but I'm still new to the whole thing. Do you have any tips for rookie blog writers? I'd really appreciate it.
Hey very nice blog!
Post a Comment