So, this week on Distilled Derivatives, some more laziness. And also, exceptions. Everyone loves them, really useful things for letting you move error handling into a single location that they are. Now and again though, their misuse can really get in the way of getting something done.
Retrieve the content of the first functioning url
Given n
urls in some ordered collection, get the content of the first
“working” url, where working is defined to mean “has http code 200 and
non empty response content”. Once you’ve found it, put the response body
of the url into some local file.
This should be easy, thought I. How wrong I was.
First problem
All the java http libraries throw IOException
. Ok, that’s not too bad,
we can get past that with only a little evil. Here’s our first go.
def withUrlConnection() : String = {
for (url <- urls) {
try
{
val actualUrl = new URL(url)
val connection = actualUrl.openConnection().asInstanceOf[HttpURLConnection]
return stringFromHttpUrlConnection(connection)
}
catch {
case e: IOException => {}
}
}
throw new RuntimeException("all failed")
}
This isn’t too bad – we’re only teetering on the edge of breaking the
“don’t use exception for control flow” rules, rather than falling into
the abyss. We can simply take the resulting String
and write it to a
file (implementation elided). Oh, and in case you were curious, here’s
the implementation of stringFromHttpURLConnection
(we’ll need it again
later).
N.B I’ve completely elided any closing of any connection everywhere in this article, because like the technique I’m trying to describe, I am lazy.
val stringFromHttpUrlConnection: (HttpURLConnection) => String = { connection : HttpURLConnection =>
Source.fromInputStream(connection.getInputStream).getLines().mkString("n")
}
The problem worsened, however, when some kind soul tried to help out by creating a function like the following:
Problem 2 – an unhelpful helper
def withUnhelpfulFunction(path: String) {
var done = false
for (url <- urls) {
try {
if (!done) {
downloadFile(url, path)
done = true
}
} catch {
case e: IOException => {}
}
}
}
def downloadFile(url: String, path: String) {
// lobs url response body into path on success, IOException if not. Implementation too boring to include.
}
At first glance this function looks like it might save us some code, but once you start talking about failure or multiple urls, we’re screwed. It’d be perfect if we wanted a one shot download that noisily failed, and that’s probably the use case the original author had in mind. In general the approach the function takes is a bit evil – it’s performing IO and letting us know we succeeded by not throwing an exception. That’s not the world’s greatest signal.
So, two quite ugly solutions with exceptions. Is there another way?
My hobby: getting lazy evaluation to write my program for me
Functional programming has given us some excellent tools to help decompose this problem, and there’s a neat lazy evaluation trick that can get us out of jail here. What do we need? A function that will:
- synchronously resolve a url to the response body
- include the possibility of failure in its return type
Roughly, this is equivalent to URL -> Maybe String
(or
Option[String]
, in Scala land). Try as I might, I can not find a scala
or java http client library that follows this approach.
Dispatch looks terrific
but is also asynchronous, and I don’t really want to talk about futures
in this post. Handily, numerous other sandbox projects I’ve tinkered
with have already forced me into writing my own, so I have a handy
function lying around.
def withHttpLib() : String = {
val requestor: HttpRequestor = new InsecureHttpRequestor()
val results = urls.map(url => requestor.performGet(url, stringFromHttpUrlConnection))
results.find(_.wasSuccessful()).getOrElse(throw new RuntimeException("all failed")).getResult
}
Look ma, no for or if! Even better, this approach appears much shorter, however, this is mostly because much of the responsibility has moved into the little http library. Let’s walk through the execution of this program.
A little secret
val urls = Stream("www.digihippo.net/home.html",
"www.digihippo.net/index.php",
"www.digihippo.net/foo.bar")
The structure the urls are sitting in is actually Scala’s excellent
Stream
abstraction. By default, operations on Stream
s are applied
lazily. For example, if we invoked map
on a particular instance of
Stream
, execution of the function we pass in is deferred until someone
tries to extract an element from the Stream
. In our earlier examples,
we use Scala’s for comprehension to iterate across urls
. In our
favourite implementation, you can see that we’ve started applying map
to it instead.
N.B These won’t actually work if you do new URL(url)
on them. They’re
in the current format because my library doesn’t yet work out whether to
connect securely or not from the passed in url String
.
What’s this http library doing then?
val requestor: HttpRequestor = new InsecureHttpRequestor()
The requestor
here is an object from my http library. It’s insecure
because it doesn’t deal with https
(it doesn’t lack confidence), just
so you know. To execute a given request, one provides a url, and a
response decoder – the requestor returns you a type parametrized Http
object, which may contain the thing you wanted to decode, or an error.
What’s happening underneath? Well, something like this:
- Attempt to get url
- On failure, return Http.failure() with some helpful details
- On success, attempt to parse using provided response decoder
- If response decoder succeeds, return whatever it decoded
- If it failed, return Http.failure() with some helpful details
Hold on though – why does our decoder get the opportunity to return any
sort of Http
? Why doesn’t it just attempt a parse and give up? Well,
we have to really think about what makes any given http request a
failure. The definition in this post is not necessary universal –
sometimes we expect a 404
, or a 500
, and still wish to parse
something from the resulting connection.
As such, at the moment, the decoder can examine any aspect of the
created HttpURLConnection
and decide for itself, if it wants to, that
this request was a failure. Our decoder does no such thing though – it
just attempts to grab the response body and whack it into a string. The
library will helpfully catch any exception resulting from playing with
HttpURLConnection
and return you an unsuccessful Http
instance with
the error information stashed inside.
What is this line really doing?
val results = urls.map(url => requestor.performGet(url, stringFromHttpUrlConnection))
So, hands up – who thinks we’re firing off http requests as we invoke
map
here? No-one? Good. Glad to see people are keeping up. What we’re
really doing here is creating a new Stream
, which at the moment is
actually a collection of functions that, when called, will return an
Http[String]
. So this line actually has no side effects, we’re really
just defining how to compute the next elements of results, rather than
computing them.
This is where the magic happens
results.find(_.wasSuccessful()).getOrElse(throw new RuntimeException("all failed")).getResult
Now, the density of this line is considerably higher than the others. If that’s a problem for you, you might be a n00b. So, what the hell is happening?
Well, that find
invocation would be better named findFirst
. It
traverses results
, examining each element to see if it was a success.
That means it needs to compute the element! So, find
actually does
this:
- Execute function to obtain element. N.B this finally executes our map function
- Examine result to see if it was successful
- If not, repeat with next element
So, in fact, just by defining results
in the way that we have means
that find
expresses the control flow of our application for us. One
could legitimately criticize the first two solutions for repeating this
standard library code, and, in fact, functional languages go out of
their way to make extracting this sort of legwork easier. If you’re
wondering why this is important, measure how many times you write
try/catch/finally in java (or similar) next week.
The remainder of the line does the necessarily evil to fulfil something
like the original contract, where we wanted a String
that we could
later write out to a file. We need to extract the actual value from the
find
result (find
actually gives us back Option[Http[String]]
,
because there’s no guarantee any of our results
matched), throwing if
none were successful, and then we need to extract the String
out of
the found successful request.
An apology
Purist functional programmers will likely be groaning at this point. The
side effects of our little getter don’t happen at the top of our
program, necessarily, and the usage of Option
(and the Http
type,
which is pretty similar to Option
) is not particularly idiomatic
(hitting get
like functions on monadic types is frowned
upon).
Some side notes
Interestingly this very same function was implemented as part of a bash build script elsewhere, and it embodies a lot of the lessons that we’ve seen in the code here. Here’s a cut down equivalent.
#!/bin/bash
function noisy_curl() {
echo "issuing request to get $1"
return `curl -s -f $1`
}
noisy_curl http://www.digihippo.net/main.html ||
noisy_curl http://www.digihippo.net/index.php ||
noisy_curl http://www.digihippo.net/site.html
This is, given how much effort we’ve gone to set up the same effect in
Scala, embarrassingly trivial. One can start to see the elegance of the
unix way of doing one simple task, and writing the result to stdout,
while also indicating success with a return code. Here we use the -f
flag to tell curl
to ‘fail’ if we get a suitably erroneous looking
response code, and then just set up a series of curl
invocations such
that the first success short circuits the others. One starts to agree
with all
those
people
on
the
internet who
think UNIX pipelining looks a helluva lot like composition as seen in
functional programs.
We’ve come a long way baby
So let’s stop (for now). We could certainly expand on our example by
doing something more interesting that putting that result String
into
a file. We could consider approaches where we never parse the whole
String
but instead hold on to the InputStream
instead. These deserve
to be posts in their own right though, so we will not discuss them
further here.
What did we learn?
- Lazy evaluation allows you to elide boring control flow from your code. More generally, a functional approach can help you separate the valuable parts of your code from arduous syntactic toil
- Wrapping exceptions into return types at the point where they are thrown makes life so much easier
- Communicating errors in exceptions is really only good for one shot calling patterns
…and, again, more generally, exceptions are vastly overused. How often
per day do you get a broken web page when browsing? Is it really
exceptional? Are browsers dealing with 404 or 500 via an exception path?
I seriously hope not. Exception abuse seems particularly prevalent in
java, perhaps because so many libraries use IOException
as their IO
alarm, and possibly also because the language makes it so difficult to
work with something like the Http
type. Once again, I find myself
thinking that Java 8 really cannot arrive soon enough.
Moral of the story
Stop throwing all that stuff around and relax, dude.