James Aylett: Recent diary entries

  1. Monday, 9 Jun 2008: Thoughts on Widefinder
  2. Monday, 9 Jun 2008: Widefinder: Pretty Graphs
  3. Friday, 6 Jun 2008: Exporting emails from Outlook
  4. Wednesday, 4 Jun 2008: Claiming the evil namespace
  5. Monday, 26 May 2008: iPlayer problems
  6. Sunday, 18 May 2008: Google, the Fast Follower
  7. Monday, 7 Apr 2008: URI Posterity
  8. Monday, 7 Apr 2008: PNG Weirdness
  9. Friday, 28 Mar 2008: Idiots
  10. Monday, 17 Mar 2008: Advertising tech
  1. Page 8 of 9

Thoughts on Widefinder

Published at
Monday 9th June, 2008
Tagged as
  • Scaling
  • Widefinder
  • Parallelism

Last year, Tim Bray ran a mini-investigation, based on the idea of parallelising traditionally linear tasks. He wrote a simple web logfile analyser, without any fancy tricks, and watched other people beat its performance in various ways, providing colour commentary along the way. (Should that be color commentary, given that we don't actually have that phrase in British English?). He called it Wide Finder, and the results, although somewhat unscientific because of the constraints he was under, showed the best performance in terms of elapsed time on a multicore T5120 some three orders of magnitude better than Tim's linear implementation in Ruby, with about an order of magnitude more code. The T5120, as Tim pointed out, is the shape of the future, both in the data centre and on the desktop; it doesn't matter who your processor designer of choice is, these things are in scale out rather than scale up mode for at least the next few years.

Now, he wants to do it again, only better: Wide Finder 2 gives people the opportunity to write their own faster version of Tim's linear code in whatever way they want, and to run it on a T2000. Tim is concerned with a balance of complexity and performance; complexity is mainly being measured in LOC, which is probably reasonable to get things going. The crucial idea is that we need techniques for taking advantage of modern multi-core and multi-thread techniques in processors that don't require everyone to be experts in concurrency and multiprocessing. This could prove interesting.

There are three things that I think we should try to shed some light on after WF-1. If you look back over the results, several use memory mapping to reduce the I/O overhead as much as possible, and then have multiple workers go over the space of the file, in chunks, either with OS threads or processes as workers, or something managed by the VM of the language itself. Either you have lots of little chunks, or you have as many chunks as workers, which I'd guess is less strain on the data pre-fetch scheduler in the operating system. Whatever, we're talking about data decomposition with independent data (as in: the processing of each individual log line is independent of other log lines or the results of their processing). This is the easiest kind of data decomposition. So: three things we can investigate from here.

  1. Can we do data decomposition in the independent data case automatically?
  2. For the kinds of problems that we know data decomposition with independent data works well, can we come up with better approaches?
  3. Are there reasonable kinds of problems for which data decomposition is either much too complicated or simply not applicable?

I'll tackle my thoughts on each of them separately, in that order. Most of the stuff I've done is on the first one, as this seems the most interesting to me.

Can we do data decomposition in the independent data case automatically?

There are two ways I can think of for doing this, one of which isn't strictly automatic but is more applicable. This is assuming that all your data comes from the same place to start off with, and that everything's running on one machine, although I'm pretty sure both of those could be lifted with a bit of cleverness (the setup becomes harder, but the user code should remain the same).

Cheating

The first way is kind of cheating. Assume we're dealing with a single loop over the data in the linear case, ie a single reduction. If you write your loop using a line iterator or generator, you can put it all in a function which takes a line generator and some sort of accumulator; a library can then drive your function and take care of making it work in parallel. Let's work through an example in Python to see how this might work. I'm not going to do the WF-2 example like this yet, because it's close to 100 lines long. Let's just calculate the arithmetic mean of a list of integers. This is pretty easy to do in the linear style.

import sys
f = open(sys.argv[1])
n = 0; s = 0
for line in f.readlines():
    try:
        i = int(line)
        s += i
        n += 1
    except:
        # ignore invalid lines
        pass

print (float(s) / n)

So we change things around so we're not driving the loop any more.

import sys, parallel

def processor(lines, driver):
    acc = driver.get_accumulator()
    for line in lines:
        try:
            i = int(line)
            acc.acc('s', i)
            acc.acc('n', 1)
        except:
            pass
    return acc

result = parallel.process(sys.argv[1], processor)
print (float(result['s']) / result['n'])

This is very similar in terms of both plain LOC and complexity, although of course there's stuff hiding in the parallel module. For a linear implementation, that's another 50 odd lines of code; for a fairly simple parallel implementation using mmap and a pipe-fork approach, it's over 100. It's tedious code, and not as efficient as it could be; my aim isn't to build this for real, but to nudge towards an existence proof (really an existence hunch, I guess). I won't bother showing it here for that reason, and because the point is that it's infrastructure: it gets written it once so the user code doesn't have to worry about it. Mine is basically Sean O'Rourke's WF-1 winner, translated to Python.

Forking a load of children costs time, moving partial results around and accumulating them takes time, and that can overwhelm the advantages of running on multiple cores when you don't have a large data set; for the simple arithmetic mean, you have to have larger files than I could be bothered with to show an improvement; doing a similar job of counting Unix mbox From_ lines, the parallel version across eight cores was about three times better than the linear version. I haven't bothered trying to optimise at all, so it's likely I'm doing something dumb somewhere.

import sys, parallel

def processor(lines, driver):
    acc = driver.get_accumulator()
    for line in lines:
        try:
            if line[0:5]=='From ':
                acc.acc('n', 1)
        except:
            pass
    return acc

result = parallel.process(sys.argv[1], processor)
print result['n']

Some empirical data, for the Unix mbox job: running on a machine with dual quad-core 1.6GHz Xeons, averaging over 10 consecutive runs from warm (so the roughly 1GB file should already have been in memory somewhere), we get the following. (Note that the GNU grep(1) does it in a little more than two seconds; this is always going to be a toy example, so don't read too much into it.)

Graph of run time against processes

The red line is the mark for my naive linear implementation; its main advantage is that a lot of the work is being done in the interpreter, in C, rather than in my code, in Python. This makes a big difference - the parallel version seems to be doing about 2.5 times as much work. It's worth noting that this machine is not completely unloaded; it's multi-user and also acts as a web and mail server, amongst other things, so beyond J=6 we're seeing slightly more flaky numbers. Back-of-the-envelope monitoring, however suggests that we don't start getting interference from other processes on the box before we start seeing interference from its own processes, where J=9 and we run out of cores (although the system does a reasonably good job of keeping things going from there up).

Note that there's some interesting discussion going on around WF-2 about how to scale some of these techniques up to gigabytes of input in one go; Alex Morega in particular has been using a broadly similar approach in Python and hit some interesting snags, and I urge you to read his write-up for details. Either we'll come up with new idioms that don't have these problems, or we'll improve the things we depend on (languages, libraries, VMs, operating systems...) to allow us to work better at scale without radically changing the way we code. (I'm sure there are some great approaches to these problems that I haven't heard of - with luck, WF-2 will bring them to a wider group of people.)

Enough on this. Hopefully I've convinced you that this is entirely feasible; it's a matter of someone writing a good library to drive it all.

Not cheating

In order to be more automatic, we need to convert the original linear style into the parallel style entirely programmatically. To my knowledge, you can't do this in Python. In fact, many languages, even dynamic languages, don't want you dicking around directly with live code constructs, which limits the applicability of this idea somewhat. However it's entirely possible to imagine that we could, in some theoretical super-Python, write the following and have the parallel module rewrite the process function appropriately.

import sys, parallel

def process(file):
    f = open(sys.argv[1])
    result = { 'n': 0, 's': 0 }
    for line in f.readlines():
        try:
            i = int(line)
            result['s'] += i
            result['n'] += 1
        except:
            pass
    f.close()
    print (float(result['s']) / result['n'])

parallel.apply(process, sys.argv[1])

In complexity for the programmer, this is like a halfway house between the linear and parallel approaches above, with the nice advantage that it doesn't remotely look parallel except for the last line.

I'm certain it is possible because the only things that enter the loop either can be considered immutable (in the scope of the loop), or are the file object f (which we only ever apply readlines() to), and the only thing that escapes the loop is a dictionary, and that within the loop we only ever accumulate into that dictionary. It's not beyond the bounds of programming ability to spot this situation, and convert it into the cheating version above. Although it's probably beyond my programming ability, and certainly is in Python. In fact, we could probably write a parallel.invoke() or something which is given a Python module, and parallelises that, at which point we've got a (limited) automatic parallel Python interpreter. Again, providing you can mutate Python code on the fly.

A question which arises, then, is this: given the constraints of only being able to parallelise loops over quanta of data (for instance, iterating over log lines, or 32 bit words, in a file), with immutable inputs and a number of dictionary outputs, how large is the problem space we can solve? This is actually question three on the original list, so I'll leave it for now.

Another is whether we can lift any of these restrictions, the main one being accumulation. (And I'm counting subtraction as accumulation, for these purposes.) Assuming data independence, there aren't actually many alternatives to accumulation: I can only think of multiplication, which is the same as accumulation anyway, give or take some logarithms. So I'm guessing the answer to this question is an unwelcome "no"; however I'm probably wrong about that. You can do an awful lot with accumulation, though.

For the kinds of problems that we know data decomposition with independent data works well, can we come up with better approaches?

I'll be very disappointed if the answer to this isn't "yes". However I don't think WF-2 is necessarily going to show us much here by being almost tuned to this kind of approach. I'm probably wrong about this as well, though, and new ways of thinking about these kinds of problem would be great.

It's not clear to me, because I don't understand the languages well enough, whether all the techniques that were used in WF-1 with JoCaml and Erlang are covered by data decomposition (beyond things like optimising the matcher). Even if there aren't, there are undoubtedly lessons to be learned from how you structure your code in those languages to approach these problems. This partly falls under the previous question: if we can't automatically parallelise our old ways of doing things, then we want new idioms where we can.

Are there reasonable kinds of problems for which data decomposition is either much too complicated or simply not applicable?

The simple answer is "yes", but that's in general. WF-2 is concerned with what most programmers will have to deal with. Here, I wonder if the answer might be "no". You can do an awful lot without sacrificing data independence. I calculated the arithmetic mean earlier, but for instance you can do standard deviation as well, providing you defer most of the computation until the end, and are prepared to unwind the formula. Generally speaking, we don't seem to actually do much computation in computing, at least these days. I think this means that for most people, beyond this one technique, you don't have to worry about parallelism too much at the moment, because anything that isn't linear to start off with (like, say, web serving) is already scaling pretty well on modern processors. So if we can use data decomposition automatically in some common cases, we dodge the bullet and keep up with Moore's Law for a few more years.

Dependent data decomposition

Data decomposition continues to be useful even with related data; however you start getting into I/O or memory model issues if your data set starts off all jumbled up, as it is in WF-2, because you've got to sort it all to get it into the right worker. For instance, if you want to do logfile analysis and care about sessions, it becomes difficult to do anything without arranging for all the data within a session to go to the same worker. (Not impossible, but in the context of making it easy on programmers, I think we can ignore that case.) In most cases, you're better off filtering your data once into the right buckets, ideally at the point you collect it; you can of course parallelise filtering, but if you're doing that at the same time as your final processing, you're moving lots of data around. The only situation I can think of where it's not going to help to filter in advance is if you need to do different processing runs over the data and your dependencies in the processing runs are different, resulting in a different data decomposition. On the other hand, I can't think of a good example for this. I'm sure they exist, but I'm less sure that they appear to regular programmers.

Note that in my parallel module above, for dealing with an independent data problem, I made a silent assumption that it's cheaper to not move much data around between the parts of your system; so it's far better for the controlling part of the system to tell a worker to attack a given byte range than it is to actually give it that data. This is surely not true in the general case, which is great news for dependent data decomposition. Given suitable VM infrastructure, or an appropriate memory model, this shouldn't actually be a problem; it's just that for most languages on most platforms at the moment, it seems to be. On the other hand, once you scale beyond a single machine, you really want the data you're working on to be close to the processing; a large part of the magic of Hadoop seems to be about this, although I haven't looked closely.

First results for WF-2

First results for WF-2 are beginning to come in. Ray Waldin set the bar using Scala, taking Tim's thousands of minutes down to tens. By now the leaders are running in minutes - note that if the ZFS pool can pull data at 150Mbps, as the Bonnie run showed with a 100G test, then the fastest all the data can come off disk is a little under five minutes; we're seeing results close to that already.

I'll start posting timings from my runs once I get it up to the complete data set; I'm also looking at how the use efficiency changes by number of workers across the size of input file, since Tim has conveniently provided various sizes of live data. So this might take a few days; and there's a chance my times will be embarrassingly bad, meaning I might just not publish :-)

A final point

There's a great book called Patterns For Parallel Programming, by Mattson, Sanders and Massingill (if you're not in Europe you may prefer to get it from Amazon.com). It has a lot more detail and useful thoughts on parallelism than I could ever come up with: although I have some experience with scaling and data processing, these guys do it all the time.

Widefinder: Pretty Graphs

Published at
Monday 9th June, 2008
Tagged as
  • Scaling
  • Widefinder
  • Parallelism

This is about Wide Finder 2; I'm trying to apply the technique I discussed earlier, where we aim for minimal changes in logic compared to the benchmark, and put as much of the parallelism into a library. I'm not the only person working in this fashion; I think Eric Wong's approach is to allow different data processors to use the same system (and it allows multiple different languages, pulling it all together using GNU make), and I'm sure there are others.

I don't have any full results yet, because I ran into problems with the 32 bit Python (and forgot that Solaris 10 comes with a 64 bit one handily hidden away, sigh). However I do have some pretty graphs. These are interesting anyway, so I thought I'd show them: they are timing runs for a spread of worker numbers between 1 and 128, working on 1k, 10k, 100k, 1m and 10m lines of sample data. The largest is about 1.9G, so it still happily fits within memory on the T2000, but this is somewhat irrelevant, because the fastest I'm processing data at the moment is around 20M/s, which is the Bonnie figure for byte-by-byte reads, not for block reads. We should be able to run at block speeds, so we're burning CPU heavily somewhere we don't need to.

Graph of run time against processes

At J=128, all the lines are trending up, so I stopped bothering to go any higher. Beyond the tiny cases, everything does best at J=32, so I'll largely concentrate on that from now on. Update: this is clearly not the right way of approaching it. Firstly, the fact that I'm not using the cores efficiently (shown by not hitting either maximum CPU use per process nor maximum I/O throughput from the disk system) means that I'm CPU bound not I/O bound, so of course using all the cores will give me better results. Secondly, Sean O'Rourke showed that reading from 24 points in the file rather than 32 performed better, and suggested that fewer still would be an improvement. So I need to deal with the speed problems that are preventing my actually using the maximum I/O throughput, and then start looking at optimal J. (Which doesn't mean that the rest of this entry is useless; it's just only interesting from a CPU-bound point of view, ie: not Wide Finder.)

You'll see that we seem to be scaling better than linearly. The following graph shows that more clearly.

Graph of M/s against processes

The more we throw at it, the faster we process it. My bet is that ZFS takes a while to really get into the swing of things; the more you fire linear reads at it, the more it expects you to ask for more. At some point we'll stop gaining from that. Of course, that analysis is probably wrong given we shouldn't be I/O-bound at this point.

Graph of run time against data size

(Larger because it's fiddly to read otherwise.) Note that for each J line, from its inflection point upwards it's almost a straight line. It's not perfect, but we're not being hit by significant additional cost. None of this should really be a surprise; processing of a single logline is effectively constant time, with the same time no matter how many workers we have. The things that change as we increase either J or the total number of lines processed are the number and size of the partial result sets that we have to collapse, and other admin; but these differences seem to be being drowned out in the noise.

I'd say there's something I'm doing wrong, or vastly inefficiently, that's stopping us getting better use out of the cores on this machine. That's also more interesting than optimising the actual processing of the loglines.

Finally, the code. No commentary; this is really just Tim's version in Python, but using the parallel driver. I made the final sort stability independent of the underlying map implementation (Hash in Ruby, dict in Python), but that should be it; so far it's given me the same results, modulo the sorting change.

import re, sys, parallel

def top(dct, num=10):
    keys = []
    last = None
    def sorter(k1,k2):
        if k2==None:
            return 1
        diff = cmp(dct[k1], dct[k2])
        if diff==0:
            return cmp(k2,k1)
        else:
            return diff
    for key in dct.keys():
        if sorter(key, last)>0:
            keys.append(key)
            keys.sort(sorter)
            if len(keys)>num:
                keys = keys[1:]
            last = keys[0]
    keys.reverse()
    return keys

hit_re = re.compile(r'^/ongoing/When/\d\d\dx/\d\d\d\d/\d\d/\d\d/[^ .]+$')
ref_re = re.compile(r'^\"http://www.tbray.org/ongoing/')

def report(label, hash, shrink = False):
    print "Top %s:" % label
    if shrink:
        fmt = " %9.1fM: %s"
    else:
        fmt = " %10d: %s"
    for key in top(hash):
        if len(key) > 60:
            pkey = key[0:60] + "..."
        else:
            pkey = key
        if shrink:
            print fmt % (hash[key] / 1024.0 / 1024.0, pkey)
        else:
            print fmt % (hash[key], pkey)
    print

def processor(lines, driver):
    u_hits = driver.get_accumulator()
    u_bytes = driver.get_accumulator()
    s404s = driver.get_accumulator()
    clients = driver.get_accumulator()
    refs = driver.get_accumulator()

    def record(client, u, bytes, ref):
        u_bytes.acc(u, bytes)
        if hit_re.search(u):
            u_hits.acc(u, 1)
            clients.acc(client, 1)
            if ref !='"-"' and not ref_re.search(ref):
                refs.acc(ref[1:-1], 1) # lose the quotes

    for line in lines:
        f = line.split()
        if f[5]!='"GET':
            continue
        client, u, status, bytes, ref = f[0], f[6], f[8], f[9], f[10]
        # puts "u, #{u}, s, #{status}, b, #{bytes}, r, #{ref}"
        if status == '200':
            record(client, u, int(bytes), ref)
        elif status == '304':
            record(client, u, 0, ref)
        elif status == '404':
            s404s.acc(u, 1)
    return [u_hits, u_bytes, s404s, clients, refs]

(u_hits, u_bytes, s404s, clients, refs) = parallel.process(sys.argv[1], processor)

print "%i resources, %i 404s, %i clients\n" % (len(u_hits), len(s404s), len(clients))

report('URIs by hit', u_hits)
report('URIs by bytes', u_bytes, True)
report('404s', s404s)
report('client addresses', clients)
report('referrers', refs)

81 LOC, compared to Tim's 78, although that's disingenious to an extent because of the parallel module, and in particular the code I put in there that's equivalent to Hash#default in Ruby (although that itself is only three lines). Note however that you can drive the entire thing linearly by replacing the parallel.process line with about three, providing you've got the accumulator class (19 LOC). In complexity, I'd say it's the same as Tim's, though.

Exporting emails from Outlook

Published at
Friday 6th June, 2008
Tagged as
  • Outlook
  • Email
  • Export
  • WIP

When I left Tangozebra last year, I had various folders of emails that I needed to take with me. I did what seemed to be the sensible thing of exporting them as Outlook .pst files, copied them onto a machine that was going with me, and thought no more about it.

Then, when I needed them, of course, I couldn't open them. I have Outlook 2002 on my machine at home, but these needed Outlook 2007. Fortunately, there's a demo version you can download and play with for 60 days - long enough to get the data off, but not long enough to just keep them all in Outlook. So I was looking for a way of exporting emails. Outlook actually has a way of doing this, although it's not really practical for the thousands of emails I've accumulated over the years that are important; however the export feature isn't in the demo anyway, so it's somewhat moot.

I scrobbled around the internet for a bit, finally chancing upon a tutorial and sample script for exporting data from Outlook using Python. It uses the built-in email-as-text export feature of Outlook, which frankly is pretty unappealling, lacking as it does most of the headers, and in particular useful things like email addresses. Also, their script outputs emails as individual files, which again is unhelpful: I just want an mbox per folder.

So I wrote an Outlook email reaper. It's happily exported about 4G of emails, although it's a long way from perfect. See the page above for more details.

Claiming the evil namespace

Published at
Wednesday 4th June, 2008
Tagged as
  • Presentations
  • Evil
  • Laziness

One of the crazy ideas that occurred at South By Southwest this year was the general application of evil to presentations. Not entirely unlike Battledecks (but practical rather than entertaining), the reason behind the idea is threefold.

It took a bit of time to get up and running, partly because I wanted to be absolutely scrupulous in how I was using other people's images: they must be public, and must be licensed appropriately. However I'm now happy to announce evilpresentation, a simple tool for creating presentations using the power of Flickr, random number generators, web monkeys, and so forth.

In the process of doing this, I of course had to 'claim' a machine tag namespace: evil: is for evil things. Currently, we just have evil:purpose= for the presentation system, but I'm sure someone will come up with some other evil uses in future. Evil is fun.

Mark Norman Francis and Gareth Rushgrove helped come up with the idea, or at least kept on ordering margaritas with me around; I can't remember which (see above, under margaritas).

iPlayer problems

Published at
Monday 26th May, 2008
Tagged as
  • URI design
  • BBC
  • Failure modes

I generally like the BBC's iPlayer; it's not great, but it seems to work. However today I decided I'd watch "Have I Got News For You", based on Paul's accidental involvement. Two little problems.

Firstly, the hostname iplayer.bbc.co.uk doesn't exist. Google has made me expect that this stuff should just work; but that's not a huge problem, because Google itself told me where the thing actually was. However having it on a separate hostname would be a really smart idea, because iPlayer requires Javascript. Using a different hostname plays nicely with the Firefox NoScript plugin, and that just strikes me as a good idea.

The real problem came when I searched on the iPlayer site. Search for "have i got news for you", and you get a couple of results back. Click on the one you want, and you get sent to http://www.bbc.co.uk/iplayer/page/item/b00bdp78.shtml?q=have+i+got+news+for+you&start=1&scope=iplayersearch&go=Find+Programmes&version_pid=b00bdp5g, which was a 404 page which doesn't help very much. I mean, they try, but since this is a link generated by their own site, it doesn't help me very much.

So I thought "that's annoying", and was close to firing up a BitTorrent client instead when I wondered if their URI parser was unencoding the stream, and then getting confused because of all the +-encoded spaces. http://www.bbc.co.uk/iplayer/page/item/b00bdp78.shtml?q=yousuck&start=1&scope=iplayersearch&go=Find+Programmes&version_pid=b00bdp5g, for instance, worked perfectly. (Which doesn't help you very much, as iPlayer URIs seem to be session-restricted or something.)

Google, the Fast Follower

Published at
Sunday 18th May, 2008
Tagged as
  • Hack
  • Google
  • Get real, please

Wow! It’s amazing! You can use Google spreadsheets to calculate stuff! Thank heavens we have Matt Cutts and Google App Hacks to teach us stuff that Excel users have been doing for two decades.

Okay, Google: it’s time to wake up now.

URI Posterity

Published at
Monday 7th April, 2008
Tagged as
  • Idiocy
  • Posterity

So I'm having trouble writing a widescreen DVD; I suspect what I actually need to do is upgrade to the all-singing, all-dancing Adobe CS3 Production Premium, which includes Encore and should be able to do everything I want and more. (I don't want much. Honestly.) Before paying lots of money though, I did the "sensible thing" and tried various things I didn't have to pay for, either because they're free or because they're already on my computer.

In the process of doing this, I fired up something that came with one of my DVD writers, probably in the last twelve months. I got the following error box:

Screenshot of error message saying that some unintelligibly long URI cannot be loaded

This is a perfect example of why URI design is important. Had the program been looking for the URI http://liveupdate.cyberlink.com/product/PowerProducer;version=3.2, say, then there's a reasonable chance that URI scheme would have stayed. That hostname doesn't provide a website (although the root URI returns an HTML document typed as application/octet-stream), just a service. Make the URI easy, and you won't have this problem.

Of course, you should also catch errors. And present them usefully ("I cannot check for updates at this time - perhaps this version is too old and no longer supported?"). But hey, that's experience design, which is nothing to do with the point of this post. (And, it seems, nothing to do with the creation of Cyberlink's PowerProducer product.)

In a telling coda, the URI for the PowerProducer page doesn't really look like it'll last that long either: http://www.cyberlink.com/multi/products/main_3_ENU.html. Sigh.

PNG Weirdness

Published at
Monday 7th April, 2008
Tagged as
  • PNG
  • Compression
  • Doesn't that just make you go 'Oooh'?

So for my previous entry I had to create an image. Screenshot, paste into Photoshop, save as PNG. Done. Now a thumbnail: save-for-web, 50% scale, PNG. Done.

Erm.

-rw-rw-r-- 1 james james 16K 2008-04-07 13:42 error-message-large.png
-rw-rw-r-- 1 james james 33K 2008-04-07 13:43 error-message-thumb.png

Something's not quite right here: the half-sized thumbnail is taking up twice the space.

error-message-large.png: PNG image data, 1001 x 126, 8-bit/color RGB, non-interlaced
error-message-thumb.png: PNG image data, 500 x 63, 8-bit/color RGBA, non-interlaced

Okay, so maybe it's the colour space - RGBA is storing more data. Not, you know, twice as much data, but 32 bit rather than 24 bit is going to hurt in some way. So I go back in, resize the image in Photoshop, and save as PNG using the same options as for the large one.

-rw-rw-r-- 1 james james 26K 2008-04-07 13:45 error-message-small.png
error-message-small.png: PNG image data, 500 x 63, 8-bit/color RGB, non-interlaced

Okay, so reducing the colour space to 24 bit does what we expect: reduces the filesize by a quarter. It still doesn't explain why the original is so much smaller. Okay, well Photoshop also comes with ImageReady, so perhaps that can help.

InputDimensionsInput sizeOptimised size
large1001x12616K15.29K
thumb500x6332.1K32.12K
small500x6325.7K25.36K
large resized500x63~4K29.04K

The last row is the interesting one: when resizing in ImageReady, it generates a 4K image... then optimises it to 29K.

For what it's worth, GIMP doesn't do any better. Anyone have any idea what's going on, or is it just pixies?

Idiots

Published at
Friday 28th March, 2008
Tagged as
  • Idiocy
  • Arrogance

I'll make this quick. Could the idiots in my life please leave?

Okay, so I'm stunningly arrogant, but I really don't like it when my imagination outstrips reality to such a staggering degree. It's one thing to dream of flying cars, but quite another to think of things that are both technologically and economically viable and still wake up and discover they don't exist. Idiots: get to it. In the meantime I have to replace your shit with stuff that works, or find some way of chilling out. Neither should be necessary.

Advertising tech

Published at
Monday 17th March, 2008
Tagged as
  • Google
  • Advertising
  • Technology

John Battelle makes a good point about (a) chasing Google and (b) the key to actually getting somewhere in the advertising market. Of course, this could be considered to be exactly the same as saying "it’s all very well coming up with funky new technology, but does it actually solve a real problem?". For various reasons I’m convinced there is still a huge amount of potential for new tech to offer value to the advertising industry. However I suspect it’s all in the hidden layer behind the scenes (back office, basically): the technology we have for delivering adverts right now is so far from being used to its full potential it does seem a little crazy to be trying to build yet more of it.

  1. Page 8 of 9