|
2007-02-04, 18:38 - General
Having moved from flickr to SmugMug I was faced with the prospect of moving all my flickr photos across. A quick look at flickr reveals there is no easy way to get your photos back (not even a "Zip & Download" for galleries), so faced with the prospect of a weekend of picture-by-picture clicking I decided to get my stuff moved across in bulk programatically. Here's how I did it ...General Approach
I wanted to preserve the titles of each of the pictures, but I was happy to dump everything from flickr into a single 'import' gallery on SmugMug and then use SmugMug's civilised organising features to get everything into the galleries I wanted.
Looking at flickr's Terms of Use (which also points out that the terms change when moving to a Yahoo! Id) I notice that users "must not modify, adapt or hack Flickr.com" -- so thoughts of a grand adapting web app for flickr (with a 'copy to SmugMug' button) went by the wayside, and instead I opted for a command-line application.
As a programming language I chose Groovy, partly as learning a new programming language on the fly makes it interesting
Preliminaries
Groovy runs on the Java platform, but a couple of third-party libraries are helpful to keep things nice: the Jakarta Commons HttpClient provides an excellent higher-level abstraction for dealing with the network interaction, and Elliotte Rusty Harold's XOM is a lovely library for dealing with XML without too much syntactic cruft.
Logging In
First up, we need to get a couple of sessions going: one with flickr and one with SmugMug. Here's some code (I'm not posting the whole thing -- but am happy to share if anyone's interested. Oh, and I hope the variable names make it 'self documenting'
First, to log into flickr:
// Login to flickr by POSTing to the "old skool" form target
def post = new PostMethod( flickrLoginUrl )
post.addParameter( "email", flickrUsername )
post.addParameter( "password", flickrPassword )
def status = flickrClient.executeMethod( post )
println "Flickr login, status=$status"
And then, to SmugMug ...
// Login to SmugMug over SSL using their REST API
def smLoginUrl = this.smugMugApiUrlStub
+ "method=smugmug.login.withPassword"
+ "&APIKey=" + smugMugApiKey
+ "&EmailAddress=" + smugMugUsername
+ "&Password=" + smugMugPassword
def get = new GetMethod( smLoginUrl )
smClient.executeMethod( get )
It's interesting to note the difference of approach here. When using flickr, it appears the username and password are sent as plain text - yikes! SmugMug's rest API, OTOH, is exposed though an SSL layer.
Also note, that in best REST fashion, the login GET to SmugMug responds with an XML document. We can then use XOM to extract the SessionID from this response -- we'll need it later to interact with SmugMug.
// Build a XOM XML document from the returned bytes
def doc = new Builder( false ).
build( new ByteArrayInputStream( get.responseBody ) )
get.releaseConnection()
// Use XPath to get the SM SessionID
sessionId = doc.query( "/rsp/SessionID" ).get( 0 ).value
println "SmugMug session ID is " + sessionId
Enumerating the flickr photos
The approach here it to enumerate the sets, and then for each set to enumerate the photos. From the "Your sets" page, I was hoping to get hold of the XML and pull out the URLs for each of the sets' own pages.
It was here I got a nasty shock - the flickr pages aren't XHTML. They're not even (according to the W3C Markup Validation Service ) valid HTML!
So, time to observe, and scrape with a regexp:
/*
* Given the bytes of a flickr "sets" page, this
* iterates over each set ...
*/
static void processAllSets( byte[] page )
{
String s = new String( page )
def pat = /a class="Seta" href="([^"]+)" title="([^"]+)"/
def matcher = ( s =~ pat )
println "Found $matcher.count sets"
for( index in 0 .. matcher.count - 1 )
{
def setUrl = matcher[ index ][ 1 ]
def setTitle = matcher[ index ][ 2 ]
println
"Got set at $setUrl, entitled '$setTitle'. Processing ..."
processSet( setUrl, setTitle )
}
}
Note Groovy's nice syntax for handling regexps.
Now we've got a URL for each set we can (using a regexp again) get a URL for all the links to thumbnails on that set's page
/*
* Given a flickr set, this iterates over each of the photos in it
*/
static void processSet( String setUrl, String setTitle )
{
def get = new GetMethod( "http://flickr.com" + setUrl )
def status = flickrClient.executeMethod( get )
println "Got setUrl, status $status"
String s = new String( get.responseBody )
// get the photo titles and thumbnail URLs
def pat = /title="([^"]+)" class="thumb_link"[^>]+><img src="([^"]+)/
def matcher = ( s =~ pat )
println "Found $matcher.count photos in set"
for( i in 0 .. matcher.count - 1 )
{
def photoTitle = matcher[ i ][ 1 ]
def photoThumbUrl = matcher[ i ][ 2 ]
println "Got thumbnail at $photoThumbUrl, for photo entitled '$photoTitle'. Processing ..."
processPhoto( photoThumbUrl, photoTitle )
}
}
With this we'll end up with the URL of each of our photo thumbnails, and its caption. But we don't want to upload thumbnails to SmugMug, but our full-size original pictures. Luckily, flickr appears to follow a naming convention so that by changing the "_s.jpg" to "_o.jpg" in our URLs, we can synthesise the URL of the original photo.
/*
* Given a flickr photo, this copies it to SmugMug
*/
static void processPhoto( String photoThumbUrl, String photoTitle )
{
def photoUrl = photoThumbUrl.replace( "_s.jpg", "_o.jpg" )
def get = new GetMethod( photoUrl )
def status = this.flickrClient.executeMethod( get )
def byte[] raw = get.responseBody
this gives us (in the byte array called raw) our original image data. Next we have to generate an MD5 checksum, as this is required by the SmugMug upload mechanism. It's here that having Java on tap comes in very handy ...
def md = MessageDigest.getInstance( "MD5" )
md.update( raw )
def digestBytes = md.digest()
def checksum = ""
for( index in 0 .. digestBytes.length - 1 )
{
checksum += Integer.
toString( ( digestBytes[ index ] & 0xff ) + 0x100, 16 ).
substring( 1 )
}
println "Content length is: "+ raw.length
println "MD5 is: " + checksum
println "Starting POST ..."</pre></div>
Finally, we're ready to do the upload. Again, the REST way allows us to do this by POSTING our raw image data to SmugMug with the headers correctly set:
def put = new PostMethod( smugMugUploadUrl )
put.addRequestHeader( "Content-Length", "" + raw.length )
put.addRequestHeader( "Content-MD5", checksum )
put.addRequestHeader( "X-Smug-SessionID", this.sessionId )
put.addRequestHeader( "X-Smug-Version", "1.1.1" )
put.addRequestHeader( "X-Smug-ResponseType", "REST" )
put.addRequestHeader( "X-Smug-AlbumID", this.smugMugUploadGalleryId )
put.addRequestHeader( "X-Smug-Caption", photoTitle )
put.setRequestBody( new ByteArrayInputStream( raw ) )
smClient.executeMethod( put )
println "POST complete"
Et voila! automatic transfer of images from flickr to SmugMug.
The transfer process takes a while, as every image needs to get downloaded to, and then uploaded from, the client machine. It would be nice if SmugMug allowed pictures to be uploaded by URL (thereby bypassing the need to route the data through client machines with their measly domestic bandwidth) -- but maybe this opens up too much opportunity for abuse.
Conclusions
This does the trick, but for flickr uses with larger collections (multi-page sets, etc), some more code will be required. It might well be worth investigating flickr's own API for a more robust approach ...
In general though, I wish flickr had provided a better way for getting photos back in bulk - it would have made life a lot easier.

11 comments
( 1043 views )
| permalink
|
stumble this |
digg it!
|
|
2007-02-04, 08:23 - General
flickr has told all its users that they will shortly need a Yahoo! Id in order to access the service.I've had Yahoo! Ids in my time, and my memory of them is not good. What's more, having forked out for a flickr Pro account a while back, this unilateral change of flickr's terms of service leaves a bad taste.
So, over to SmugMug. That's better...
The challenge now is whether it's possible to automate the transfer of my galleries from flickr to SmugMug. SmugMug exposes a REST API for uploading, but the flickr side exposes nothing official for downloading. Hmmm - with a bit of screen scraping and jiggery-pokery, I wonder whether this is possible ...
Away from the screen, I took my daughter to a party at Wicken Fen and the light was just amazing, instantly making me regret that I had not brought my camera. So, remembering Ken Rockwell's advice that the camera doesn't matter, I got some snaps with a mobile phone. Here's one hosted (as it happens) on SmugMug. As to the equipment not mattering, hmmm -

|
|
2006-08-06, 23:10 - General
My arrival in Montreal was somewhat marred by the fact that British Airways managed to lose my luggage. Which on a direct flight from Heathrow takes some doing Fulminating at the BA desk ellicited a £35 ex-gratia payment, but with no fresh clothes, or sponge bag content, the beginning of this trip is somewhat grungy.
Despite this, excellent progress was made at today's WG1 meeting, where Jeni Tennison was able to attend to go through the latest draft of DTLL. With all substantive points now settled, all that remains is for me to prepare a revised document and we are on track for a final candidate draft (FCD) text.
There are two significant changes to DTLL as compared to the last draft.
The first is that the 1:1 relationship between a DTLL documents and a Namespace (for its declared datatypes) has been relaxed and brought into line with RELAX NG's more liberal approach. DTLL instances will now be able to declare a bunch of datatypes from different Namespaces.
The second is that when parsing values using regular expressions, DTLL processors no longer build a mini XML document behind the scenes, but instead merely a set of bound variables. This should make implementation somewhat simpler (though, having already done the work on this I felt - perhaps rather unreasonably - that this was a feature worth preserving).
During the lunch break I made a quick visit to a department store for fresh sets of clothes and toiletries, since the online baggage tracker revealed that my suitcase was still 'being traced'.
After lunch we discussed DSDL Parts 8, 7 and 9 — and our view is that now all of these texts will be nearing their final form in or before September 2006. So it looks likely a January WG1 meeting will be necessary to resolve ballot comments received and move them towards the the final stages of their standards status.
|
|
2006-07-22, 14:01 - General
The Propaganda Tiles are a number of exotic abstract graphics that can be seamlessly tiled to form backgrounds. I am particularly keen on using them for desktop backgrounds to avoid plain solid desktops (too boring) and photos of one's children (a but naff). Mind you, some of the Propaganda Tiles are somewhat psychedelic, so careful choice is required.The tiles were originally created by Bowie J Poag and released under the GNU GPL. But finding them on the Web has been difficult of late.
But to my joy, I have discovered a mirror of the complete set here.
One off the odder things about them are the peculiarly evocative names of the tiles, such as plastic-dinner-plate-1.jpg and lowdown-popcorn-1.jpg.
Me? I've just chosen the-alias-line-1.jpg. Smart.
|
|
2006-07-09, 10:16 - General
While browsing my favourite newsgroup, rec.music.classical.recordings, this morning I came across a link to a YouTube video of Arturo Benedetti Michelangeli playing Chopin's 1st Ballade.The combination of my favourite Chopin composition and a pianist whose performances are always interesting proved irresistible, especially since the recording through which I've got to know the piece, Artur Rubinstein's on RCA, has – though fabulous – made me think that this is music that is, as Artur Schnabel put it, ‘better than it could be performed’.

Rubinstein's late 1950s Chopin studio recordings
of the Scherzos and Ballades
And the video was tantalising, with Michelangeli showing all the executive perfection, good taste and narrative sense that is expected of him. The one disappointment was the sound quality (obviously one doesn't expect high fidelity through YouTube) and so I was prompted to look to see if I could buy a CD with his performance in decent sound.
Some googling revealed that there was a modern recording in print from DGG, but on looking at the CD cover I was surprised to see this was a recording I already owned. Somehow I had overlooked the existence of the Ballade recording on this disc! (maybe because I bought it principally to hear Michelangeli's Mazurka performances).

Michelangeli's 1972 studio Chopin recital disc
The CD recording is of very high fidelity (though a little dry) and presents a similarly conceived – though a little less austere – performance in fuller tonal splendour. This is a first Ballade to rank in my affections alongside that of my Rubinstein disc.
Somehow though, I feel the perfect 1st Ballade recording is out there somewhere; though equally part of me knows searching for it is fruitless.
Hmmm, I wonder what Sviatoslav Richter's recordings of it are like …
|

Categories



