jLuger.de - Screen scrap images with PhantomJS

PhantomJS is a headless webkit based browser. This makes it a great tool for unit testing with a real world browser engine. It even supports to create screenshots of the sites it is displaying. While it is nice to see if there is an error it would be far more helpful to get access to the DOM. That's no problem as PhantomJS has an API to save content to disk. While you are over it you may want to save also the images of the site, to get it complete. Shouldn't be a great deal. The images are already download and display in the site. Well, it turned out to be a great deal.

First I've learned that there is no direct way to get the image data from the img-element. You need to create a canvas and draw the image into it. Then you can get the image data from the canvas. See this stackoverflow question. But this is just the first half of the deal.

The next thing is that you are working in two contextes. The first one is in PhantomJS. Your script is normally executed in it and there you can save data to the disk. The second context is in the page. You get in when you call page.evaluate with a function as an argument. The function is now in the page and can access the DOM of the page but can't write to disk. You have to get the image data in the second context and write it to the file in the first.

Some code as an example:
var myData = page.evaluate(function () { var result = {data:undefined, error:undefined}; var image = document.querySelector('table table td a img'); if (image==null) { result.error="Image not found"; return result; } var canvas = document.createElement('canvas'); canvas.width = image.width; canvas.height = image.height; var context = canvas.getContext('2d'); context.drawImage(image, 0, 0 ); result.data = canvas.toDataURL("image/png"); return result; }); if (myData.error) { exit(myData.error); } var file = "test.png"; fs.write(file,atob(myData.data.substring(22)),"b");