README (text version)
I bet you recognise the following situation. You have some paper material you need to scan because you want to copy it, mail it, or something like that. This can be papers from school, sheetmusic, part of a book, a whole book, etc. Whatever it is, it's much. So, you take a deep breath a prepare yourself for a couple of extremely dull hours behind your pc.
While your in this infinite prescan / select / scan / save loop, you think ahead on stage two. Maybe, if you're a bit of a multitasking person, you're even working on stage two already. Which is of course: rotating all scanned material upright. Yes, this is important. No matter how small the angle, it will be distracting.
Then, when you are finally done what you have is a large (huge) pile of images, probably named in some clever way so they can be viewed in alphabetical order. It's all not as sharp as the original material but it will have to do. If you want to copy it you're only halfway and you still have the painful task of printing each individual file ahead of you. If on the other hand you want to save it or mail it you probably want to compress it, just to keep it all together. Of course it will have to be extracted to be viewed again but its the only option you have. You think.
To summarize, what you'd really want is this:
Now that you clearly remember this long cherished wish, here is, a dream come true…
Actually this is only the third item of the short wishlist. The other two have already existed for some time but they clearly missed point three to be really useful. Let me explain by repeating the above situation, the better way.
This one is easy. The key here is to forget about scan regions, so you can do without a gui. Sane comes with an excellent command-line frontend scanimage which lets you in control of all your scanner's options. Just take a minute to figure out the comand-line arguments you need and enter a simple bash loop like this:
for i in $(seq -w 999); do read -p 'press a key to continue' scanimage --mode Gray --resolution 300 > page$i.pnm done
Break out of it with control-c. This way you'll scan all that paper in no time.
For this use a bitmap tracer like autotrace or potrace, whichever you like. These programs transform a bitmap image (like the ones you just scanned) into a smooth and sharp postscript (eps) image. Personally I prefer the latter (don't let that influence you) so I'll continue this example with potrace. It takes a single command to turn all these scanned images into postscript. However, I strongly advise you to first preprocess these images with mkbitmap, a highpass filter that comes with potrace. This will significantly improve the final postscript outcome. So, two commands:
mkbitmap *.pnm potrace *.pbm
Easy, right?
Up until now you're left with a large bunch of eps files. The images may be sharp but they're not rotated, they all have a different placement and they are not joined in a single file. So that's where epsjoin comes in. It's a gtk2 / gnome app that builds a single postscript file of all these loose eps files by putting each on a single page. Each eps can be individually rotated and translated to correct scanning imperfections. As this happens at postscript level this doesn't involve raster operations, which means no loss of quality no matter how many rotations you need to get it perfect. The resulting postscript can of course be directly viewed or printed. Even better, it can be opened again to add, remove, reorder or modify images.
So far for the introduction, I'll go into detail now.
With epsjoin you can create a postscript document from a collection of eps images. Each image is put on a separate page, individually rotated and translated. The translating procedure is a bit unconventional, aimed to make it very easy to position the individual images in a consistent way throughout the document. Each image has its own coordinate system (the local system) which can be translated and rotated. The document page also has a coordinate system (the global system) that can be translated and scaled. The final placement of each eps image is determined by making the two coordinate systems coincide. Both are controled from epsjoin's preview window, which can be either in local or in global mode. The mode is switched with the spacebar.
In local mode the image's coordinate system is represented by two semi-transparent perpendicular lines. The origin of the coordinate system is the intersection of these lines. It is up to you to choose a useful origin that can be found on all images. An example of such point that may be found on each page is the page number. The origin is set by moving the two perpendicular lines with the mouse, as follows:
Another origin that can be often used is the intersection of the vertical line along the left margin and the horizontal line through the page number. This is a more useful point than simply the page number because this way the angle is automatically set as well.
When the origins are set, use the spacebar to switch to global mode. A semi-transparent rectangle will replace the two lines to mark the page boundaries. In this mode the global image scale and the location of the chosen origin on the page are set, again using the mouse:
The second mouse button lets you modify the local origin while in global mode, which can be useful when the origin point is not found on a certain image. In that case (for instance a title page) the origin can be set based on the page boundaries. Cycle through the images to confirm that they all have the correct placement. Now all you have to do is save the document and you're done.
Some notes about the other (initial) window. The menubar controls (open / save / export) the postscript document. The four buttons at the bottom control (add / remove / reorder) the eps images. If no image is selected new images are appended at the end, otherwise it is inserted before the selected image. To deselect an image, just close the preview window.
A small two-page postscript demo created with epsjoin can be downloaded below. You'll see the scanning imperfections are a bit exaggerated. The screenshots of epsjoin in action come from this same demo. They show the above described process, using the left margin / page number as origin.
The following is optional reading, for the interested.
The created postscript document is very simple. In the document prolog
a single TRANSFORM function is defined with the three parameters
that define the global coordinate system: scale s, horizontal
translation u and vertical translation v. These parameters are
read from this function declaration when opening the file. If it is
not found in the document's prolog section (ended with %%EndProlog)
you'll get a warning and the global parameters are not changed.
/TRANSFORM { <s> <u> <v> translate dup scale
neg rotate neg exch neg exch translate } bind def
The function modifies the postscript transformation matrix according
to the three parameters that define the image's local coordinate
system: horizontal translation x, vertical translation y and angle
a. Every included eps image is preceded by a call to this function,
which itself is preceded by gsave to save the current matrix.
<x> <y> <a> TRANSFORM
When the eps is drawn the matrix is restored with grestore and the
page is shown with showpage. The eps code is enclosed by
%%BeginDocument and %%EndDocument. This is the standard way of
including (eps) documents, which means that many postscript documents
created by other programs (such as dvips) can be opened in epsjoin as
well. Of course they will not contain the TRANSFORM call so you will
have to work yourself through some warning messages, but it works. In
case of a missing TRANSFORM line the three individual parameters are
left at the default.
I used to state here that there was a problem with eps files created by jpeg2eps. This was never the case. I had jpeg2eps mixed up with another program, jpeg2ps, which was in debian at that time but doesn't seem to be anymore. Moreover, there seem to be several different programs named "jpeg2ps" so it wouldn't be true to say there is a problem with that either. Just keep in mind that not all eps files work the way they should. I'm pretty sure the problem lies with the eps code in such cases, just because the described postscript format is too simple to cause trouble. Still, I don't know postscript well enough to be certain.
I have tried jpeg2eps and it works just fine. In the latest epsjoin version the preview window supports color and greyscale images to better handle these eps files. This makes jpeg2eps a valid alternative for bitmap tracers such as potrace and autotrace. Referring to the introduction this approach skips step two (have it just as clean and sharp as the original). In return you'll usually get a smaller document and better support for images (as opposed to text). What remains of course is an easy way to rotate and scale the separate images and join them.