| <<<Back 1 day (to 2011/12/30) | 2011/12/31 |
phira | howdy, we're doing server-side conversion of pdfs into images (jpeg), we're using gs right now. Are there other options that are likely to be significantly faster? mupdf? | 01:04.13 |
ray_laptop | phira: The startup overhead time of Ghostscript can be avoided by using GS in a 'server' mode where you set the output file name and then tell GS to interpret an input PDF | 01:41.55 |
phira | nods | 01:42.38 |
| thanks for that :) | 01:42.51 |
ray_laptop | phira: for multiple page files this doesn't help much, but for usage where the PDF's only have one or two pages, it can be signiificant | 01:43.22 |
phira | we have mostly multiple pages sadly | 01:43.44 |
| but still, good to know. | 01:43.52 |
ray_laptop | phira: OK | 01:44.04 |
| probably not much you can do. GS and MuPDF are similar, but can differ in performance for some types of files. If your files are all similar, then benchmark them. | 01:45.51 |
| phira: BTW, if you discover any files that are particularly slow (or memory intensive) with one or the other, we'd like to know (and get the file to test with) | 01:47.11 |
phira | ok, we will do | 01:47.38 |
| do you know of any online saas style services that do pdf to image conversion with an api? | 01:48.07 |
ray_laptop | phira: thanks | 01:48.13 |
| phira: Nope -- I don't track SAAS services | 01:48.48 |
phira | sure just thought one might have stumbled in here, all good we'll keep trucking along and see what comes up :) | 01:49.09 |
ray_laptop | (but I suspect that if there are "free" ones, they use GS) | 01:49.14 |
phira | not too worried about free | 01:49.27 |
| the main problem we have is speed - we have someone upload a pdf and we need to convert it pretty quickly and deal with spikes in demand | 01:49.53 |
| which could mean us having to deploy a cluster of systems to do it, which I'd rather avoid for obvious reasons | 01:50.11 |
| so if we paid someone else to deal with the headache that'd be fine. | 01:50.24 |
ray_laptop | and if they charge for them and it isn't obvious what converter is used, we'd like to know -- there are occasional violators of GPL. | 01:50.51 |
phira | nods | 01:51.23 |
ray_laptop | phira: we use a cluster to convert files for our 'regression' tests. The perl scripts that do this adaptive scatter-gather of testing can probably be adapted to your needs. Please see gs/toolbin/localcluster | 01:53.57 |
phira | oh neat, thanks | 01:54.17 |
ray_laptop | the "servers" poll the "master" to see if there are any jobs to run. The action performed sometimes just collects a MD5 sum to detect differences, but the 'bmpcmp' mode of test collects raster image files | 01:55.50 |
| since the master dispatches, the servers sort of 'self balance' in that a "killer" job only ties up that server | 01:57.14 |
| since it doesn't request other jobs while it is busy | 01:57.49 |
phira | yepyep | 01:57.59 |
ray_laptop | to watch it work see http://www.ghostscript.com/regression/ | 01:59.17 |
| every time someone commits a change to our 'git' repo, it runs >50K jobs to see what changed. As mentioned above, we usually only capture the MD5, but can capture the raster image | 02:01.06 |
| phira: I hope this helps. BTW, the regression system invokes GS each time -- not in "server" mode where multiple input files produce output files. | 02:02.51 |
| let us (me) know if you need hints on processing multiple input files into separate output files. | 02:04.34 |
| wife calls... back in a bit. | 02:04.50 |
noobirc | happy new year! | 11:41.05 |
Robin_Watts | happy new year everyone. | 19:09.56 |
| Forward 1 day (to 2012/01/01)>>> | |