Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2020/04/29)Fwd 1 day (to 2020/05/01)>>>20200430 
ator paulgardiner: ah, yes. I see the issue. the multi-language string writing commit broke combed fields. will fix.09:18.45 
paulgardiner ator: great thanks09:21.57 
ator paulgardiner: do you need this in the release?09:29.45 
pedr0 hi all. I am fiddling with mutool run - is it possible to print out or somehow loop over a Buffer obtained through readStream() instructions by instructions ?09:32.19 
ator what do you mean with "instructions by instructions"?09:34.00 
paulgardiner ator: I'd imagine we do.09:35.20 
ator paulgardiner: right. I'm pretty close to a fix, there are some bugs in it I can't quite figure out when mixing languages.09:36.39 
  probably Tj resets some positioning state :/09:36.48 
  switching fonts in the middle messes it up09:36.57 
  paulgardiner: I'm pretty confident I'll have as fix for you today09:37.23 
paulgardiner Magic09:37.30 
ator paulgardiner: eh, yes. it helps to use the right operator. Tj is not the same as TJ :)09:38.45 
  paulgardiner: see tor/release branch09:42.11 
  give that a test and review, please09:42.22 
kiwi_66 Hello, I need to massage PDFs for reading on e-readers so that pages that contain complicated layout look OK.09:54.07 
  An alternative to the so-so job done by Poppler to convert a PDF to HTML (and then on to EPUB), is to turn those problematic pages from text to picture, and merge everything back into a full PDF.09:54.11 
  It works, but those "picture" pages are much smaller as the "text" pages. Is there a way to crop the margins when turning PDFs into PNGs?09:54.16 
  https://postimg.cc/gallery/Z96s6r309:54.20 
  I used the following commands:09:54.24 
  #Loop: Turn all fifty pages into individual PDFs09:54.28 
  mutool clean -g input.pdf 1.pdf 109:54.28 
  mutool clean -g input.pdf 2.pdf 209:54.28 
  etc.09:54.28 
  #Loop: Convert problematic pages from PDF to PNG09:54.32 
  #213DPI, 758px width, 1024px height09:54.32 
  mutool draw -r 213 -w 758 -h 1024 -o 13.png input.pdf 1309:54.32 
  mutool draw -r 213 -w 758 -h 1024 -o 34.png input.pdf 3409:54.32 
  etc.09:54.32 
  #Loop: Convert PNG files into PDFs09:54.36 
  mutool convert -O compress -F pdf -o 13.pdf 13.png09:54.36 
  mutool convert -O compress -F pdf -o 34.pdf 34.png09:54.36 
  etc.09:54.36 
  #Merge all individual PDFs (untouched + turned into pictures) into single PDF09:54.38 
  #TODO Find way to build list and pass it on to merge09:54.38 
  mutool merge -o new.pdf -O compress 1.pdf 2.pdf etc.09:54.38 
  Thank you.09:54.41 
ator kiwi_66: margin cropping is not trivial, you may need to resort to imagemagick or some other tool to scan or crop the bitmaps10:01.02 
kiwi_66 That's what I suspected while reading the MuPDF manual. Thanks for the confirmation.10:01.37 
ator having a PNG with white borders adds very little to the file size, plain colored areas compress very well10:01.43 
kiwi_66 I don't mind the file size, but I wanted to see if the presentation could be improved so that those "picture" pages look closer to the "text" pages.10:02.29 
kens If you're ereally desperate you could use Ghostscript to determine the boundingbox of the content in the original PDF, then some PostScript trickery to set the imaging 'window' and have GS render teh PDF into that window, Assuming hte white areas are genuinely unmarked that would get rid of the white space in the image files. But its non-trivial10:04.30 
kiwi_66 Will do. Thank you.10:04.57 
ator huh. we have a bbox device, but no way to call it from mutool :/10:08.00 
kens Hmm sounds like you nmeed to add it, surely that shouldn't be hard ?10:08.19 
  It sounds like it might be useful to me10:08.37 
  Obviously post-release :-)10:08.49 
ator kens: yeah. I've got a commit ready already :)10:14.19 
kens Wow that *was* fast10:14.29 
ator kens: question though is what format is most useful10:14.30 
  what does GS output?10:14.43 
kens IIRC Ghostscript dumps it out in PostScript form, I doubt that's terribly useful10:14.51 
  %%oundingBox: llx lly urx ury10:15.07 
ator I made it dump it in some XML format. not the easiest to handle, but it fits with everything else we dump.10:15.13 
kens XML makes sense10:15.22 
ator <page bbox="llx lly urx ury" mediabox="llx lly urx ury" />10:15.28 
kens Seems reasonable should be parseable by any reasonable XML parser and it makes sense to a human reader10:15.49 
ator easy enough to crack with awk or sed if you need to since it's line based10:15.51 
kens So the page bbox is the bounding box of the marks ?10:16.20 
ator yeah. the 'bbox' is the bbox of the marks10:16.30 
  and I put the mediabox in there as well, because why not?10:16.46 
kens Fair enough then it looks fine to me, you might ask Robin for an opinion10:16.49 
ator yeah. I'll get Robin_Watts to review it.10:16.59 
kens MediaBox is always useful, but obviously for GS we can't know that the input is PDF10:17.17 
ator maybe bbox should be contentbox or drawbox or markbox or something more evocative than 'bbox'10:17.57 
kens Boundingbox then :-)10:18.18 
ator $ mutool draw -Fbbox input.pdf10:18.40 
kens markedboundsbox?10:18.46 
ator ENAMETOOLONG :)10:18.56 
kens Yeah anything truly descriptive is10:19.09 
kiwi_66 Using convert+CBZ to turn a 9,5MB PDF into pictures generates a 640MB PDF output.10:42.09 
  "-O -compress" makes no difference.10:42.13 
  Is there a way to reduce the file size? "number of bits of antialiasing", "resolution", "colorspace", etc.10:42.17 
kens lower resolution would be my bet10:42.40 
  But basically, that's what happens when you render vectors to bitmaps10:43.01 
kiwi_66 Using "convert -O resolution=100 -o temp.cbz", it's even bigger (700MB) than without (640MB; Original: 9MB)10:59.19 
kens I don' tknow what the default resolution would have been10:59.51 
kiwi_66 How do I find this info?11:00.38 
kens If it was (for example) 96 dpi then yes, 100 dpi will produce a larger set of bitmaps and bigger output11:00.39 
  convert is imagemagick, isn't it ? I've no idea how to tell what its default resolution is11:01.46 
kiwi_66 I'll read up on resolution, thanks11:02.12 
kens Ah, apparently its 72 dpi11:02.18 
kiwi_66 No, it's mutool's convert11:02.22 
kens Oh sorry, wrong convert11:02.29 
  Then I don't know, but presumably you could find out from the code11:02.54 
pedr0 ator: sorry for the delay. For instance I want to scan the stream searching for a specific instruction, that is, stream's instructions such as 'Tj' etc etc11:02.57 
  I can get the stream - in a Buffer - I am not sure how to use such object to read the stream instruction by instruction11:04.12 
ator pedr0: you'd have to write a tokenizer yourself, the Buffer is just an array of bytes11:04.39 
pedr0 oks11:04.44 
  can I build a string from it ?11:05.08 
kens ator what's the default resolution of mutool convert ? I can't find it anywhere11:05.12 
ator for (i = 0; i < buffer.length; ++i) buffer[i] accesses all the bytes11:05.27 
  you'd have to build a string from those, or tokenize directly and build up temporary strings using String.fromCharCode(buffer[i])11:06.42 
  kens: 72dpi I think11:06.52 
pedr0 is that buffer a JS object or is it a custom object part of the mutool environment ?11:06.53 
  oks - I get it11:07.08 
ator it is a custom object wrapping a fz_buffer11:07.09 
pedr0 thanks for your help11:07.18 
kens ator thanks, that would explain the increase in size that kiwi_66 experienced with resolution of 100 then :-)11:07.21 
ator kens: ah yes, missed that bit of the conversation. I'm 99% certain it's 72dpi, and if not 72 then 9611:08.17 
kens Either would explain the increase in size.11:08.31 
kiwi_66 Is there another setting besides "resolution" to have "mutool convert" build smaller files?11:08.32 
ator my bad, it is actually 96 dpi11:08.55 
kens Well you could render to grayscale, that would reduce the size, while discarding all the colour11:09.00 
  96 was my guess :-)11:09.08 
ator fz_parse_draw_options() reveals my lie!11:09.18 
  opts->x_resolution = 96;11:09.27 
kens But resolution is the killer, because its in each direction, so doubling it squares the output size11:09.34 
ator kiwi_66: you could convert to grayscale or monochrome11:10.02 
  -O colorspace=mono or colorspace=gray11:10.17 
kens monochrome would reduce it a lot11:10.28 
ator with the obvious loss of color and anti-aliasing11:10.31 
kiwi_66 my e-reader is only b&w :-)11:10.47 
ator how many gray levels?11:10.55 
kens Then you may as well have monochrome11:10.57 
kiwi_66 16 shades11:11.16 
kens Oh, that's gray scale, not monochrrome, but still, not may grays11:11.30 
kiwi_66 entry level (but strong screen, thankfully for bike rides)11:12.03 
kens Does mutool convert do 4-bit output ?11:12.10 
kens suspects not11:12.23 
kiwi_66 With "-O colorspace=gray", I go from 640MB to 214MB :-)11:14.37 
ator mutool convert -o out%d.pbm -O colorspace=mono,graphics=aa0 input.pdf11:15.04 
  you should get a bunch of pbm files that you can convert to PDF and get very small files11:15.16 
kiwi_66 ie. convert PDF to PBM, and then on to PDB?11:17.01 
  PDF11:17.09 
ator yeah. rasterize the PDF to black-and-white images, the wrap those into a new PDF11:17.28 
kiwi_66 can mutool do this, or should I look at ImageMagick etc. ?11:17.46 
ator it's like PNG but will become smaller if it's black and white11:17.48 
  just use 'pbm' as the suffix rather than 'png'11:17.59 
kiwi_66 ok11:18.03 
paulgardiner That looks to work thanks Tor11:20.33 
ator paulgardiner: great!11:20.44 
kiwi_66 mutool convert -o out%d.pbm -O colorspace=mono,graphics=aa0 input.pdf 72-7411:25.52 
  mutool merge -o output.pdf out1.pbm out2.pbm out3.pbm11:25.56 
  error: cannot recognize version marker11:26.06 
  warning: trying to repair broken xref11:26.06 
  warning: repairing PDF document11:26.06 
  error: invalid key in dict11:26.06 
  error: no objects found11:26.06 
  error: aborting process from uncaught error!11:26.07 
ator merge only takes PDF as inptu12:01.51 
  zip output.cbz *.pbm; mutool convert -o output.pdf output.cbz12:02.14 
  or convert the pbm to pdf before merging12:03.11 
  like you did with PNG files earlier12:03.20 
kiwi_66 thx12:06.12 
malc_ ator: just hit a previously unseen warning while building with clang and -Weverything - http://tpaste.us/gk5d18:41.56 
ator Robin_Watts: ^ maybe something with the header file cleanups gone wrong?18:58.34 
sebras Robin_Watts: ator: http://git.ghostscript.com/?p=user/sebras/mupdf.git;a=commitdiff;h=15a4819739aa387031f3a4af074c1da7ff7dcb70 and http://git.ghostscript.com/?p=user/sebras/mupdf.git;a=commitdiff;h=05301a828419fb95e2d06ee01dc23c814330760b21:08.39 
  both appear to cluster well.21:08.44 
malc_ sebras: tack21:52.08 
 <<<Back 1 day (to 2020/04/29)Forward 1 day (to 2020/05/01)>>> 
ghostscript.com #ghostscript
Search: