Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2018/03/07)20180308 
Robin_Watts n35xdxb0: There is no config file for MuPDF.00:30.18 
  You'd need to change the C code. It's not hard.00:30.35 
n35xdxb0 Robin_Watts: and then recompile?01:36.36 
sebras tor8: I found the reasons for the issues in 69909410:21.01 
  tor8: I have an ugly patch that appears to fix the immediate issues on sebras/master10:21.19 
tor8 sebras: ohhh!10:22.26 
  or maybe we should just detect JPX and refuse to decompress?10:22.41 
sebras tor8: we could, but should we? after all the user has supplied -d in order to decompress them.10:23.13 
tor8 that's a lot less useful for images than content streams and fonts10:23.38 
  I would almost consider a JPX image to be a 'resource' blob that shouldn't be decompressed10:24.30 
  and almost the same with JPEG10:24.39 
sebras tor8: what about flated images?10:24.53 
tor8 since those things are self contained files if you just extract the raw compressed data10:24.56 
sebras tor8: ah, I see where you are coming from now.10:25.12 
tor8 flated images aren't self contained -- they have no information about colorspace and size like JPEG and JPX do10:25.21 
  which is why we have a special case for extracting JPEG images in 'mutool extract' (though that ought to be extended to also detect JPX I think)10:25.58 
sebras tor8: yeah, I see that now.10:26.00 
  tor8: vim source/tools/pdfextract.c oh!10:26.31 
  tor8: yes, that code should probably handle JPX too.10:27.25 
  tor8: I'll whip up a patch.10:31.17 
  tor8: so the extract code actually calls pdf_load_image() which converts the JPX images to decoded images at load time.10:59.39 
  tor8: hence mutool extract has no way of knowing that the image is compressed.10:59.57 
tor8 pdf_load_image loads a 'compressed buffer'11:00.09 
sebras tor8: not for JPXs.11:00.31 
tor8 oh... yes, you're right.11:00.55 
sebras tor8: pdf_load_jpx() calls fz_new_image_from_pixmap() after having loaded the uncompressed JPX.11:01.09 
  tor8: but for all other image formats you are correct! :)11:01.32 
tor8 we have to do that because we don't have access to the w/h in the PDF dictionary11:01.40 
  nor the colorspace11:01.49 
sebras tor8: right, I wonder if it would suffice to decode just the headers though..?11:02.03 
  tor8: that way we could postpone the decoding of actual image data until a later point.11:02.16 
tor8 yes. this might be the cause for Robin_Watts, jogux, paulgardiner's slow to cancel files11:03.03 
  because the decoding happens on the image loading thread, not in the rendering thread where it happens for all other image formats11:03.24 
sebras tor8: right, if the images they are decoding are JPXs, do we know?11:03.43 
tor8 does who know?11:08.38 
sebras tor8: has anyone mentioned if the issues are specifcally with PDFs containing JPX-images?11:09.18 
tor8 oh right, it migth have been big jpegs. I'm not sure on the details.11:09.48 
  if we add a JPX info header-only decoding (and it's faster than full decoding) then we could load the header in pdf_load_image_imp and stick the data in a compressed buffer in pdf_load_jpx_imp11:10.12 
  or not bother with a fz_compressed_buffer and have a special class of fz_image for jpx that has the data11:10.44 
  in pdf_load_jpx we currently unconditionally call fz_load_jpx11:11.51 
  but we could rejig that call to happen in the fz_image.get_pixmap callback instead and do the rest based on fz_load_jpx_info11:12.14 
  or you can just point Robin_Watts at this and let him do it :)11:12.23 
  I'm not 100% convinced of the benefits11:12.39 
  if you use JPX you deserve whatever punishment you get...11:12.51 
Robin_Watts tor8: The file that exhibited "slow to cancel" was jpeg, not jpeg2k.11:13.09 
sebras Robin_Watts: wait until you have a file using j2k. :)11:13.38 
Robin_Watts We hold the compressed data in an fz_compressed_buffer, so 1) we can decode without hitting the file, and 2) so we can get the original data back (for most formats at least) in PDF writing.11:14.30 
  sebras: I'm sure JPX will be worse, yes.11:14.45 
sebras tor8: seems like fz_load_jpx_info() costs about 1/10th of fz_load_jpx() on the JPXs from the file from 69909411:24.59 
  tor8: so it might be worth doing.11:25.04 
  with openjpeg.11:26.27 
jogux morning all. could someone review http://git.ghostscript.com/?p=user/joseph/mupdf-ios-viewer.git;a=commitdiff;h=a5e7da5fcc2c3e196f05dda50b90829392ab97c0;hp=087aef041c257325cf16ee6a83432aeac2304972 please?11:32.03 
  (it's trivial_)11:33.26 
tor8 jogux: LGTM.11:35.23 
sebras jogux: lgtm.11:35.27 
tor8 sebras: though it might just end upp costing us 1/16th extra time to draw jpx images11:35.59 
sebras tor8: because we need to move data into memory and then decode it.11:36.45 
tor8 yeah, the question is how slow is 'fz_load_jpx_info+fz_load_jpx' on the same data, as compared to just 'fz_load_jpx'11:37.57 
jogux thank you tor8/sebras11:41.40 
malc_ mupdf's (muhtml's) lack of any warnings/errors when fed something it has problem with is distressing :(11:42.34 
sebras tor8: when I retest on other files fz_load_jpx_image and fz_load_jpx have roughly the same execution time.11:46.52 
  tor8: so that would mean that we would only get slower.11:47.28 
  I wonder why the j2k libraries are not faster at doing this decoding.11:47.50 
tor8 they probably don't consider the use case of only loading the header info11:51.19 
  or the APIs don't expose 'load a header' function11:51.34 
sebras tor8: ehm, seems like it might be us that are silly in jpx_read_image().11:51.54 
  we actually decode the image when only needing the headers.. :-/11:52.17 
  now I get JPX info: .001902s versus JPX decode: 5.012461s11:52.43 
tor8 all we save is needing to unpack the decompressed image into a fz_pixmap11:52.57 
sebras tor8: in pdf_load_image_imp() for non-JPX we check imagemask and forcemask and if either is true we set n == 1 (leaving colorspace == NULL) and ignore the number of components according to the colorspace. but for JPX we in pdf_load_jpx_imp() call fz_convert_pixmap() to convert it to grayscale. seems like we are not handling the two cases in the same way.12:30.26 
  why wouldn't we decode the image accoridng to the colorspcae and then convert to grayscale for non-JPX like we do for JPX?12:30.50 
tor8 you mean for softmasks?12:31.43 
  that would probably be good. I'm not sure we've ever seen any files that aren't grayscale softmasks other than JPX (which are usually pretty crappily constructed)12:32.23 
sebras ah.12:33.03 
  tor8: I guess that's as a good reason as any. :)12:33.41 
  tor8: btw, you asked me to ping you about the commit at the top of sebras/mater.13:09.15 
  master13:09.17 
tor8 sebras: in "Do not warn..." wouldn't a simple if (obj) { ... old stuff with if (!pdf_is_stream) ... } bracketing if-statement be cleaner?13:42.32 
  Fix 699086 LGTM13:42.48 
  and the scissor stack one also LGTM (I think I've reviewed those two before)13:43.03 
  let me have a look at the crypto/md5 stuff now13:43.19 
sebras tor8: I was worried about the indentation depth. but now i see that build_filter() indents just as deep further down. I'll fix the if that you mentioned above.13:44.24 
tor8 indentation depth takes a (very much) back seat to clear logic :)13:45.36 
sebras sebras/master updated.13:46.13 
  tor8: the key thing (pun intended) with the crypto one is that there's an MD5 sum being computed, so using more than 16 bytes is an issue.13:46.52 
  tor8: also note that this issue was brought up by a fuzzed file.13:47.09 
tor8 sebras: see page 126 of pdfref1713:50.28 
  algorithm 3.3, step 4: "Create an RC4 encryption key using the first n bytes of the output from the final MD5 hash."13:51.00 
  note 'n' not 1613:51.06 
sebras tor8: so the values of crypt->v == 5 but crypt->r == 413:51.10 
tor8 we should probably check that n<=16 and throw if not13:51.19 
  pdf_compute_user_password uses the same logic13:54.02 
  so that should also verify that n <=1613:54.10 
  (or just clamp it to 0..16 and let the file fail when it can't match the keys and fail to decrypt)13:55.03 
sebras fz_mini(n, 16) seems apt.13:55.25 
tor8 fz_clampi(n, 0, 16) seems better, since it could also be negative...13:55.42 
sebras no it cant.13:56.06 
  the only time we read crypt->length from the file is if crypt->r == 2 or 413:56.20 
  ehm.. v13:56.33 
  crypt->v13:56.35 
  and we check if length < 40 || length > 128 and throw13:56.48 
  in all other cases length is == 40 or == 25613:56.59 
tor8 ah, right13:57.04 
  then fz_mini seems apt (or if (n>16) throw()13:57.32 
sebras though the logic in pdf_compute_user_password() has _much_ simpler.13:57.34 
  I'll rerrange pdf_authenticate_owner_password() similarly I think.13:58.00 
malc_ sebras: has(?) _much_ simpler... is?13:58.15 
 Forward 1 day (to 2018/03/09)>>> 
ghostscript.com #ghostscript
Search: