IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2016/05/10)20160511 
Ruslan Hi there. Is here any one from MuPDF devs?09:52.43 
Robin_Watts Yup.09:57.09 
Ruslan Robin_Watts: I'm trying to copy one pdf page of one document to anther. Is it possible with mupdf?10:01.28 
tor8 Ruslan: It is possible.10:01.52 
  if you know a little bit of javascript you can write a quick script to do it with mutool10:02.21 
Ruslan_ tor8: Is there a clean C way ?10:03.39 
tor8 Ruslan: you can use mutool merge; but that will lose the table of contents and annotations10:04.11 
Ruslan_ i need to do this in code. Without any external binaries.10:04.49 
  Currenlty i tried this. Load page with pdf_apge_load10:05.20 
  *pdf_page_load10:05.34 
tor8 Ruslan_: a pdf_page is bound to the document it is created with; you need to use a lower level access10:05.38 
  Ruslan_: you need to get the pdf_obj* for the page from the source document; copy that object into the destination document, then insert the copy of the object in the page tree10:06.40 
  pdf_lookup_page_obj to find the pdf_obj from the source document10:06.52 
  pdf_graft_object to copy the page object from the source to the destination document10:07.14 
  and pdf_insert_page to insert the grafted object into the list of pages in the destination document10:07.34 
Ruslan_ tor8: thenk you for your help10:08.00 
  *thank10:08.08 
tor8 that is a bit of a simplification though; if you want to do a good job you should create a new page object for the destination and copy the Contents, Resources, and MediaBox entries using pdf_graft_object instead10:08.40 
  Ruslan_: look in source/tools/pdfmerge.c for some example code10:09.12 
Robin_Watts utterly fails to reproduce rays mutool crashes :(10:24.23 
  s/crashes/hangs/10:24.29 
  Even running rays binaries with rays command line on rays machine.10:25.14 
tor8 Robin_Watts: a handful of various commits on tor/master for review when you got a minute11:00.09 
  Robin_Watts: a handful of various commits on tor/master for review when you got a minute11:01.33 
Robin_Watts Looking.11:02.04 
  tor8: It seems a shame to move from unsigned int to int in the alloc stuff.11:03.30 
tor8 Robin_Watts: the mujs commit?11:03.54 
Robin_Watts Would be better if we moved to size_t in the alloc stuff everywhere.11:03.55 
  tor8: yeah.11:04.00 
tor8 I just got fed up with signed/unsigned conversion warnings everywhere. it's insane!11:04.29 
  so in order to keep my sanity, I made mujs use signed integers *everywhere* (except in a few very specific places where the VM uses unsigned integer arithmetic on doubles)11:05.50 
Robin_Watts tor8: Well, arguably we should be using unsigned ints everywhere where it comes to allocation sizes.11:06.03 
  unsigned ints, or size_t's.11:06.14 
  It'd be pain to get it right, but we'd be better in the long run.11:06.33 
tor8 Robin_Watts: in truth, to be perfectly good citizens, we should be using size_t for all array indices and loop counters11:06.34 
  because compilers make a mess of things when generating 64-bit code using 32-bit integers11:07.01 
  having to sign extend them into 64-bit registers and crap11:07.11 
Robin_Watts tor8: We should be using unsigned ints, certainly.11:07.35 
tor8 IMO they should have just bitten the bullet and made 'int' be 64-bit on 64-bit platforms and we wouldn't be having this optimizing compilers abusing undefined behaviour mess in the first place11:07.47 
  Robin_Watts: have you seen these gems? http://blog.regehr.org/archives/76711:08.07 
  Robin_Watts: I thought I'd gotten all the signed/unsigned stuff correct in mujs, but going through that code and fixing it I still spotted half a dozen mistakes :(11:09.24 
Robin_Watts I don't follow Winner #1.11:12.12 
  Yes, it reads a[32].11:12.24 
  yes, that's out of bounds and therefore undefined.11:12.35 
  It's not unexpected though.11:12.39 
tor8 what the compiler does is a bit unexpected though!11:13.02 
Robin_Watts Compilers do screwy things.11:13.47 
  IMAO, for C compilers to try to mandate anything about left shifts other than "it'll do what the underlying architecture does", seems potty.11:14.31 
  The point of C is that it's close to the metal. That's both it's boon and it's bane.11:14.57 
  s/it's/its/11:15.06 
  Returning to your commits...11:16.57 
  So we only have long options now?11:17.04 
tor8 Robin_Watts: this is for the '-O' option to the non-mudraw tools that can write pdf files11:17.36 
  like mutool merge, etc11:18.05 
  I have not touched pdfclean.11:18.27 
Robin_Watts But for consistency we should, arguably. And only having long options there would be bad.11:19.14 
tor8 it would be nice to have the same options for mudraw and pdfclean; but both of those tools have a lot of baggage.11:19.16 
Robin_Watts tor8: at the very least the usage messages should list the options.11:19.41 
tor8 and I've got a lot of muscle memory for how to invoke pdfclean11:19.42 
Robin_Watts Me too, which is why I'd be loathe to change it.11:19.54 
  rather than -Ofoo,bar,baz I'd rather see --foo --bar --baz11:20.16 
tor8 at the moment it's only mutool convert, create and merge that take the new options11:20.28 
Robin_Watts so we can follow the standard unix conventions of -f, --foo being short and long things that do the same thing.11:20.58 
tor8 Robin_Watts: this is modeled more along the lines of passing linker options to the compiler with -Wl11:21.42 
Robin_Watts tor8: Yeah, but that's a different kinda case.11:22.11 
  -W<destination flag>,<option>11:22.25 
tor8 the thing is, we might conceivable want to pass *different* sets of -O options to a tool11:22.26 
Robin_Watts The destination flag says where to send the flag.11:22.40 
tor8 and we don't necessarily know which options are available, because it depends on the destination11:23.05 
Robin_Watts so -O<destination flag>,<long option>11:23.09 
  or --<long option> if you want it to go everywhere.11:23.26 
tor8 and also as a string to pass to fz_new_document_writer("foo.pdf", options)11:23.55 
Robin_Watts I like the idea of adding long options. I dislike this way of doing it, at first sight.11:24.07 
tor8 they're not generic long options though. they're parameters to be piped through to the output device.11:24.55 
  pdf.save("out.pdf", "pretty,ascii,compress-images,compress-fonts") in javascript for example11:25.40 
Robin_Watts I get the idea behind using strings for options. (Much as I dislike strings for options, it makes sense for javascript)11:26.45 
tor8 strings also make sense for command line tools :)11:27.17 
Robin_Watts yes.11:27.35 
tor8 for the C interface, we still have the pdf_write_options struct for pdf writing11:27.41 
Robin_Watts so having a single point that converts from strings to a struct makes sense.11:27.56 
tor8 and a pdf_parse_write_options that parses from the string11:28.16 
  and a generic fz_has_option to read a value from an option string11:28.31 
Robin_Watts In "Fix double free and memory leak", you remove fz_free(ctx, wri) from fz_close_document_writer.11:29.08 
  Where is the other fz_free of that then ?11:29.16 
  Oh, in fz_drop_document_writer. I see.11:29.41 
tor8 yes.11:29.48 
Robin_Watts Reference counting devices... OK...11:30.13 
tor8 I disliked having to add that11:30.25 
Robin_Watts but then I wonder if we should have a debug thing that checks that a device is not being used from 2 places at the same time.11:30.35 
tor8 but making writer.begin_page return a device necessitated it for the language bindings11:30.47 
Robin_Watts or even not a debug thing.11:30.57 
  We could have a 'in_use' thing that we use the alloc lock to inc/dec and throw if we find it in_use on entry.11:31.29 
tor8 isn't taking and releasing a lock on every device call going to slow things down?11:31.58 
Robin_Watts Similarly, we should possibly take the FZ_FILE lock whenever we access the document.11:32.01 
  tor8: in single threaded stuff, not at all.11:32.13 
  In multi-threaded stuff that's well behaved, barely at all.11:32.28 
  If we're worried about performance, only do it on DEBUG builds.11:34.49 
tor8 Robin_Watts: even locking the device around individual device calls can go horribly wrong11:35.01 
Robin_Watts tor8: Cos of recursion?11:35.21 
tor8 we'd have to lock around clip/popclip and begin*/end* pairs11:35.21 
Robin_Watts No, I don't think so.11:35.31 
  In fact, definitely not.11:36.06 
tor8 it makes no sense to call a device from multiple threads; the question is how much effort do we spend trying to coddle people from the effects of doing stuff we say DON'T DO THAT!11:36.17 
Robin_Watts tor8: On the contrary.11:36.28 
tor8 or am I misunderstanding what you're trying to accomplish with this lock?11:36.42 
Robin_Watts Consider that I have a lump of code that throws out various callbacks.11:36.53 
  On those callbacks I want to make device calls into (say) a pdf writer.11:37.15 
  I can't know what threads those callbacks are coming from.11:37.37 
  but what I can know is that if 2 of them try to call into the same device at the same time, all bets are off.11:37.59 
  There are the 3 rules of MuPDF mult-threading.11:38.14 
tor8 wouldn't you want higher granularity around those calls then?11:38.26 
  say lock the pdf writer device for exclusive access, do what I need, then release it11:38.40 
Robin_Watts 1) "Only use a context in a single thread at a time"11:38.46 
  2) "Only use a document in a single thread at a time"11:38.57 
  3) "Only use a device in a single thread at a time"11:39.10 
  if you break 1, your program falls in a heap pretty soon.11:39.41 
  If you break 2 (like ferter was doing) it can be confusing. It would be much nicer to spot it and assert with a clear message.11:41.10 
  Likewise 3.11:41.18 
  Arguably we can fix 2 using the FILE lock (which is defined, but not currently used).11:41.49 
tor8 Robin_Watts: not for all input file types though11:42.12 
Robin_Watts The only problem with that is that that would serialise access to ALL documents, even if we had several open at once.11:42.40 
  tor8: In what way not for all input file types?11:42.48 
tor8 using the FILE lock around the fz_archive for zip files could save you from some of it; but the other input formats don't hit the underlying file quite as often11:42.54 
Robin_Watts tor8: I was thinking we'd take the FILE lock on an fz_run_page.11:43.16 
tor8 Robin_Watts: yeah; it looks like we would want to add per-object mutexes for this kind of stuff11:43.26 
Robin_Watts or an fz_run_page_contents.11:43.36 
  tor8: Yeah, and I'd like to avoid that.11:43.44 
tor8 and IMO that is outside our responsibility at the moment11:43.47 
Robin_Watts I think I'd be happy enough to add a DEBUG thing that asserts if someone breaks it.11:43.59 
tor8 serializing and synchronizing your threads is up to you as long as you follow the 3 rules you listed above11:44.05 
Robin_Watts That way we're saying "don't do it", and we're giving reasonable checks to ensure that people don't.11:44.24 
tor8 Robin_Watts: a simple volatile int in_use field that we set, check, and abort on in fz_document and fz_page and fz_device?11:45.07 
Robin_Watts I might knock something up for your consideration later.11:45.11 
tor8 no need to lock, just detect if it's being abused11:45.26 
Robin_Watts tor8: an fz_lock, read/set, fz_unlock, assert on the value, yes.11:45.31 
  If we don't lock, then the read may be stale on truly multi-threaded systems.11:45.54 
tor8 or lock, to avoid race conditions11:45.57 
Robin_Watts yeah.11:46.22 
tor8 yeah, I think having an in_use sanity check for devices, docs and pages could be useful. still, guarded by ifdef DEBUG I think; no need to have development checks in for all code11:47.18 
Robin_Watts tor8: For the page range stuff, can we pass in numpages too?11:47.26 
  tor8: agreed.11:47.31 
  cos then we can cope with reverse,evens,odds etc at some point.11:47.58 
  revens, rodds, booklet11:48.25 
tor8 Robin_Watts: N is num pages; so "1-N" gets you all pages11:48.29 
  and "N-1" gets you all pages, reversed11:48.37 
Robin_Watts Oh, so you already do!11:48.47 
tor8 odds and evens and booklets are more troublesome11:49.27 
Robin_Watts tor8: I'd be tempted by a map_over_page_range(range, fn, arg) thing.11:50.02 
  where that calls fn(arg, x) on each x in the range in turn.11:50.35 
tor8 Robin_Watts: yeah, that sounds like the next version of this code :)11:51.06 
Robin_Watts ok, so none of my comments there were showstoppers.11:51.30 
  Look good to me then.11:51.36 
tor8 Robin_Watts: though, once you start getting into wanting lots of control, I feel we'd be better served by pointing people to mutool run11:51.43 
Robin_Watts Yeah, but odds,evens and reverse (and potentially rodds,revens) are normal enough to warrant it, I think.11:54.07 
tor8 odds and evens, forward and reverse: agreed11:54.32 
  if you know the number of pages it's easy enough as mutool clean in.pdf out.pdf $(seq -s, 1 2 100)11:55.45 
Robin_Watts tor8: That's not easy :)11:59.12 
  Doubly not easy on windows :)11:59.24 
tor8 bah, windows! ;)11:59.45 
  I'm sure there's a powershell equivalent11:59.53 
  I've just never bothered to learn powershell12:00.01 
Robin_Watts Well, git is one of the first things I install on all windows boxes, just to get git bash.12:01.17 
  but that'll be even easier soon when the whole "run any user mode linux binary" thing appears and we can use normal bash.12:01.53 
  tor8: How would you feel about us supporting greyscale pixmaps with no alpha in mupdf?12:02.43 
tor8 I would love for us to make the alpha channel optional in all pixmaps (gray and color)12:03.22 
  it does mean exploding a lot of the plotting functions12:03.37 
  if you're going to do it, please add and use an explicit 'stride' field in the pixmaps too! that's been bothering me quite a while.12:04.07 
Robin_Watts For RGB, RGBA makes a lot of sense, because 4 byte access to pixels helps a lot.12:04.25 
  I don't care enough about CMYK performance.12:04.44 
tor8 Robin_Watts: yes. for CMYK having CMYKA hurts a lot, but I don't think we care enough.12:04.51 
Robin_Watts but greyscale performance, having just G rather than GA would be a win.12:05.06 
tor8 I think if we make alpha optional for gray, having it optional everywhere makes sense12:05.43 
  when writing to PPM or PNG we can save the alpha-plane stripping step when writing if we never create the A in the destination pixmap12:06.20 
Robin_Watts tor8: Well, for PNG we'd want to allow for alpha saving, right?12:07.05 
tor8 I'm fine with not having optimized versions of plotters for RGB (only for RGBA) and make our device create RGBA buffers by default12:07.11 
Robin_Watts SOT has build time cleverness to include/exclude plotters as required.12:08.07 
  so for devices that have a 565 screen we build one set, for 12 bit screens another, for 555 another, etc.12:08.31 
tor8 wolfenstein 3d back in the day generated specialized machine code for its plotters; like an early JIT12:09.17 
  and software opengl plotters also generate specialized code at runtime12:09.43 
Robin_Watts Yeah, Acorns full motion video system Replay would do that for desktop playback.12:09.58 
  (well, it did it for normal playback too, but the desktop playback code was particularly funky)12:10.18 
  So fz_new_pixmap etc, would all need to sprout an int alpha flag, I guess.12:11.23 
tor8 security paranoia makes code generation a bit of a no-no on mobile platforms (and some desktops) these days :(12:11.26 
  ray_pc: yeah.12:11.31 
  Robin_Watts: yeah. (bad autocomplete)12:11.37 
 Forward 1 day (to 2016/05/12)>>> 
ghostscript.com
Search: