| <<<Back 1 day (to 2016/05/10) | 20160511 |
Ruslan | Hi there. Is here any one from MuPDF devs? | 09:52.43 |
Robin_Watts | Yup. | 09:57.09 |
Ruslan | Robin_Watts: I'm trying to copy one pdf page of one document to anther. Is it possible with mupdf? | 10:01.28 |
tor8 | Ruslan: It is possible. | 10:01.52 |
| if you know a little bit of javascript you can write a quick script to do it with mutool | 10:02.21 |
Ruslan_ | tor8: Is there a clean C way ? | 10:03.39 |
tor8 | Ruslan: you can use mutool merge; but that will lose the table of contents and annotations | 10:04.11 |
Ruslan_ | i need to do this in code. Without any external binaries. | 10:04.49 |
| Currenlty i tried this. Load page with pdf_apge_load | 10:05.20 |
| *pdf_page_load | 10:05.34 |
tor8 | Ruslan_: a pdf_page is bound to the document it is created with; you need to use a lower level access | 10:05.38 |
| Ruslan_: you need to get the pdf_obj* for the page from the source document; copy that object into the destination document, then insert the copy of the object in the page tree | 10:06.40 |
| pdf_lookup_page_obj to find the pdf_obj from the source document | 10:06.52 |
| pdf_graft_object to copy the page object from the source to the destination document | 10:07.14 |
| and pdf_insert_page to insert the grafted object into the list of pages in the destination document | 10:07.34 |
Ruslan_ | tor8: thenk you for your help | 10:08.00 |
| *thank | 10:08.08 |
tor8 | that is a bit of a simplification though; if you want to do a good job you should create a new page object for the destination and copy the Contents, Resources, and MediaBox entries using pdf_graft_object instead | 10:08.40 |
| Ruslan_: look in source/tools/pdfmerge.c for some example code | 10:09.12 |
Robin_Watts | utterly fails to reproduce rays mutool crashes :( | 10:24.23 |
| s/crashes/hangs/ | 10:24.29 |
| Even running rays binaries with rays command line on rays machine. | 10:25.14 |
tor8 | Robin_Watts: a handful of various commits on tor/master for review when you got a minute | 11:00.09 |
| Robin_Watts: a handful of various commits on tor/master for review when you got a minute | 11:01.33 |
Robin_Watts | Looking. | 11:02.04 |
| tor8: It seems a shame to move from unsigned int to int in the alloc stuff. | 11:03.30 |
tor8 | Robin_Watts: the mujs commit? | 11:03.54 |
Robin_Watts | Would be better if we moved to size_t in the alloc stuff everywhere. | 11:03.55 |
| tor8: yeah. | 11:04.00 |
tor8 | I just got fed up with signed/unsigned conversion warnings everywhere. it's insane! | 11:04.29 |
| so in order to keep my sanity, I made mujs use signed integers *everywhere* (except in a few very specific places where the VM uses unsigned integer arithmetic on doubles) | 11:05.50 |
Robin_Watts | tor8: Well, arguably we should be using unsigned ints everywhere where it comes to allocation sizes. | 11:06.03 |
| unsigned ints, or size_t's. | 11:06.14 |
| It'd be pain to get it right, but we'd be better in the long run. | 11:06.33 |
tor8 | Robin_Watts: in truth, to be perfectly good citizens, we should be using size_t for all array indices and loop counters | 11:06.34 |
| because compilers make a mess of things when generating 64-bit code using 32-bit integers | 11:07.01 |
| having to sign extend them into 64-bit registers and crap | 11:07.11 |
Robin_Watts | tor8: We should be using unsigned ints, certainly. | 11:07.35 |
tor8 | IMO they should have just bitten the bullet and made 'int' be 64-bit on 64-bit platforms and we wouldn't be having this optimizing compilers abusing undefined behaviour mess in the first place | 11:07.47 |
| Robin_Watts: have you seen these gems? http://blog.regehr.org/archives/767 | 11:08.07 |
| Robin_Watts: I thought I'd gotten all the signed/unsigned stuff correct in mujs, but going through that code and fixing it I still spotted half a dozen mistakes :( | 11:09.24 |
Robin_Watts | I don't follow Winner #1. | 11:12.12 |
| Yes, it reads a[32]. | 11:12.24 |
| yes, that's out of bounds and therefore undefined. | 11:12.35 |
| It's not unexpected though. | 11:12.39 |
tor8 | what the compiler does is a bit unexpected though! | 11:13.02 |
Robin_Watts | Compilers do screwy things. | 11:13.47 |
| IMAO, for C compilers to try to mandate anything about left shifts other than "it'll do what the underlying architecture does", seems potty. | 11:14.31 |
| The point of C is that it's close to the metal. That's both it's boon and it's bane. | 11:14.57 |
| s/it's/its/ | 11:15.06 |
| Returning to your commits... | 11:16.57 |
| So we only have long options now? | 11:17.04 |
tor8 | Robin_Watts: this is for the '-O' option to the non-mudraw tools that can write pdf files | 11:17.36 |
| like mutool merge, etc | 11:18.05 |
| I have not touched pdfclean. | 11:18.27 |
Robin_Watts | But for consistency we should, arguably. And only having long options there would be bad. | 11:19.14 |
tor8 | it would be nice to have the same options for mudraw and pdfclean; but both of those tools have a lot of baggage. | 11:19.16 |
Robin_Watts | tor8: at the very least the usage messages should list the options. | 11:19.41 |
tor8 | and I've got a lot of muscle memory for how to invoke pdfclean | 11:19.42 |
Robin_Watts | Me too, which is why I'd be loathe to change it. | 11:19.54 |
| rather than -Ofoo,bar,baz I'd rather see --foo --bar --baz | 11:20.16 |
tor8 | at the moment it's only mutool convert, create and merge that take the new options | 11:20.28 |
Robin_Watts | so we can follow the standard unix conventions of -f, --foo being short and long things that do the same thing. | 11:20.58 |
tor8 | Robin_Watts: this is modeled more along the lines of passing linker options to the compiler with -Wl | 11:21.42 |
Robin_Watts | tor8: Yeah, but that's a different kinda case. | 11:22.11 |
| -W<destination flag>,<option> | 11:22.25 |
tor8 | the thing is, we might conceivable want to pass *different* sets of -O options to a tool | 11:22.26 |
Robin_Watts | The destination flag says where to send the flag. | 11:22.40 |
tor8 | and we don't necessarily know which options are available, because it depends on the destination | 11:23.05 |
Robin_Watts | so -O<destination flag>,<long option> | 11:23.09 |
| or --<long option> if you want it to go everywhere. | 11:23.26 |
tor8 | and also as a string to pass to fz_new_document_writer("foo.pdf", options) | 11:23.55 |
Robin_Watts | I like the idea of adding long options. I dislike this way of doing it, at first sight. | 11:24.07 |
tor8 | they're not generic long options though. they're parameters to be piped through to the output device. | 11:24.55 |
| pdf.save("out.pdf", "pretty,ascii,compress-images,compress-fonts") in javascript for example | 11:25.40 |
Robin_Watts | I get the idea behind using strings for options. (Much as I dislike strings for options, it makes sense for javascript) | 11:26.45 |
tor8 | strings also make sense for command line tools :) | 11:27.17 |
Robin_Watts | yes. | 11:27.35 |
tor8 | for the C interface, we still have the pdf_write_options struct for pdf writing | 11:27.41 |
Robin_Watts | so having a single point that converts from strings to a struct makes sense. | 11:27.56 |
tor8 | and a pdf_parse_write_options that parses from the string | 11:28.16 |
| and a generic fz_has_option to read a value from an option string | 11:28.31 |
Robin_Watts | In "Fix double free and memory leak", you remove fz_free(ctx, wri) from fz_close_document_writer. | 11:29.08 |
| Where is the other fz_free of that then ? | 11:29.16 |
| Oh, in fz_drop_document_writer. I see. | 11:29.41 |
tor8 | yes. | 11:29.48 |
Robin_Watts | Reference counting devices... OK... | 11:30.13 |
tor8 | I disliked having to add that | 11:30.25 |
Robin_Watts | but then I wonder if we should have a debug thing that checks that a device is not being used from 2 places at the same time. | 11:30.35 |
tor8 | but making writer.begin_page return a device necessitated it for the language bindings | 11:30.47 |
Robin_Watts | or even not a debug thing. | 11:30.57 |
| We could have a 'in_use' thing that we use the alloc lock to inc/dec and throw if we find it in_use on entry. | 11:31.29 |
tor8 | isn't taking and releasing a lock on every device call going to slow things down? | 11:31.58 |
Robin_Watts | Similarly, we should possibly take the FZ_FILE lock whenever we access the document. | 11:32.01 |
| tor8: in single threaded stuff, not at all. | 11:32.13 |
| In multi-threaded stuff that's well behaved, barely at all. | 11:32.28 |
| If we're worried about performance, only do it on DEBUG builds. | 11:34.49 |
tor8 | Robin_Watts: even locking the device around individual device calls can go horribly wrong | 11:35.01 |
Robin_Watts | tor8: Cos of recursion? | 11:35.21 |
tor8 | we'd have to lock around clip/popclip and begin*/end* pairs | 11:35.21 |
Robin_Watts | No, I don't think so. | 11:35.31 |
| In fact, definitely not. | 11:36.06 |
tor8 | it makes no sense to call a device from multiple threads; the question is how much effort do we spend trying to coddle people from the effects of doing stuff we say DON'T DO THAT! | 11:36.17 |
Robin_Watts | tor8: On the contrary. | 11:36.28 |
tor8 | or am I misunderstanding what you're trying to accomplish with this lock? | 11:36.42 |
Robin_Watts | Consider that I have a lump of code that throws out various callbacks. | 11:36.53 |
| On those callbacks I want to make device calls into (say) a pdf writer. | 11:37.15 |
| I can't know what threads those callbacks are coming from. | 11:37.37 |
| but what I can know is that if 2 of them try to call into the same device at the same time, all bets are off. | 11:37.59 |
| There are the 3 rules of MuPDF mult-threading. | 11:38.14 |
tor8 | wouldn't you want higher granularity around those calls then? | 11:38.26 |
| say lock the pdf writer device for exclusive access, do what I need, then release it | 11:38.40 |
Robin_Watts | 1) "Only use a context in a single thread at a time" | 11:38.46 |
| 2) "Only use a document in a single thread at a time" | 11:38.57 |
| 3) "Only use a device in a single thread at a time" | 11:39.10 |
| if you break 1, your program falls in a heap pretty soon. | 11:39.41 |
| If you break 2 (like ferter was doing) it can be confusing. It would be much nicer to spot it and assert with a clear message. | 11:41.10 |
| Likewise 3. | 11:41.18 |
| Arguably we can fix 2 using the FILE lock (which is defined, but not currently used). | 11:41.49 |
tor8 | Robin_Watts: not for all input file types though | 11:42.12 |
Robin_Watts | The only problem with that is that that would serialise access to ALL documents, even if we had several open at once. | 11:42.40 |
| tor8: In what way not for all input file types? | 11:42.48 |
tor8 | using the FILE lock around the fz_archive for zip files could save you from some of it; but the other input formats don't hit the underlying file quite as often | 11:42.54 |
Robin_Watts | tor8: I was thinking we'd take the FILE lock on an fz_run_page. | 11:43.16 |
tor8 | Robin_Watts: yeah; it looks like we would want to add per-object mutexes for this kind of stuff | 11:43.26 |
Robin_Watts | or an fz_run_page_contents. | 11:43.36 |
| tor8: Yeah, and I'd like to avoid that. | 11:43.44 |
tor8 | and IMO that is outside our responsibility at the moment | 11:43.47 |
Robin_Watts | I think I'd be happy enough to add a DEBUG thing that asserts if someone breaks it. | 11:43.59 |
tor8 | serializing and synchronizing your threads is up to you as long as you follow the 3 rules you listed above | 11:44.05 |
Robin_Watts | That way we're saying "don't do it", and we're giving reasonable checks to ensure that people don't. | 11:44.24 |
tor8 | Robin_Watts: a simple volatile int in_use field that we set, check, and abort on in fz_document and fz_page and fz_device? | 11:45.07 |
Robin_Watts | I might knock something up for your consideration later. | 11:45.11 |
tor8 | no need to lock, just detect if it's being abused | 11:45.26 |
Robin_Watts | tor8: an fz_lock, read/set, fz_unlock, assert on the value, yes. | 11:45.31 |
| If we don't lock, then the read may be stale on truly multi-threaded systems. | 11:45.54 |
tor8 | or lock, to avoid race conditions | 11:45.57 |
Robin_Watts | yeah. | 11:46.22 |
tor8 | yeah, I think having an in_use sanity check for devices, docs and pages could be useful. still, guarded by ifdef DEBUG I think; no need to have development checks in for all code | 11:47.18 |
Robin_Watts | tor8: For the page range stuff, can we pass in numpages too? | 11:47.26 |
| tor8: agreed. | 11:47.31 |
| cos then we can cope with reverse,evens,odds etc at some point. | 11:47.58 |
| revens, rodds, booklet | 11:48.25 |
tor8 | Robin_Watts: N is num pages; so "1-N" gets you all pages | 11:48.29 |
| and "N-1" gets you all pages, reversed | 11:48.37 |
Robin_Watts | Oh, so you already do! | 11:48.47 |
tor8 | odds and evens and booklets are more troublesome | 11:49.27 |
Robin_Watts | tor8: I'd be tempted by a map_over_page_range(range, fn, arg) thing. | 11:50.02 |
| where that calls fn(arg, x) on each x in the range in turn. | 11:50.35 |
tor8 | Robin_Watts: yeah, that sounds like the next version of this code :) | 11:51.06 |
Robin_Watts | ok, so none of my comments there were showstoppers. | 11:51.30 |
| Look good to me then. | 11:51.36 |
tor8 | Robin_Watts: though, once you start getting into wanting lots of control, I feel we'd be better served by pointing people to mutool run | 11:51.43 |
Robin_Watts | Yeah, but odds,evens and reverse (and potentially rodds,revens) are normal enough to warrant it, I think. | 11:54.07 |
tor8 | odds and evens, forward and reverse: agreed | 11:54.32 |
| if you know the number of pages it's easy enough as mutool clean in.pdf out.pdf $(seq -s, 1 2 100) | 11:55.45 |
Robin_Watts | tor8: That's not easy :) | 11:59.12 |
| Doubly not easy on windows :) | 11:59.24 |
tor8 | bah, windows! ;) | 11:59.45 |
| I'm sure there's a powershell equivalent | 11:59.53 |
| I've just never bothered to learn powershell | 12:00.01 |
Robin_Watts | Well, git is one of the first things I install on all windows boxes, just to get git bash. | 12:01.17 |
| but that'll be even easier soon when the whole "run any user mode linux binary" thing appears and we can use normal bash. | 12:01.53 |
| tor8: How would you feel about us supporting greyscale pixmaps with no alpha in mupdf? | 12:02.43 |
tor8 | I would love for us to make the alpha channel optional in all pixmaps (gray and color) | 12:03.22 |
| it does mean exploding a lot of the plotting functions | 12:03.37 |
| if you're going to do it, please add and use an explicit 'stride' field in the pixmaps too! that's been bothering me quite a while. | 12:04.07 |
Robin_Watts | For RGB, RGBA makes a lot of sense, because 4 byte access to pixels helps a lot. | 12:04.25 |
| I don't care enough about CMYK performance. | 12:04.44 |
tor8 | Robin_Watts: yes. for CMYK having CMYKA hurts a lot, but I don't think we care enough. | 12:04.51 |
Robin_Watts | but greyscale performance, having just G rather than GA would be a win. | 12:05.06 |
tor8 | I think if we make alpha optional for gray, having it optional everywhere makes sense | 12:05.43 |
| when writing to PPM or PNG we can save the alpha-plane stripping step when writing if we never create the A in the destination pixmap | 12:06.20 |
Robin_Watts | tor8: Well, for PNG we'd want to allow for alpha saving, right? | 12:07.05 |
tor8 | I'm fine with not having optimized versions of plotters for RGB (only for RGBA) and make our device create RGBA buffers by default | 12:07.11 |
Robin_Watts | SOT has build time cleverness to include/exclude plotters as required. | 12:08.07 |
| so for devices that have a 565 screen we build one set, for 12 bit screens another, for 555 another, etc. | 12:08.31 |
tor8 | wolfenstein 3d back in the day generated specialized machine code for its plotters; like an early JIT | 12:09.17 |
| and software opengl plotters also generate specialized code at runtime | 12:09.43 |
Robin_Watts | Yeah, Acorns full motion video system Replay would do that for desktop playback. | 12:09.58 |
| (well, it did it for normal playback too, but the desktop playback code was particularly funky) | 12:10.18 |
| So fz_new_pixmap etc, would all need to sprout an int alpha flag, I guess. | 12:11.23 |
tor8 | security paranoia makes code generation a bit of a no-no on mobile platforms (and some desktops) these days :( | 12:11.26 |
| ray_pc: yeah. | 12:11.31 |
| Robin_Watts: yeah. (bad autocomplete) | 12:11.37 |
| Forward 1 day (to 2016/05/12)>>> | |