| <<<Back 1 day (to 2018/09/04) | 20180905 |
persmule | Hi. Currently there is a workaround inside pdf-font.c for PDFs ill-produced by S22PDF, allowing mupdf to display those ill-produced pdf, but is there any way to write the usable font descriptor back to PDF files, in order to fix them? | 08:27.42 |
tor8 | persmule: you mean so you can pass the file around to other PDF tools that don't share the same workaround? | 08:31.00 |
persmule | Yes. | 08:31.21 |
tor8 | it's easy enough to write a 'mutool run' script if it's just a handful of files that you need to fix | 08:31.29 |
| well, 'easy' is stretching it a bit | 08:32.09 |
| in PDF there are two main flavors of fonts -- simple fonts with 256 characters, and CID fonts with more characters | 08:33.01 |
| the S22PDF generator writes the structures for a simple font with 256 characters and says that it uses ASCII encoding but then writes multi-byte character codes using windows codepage 936 | 08:34.19 |
persmule | I once tried "mutool clean -s", but it is unable to write the fix to file. | 08:43.57 |
| tor8: You mean I can do the fix by simply adding a CID font to the PDFDocument object created from an S22PDF-produced file? | 09:00.25 |
tor8 | persmule: you need to change the font descriptor object structure from being a simple font to being a cid font | 09:06.21 |
| with an appropriate encoding | 09:06.29 |
persmule | tor8: By reproducing what is done in pdf-font.c via JS code? | 09:07.40 |
sebras | tor8: I have a few commits on sebras/master as well as an implementation of the StructuredTextWalker thingy at the top of sebras/wip | 09:50.13 |
| tor8: I did notice that the colorspace name might be clipped to 24 characters. so that one might not be perfect, but at least it illustratets what I want to do. | 09:56.24 |
tor8 | persmule: something like this script: http://ix.io/1m0S | 10:10.40 |
persmule | tor8: Thanks. | 10:14.20 |
avih | tor8: hmm... the getopt thingy also fails on osx. (in addition to two mingw setups and alpine linux). it basically only works on ubuntu (debian?) | 10:15.51 |
persmule | tor8: The js api of mutool seems not well documented, e.g. the property Font#Encoding cannot be found on https://mupdf.com/docs/manual-mutool-run.html . | 10:16.57 |
tor8 | persmule: it assumes an intimate knowledge of the PDF reference | 10:17.32 |
| addCJKFont returns a PDF object representing the font | 10:17.42 |
| font.Encoding is poking at the internal PDF object properties | 10:18.06 |
persmule | function addCJKFont is not documented as well. | 10:18.31 |
tor8 | persmule: that much is true! | 10:18.46 |
| sebras: the colorspace thing will need to be rebased on top of "Use colorspace type enum instead of magic profile names." | 10:37.55 |
sebras | tor8: can do. | 10:39.17 |
| tor8: apart from that, are you happy with it? | 10:39.26 |
tor8 | wow, color management in GIF ... that's rather strange | 10:41.34 |
| sebras: but yes, apart from that I'm happy with the commits on sebras/master | 10:42.03 |
sebras | tor8: yes, it was in one of the application extensions that we used to ignore. | 10:42.09 |
tor8 | sebras: why not use the to_Rect and to_Matrix utility functions? | 10:45.50 |
| in the text walker stuff | 10:45.56 |
| and I'd remember the last font object so you only create a new font wrapper when the font changes | 10:47.00 |
| sebras: if you're happy with the commits on tor/master (up to the missing fz_var declarations) I can push that | 10:48.42 |
sebras | tor8: I should probably be using to_Rect_safe() since I'm not handling fitz exceptions. | 10:48.52 |
tor8 | I added a few minor changes and documentation additions to murun | 10:48.57 |
| sebras: one of them anyway :) | 10:49.30 |
sebras | tor8: should you be mentioning scriptPath in docs/manual-mutool-run.html? | 10:50.30 |
| you do add scriptArgs there so add both to not get persmule chasing you in the future. :) | 10:50.54 |
tor8 | true, will add. | 10:51.18 |
persmule | sebras: most existing js scripts for mutool use argv. | 10:52.26 |
tor8 | persmule: sebras: I learned just the other day that Mozilla's SpiderMonkey js shell puts the script path and arguments in scriptPath and scriptArgs | 10:56.00 |
| so I'm matching that behaviour | 10:56.05 |
| as that's what the plain 'mujs' shell also does | 10:56.21 |
sebras | tor8: yes, I know. I can't see a problem with that. | 10:56.38 |
tor8 | other than breaking existing scripts (which I expect there to be not too many of in the wild) | 10:56.56 |
sebras | tor8: I was just worried that the docs and the software were being inconsistent. :) | 10:56.57 |
persmule | At least on the version I use, scriptArgs has not existed yet. | 10:58.16 |
sebras | persmule: tor8 has not pushed the change to the main repository yet. :) | 10:58.39 |
tor8 | persmule: in the plain 'mujs' shell or 'mutool run'? | 10:58.40 |
| persmule: use argv[1] instead of scriptArgs[0] if you want to run the example script I pasted earlier on non-bleeding-edge mupdf | 10:59.19 |
persmule | tor8: I have done in that way. | 10:59.55 |
| My version is 1.13.0. | 11:00.39 |
| tor8: This fix is quite useful, since mupdf cannot print. If some S22PDF-produced files are going to be print, they should be fixed first before feeding into programs capable to print PDFs, e.g. poppler frontends. | 11:04.17 |
sebras | tor8: once you add scriptPath to the docs go ahead and merge. then I'll go next with the ICC-stuff given that you are happy with my rebase..? | 11:04.50 |
tor8 | sebras: done. | 11:07.33 |
sebras | tor8: is it only me or do you also get updates in platform/java/mupdf_native.h due to PDFWidget? | 12:11.44 |
tor8 | sebras: hm? the PDFWidget stuff hasn't gone in yet, has it? | 12:20.58 |
sebras | I can't find it in the git log. what on earth is going on?! | 12:22.26 |
tor8 | sebras: remnants from fred's forms2? | 12:24.40 |
| IIRC the javah tool is finicky. | 12:24.50 |
| maybe it tries to keep old stuff in the header and not regenerate everything? | 12:25.11 |
sebras | there were some uncommitted files in a directory, yes. | 12:25.31 |
| tor8: why did you remove public from the interface members? https://docs.oracle.com/javase/tutorial/java/IandI/interfaceDef.html seems to indicate that without them the access rights are incorrect..? | 12:28.18 |
| or rather, unreachable. | 12:28.31 |
persmule | tor8: Could you make the script you just provided for me a part of the future release of mupdf? | 12:29.18 |
| tor8: Since S22PDF seems disastrously popular in China, a Free software solution to fix S22PDF-produced PDFs is eagerly needed. | 12:31.22 |
| tor8: as stated in https://bugs.ghostscript.com/show_bug.cgi?id=691457 , in which S22PDF's problem was detected first time. | 12:35.08 |
| tor8: Earlier solutions all depend on proprietary software. | 12:36.42 |
| tor8: Since it is your work, it had better be published by you, not me, and I believe to publish it as a part of future release of mupdf is the best way. | 12:38.28 |
| tor8: Besides, I have fixed a minor bug of the script in http://ix.io/1m1i , since not all page has fonts referenced. | 12:41.24 |
| tor8: http://ix.io/1m1j is better, which fixes the pdf in place by default unless the second parameter is given. | 12:43.49 |
tor8 | persmule: I can put the script in the docs/examples directory | 12:49.20 |
persmule | tor8: That is just what I want. Thanks. | 12:49.48 |
sebras | tor8: StructuredTextWalker.beginLine() does not supply fz_stext_line_s->dir | 12:51.25 |
tor8 | persmule: you may know better which of the sun/hei/kai/fang/li fonts should be serif/sans-serif style | 12:51.41 |
| sebras: true. do you think it should? | 12:51.52 |
sebras | tor8: I'm not sure. is it meant as an internal field? | 12:52.25 |
persmule | tor8: There is no correspondance, only similarity. | 12:52.40 |
tor8 | persmule: I know ... the PDF format has a flag for 'serif' style only though so we have to shoehorn it in | 12:53.17 |
persmule | tor8: I do know that hei and fang are similar to sans-serif. | 12:57.33 |
sebras | tor8: if you use StructuredText you are presumable mostly interested in the logical order of characters (for searching/marking). so perhaps the dir is not as useful there as it is when you need to figure out in what order to insert the characters into the stext in the first palce. perhaps it is best left out as an internal field. | 12:58.27 |
tor8 | sebras: you can infer the direction for each character from the quad | 12:59.22 |
sebras | sure, the question is: would a consumer need it. the more I think about this, the more I believe they wouldn't. | 13:00.19 |
persmule | tor8: and song is similar to serif fonts in west. | 13:00.20 |
| tor8: Kai and Li are ambiguous. Many system count them as serif fonts, since strokes in them has various width, different with sans-serif-like Hei and Fang, which has constant stroke width. | 13:13.43 |
tor8 | sebras: I suspect not | 13:14.40 |
| persmule: thanks. that's roughly to what I thought as well. | 13:15.23 |
sebras | tor8: in that case I feel "jni: Add StructurexTextWalker interface." seems usable. I added the BlockWalker to be able to retain the getBlocks() interface so as to avoid complaints about changing the API. | 13:16.01 |
| I'd rather remove it, but hey... | 13:16.13 |
tor8 | persmule: http://git.ghostscript.com/?p=user/tor/mupdf.git;a=blob;f=docs/examples/fix-s22pdf.js;h=4a2789ec58bb496ff38066e20b9db50dd3945f6b;hb=2d512ffd1a9d19faabe38b3bbb419b5950a8105b | 13:17.13 |
persmule | tor8: Thank you very much. | 13:19.15 |
tor8 | sebras: yeah, but I suspect fred may be using it | 13:19.18 |
| possibly ask him first? | 13:19.27 |
| sebras: we should probably add StructuredText.snapSelection(Point a, Point b, int mode) | 13:20.39 |
sebras | tor8: that's why I retained the interface. | 13:20.42 |
tor8 | which calls fz_snap_selection | 13:20.44 |
sebras | can do. | 13:20.56 |
tor8 | and then maybe he can use that instead of cooking his own | 13:21.13 |
sebras | tor8: seems like StructuredText.snapSelection() would need to return a struct of a quad and the a and b points seeing as fz_snap_selection() actually modifies the a and b points to snap to the correct positions. | 13:32.23 |
tor8 | sebras: you would modify the a and b input points | 13:34.09 |
| i.e. the java Point objects would be in-out as well | 13:37.06 |
sebras | tor8: I'm daft. I forgot that functions may change java objects. :) | 14:04.24 |
| tor8: do we want to add a link to where to S22PDF in the script? | 14:44.59 |
| tor8: I tried searching for it but came up empty | 14:45.11 |
| persmule: where do I get S22PDF? | 14:46.28 |
tor8 | sebras: I don't know that it's still used... it was extremely popular a long time ago and there are a lot of broken files out there already. | 14:58.23 |
sebras | I see. | 15:02.21 |
tor8 | sebras: "StructurexTextWalker" typo otherwise sebras/master LGTM | 15:05.43 |
sebras | nice! | 15:06.05 |
moolc | ghoscrscript package for my distro was updated _3_ times in the last 24 hours... amazing | 15:15.43 |
sebras | moolc: there has been a number of security bugs fixed and a new release just the other day. | 15:16.34 |
moolc | sebras: yes, i know, but _three_ bloody times | 15:17.04 |
persmule | sebras: It may have been popular before pdf printers were introduced to M$WIN, just as a disastrous influnza. | 15:32.45 |
sebras | tor8 (for the logs): I tracked down a memory leak in the ICC loading code and noticed what I believe to be a typo taking sizeof(ptr). in any even it clusters fine. | 20:37.12 |
mojca | I'm looking for some hints about how to avoid "platform/gl/gl-main.c:1677:16: error: use of undeclared identifier 'GLUT_ACTION_ON_WINDOW_CLOSE'" | 20:52.49 |
| I'm trying to compile 1.13.0 on macOS 10.13 | 20:53.04 |
| Forward 1 day (to 2018/09/06)>>> | |