| <<<Back 1 day (to 2012/07/02) | 2012/07/03 |
vtorri | hey | 06:04.39 |
ghostbot | hi, vtorri | 06:04.39 |
vtorri | with mupdf, is there a way to check if the file is a pdf one, without loading the whole file ? | 06:05.13 |
| some kind of "preload" function | 06:05.25 |
| asking again :) | 07:56.31 |
| with mupdf, is there a way to check if the file is a pdf one, without loading the whole file ? | 07:56.32 |
| some kind of "preload" function | 07:56.35 |
kens | vtorri I'm not an expert, but I don;t believe MuPDF will 'load the whole file' if its not a PDF. | 07:57.14 |
vtorri | and if it's a PDF one, it will load it entirely ? | 07:57.48 |
kens | In order to be a valid PDF file it must contain %!PDF within the first 1024 bytes of the file IIRC | 07:57.48 |
vtorri | ha | 07:58.04 |
| i ask because of a doc viewer i'm writing | 07:58.32 |
kens | In order to find the xref it is *required* to go to the ned of the file. If you know a way to get to the end of the file without having teh whoel file, I'd be interested to hear it ;-) | 07:58.33 |
vtorri | it has several backends | 07:58.42 |
| and i would like to load the module only if the file corresponds to the module | 07:59.12 |
kens | Sadly not all PDF files follow the rules. | 07:59.29 |
vtorri | so, for optimisation, i would like some kind of "preload" function | 07:59.30 |
| arg | 07:59.34 |
| i'm doomed | 07:59.39 |
kens | Because Adobe Acrobat is 'flexible' in assuming files you load are PDF files, PDF producers are very lax about what they create | 08:00.05 |
vtorri | too bad :) | 08:00.56 |
| i'll just use a "prefered" module, based on the extension, if there is one | 08:01.31 |
kens | Well I think you can legitimately search for %PDF in the first 1024 of the file, and assume its not a PDF if you don't find that. | 08:01.32 |
vtorri | i'll ask tor what to search exactly | 08:02.00 |
| maybe he will give some hints or advices about what to do exactly | 08:02.20 |
| haaa, here he comes | 08:10.03 |
| tor8: hey | 08:10.07 |
| tor8: question: | 08:10.17 |
| i would like to optimize a doc viewer that can render pdf with mupdf | 08:10.42 |
| what i would like to do is some kind of "preload" function that would detect that a file is a PDF one, without loading the whole file | 08:11.20 |
| kens told me that i should search for %PDF in the first 1024 bytes of the file | 08:11.48 |
| is it the best way to achieve what i want to do ? | 08:12.02 |
kens | vtorri see implementation notes 13 & 14 in the 1.7 PDF Reference Manual (p 1102) | 08:21.26 |
tor8 | vtorri: yeah, that sounds like the best approach | 08:24.03 |
kens | In particular look at implementation note 14 which has an alternate fomr of the header, accepted by Acrobat, which I wasn't aware of. | 08:25.02 |
| "Acrobat viewers also accept a header of the form | 08:25.02 |
| %!PS-Adobe-N . n PDF-M . m" | 08:25.02 |
tor8 | vtorri: if mupdf opens a valid pdf file, we load only select bits of it at launch | 08:25.03 |
| if it's a broken pdf file, we may end up parsing the whole file in one go to patch up the broken index | 08:25.40 |
| kens: odd, I've never seen that header before either | 08:27.06 |
kens | :-) | 08:27.13 |
chrisl | Hah, so much for "The text rendering mode has no effect on text displayed in a Type 3 font"...... | 08:31.09 |
kens | Huh ? | 08:31.47 |
| I take it Acrobat does when its not a bitmap font ? | 08:32.00 |
chrisl | Not exactly - the colour used to draw the glyph is influenced by the tr mode. It looks like *any* stroking mode causes the glyph to be drawn in the stroke color | 08:33.01 |
| But Acrobat also (tries to?) apply the clipping tr modes, too...... | 08:34.09 |
kens | THat's just bizarre.... | 08:34.18 |
vtorri | tor8: so opening the pdf with mupdf is kind of light ? | 08:34.18 |
tor8 | vtorri: if it is a well formed PDF, it is a light operation | 08:34.41 |
| if it is a badly formed PDF, then it's a very heavy operation | 08:34.51 |
vtorri | hmm | 08:34.57 |
| ok | 08:35.02 |
tor8 | and if it's not a PDF at all, also a heavy operation until we give up | 08:35.05 |
vtorri | i guess that i can't have much better | 08:35.14 |
chrisl | kens: ironically, the test file that shows this was create by "Jaws PDF Library" :-) | 08:35.20 |
kens | ROFL | 08:35.34 |
| Probably it just inherited it from a previous PDF file, but its impossible to know | 08:35.54 |
tor8 | well, you could refuse to open files that are obviously not PDF, or obviously broken. but mupdf tries very hard to accept broken files. | 08:36.18 |
chrisl | No, it looks like it's been hand hacked to roll through all the modes with a t3 font | 08:36.26 |
kens | Hmm it sounds like a file I may have created, this all sounds teribly familiar | 08:36.55 |
chrisl | kens: it's comparefiles/pdf-t3-simple.pdf | 08:37.13 |
kens | That sounds like its mine, let me quickly look | 08:37.32 |
| Yes, I'm pretty sure I made that one | 08:38.12 |
chrisl | Well, pretty much everybody seems to get different output for it | 08:38.29 |
kens | I think differerent versions of Acrobat display it differently too | 08:38.48 |
| Acrobat X looks 'correct', all the text is blue, no strokes, no clipping | 08:39.08 |
chrisl | What do you get in AcroX? | 08:39.09 |
kens | 6 lines of blue square, blue triangle C one blank line in the middle | 08:39.30 |
chrisl | Ah, Acro9 has blue, red, red, blank, blue/green, red/green, red/green, blank/green | 08:40.19 |
kens | Network is having trouble today | 08:41.29 |
| Acrobat X looks right, other versions look wrong | 08:41.51 |
| And I'm pretty sure that's my test file. I think I was trying to investigate what Acrobat did. | 08:42.22 |
chrisl | Could you send me the Acro X output? It should be fairly easy to get our output the same | 08:42.30 |
kens | OK one second | 08:42.38 |
chrisl | Shame I just spent half an hour getting our output to match Acro 9 :-( | 08:42.50 |
kens | Oh thart's bizarre | 08:42.57 |
| I reopened hte file and its different..... | 08:43.09 |
| Now its applying the clipping, it didn't before | 08:43.21 |
| ROFL | 08:43.47 |
| If I open the file by double-clicking it displays one way, if I use the 'open' dioalog, it displays differntly.... | 08:44.13 |
| You can't make this stuff up ;-) | 08:44.28 |
chrisl | Oh, that's just..... <sigh> Adobe all over, really..... | 08:44.45 |
kens | chrisl tiff file on its way to you | 08:52.13 |
chrisl | kens: thanks - very different from Acro 9, and inconsistent with itself - I guess I'll try to match the all blue one, since that actually matches the spec..... | 08:59.43 |
kens | Yes, I think that we should match teh spec, especialy since the most recent version of Acrobat (mostly) does | 09:09.27 |
| How they manage to get different results depending on how you open the file escapes me.... | 09:09.53 |
chrisl | Because they are Adobe, and defeating logical reasoning is their forte...... | 09:15.40 |
| kens: although, interestingly, even your "best" output from Acro X doesn't match the spec - it's still honouring tr mode 3 | 09:48.19 |
Robin_Watts | chrisl: I had to do some stuff in mupdf recently to match type3 fonts that used stroking etc. | 09:54.39 |
| I don't remember it being to do with tr, but it may be related. | 09:54.52 |
kens | chrisl, yes you're quite right, I didn'rt consider that | 09:55.03 |
Robin_Watts | previously we always used to cache the bitmaps produced from a type 3 font, but you can't do that if they rely on the color set in the environment in which they are called. | 09:55.25 |
| So, I now check for things like color being set before it is used; if it isn't set, we don't cache the bitmap. | 09:55.54 |
| and we draw it fresh each time. | 09:56.09 |
chrisl | Robin_Watts: such glyphs should begin with a d0 call - instead of d1 | 09:56.15 |
Robin_Watts | I can't remember the details offhand, but I suspect that these weren't that simple. | 09:56.50 |
chrisl | kens: of course, there's no reason not to honour tr mode 3 (unlike the other modes) | 09:56.59 |
kens | Yes, that's the only one that makes any sense | 09:57.13 |
chrisl | Robin_Watts: "A glyph description that begins with the d1 operator should not execute any operators that set the color (or other color-related parameters) in the graphics state; any use of such operators is ignored."" | 09:58.32 |
kens | D'oh, forgot to write an else clause, no wonder I'm getting funny rsults. | 09:59.43 |
Robin_Watts | Bug 692745 | 10:13.02 |
| http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commit;f=pdf/pdf_interpret.c;h=2c836b57d5295b47655988cf8deaffda731e1c3c | 10:13.08 |
| It's just started to rain, so it's clearly time for me to run :( | 10:14.31 |
chrisl | Robin_Watts: well, it seems to me that could result in wrong rendering - albeit it really stupid files...... | 10:47.12 |
Robin_Watts | paulgardiner: 69 Gig in fact :) | 12:13.42 |
paulgardiner | Eeek | 12:13.53 |
Robin_Watts | I've got the interesting stuff (for you) down onto a single layer blu-ray though, burning now. | 12:14.03 |
paulgardiner | Great. Thanks | 12:14.14 |
| I'll pop over as soon as it's done if that's ok | 12:15.13 |
Robin_Watts | no problem. | 12:15.25 |
| sebras: I'm going to look at that shading bug now (just checking in, in case you had looked) | 12:25.29 |
sebras | Robin_Watts: go ahead. | 12:34.35 |
jen_ | I use GS for merging PDF files, is this process lossless? | 12:43.45 |
kens | Depends what you mean | 12:43.58 |
jen_ | so, file1.pdf +file2.pdf is merged to file3.pdf. Then file 1.pdf + file3.pdf = file4.pdf, etc. What happens to quality of file1 after many runs. | 12:44.20 |
kens | Probably nothign much | 12:44.33 |
| But best answer is 'don't do that' | 12:44.42 |
jen_ | file1 image in the final merged file. | 12:44.45 |
kens | Ghostscript fully interprets each PDF dfile to marking operations, and tehn regeneragters a *brand new* PDF fiel from teh marking operations. | 12:45.12 |
| There is no correspondence between teh contensts of the inptu and output, except that the marking operastions shoul have the same result | 12:45.41 |
| Modulo specific command line options which may downsample images etc. | 12:46.10 |
jen_ | thanks kens , great answers. | 12:47.17 |
kens | NP | 12:48.16 |
marcosw_ | kens: A customer asks: "is there support for converting PCL to PDF/A directly?" | 13:49.49 |
kens | marcosw : you can do the conversion, but you can't add some of the field stuff | 13:58.00 |
| Actually colour conversions would be a potentail problem, but PCL is only RGB so as long as you used a RGB IC profile it shoudl be OK | 13:59.56 |
| It might work but I'd have rto think about it for a minute | 14:01.08 |
chrisl | kens: what edition of Acro X do you have? | 14:01.12 |
kens | pro I think | 14:01.20 |
chrisl | ta | 14:01.26 |
kens | yes, pro | 14:02.26 |
marcosw_ | kens: thx. | 14:02.33 |
kens | marcosw No it won't work because we add some stuff via a pdfmark, which won't (obviously) wotk in PCL | 14:04.53 |
| We could probably modify teh PCL interperter and pdfwrite to do it though | 14:05.10 |
Robin_Watts | AH, I see the problem with this shading. | 14:08.06 |
| tor8: ping | 14:14.23 |
| tor8: Did you ever get a chance to look at my patch on master? Another tiny one there now. | 14:14.45 |
tor8 | Robin_Watts: I asked yesterday if memsetting the out buffer in fz_predict_tiff wouldn't be faster | 14:49.27 |
| Robin_Watts: mubusy fix lgtm | 14:50.03 |
Robin_Watts | tor8: it might be, but then I'd have to know how big the outbuffer was. | 14:50.14 |
| (sorry, obviously missed that comment yesterday) | 14:50.28 |
tor8 | Robin_Watts: the 'len' argument, but hm I wonder if there aren't more fishy problems with the tiff predictor | 14:51.38 |
| it doesn't use the len argument! | 14:51.43 |
| Robin_Watts: I'm dithering about how to solve the Symbol font problems | 14:52.13 |
| in a way, I don't want to add specific workarounds for this specific font. it will work if we look for and find system fonts instead of using only the built in ones, but really it's the files fault for not embedding the odd fonts or providing proper file descriptors and encodings. | 14:54.24 |
henrys | paulgardiner:found form test files here https://live.gnome.org/Evince/Forms/ ... toward the bottom of the page. | 14:54.34 |
Robin_Watts | tor8: Right. I'm not feeling any huge pressure to try to fix any of these files if they are actually broken (and lots are). | 14:55.26 |
tor8 | I have a simple generic workaround which will make it pick the built in symbol font but not apply the synthetic bold/italic, by providing alias names like we do for the other built in fonts. | 14:55.28 |
paulgardiner | henrys: Oh yes. That looks useful | 14:55.53 |
tor8 | and if you compare appendix H.3 in the pdf reference for the "standard type1 fonts" -- the list of valid aliases got halved in pdfref15 as compared to earlier specs | 14:56.02 |
Robin_Watts | but, if there are cheap fixes we can do that make us look better compared to gs and acrobat, then its worth considering. | 14:56.21 |
tor8 | Robin_Watts: right. well, the easy fix that I like will get the right glyph out, just in the "wrong" style as compared with finding the real font on disk. | 14:56.50 |
Robin_Watts | but font substitution is an area I'm trying to stay out of, so I'll bow to your decision here. | 14:56.52 |
henrys | I don't know about having these meetings during the tour de france ;-) but let's get started. | 14:59.13 |
Robin_Watts | henrys: The bloke in the yellow jumper always wins. | 14:59.48 |
henrys | I can't believe I work with the 4 europeans that don't watch the tour. | 15:00.18 |
| paulgardiner:probably important for those irs forms to work correctly. | 15:00.50 |
chrisl | henrys: I watch some of the Tour - but it makes me cringe when they fall..... | 15:01.08 |
henrys | you'd cringe a lot this year | 15:01.32 |
Robin_Watts | paulgardiner: Just pushed your calculation stuff. | 15:02.43 |
paulgardiner | Robin_Watts: thanks | 15:02.55 |
chrisl | henrys: Also, they're always picking on the British guys..... | 15:03.02 |
Robin_Watts | paulgardiner: What item does the validation stuff fall under ? | 15:03.57 |
| 2? | 15:04.11 |
paulgardiner | Yeah | 15:04.21 |
Robin_Watts | All other things being equal, I'd like to vote for that being prioritised, as it effects the testing. I don't know how others feel. | 15:05.13 |
paulgardiner | That includes reporting the constraints to the app and checking on input | 15:05.26 |
| Robin_Watts: I have no objections, and I'd like to see the changes to the tests that you are planning. | 15:06.04 |
henrys | paulgardiner:so was it difficult to the utility functions? Are those examples where we don't have a spec? | 15:06.11 |
paulgardiner | henrys: no real problems so far | 15:07.10 |
henrys | Robin_Watts:yes I agree with that. | 15:07.10 |
Robin_Watts | (for the benefit of the others that haven't been privy to paul and I talking on the phone) it's the reporting of constraints bit that interests me - it would enable me to generate mjs test files smarter, so we try and put numbers into number fields and dates into dates etc. That in turn would show off the calculation stuff better - at the moment we get lots of 'NaN' in the tests. | 15:07.12 |
paulgardiner | henrys: When I thought about it more after the meeting, I could see it would be strange to take the trouble to write all our own C only to try to then steal their javascript. | 15:08.35 |
tor8 | Robin_Watts: paulgardiner: is that possible (detecting constraint type) with a simple string search or do we need a mudraw-v8 for it? | 15:08.43 |
paulgardiner | tor8: Some of the constraints are held as flags in the dictionaries | 15:09.28 |
| I was thinking we'd restrict ourselves to those for now. | 15:09.50 |
| Although we could notice cases where particular javascript functions are used: the formatting code is often a single call to a utility fn | 15:10.36 |
tor8 | paulgardiner: okay | 15:11.42 |
paulgardiner | ... hmmm, actually I may be misremembering that constraints like "digits only" are held as flags. I'll need to take another look at the spec. | 15:11.48 |
| Yes, it may make sense to do pattern matching on the javascript formatting code sooner rather than later. It looks easy enough. | 15:13.48 |
henrys | anything else to discuss it looks like we can make this a short meeting? It looks like you should have the complete spreadsheet example by next meeting. That will be great. | 15:15.35 |
Robin_Watts | I have nothing else. | 15:15.55 |
henrys | BTW US folks will probably be out tomorrow. | 15:16.04 |
paulgardiner | henrys: Actuall that looks to be working. I just had the sense of a test around the wrong way | 15:16.12 |
henrys | oh super | 15:16.56 |
Robin_Watts | paulgardiner: There is a new patch on your forms branch? Does that need testing/review ? | 15:17.34 |
paulgardiner | henrys: there is a new problem just come up: the strings coming back from the js are utf8, and sometimes straying outside ascii. I think at some stage I'll need to make the appearance stream sythesis respect the utf8 chars correctly | 15:17.49 |
| Robin_Watts: yeah, would be handy | 15:18.08 |
| henrys: a case of it I've seen so far is \u20ac used in formatting an amount in euros | 15:19.08 |
Robin_Watts | paulgardiner: The bytes that go into a pdf string are allowed to be any value from 0..255 (and indeed they have to be to allow for encryption. | 15:19.39 |
henrys | paulgardiner:I wouldn't expect to see a lot of that on form input ... | 15:20.05 |
Robin_Watts | If we are getting top bit chars back from javascript that when decoded give us values in the 0-255 range, that's fine. | 15:20.41 |
paulgardiner | Robin_Watts: I'm imagining that I can put the utf8 in the value, but when I generate the appearance, I'll need to find the write gid or cid (whatever) for the utf8 chars | 15:20.43 |
| We're getting 3-byte encodings back from js | 15:21.12 |
Robin_Watts | If we're getting >=256, then we need to generate hexstrings or something? | 15:21.28 |
paulgardiner | It's still coded in strings of 256bit values | 15:22.04 |
Robin_Watts | Hmm. All strings in pdf are a series of bytes. | 15:22.06 |
paulgardiner | I don't think there's a problem handling the value, but generating the appearance requires going from utf8 to the font index. | 15:22.59 |
Robin_Watts | ok, I'll bow to your knowledge here. | 15:23.18 |
paulgardiner | ... I think... I may be misunderstanding.... | 15:23.29 |
| Robin_Watts: Don't do that! I was hoping you'd say either "Yes that's right" or "No, you should do it like this" :-) | 15:24.08 |
henrys | I thought there was a bit more to pdf strings than bytes going to the manual | 15:24.28 |
Robin_Watts | Nah, if you remember, whenever I got near fonty stuff in the Picsel PDF stuff I left it all to you :) | 15:24.33 |
| henrys: I just checked the manual :) but it's a bad manual, so another set of eyes would be good. | 15:25.09 |
| So, if we close the meeting early, I can talk paul through driving the cluster? | 15:25.44 |
tor8 | paulgardiner: you could use UTF16 for the appearance stream strings and set up the font descriptor correctly for it | 15:25.45 |
henrys | Robin_Watts:sounds good | 15:25.55 |
tor8 | paulgardiner: or if you want to reuse the existing ones, you get into encoding madness | 15:26.00 |
paulgardiner | tor8: Ah right. That might be necessary if the font is large | 15:26.33 |
henrys | meeting closed if tor8 is good. | 15:27.08 |
tor8 | paulgardiner: I think I tried to push you in that direction in London, to recreate new fonts and font descriptors from scratch | 15:27.10 |
paulgardiner | tor8: Yes I remember that... even though I was singing "la la la I can't hear you" with my fingers in my ears. | 15:28.22 |
tor8 | paulgardiner: well, now you know why :) encodings in pdf are crazy stuff. | 15:28.51 |
| paulgardiner: when loading the form value we should probably run it through the ToUnicode cmap to get a utf-8 string to start with | 15:30.31 |
| and then replace the fonts with our own fonts if they aren't compatible (i.e. not using a standard unicode encoding) | 15:31.06 |
| and recreate the appearance stream using our own fonts from the utf-8 string | 15:31.21 |
paulgardiner | I wondered whether the value would already be in UTF8 | 15:31.22 |
tor8 | paulgardiner: I don't think PDF can represent UTF-8 in the text objects | 15:31.46 |
paulgardiner | I mean the value held under the V item of the field dict | 15:31.55 |
tor8 | or rather, the CMap machinery for decoding utf-8 | 15:32.03 |
| if it's in a dictionary, it can be either PDF Doc Encoding or UTF16. see pdf_to_utf8. | 15:32.43 |
paulgardiner | The value is sort of held twice, once under V which is just the value as a string, and again under AP which is the graphics commands to draw the text. | 15:33.18 |
tor8 | yeah, so the V can be either pdfdocencoding or utf16, and the AP can be in whatever encoding the font descriptor uses | 15:33.45 |
paulgardiner | The stuff under V would presumably have to be unicode because there is nowhere to look up the encoidng | 15:33.56 |
| Oh ok So it's pdfdocencoding or utf16 | 15:34.30 |
tor8 | and pdf_to_utf8 takes a fz_obj string and returns a utf-8 char* | 15:34.56 |
| by guessing either pdfdocenc or unicode | 15:35.08 |
paulgardiner | Handy | 15:35.32 |
tor8 | going the other way, for the AP, is what my ranting about font descriptors is all about :) | 15:35.52 |
paulgardiner | Is guessing necessary. Is there nothing that tells you which it is? | 15:35.54 |
tor8 | paulgardiner: unicode always has a BOM :) | 15:36.08 |
paulgardiner | Right | 15:36.40 |
tor8 | so you'd need to turn a utf-8 char* back into unicode with a BOM for writing it out as a fz_obj in the V entry | 15:37.17 |
| I don't think we have a function for that yet | 15:37.31 |
Robin_Watts | paulgardiner: ping me when you and tor8 finish. | 15:40.21 |
paulgardiner | Robin_Watts: sure | 15:40.38 |
| tor8: I guess we don't want to do all this unnecessarily, so would be scan first for top-bit-set chars? | 15:41.39 |
tor8 | paulgardiner: premature optimization and all that, for the V field. for the AP I think it may make sense to check the encoding and strings both before deciding to replace the fonts. | 15:42.49 |
paulgardiner | Also makes it a pig to debug. I was for a while coding everything as hex with the same effect | 15:44.33 |
| So what goes in the appearance strings here (xxxxxxx) Tj ? | 15:46.19 |
| Would that depend on the font encoding, or are you saying that would also be either unicode or pdfdocenc? | 15:46.48 |
tor8 | paulgardiner: the (xxxxx) would depend on the font encoding | 15:51.40 |
| paulgardiner: so I think the easiest way to get that done is to make new fonts with known encodings than trying to use the old font objects with potentially really broken stuff | 15:52.26 |
paulgardiner | Right. Yeah, thought so. As Robin said, I implemented most of the font encoding stuff for Picsel's viewer, but it's a long time ago. | 15:52.42 |
Robin_Watts | So, pdf_buffer_cat_pdf_string could take a font encoding argument, and convert if required as it catted ? | 15:52.46 |
paulgardiner | Robin_Watts: Nice. | 15:53.08 |
tor8 | we do have a fair bit of encodings parsed up in the pdf_fontdesc struct | 15:53.11 |
| making a reverse mapping from unicode back down to that should work, but then there's the issue of what to do if a character is missing :) | 15:53.33 |
paulgardiner | But that would be a broken file | 15:54.04 |
Robin_Watts | Omit it? That's all the renderer would do. | 15:54.11 |
paulgardiner | I'd have thought any font used in a form would have well defined unidode mappings | 15:54.22 |
tor8 | paulgardiner: well, consider a form where the fonts are only ascii encoded and someone tries to enter a funny character like é. | 15:54.44 |
Robin_Watts | paulgardiner: Bzzt. Expecting sanity from Adobe. Docked 5 points. | 15:54.47 |
tor8 | we could expect form generators to do the sane thing, but we ought to check what adobe does | 15:55.22 |
paulgardiner | That's twice in half an hour. I was expecting then to allow utf8 in strings rather than a difficult to determine choice pdfdocenc or unicode wit | 15:56.19 |
tor8 | for the encoding we've got ways in the pdf_font_desc struct to map from font encoding to glyph id, and from font encoding to unicode. doing a reverse lookup could be potentially very slow and awkward. | 15:56.23 |
| and then you have to get them out into the right multi-byte crap too | 15:57.05 |
| all of which is possible, but icky | 15:57.29 |
paulgardiner | tor8: But don't you need to do that when creating your new font. Presumably it has to be based on the old one (but with a different encoding) so as to look correctl | 15:57.53 |
tor8 | paulgardiner: no, you really need to do this in pdf_buffer_cat_pdf_string | 15:58.29 |
| take a utf-8 character in, reverse look up the font encoding from the fontdesc struct, and figure out the multi-byte encoding to use to put in the buffer | 15:59.02 |
paulgardiner | Sorry. I'm not explaining what I mean well. | 15:59.48 |
henrys | on to the next meeting? | 16:00.02 |
tor8 | paulgardiner: ah, right. I misread. yes, you can avoid all that if you make your own font and fontdescriptor | 16:00.20 |
paulgardiner | tor8, Robin_Watts catch you tomorroe | 16:00.21 |
Robin_Watts | night. | 16:00.40 |
henrys | Robin_Watts:so we gave up on regular clusterpush for windows and have something entirely different? | 16:00.41 |
paulgardiner | tor8: but with your own font it may not look right, unless it's based on the old, and then you need to process it's encoding | 16:01.01 |
Robin_Watts | henrys: You can use clusterpush.pl if you have cygwin set up. | 16:01.18 |
paulgardiner | really must go. cyl | 16:01.25 |
tor8 | paulgardiner: well, we could pick the nearest of the base 14 fonts and hope for the best :) | 16:01.26 |
| paulgardiner: cya | 16:01.29 |
Robin_Watts | (as you need cygwin for rsync, and even then it'll only work if you're lucky). | 16:01.40 |
henrys | Robin_Watts:oh okay. | 16:02.01 |
Robin_Watts | So I came up with a mechanism that uses git to transfer to casper, and then does a normal clusterpush from casper. | 16:02.11 |
henrys | Robin_Watts:right. | 16:02.53 |
Robin_Watts | So, meeting time ? | 16:03.11 |
henrys | yes, giving mvrhel a few minutes. | 16:03.30 |
Robin_Watts | oh, waiting for mvrhel right. | 16:03.34 |
henrys | oh he's here. | 16:03.42 |
| phone call that I have to take go ahead without me. | 16:04.28 |
| i'm back | 16:05.21 |
mvrhel | oh I am here | 16:05.26 |
| sorry | 16:05.28 |
henrys | ray_work? | 16:05.37 |
| texted ray | 16:06.59 |
| chrisl:what's the progress of the font integration now? | 16:07.37 |
chrisl | Working on the UFST, and getting things going with MT fonts - freetype is working, but not well tested yet | 16:08.13 |
| I haven't done the artificial boldening stuff yet, either | 16:08.33 |
henrys | chrisl:I think that can be safely skipped the first round, just use the old stuff? | 16:09.16 |
chrisl | I have to move the old stuff around, but yes, that's my plan | 16:09.38 |
henrys | alexcher:you said you were going to make a public branch of the mupdf parser with gs? | 16:10.31 |
| tor8:now that I have you trapped, any more thoughts about using your viewer for the other languages? | 16:11.09 |
alexcher | henrys: yes, I remember that I need to do it. I still need to make the first version more usable. | 16:12.14 |
tor8 | henrys: I doubt it'll be worth the effort. it means having two back ends for all the gui stuff, and complicates all the code for it. | 16:12.39 |
henrys | alexcher:it doesn't have to work. | 16:12.49 |
alexcher | henrys: OK | 16:13.06 |
Robin_Watts | tor8: does it? I believe henrys is suggesting that we do ANY_FORMAT -> pdf then view the pdf. | 16:13.41 |
tor8 | alexcher: henrys: you can push to a 'user' git repo, which is visible but not in the gold repo. like we do with mupdf. | 16:13.44 |
henrys | tor8:what Robin_Watts said. | 16:14.01 |
tor8 | Robin_Watts: if it's a 'convert to pdf in a forked thread then open with mupdf' then yeah, sure, no biggie. but it means having to find a ghostscript installation :) | 16:14.19 |
| :( I mean | 16:14.38 |
henrys | tor8:I am assuming we'd use the api and not fork. | 16:15.00 |
| exactly to avoid "finding" | 16:15.43 |
tor8 | henrys, alexcher: if you create a git clone in ~/repos/ghostpdl.git it'll show up on git.ghostscript.com (just look at tor, sebras, robin, paulg or chrisl's ~/repos/ directories for an example) | 16:16.00 |
| henrys: right. | 16:16.08 |
Robin_Watts | It's akin to how acrobat calls distiller on postscript input files. | 16:16.16 |
norbertj | henrys: hello, did you get my mail on the optional truetype loading (just checking)? | 16:16.28 |
tor8 | Robin_Watts: like what apple's preview does on postscript files too | 16:16.35 |
henrys | yes norbertj and I passed it on to chrisl who is actually doing our new font integration with freetype. | 16:17.02 |
chrisl | is not sure we should be looking to Adobe and Apple for inspiration on best practices........ | 16:17.19 |
norbertj | perfect. Will see what you think of it. | 16:17.51 |
| have to eat.. | 16:18.03 |
henrys | chrisl:I've sort of studied the alternative and that looks like the right way to go. Streamed languages like postscript and PCL are really not appropriate for viewing apps. | 16:18.19 |
| certainly open to other suggestions. | 16:18.37 |
| kens:did you have anything for the meeting, I know you like to leave on time? | 16:18.57 |
chrisl | henrys: I do agree with the approach, I'm just not keen on using "it's how Adobe does it" as an argument in its favour! | 16:19.16 |
Robin_Watts | Any sane alternative is going to involve us rendering to some intermediate displaylist format and then 'viewing' that. Using pdf as that alternative seems reasonable. | 16:19.24 |
kens | henrys the only thing I have is to say that I'm on holiday Thurs Friday this week and Monday Tuesday next week | 16:19.52 |
| I will have intermittent email but don't plan to be on irc | 16:20.16 |
| SO if you want me, email me :-) | 16:20.25 |
henrys | tor8:I'd like a definite okay from you because it effects business decisions so give it some thought and let me know. | 16:20.27 |
rayjj_ | Robin_Watts: right, PDF effectively becomes the high level display list | 16:21.06 |
henrys | kens:okay and that reminds me -- the US folks will be celebrating our independence from you tomorrow ;-) | 16:21.27 |
tor8 | henrys: it's not a very nice experience (apple and adobe's running distill jobs before opening), compared with something like gv back in the 90's where you could view ps files with dsc comments instantly | 16:21.51 |
chrisl | henrys: the only problem, from a business perspective, is that it only really partially showcases GS - it's more a showcase for mupdf (which we also want, but....) | 16:22.04 |
Robin_Watts | And it means we get all our bugs in one easy to use package. | 16:22.27 |
chrisl | Robin_Watts: not really, too many differences between high level devices, and rendering :-( | 16:23.07 |
Robin_Watts | Presumably we need language switch to be in a reasonable state then ? | 16:23.10 |
rayjj_ | chrisl: the advantages of GS for some uses is _not_ (IMHO) as a viewer | 16:23.11 |
henrys | tor8:with this we get support for all the language with text search. | 16:23.12 |
tor8 | so I think my biggest concern is, do we have an appetizing api to render a page at a time with postscript and pcl? | 16:23.23 |
Robin_Watts | tor8: -dFirstPage -dLastPage ? | 16:23.44 |
kens | tor8 pdfwrite cna use %d now | 16:23.46 |
tor8 | henrys: yeah, going via pdfwrite does get us everything we want, except performance :) | 16:24.01 |
Robin_Watts | oh, what kens said is much better. | 16:24.02 |
rayjj_ | having a single viewer that works for PCL and PS seems to be worth having | 16:24.20 |
tor8 | does -dFirstPage work on PS? | 16:24.31 |
henrys | tor8:I don't know about that %d in the background should do pretty well kens? | 16:24.34 |
kens | henrys, see above | 16:24.42 |
| tor8 yes -dFirstPage also works with PS I believe | 16:25.06 |
rayjj_ | but an important thing is to be able to show the first page BEFORE 'distilling' the entire input file | 16:25.13 |
Robin_Watts | tor8: Ignore -dFirstPage etc. Just generate the whole file as a series of pdf files. | 16:25.28 |
tor8 | kens: okay, that'd improve matters or at least give us more options | 16:25.29 |
kens | If we use %d then teh first file will appear when the firs page is completed | 16:25.33 |
rayjj_ | tor8: -dFirstPage (and Lastpage) only works with PDF | 16:25.36 |
Robin_Watts | pdfwrite using %d and then open the first page while the later ones are still processing. | 16:25.50 |
tor8 | right. so %d it'd have to be. | 16:25.54 |
henrys | yes we definitely want %d | 16:25.56 |
alexcher | -dFirstPage _doesn't_ work with PS. | 16:26.20 |
rayjj_ | I agree with ken -- that emitting each page as a separate PDF makes sense | 16:26.20 |
henrys | we should be able to really outperform Adobe and Apple with that approach (on PS) | 16:26.45 |
rayjj_ | it's possible (using an EndPage proc with setpagedevice) to skip leading pages, but it still has to do all the work for all previous pages (then just throws it away). Doesn't save much time | 16:28.04 |
tor8 | henrys: it's not impossible, and it's an isolated task to hand off to robin or someone else to add if I'm too laz^H^H^Hbusy | 16:28.07 |
henrys | I don't know it seems Robin_Watts has a pretty full plate, but maybe so. | 16:29.07 |
| tor8:the build might be a hassle. | 16:29.20 |
tor8 | henrys: well, anyone really. I know we're keeping Robin busy :) | 16:29.26 |
henrys | maybe a fork for the first go would be a lot easier. | 16:29.42 |
Robin_Watts | henrys: At the moment I have an empty plate, but I'm standing in front of a large buffet of broken/slow files from customer 394. | 16:30.00 |
tor8 | henrys: we have to solve build issues for the gs bridge too, hopefully something can be learned from that. fork and exec is probably easiest (and reduces the download footprint) | 16:30.01 |
Robin_Watts | How fast you want them eaten determines my business. | 16:30.09 |
kens | the viewerr will have to be smart, normally it has an N page PDF file int htis case it will have N 1-page PDF files. It will have to kno. | 16:30.11 |
Robin_Watts | or busyness :) | 16:30.21 |
rayjj_ | henrys: why would the build be a hassle ? wouldn't we just fire off a process ? | 16:30.34 |
Robin_Watts | rayjj_: I assumed henrys meant "build" as in "build of the viewer binary" | 16:31.02 |
tor8 | kens: we have a fz_document abstraction that already hides pdf and xps differences, making that one expose the same api but with multiple 1-page pdf files should be no more than a day's job. | 16:31.12 |
kens | well that's good news anyway :-) | 16:31.28 |
henrys | rayjj:it would be a hassle to build in gs and for now it is simpler to fork a process. | 16:31.43 |
tor8 | rayjj_: we were considering using gs as a library not external process. | 16:31.53 |
kens | Forking a process means we don't have to have a language switch :-) | 16:32.01 |
Robin_Watts | Yes, the lack of a functioning language switch library build is a problem. | 16:32.42 |
rayjj_ | tor8: but we'd (probably) want it to run asynchronously in a separate thread, so having it run as a process isn't any worse (and may be preferred) | 16:32.49 |
tor8 | rayjj_: indeed. | 16:33.02 |
Robin_Watts | (Well, we do have a functioning language switch lib build, cos I've used it, but it's not ideal) | 16:33.09 |
chrisl | Depends on your definition of "functional"...... | 16:33.39 |
henrys | Robin_Watts:I should look at 394 priority I need to talk to miles, do you have some sort of ballpark estimate for the work you know about? | 16:34.06 |
rayjj_ | the only thing about running as a process is knowing when the temp-%d.pdf is complete | 16:34.08 |
chrisl | rayjj_: when the next one appears | 16:34.31 |
Robin_Watts | henrys: Let me summarise the meeting... | 16:34.32 |
henrys | when temp-%d+1 is complete. | 16:34.35 |
kens | When the process exits or the next file turns up | 16:34.49 |
henrys | s/complete/started | 16:34.53 |
Robin_Watts | they are using mupdf v0.9, and broadly they seem happy with it, except in some cases where performance isn't great, probably largely because of floating point. | 16:35.05 |
rayjj_ | chrisl: currently the output pdf is created when the page starts, but the file is empty. | 16:35.06 |
Robin_Watts | They gave us several hundred problem files (some where we give different results to "ImageMagick" (i.e. gs), some where we are slow) | 16:35.33 |
chrisl | rayjj_: yes, so when the file for page two appears, page one is ready to use | 16:35.33 |
rayjj_ | chrisl: then pfwrite reads its temp files and gradually builds the pdf when the page is closed | 16:35.41 |
Robin_Watts | v1.0 solves about half of these problems (based on the sample I've looked at) | 16:35.49 |
rayjj_ | chrisl: yes, that would work. | 16:35.58 |
henrys | Robin_Watts:well bugs can fairly be split between you and tor8 right? | 16:36.00 |
Robin_Watts | right. at the moment I'm running through the files looking for which ones have problems. | 16:36.25 |
| They plan to try to move to 1.0, and in the process pass back to us performance optimisations they have made. | 16:36.48 |
| I think a lot of those are going to be in the form of avoiding (or reducing) floating point. | 16:37.12 |
| They have some thirdparty lib opts which they want to give us for us to pass back to lib maintainers. | 16:37.31 |
rayjj_ | kens: does pdfwrite in %d mode still emit one extra (empty) output file, or do you delete it ? | 16:37.48 |
kens | ryajj that's a good question | 16:38.04 |
| ray_work | 16:38.09 |
| WHoever :-) | 16:38.14 |
henrys | so it seems to make sense to hold off on any performance changes and just fix bugs ... I guess you're doing that. | 16:38.20 |
Robin_Watts | So, at the moment, I'm not under any pressure, but when they start feeding us stuff, the workload may get heavier. | 16:38.23 |
kens | I have to say I don't know the answer | 16:38.23 |
Robin_Watts | henrys: indeed. | 16:38.27 |
| Sadly, they are using an ARM9 (no FP unit), and they don't appear to have a profiler on it. | 16:38.44 |
| I've passed them some code that I've used to do profiling on the ARM9 before, but I haven't heard anything back. | 16:39.01 |
| It's extremely likely that any profiling I do on windows (or on the beagleboard) will be completely useless as it will be so massively skewed that it will be meaningless. | 16:39.40 |
rayjj_ | kens: pdfwrite does still create one extra PDF that is just a blank page (not an empty file) e.g. annots.pdf creates 7 pdf's | 16:40.13 |
henrys | so work on the viewer and when they come back drop the viewer, there is really no schedule for the viewer, that is if you want to work on it. | 16:40.13 |
Robin_Watts | so yes, I'm just fixing bugs at the moment. | 16:40.14 |
| If tor8 needs me to do stuff on the viewer, just say. | 16:40.38 |
henrys | mvrhel:are you okay with bugs, swamped as usual it appears? | 16:41.04 |
| tor8, Robin_Watts:I assume optimizing for no fp would be a big task in mupdf, isn't it? | 16:42.23 |
mvrhel | henrys: I am doing good. I am working through a few xps transparency things to optimize the group size. I will have a big speed up with a couple files from this | 16:42.44 |
Robin_Watts | henrys: Yes, it would be a major upheaval. | 16:42.47 |
tor8 | henrys: yeah, we really do assume floating point is everywhere | 16:42.55 |
| it's not the 90's anymore... but sadly not everyone agrees. | 16:43.09 |
Robin_Watts | It might be feasible to introduce a level in the draw stuff below which everything goes to fixed point. | 16:43.13 |
henrys | tor8, Robin_Watts:I'm wondering if we shouldn't straight up with them about that. | 16:43.25 |
mvrhel | henrys: then I have one minor issue related to icc profiles and then some features for the one customer that I want to get in before the release | 16:43.40 |
| when are we doing the freeze? | 16:43.46 |
tor8 | much of the low level rasterization work is done in integer or fixed point math, and some more bits could be pushed down | 16:43.56 |
mvrhel | or I guess we dont do a freeze anymore | 16:43.57 |
Robin_Watts | henrys: I think they are aware of the fact that FP is a problem, and I don't think they expect us to rework to fixed. | 16:44.12 |
mvrhel | but when is the candidate tagged? | 16:44.13 |
tor8 | in which case, raw clock speed may be good enough for the remaining floating point bits | 16:44.16 |
henrys | mvrhel:late july or so. | 16:44.21 |
chrisl | mvrhel: we haven't even talked about a target release date, yet....... | 16:44.23 |
Robin_Watts | tor8: 200MHz. | 16:44.35 |
mvrhel | early august sounds better to me.... | 16:44.41 |
henrys | chrisl: August ;-) | 16:44.43 |
tor8 | Robin_Watts: okay, not so much then... | 16:44.47 |
| I remember when Quake 1 came out and required a FP unit to run... | 16:45.09 |
chrisl | So, if we're aiming | 16:45.12 |
Robin_Watts | I think we should wait to see some profiles (or the best approximation to that that we can get) before panicing about optimisations. | 16:45.25 |
henrys | Robin_Watts:okay | 16:45.38 |
chrisl | mvrhel, henrys: so if we're aiming for early August, then I'd want to do an rc around the 1st | 16:45.54 |
henrys | chrisl:are we freezing a week before the rc? | 16:46.34 |
| or at the rc? | 16:46.40 |
mvrhel | at the rc I hope | 16:46.45 |
| that only leave 4 weeks | 16:46.49 |
chrisl | I usually just ask people to be sensible with their commits in the run up to the rc | 16:47.08 |
mvrhel | I will try to be | 16:47.36 |
Robin_Watts | muhahah | 16:47.44 |
henrys | mvrhel:it seems late for new features, we could do a snapshot release for the customer. | 16:48.16 |
mvrhel | well let me try to get them in at least 2 weeks out. | 16:48.48 |
henrys | mvrhel:okay | 16:48.58 |
| way past meeting end time, back to the salt mine. | 16:50.03 |
chrisl | henrys, mvrhel: I'll send a mail round tomorrow stating the plan - we should also ping tkamppeter and check if he has a driver for the release...... | 16:50.11 |
mvrhel | ok . brb | 16:50.11 |
kens | Time for me to go then, night all | 16:58.41 |
tkamppeter | chrisl, what do you mean with whether I have a driver for the release? | 17:14.42 |
chrisl_away | tkamppeter: are there any Ubuntu related deadlines or freezes we need to worry about in the run-up to 9.06? | 17:21.27 |
tkamppeter | chrisl_away, important is to jhave 9.06 ready some days before Feature Freeze, Aug 23. See https://wiki.ubuntu.com/QuantalQuetzal/ReleaseSchedule | 17:35.57 |
chrisl_away | tkamppeter: OKay, our plans fit okay with that. It would probably be wise if you consider starting to take snapshots for early testing soonish..... | 17:37.27 |
| Have to go now.... | 17:37.56 |
Robin_Watts | aargh. tor8, you about ? | 18:13.10 |
| and sebras if interested. | 18:13.17 |
tor8 | Robin_Watts: briefly here | 19:12.35 |
Robin_Watts | S'OK. Sorted it. | 19:12.47 |
henrys | tor8:so are we going with muview for a monicker? | 19:26.30 |
tor8 | henrys: we could, I have no strong opinion either way | 19:27.09 |
henrys | just curious if you had something planned I don't feel strongly either. | 19:27.57 |
Robin_Watts | tor8: Another patch for you to look at on my master branch. | 19:54.11 |
| no hurry. | 19:54.24 |
dawagenaar | I have written a small patch for mupdf to partially implement comment #33 in big #691330. I am new to mupdf development and I have never used IRC before. Can somebody introduce me to basic etiquette here and also tell me what the appropriate method for submitting a patch for discussion? Thanks a lot! | 20:42.03 |
bapt | hi | 21:54.46 |
| the checksum of ghostscript-9.05.tar.bz2 seems to have change, has it been rerolled? | 21:55.21 |
Robin_Watts | dawagenaar: Hi. | 22:57.51 |
| Let me just have a look to see what you're on about :) | 22:58.01 |
| ok, so coming on here and talking to us is a great start. | 22:59.41 |
| You can either attach the patch to the bug, or (probably better in this case as that bug is a large and sprawling thing) make a new enhancement bug and attach the patch to it. Give full details of what it does and why it's needed, and we can then look it over. | 23:00.51 |
| If you add a new bug, then put a note on the existing bug pointing to it. | 23:01.10 |
| The MuPDF developers are on european time (don't know where you are based), so be prepared for delays on irc. we do check the logs though, so you should get an answer to any question when we get back. | 23:02.21 |
dawagenaar | Robin_Watts: Thanks for your advice. I will create a new ticket for discussion of this patch. | 23:50.05 |
| Forward 1 day (to 2012/07/04)>>> | |