| <<<Back 1 day (to 2016/02/02) | 20160203 |
sebras | tor8: there is a new tentative patch over at sebras/master. it doesn't implement the full width property support, but it is a simple start. at least the file in question is now displaying properly. | 00:45.24 |
| tor8: I also took a stab at soft hyphens, just to familiarize myself with the epub code. I think I might be on to something, but there is still something about those glue nodes that I don't fully get. | 00:46.13 |
Robin_Watts | sebras: hyphens definitely need to be thought about in the context of shaping. | 09:40.46 |
| Morning tor8. | 11:04.10 |
| tor8, sebras: Am I right in thinking that glue nodes don't have content, or position or styles etc? | 11:06.08 |
| In fact, couldn't we represent a glue node by a bit on a node that says: "And this is followed by a glue node"? | 11:06.48 |
tor8 | Robin_Watts: glue nodes should be able to have different widths (for hairline spaces, em-spaces, etc) | 11:21.50 |
Robin_Watts | tor8: OK. | 11:22.11 |
tor8 | Robin_Watts: about Java, I added the JavaDevice to get around some trouble I had with constructors and finalizers but I've managed to solve things without adding the extra class now | 11:22.37 |
Robin_Watts | So every node could have a width for the glue node that follows it, where width = 0 means no node ? | 11:22.41 |
| tor8: Ah, cool. | 11:22.55 |
tor8 | Robin_Watts: also non-breaking spaces that still need to be adjusted for width | 11:23.05 |
| I was hoping to be able to get the actual java exception message out and throw that in fz_throw_java | 11:23.29 |
Robin_Watts | tor8: So, width and a flag to say whether they are non-breaking or not? | 11:24.08 |
tor8 | and about the style, I too prefer having the opening brace for java functions on a separate line, but this is the conventional java style | 11:24.22 |
| Robin_Watts: yeah. that should save us a bunch of nodes. | 11:24.40 |
Robin_Watts | I have 9 bits left in the flagword :) | 11:24.51 |
tor8 | :) | 11:24.58 |
| then go for it! | 11:25.03 |
| I did notice you added some 8-bit bitfields, can't those just be unsigned chars? | 11:25.31 |
Robin_Watts | They could, but I'm not convinced that compilers are smart enough to pack bitfields into chars/shorts rather than ints | 11:28.54 |
tor8 | http://www.oracle.com/technetwork/java/javase/documentation/codeconventions-141270.html#381 | 11:28.56 |
| Robin_Watts: ah, right. so you're filling it up until you hit 32 bits. | 11:29.27 |
Robin_Watts | tor8: yeah. | 11:29.38 |
| well, I have 32bits in my head as the amount to try to stay within. | 11:30.01 |
| scripts take just over 7 bits. | 11:30.16 |
| lang isn't done yet, and I need to remove the fallback one. | 11:30.34 |
| As to the paren style, we don't follow recommendations anywhere else :) | 11:31.01 |
| brace style, I mean. | 11:31.07 |
| given that we push to a new line for if () {, I don't see why we don't do the same for int function(...) { too. | 11:31.42 |
| cos ultimately you want to be able to easily look back to check stuff matches. | 11:31.58 |
tor8 | our C code uses the 'allman' brace style, but for java the convention is more like K&R. personally I like fewer mostly blank lines just so I can add real blank lines so they stand out better to separate code into paragraphs of sorts | 11:33.27 |
Robin_Watts | Crap. Stupid windows vs2015 runtime has a bug. | 11:34.06 |
| _read, given a single return in text mode returns 0. | 11:34.33 |
tor8 | a "\r" not "\r\n" ? | 11:39.22 |
Robin_Watts | The _read implementation checks for \r\n and converts it to \n | 11:39.57 |
| but in this case, it gets just \r, converts it to \n, and forgets that it read it (so returns 0 bytes read) | 11:40.21 |
| It's a clear change in behaviour from previous VS builds. | 11:40.38 |
| and it means the gs code detects it as being EOF. | 11:40.53 |
| and closes the console down. | 11:41.11 |
| I have a hacky fix. | 11:41.25 |
tor8 | eww. nasty. | 11:58.15 |
Robin_Watts | tor8: I might look at simon91's atof thing now ? | 12:52.47 |
torand | Robin_Watts: sure | 13:13.10 |
sebras | torand: Robin_Watts: I have been toying around with letting glue nodes be used for hyphens. | 13:56.16 |
Robin_Watts | sebras: So you'd insert glue nodes in the middle of words? | 13:56.45 |
sebras | I'm not sure if that is a good idea. I'm just playing with it to understand it a bit better. | 13:56.46 |
| Robin_Watts: indeed. | 13:56.50 |
Robin_Watts | sebras: How do you know where it is safe to insert a hyphen? | 13:57.13 |
sebras | Robin_Watts: with content.text set to "-", and it I'm thinking that maybe it would only show it is at the end of the line. | 13:57.22 |
| Robin_Watts: actually in my case the originaltext contains ­ which is a soft hyphen. | 13:57.43 |
torand | sebras: my original idea was to put soft hyphens as break nodes | 13:57.56 |
| with alternate text content for broken and unbroken | 13:58.06 |
sebras | Robin_Watts: they are normally not shown, unless the are used as a line breaking point at the end of the line and then they are rendered as normal hyphens. | 13:58.12 |
Robin_Watts | sebras: Gotcha. | 13:58.28 |
sebras | torand: ah! yeah, that is a another idea. | 13:58.33 |
| tor8: welcome back! | 13:58.37 |
| tor8: I'll try to use the break nodes instead in that case. | 13:59.01 |
| tor8: but at least now I understand the layout functions a bit better. :) | 13:59.18 |
Robin_Watts | The 'interesting' question for hyphens is whether we can support hypenating text that is not marked with ­ | 13:59.23 |
| auto- | 13:59.41 |
tor8 | Robin_Watts: okay, updated java commit that does away with the JavaDevice and rethrows java exceptions using the exception.toString() string to fitz | 13:59.42 |
Robin_Watts | matically. | 13:59.44 |
| tor8: will look in just a mo. | 14:00.02 |
sebras | Robin_Watts: to do that we'd need to know the language and how hyphenation works in that language, no..? | 14:00.12 |
tor8 | Robin_Watts: we could, given the presence of lang="" html attributes, preprocess the text using the TeX hyphenation tables to insert ­ characters | 14:00.19 |
Robin_Watts | tor8: That would be a reasonable approach. | 14:00.37 |
| I did wonder about just encoding hyphenatable positions into the text as an illegal unicode value. | 14:01.06 |
tor8 | the "problem" is we'll need to insert some sort of penalty calculation -- when do we prefer to break a word in the middle or leave extra whitespace instead | 14:01.16 |
| breaking at every possible hyphenation point is going to be ugly, there needs to be some tweakable value | 14:02.02 |
Robin_Watts | So rather than having "notif" "ic" "ation" we'd have "notifXicAation" | 14:02.04 |
tor8 | all as one flow node? | 14:02.29 |
Robin_Watts | Yes. | 14:02.36 |
| It means we need to be smart about splitting the last flow node on a line. | 14:02.55 |
tor8 | my original thoughts when designing the flow nodes was to be ["notif"] + ["-"/""] + "ic" + ["-"/""] + ["ation"] | 14:03.12 |
Robin_Watts | but it avoids needlessly exploding the number of nodes. | 14:03.19 |
Robin_Watts | is called for lunch by the boss. bbs. | 14:03.47 |
tor8 | the nodes are there to eventually allow for a shortest-path search to optimize the line breaks | 14:03.57 |
Robin_Watts | tor8: You envisage more than a simple greedy algorithm eventually then ? | 14:04.33 |
tor8 | I do. I hope to get the TeX layout algorithm in there eventually. | 14:05.00 |
Robin_Watts | I'm not sure how well the TeX algorithm fits with the unicode stuff and shaping. But that's ignorance on my part. | 14:05.41 |
tor8 | and do small typographic tweaks like hanging punctuation (in essence kerning . and , against the margin) | 14:05.42 |
Robin_Watts | bbs. | 14:06.01 |
tor8 | the tex algorithm is just picking the line breaks and hyphenation points to minimize the number of hyphenations and lines with exceeding amounts of stretching for justification | 14:06.53 |
Robin_Watts | tor8: fair enough. | 14:40.27 |
| tor8: Couple of commits on robin/master | 16:23.45 |
tor8 | 'effectievely' typo in commit message | 16:29.11 |
Robin_Watts | Fixed. | 16:30.11 |
tor8 | blank line between fz_runetochar and fz_runelen looks like it has disappeared | 16:30.28 |
| not sure why you broke the comment into two lines, are you using an 80-column editor? | 16:31.00 |
| have not read strtof.c but other than my comments above LGTM | 16:31.28 |
Robin_Watts | tor8: I am not using an 80 char editor, but I try to keep comments to shorter than 72 chars. | 16:31.36 |
tor8 | I dislike broken commenst (because of the extra '*' at the beginning of the line | 16:32.08 |
| sometimes I wish we could just use '//' comments | 16:32.36 |
| for the last comment -- do we really care that much about matching adobe's arguably broken behaviour? | 16:33.19 |
| s/comment/commit/ | 16:33.24 |
Robin_Watts | oh, that comment change was simon :) | 16:33.51 |
kens | Customers complain if we don't :-( | 16:33.58 |
tor8 | kens: this is the insane number overflow parsing | 16:34.22 |
kens | Yeah, and like I said :-) | 16:34.31 |
tor8 | kens: bah! stupid customers! | 16:34.40 |
| kens: too bad I can't just wish them to go away ;) | 16:34.48 |
Robin_Watts | tor8: There are files in our test suite that go horribly wrong without it. | 16:34.49 |
kens | The reason I changed GS's handlign was to match (better) Adobe | 16:34.50 |
tor8 | Robin_Watts: kens: okay, fair enough. it stays then I guess. | 16:35.03 |
kens | doesn't like it one bit either..... | 16:35.14 |
Robin_Watts | Ok, so I'm going to look at the gs tiling problem bug. | 16:35.31 |
| tor8: Thanks for the review. | 16:35.40 |
mvrhel_laptop | tor8: finally getting going on the pdfwrite stuff | 16:35.50 |
Robin_Watts | Let me know when the java stuff is in a state to be picked up again. | 16:35.53 |
kens | Im writing documentation, I hate writing documentation :-( | 16:35.59 |
mvrhel_laptop | tor8: so I will get it to share the image handling that I addec | 16:36.05 |
| added | 16:36.07 |
Robin_Watts | actually, better go and get the dog back from the vets. | 16:36.09 |
tor8 | Robin_Watts: downloading the latest android sdk's, ndk's and studio... taking AGES due to slow pipes from google | 16:36.14 |
Robin_Watts | As I dropped him at the vets yesterday, I spied an intersting figure on the computer screen: "Billings this period: 1944 quid" | 16:36.55 |
| I told myself that that couldn't possibly be right, but I'm beginning to fear it might be... | 16:37.37 |
tor8 | Robin_Watts: if that's the figure for january, then he's not going to stay open much longer I can't imagine... | 16:38.08 |
| mvrhel_laptop: cool! | 16:38.15 |
mvrhel_laptop | tor8: do we want to do anything with the type0 font stuff? | 16:38.30 |
Robin_Watts | tor8: No. That's the figure that they've billed *me*, I think. | 16:39.06 |
tor8 | Robin_Watts: ouch. expensive dogs. | 16:39.17 |
| mvrhel_laptop: define "do anything" | 16:39.22 |
Robin_Watts | dog was free. dog upkeep not free. | 16:39.33 |
tor8 | Robin_Watts: just like kids, from what I've heard... | 16:39.48 |
mvrhel_laptop | tor8: is there anything that I need to do with pdfwrite and the type0 work I did. Right now it works for pdfcreate | 16:39.49 |
Robin_Watts | It's like the whole printer/printer toner thing :) | 16:39.49 |
mvrhel_laptop | tor8: But I was not sure if there was any other uses for it | 16:40.02 |
| tor8: At some point, it would be nice to add the capability via an api to add in an image or text to a page in an existing document | 16:40.36 |
| that would be useful in gsview | 16:40.39 |
| tor8: I suspect I should be able to leverage what I have for that | 16:40.52 |
tor8 | mvrhel_laptop: ah! well, we should look into using it for pdfwrite. I believe pdfwrite's current font handling is not up to task when it comes to outputting text. | 16:41.05 |
| now that you have the ability to create cid fonts, everything that comes into pdfwrite should come out as a cid font. we'll tackle type3 fonts later I think. | 16:41.50 |
mvrhel_laptop | tor8: ok. I will work on the images first and then the font | 16:42.10 |
tor8 | mvrhel_laptop: yes, having an api to add stuff to pages in existing documents can be done two ways. | 16:42.24 |
| one: interpret the page through the device interface and use pdfwrite to recreate the graphics, and then add drawing commands to the end of that | 16:42.48 |
| two: use the 'pdf_processor' interface to clean up the syntax, and then add extra pdf graphics commands to the end | 16:43.12 |
| the latter is probably better in the long run; it doesn't change the actual contents of the page by reinterpreting them | 16:43.33 |
mvrhel_laptop | tor8: yes. that sounds good to me | 16:43.42 |
tor8 | so most of the file is going to be intact, with the existing fonts and encodings etc. less risk for trouble. | 16:43.53 |
mvrhel_laptop | not sure what you mean by "clean up the syntax" though | 16:43.55 |
tor8 | and the 'sanitize' mode to the pdf_processor will make sure to balance the q/Q push states so we can safely add stuff on top at the end | 16:44.13 |
mvrhel_laptop | ah ok | 16:44.20 |
tor8 | mvrhel_laptop: right. so we have another interface that sits between the pdf content stream parser and device interface | 16:44.52 |
| we currently have three implementations of this interface | 16:45.08 |
| the interface is basically just a bag of function pointers, one per content stream operator | 16:45.33 |
| so we have functions for 'q' and 'Q' and moveto and lineto and fill etc | 16:45.46 |
mvrhel_laptop | ok. I recall seeing that | 16:46.03 |
tor8 | one of the implementations does the pdf interpretation and turns these commands into calls to the device interface | 16:46.09 |
| one of them just printf's the commands back out to a buffer | 16:46.26 |
| so if we use the second one, we can get a syntax pretty-printed content stream | 16:46.48 |
kens | tor8 you may not want to turn incoming non-CIDFonts into CIDFonts, or did I misunderstand you above ? | 16:47.09 |
tor8 | kens: you understood correctly. I think turning everything into an Identity-H CIDFont is the road to least complexity. | 16:47.34 |
kens | Its less complex true, but..... | 16:47.45 |
tor8 | except Type3 fonts, they'll need to be special cased. | 16:47.47 |
| mvrhel_laptop: the code for this processing interface is in pdf-op-run.c and pdf-op-buffer.c | 16:48.10 |
kens | If the incoming text is in a regular Font, and uses ASCII character codes, then its searchable in Acrobat. But if you turn it into a CIDFont, then it isn't | 16:48.19 |
tor8 | for the two implementations I've mentioned so far | 16:48.20 |
mvrhel_laptop | tor8: ok great. what is the third interface? | 16:48.31 |
tor8 | kens: even if we create a ToUnicode table? | 16:48.34 |
kens | </p>OK if you create a ToUnicode then its fine | 16:48.45 |
tor8 | mvrhel_laptop: that's the fancy implementation -- pdf-op-filter.c | 16:48.46 |
| it tracks state changes and omits redundant ones | 16:49.03 |
| so you get a minimized content stream out | 16:49.10 |
kens | has an enhancement bug somewhere to create ToUnicode CMaps in pdfwrite if we don't have one and think we can. | 16:49.48 |
tor8 | so the pdf_new_filter_processor takes another processor as an argument and forwards the "optimized" calls to it | 16:50.02 |
mvrhel_laptop | kens: you can borrow what I wrote for mupdf | 16:50.03 |
kens | mvrhel_laptop : That's not the problem :-) | 16:50.14 |
mvrhel_laptop | tor8: ok that makes sense | 16:50.17 |
tor8 | so if you chain the filter and the buffer processor together you can get a nicely formatted and somewhat cleaned up content stream up | 16:50.39 |
kens | GS already has loads of code to write ToUnicode, its identifying the condition where such athing is sensible | 16:50.42 |
tor8 | this is what the '-s' flag to mutool clean does | 16:50.45 |
| so if we're editing pages to add images on top, etc. what I think we could do is run the page through said filters and then just concatenate on the graphics commands to draw the image. | 16:51.22 |
| after the 'pdf-op-filter' pass, the q and Q's should be balanced and the graphics state is back to the initial defaults | 16:51.55 |
| kens: yeah, given garbage in we're going to have garbage out. I expect this will fail in some cases; I think mvrhel is building the ToUnicode from the actual font cmaps | 16:52.50 |
mvrhel_laptop | tor8: ok. I believe I follow that. So is the added content stored with the existing content in a pdfobj at that point? | 16:52.59 |
tor8 | so if those are missing, we're in trouble | 16:53.01 |
kens | tor8 fonts don't have CMaps, only CIDFonts :-) | 16:53.10 |
| But it shouldn't be any worse than the original anyway | 16:53.28 |
tor8 | mvrhel_laptop: I haven't got that far. the current code just gives you the primitive operations; nothing is tied into automatically updating pd_obj's and associated streams. | 16:53.40 |
mvrhel_laptop | tor8: sorry | 16:53.50 |
tor8 | kens: lowercase truetype cmaps, sorry for the confusion | 16:53.58 |
kens | :-) | 16:54.05 |
tor8 | kens: or glyph names | 16:54.06 |
mvrhel_laptop | gawd. more font syntax goofiness | 16:54.18 |
kens | Yeah glyph names was my approach, we have the Adobe Glyh List already in GS | 16:54.25 |
tor8 | mvrhel_laptop: I took a quick glance through your latest commits. | 16:56.59 |
| It looks like you suffer the same affliction as Robin and most other people who use syntax coloring... | 16:57.20 |
mvrhel_laptop | what is that | 16:57.45 |
tor8 | you don't put blank lines before a new section is introduced with a comment, for example in pdf_add_cid_to_unicode at the "Now output non-zero entries" comment I would've put a blank line before | 16:58.10 |
mvrhel_laptop | tor8: ah ok | 16:59.02 |
tor8 | I possibly overdo the blank lines personally; I think about code in paragraphs. and it helps me navigate with my editor, there are quick short cuts to skip to the next/prev blank line | 16:59.08 |
mvrhel_laptop | I can see where without color that would be harder to see | 16:59.14 |
tor8 | I haven't used syntax coloring in over a decade; I find it extremely distracting. | 16:59.38 |
mvrhel_laptop | and I can understand the editor usefullness | 16:59.39 |
| I love it | 16:59.44 |
| I will fix these | 16:59.50 |
tor8 | turn it off for a few weeks; code formatting will only improve once you don't have coloring as a crutch ;) | 17:00.19 |
mvrhel_laptop | what about at line 2032 | 17:00.34 |
| do you put a blank line before that comment? | 17:00.47 |
| the line before it is a { | 17:00.53 |
tor8 | mvrhel_laptop: just a sec, I'm looking in gitk | 17:01.01 |
| mvrhel_laptop: nah, the '{' on a line of its own is enough of a paragraph separator for me | 17:02.07 |
mvrhel_laptop | tor8: ok good. oh also a question. for pdfcreate do we want to add a command line option to use either the simple / or type 0? | 17:02.23 |
tor8 | I'd vote for simple, or distinguish using a different command line flag to add the font? | 17:02.50 |
| the use case for the tool is to hand craft simple pdf files; and there the simple fonts would be easier to work with | 17:03.27 |
mvrhel_laptop | yes | 17:03.32 |
| tor8: ok. great. thanks for all the details about the options after the pdf content stream parser. I will take a look at this once I get pdfwrite changes in place | 17:05.04 |
| maybe around that time we can merge this work in? | 17:05.12 |
| perhaps after the release though | 17:05.23 |
| I worry about the testing for pdfwrite | 17:05.29 |
| do we currently have any regression testing in place for it? | 17:05.42 |
tor8 | mvrhel_laptop: yes. I wouldn't worry much, I certainly hope we don't have anybody actually using our pdfwrite yet. | 17:05.48 |
| I'm not sure about testing, Robin would know | 17:06.02 |
mvrhel_laptop | gsview uses it a lot | 17:06.07 |
| or at least I use it quite a bit with gsview | 17:06.13 |
tor8 | mvrhel_laptop: ah! | 17:06.20 |
mvrhel_laptop | mainly for expanding content | 17:06.23 |
tor8 | good :) | 17:06.24 |
| we have a user! | 17:06.32 |
mvrhel_laptop | anytime I have a pdf file that I need to fool with I open with gsview and save as expanded pdf | 17:06.50 |
tor8 | mvrhel_laptop: typedef struct resource_tables_s resource_tables that publicly visible struct really needs to have our pdf_ namespace prefix | 17:07.25 |
mvrhel_laptop | tor8: ok. that makes sense | 17:07.39 |
tor8 | same with res_table and res_search_fn | 17:07.57 |
mvrhel_laptop | ok | 17:08.07 |
tor8 | if structs and functions are local to a file, I don't mind skipping the prefix but we shouldn't pollute the namespace with externally visible symbols | 17:08.46 |
mvrhel_laptop | right. | 17:08.56 |
tor8 | I'll probably go and make you rename every single function once we're ready to merge. we haven't been perfectly consistent in our naming, | 17:09.54 |
| but I have tried to make some effort to clean up some of the places lately. as you've no doubt ran into and had to fix your code for. | 17:10.24 |
| I'm sure you've already looked at docs/naming.txt ... I ought to add some more examples and motivations to that document. | 17:11.09 |
| and I'm sure there are things I've forgot to put in it in the first place | 17:11.30 |
mvrhel_laptop | tor8: yes. no problem. I will take another look at that document also. | 17:11.47 |
| tor8: tbh I may not have read naming.txt before or it may have been a long time ago. Just been copying the style that I saw. This clears up a few things... | 17:17.46 |
tor8 | mvrhel_laptop: the pdf_obj function naming does not follow the style guide | 17:18.29 |
| Robin prefers functions of the noun_verb kind (object oriented) whereas I prefer verb_noun (it reads better, and I prefer functional programming to o.o.) | 17:19.19 |
| and we've had verb_noun since the beginning even though sometimes it would be clearer with noun_verb | 17:20.12 |
| conversion functions are generally of the x = x_from_y(y) form, easer to keep track of what's what with the alliteration of variable names and the function name parts | 17:21.32 |
| x = y_to_x(y) is just awkward, IMO | 17:21.46 |
mvrhel_laptop | certainly it is good to pick one and do it for all | 17:22.08 |
Robin_Watts | mvrhel_laptop: You get used to it. tor has just completely renamed everything in the java stuff I did :) | 17:24.25 |
| I note that we now have to_Matrix rather than fz_matrix_to_Matrix. | 17:24.42 |
mvrhel_laptop | tor8: added the pdf prefix to the visible structures. question for you. should pdf_resource_table_free not use the "free" wording since it is not really rc'd I see in naming.txt you say that that word is reserved for rc schemes. I could change it to release | 17:33.52 |
| just trying to reduce the amount of rewrite as I move forward | 17:34.32 |
Robin_Watts | keep/drop are the rc words. | 17:36.12 |
| free seems fine to me, I think. | 17:37.03 |
mvrhel_laptop | ok | 17:40.04 |
| bbiab | 17:52.04 |
tor8 | mvrhel_laptop: pdf_drop_resource_table is the preferred naming | 22:22.33 |
| Robin_Watts: we use drop for both free and refcounted things. saves us having to rename (and remember which is which) should we change | 22:23.13 |
| Forward 1 day (to 2016/02/04)>>> | |