IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2016/02/02)20160203 
sebras tor8: there is a new tentative patch over at sebras/master. it doesn't implement the full width property support, but it is a simple start. at least the file in question is now displaying properly.00:45.24 
  tor8: I also took a stab at soft hyphens, just to familiarize myself with the epub code. I think I might be on to something, but there is still something about those glue nodes that I don't fully get.00:46.13 
Robin_Watts sebras: hyphens definitely need to be thought about in the context of shaping.09:40.46 
  Morning tor8.11:04.10 
  tor8, sebras: Am I right in thinking that glue nodes don't have content, or position or styles etc?11:06.08 
  In fact, couldn't we represent a glue node by a bit on a node that says: "And this is followed by a glue node"? 11:06.48 
tor8 Robin_Watts: glue nodes should be able to have different widths (for hairline spaces, em-spaces, etc)11:21.50 
Robin_Watts tor8: OK.11:22.11 
tor8 Robin_Watts: about Java, I added the JavaDevice to get around some trouble I had with constructors and finalizers but I've managed to solve things without adding the extra class now11:22.37 
Robin_Watts So every node could have a width for the glue node that follows it, where width = 0 means no node ?11:22.41 
  tor8: Ah, cool.11:22.55 
tor8 Robin_Watts: also non-breaking spaces that still need to be adjusted for width11:23.05 
  I was hoping to be able to get the actual java exception message out and throw that in fz_throw_java11:23.29 
Robin_Watts tor8: So, width and a flag to say whether they are non-breaking or not?11:24.08 
tor8 and about the style, I too prefer having the opening brace for java functions on a separate line, but this is the conventional java style11:24.22 
  Robin_Watts: yeah. that should save us a bunch of nodes.11:24.40 
Robin_Watts I have 9 bits left in the flagword :)11:24.51 
tor8 :)11:24.58 
  then go for it!11:25.03 
  I did notice you added some 8-bit bitfields, can't those just be unsigned chars?11:25.31 
Robin_Watts They could, but I'm not convinced that compilers are smart enough to pack bitfields into chars/shorts rather than ints11:28.54 
tor8 http://www.oracle.com/technetwork/java/javase/documentation/codeconventions-141270.html#38111:28.56 
  Robin_Watts: ah, right. so you're filling it up until you hit 32 bits.11:29.27 
Robin_Watts tor8: yeah.11:29.38 
  well, I have 32bits in my head as the amount to try to stay within.11:30.01 
  scripts take just over 7 bits.11:30.16 
  lang isn't done yet, and I need to remove the fallback one.11:30.34 
  As to the paren style, we don't follow recommendations anywhere else :)11:31.01 
  brace style, I mean.11:31.07 
  given that we push to a new line for if () {, I don't see why we don't do the same for int function(...) { too.11:31.42 
  cos ultimately you want to be able to easily look back to check stuff matches.11:31.58 
tor8 our C code uses the 'allman' brace style, but for java the convention is more like K&R. personally I like fewer mostly blank lines just so I can add real blank lines so they stand out better to separate code into paragraphs of sorts11:33.27 
Robin_Watts Crap. Stupid windows vs2015 runtime has a bug.11:34.06 
  _read, given a single return in text mode returns 0.11:34.33 
tor8 a "\r" not "\r\n" ?11:39.22 
Robin_Watts The _read implementation checks for \r\n and converts it to \n11:39.57 
  but in this case, it gets just \r, converts it to \n, and forgets that it read it (so returns 0 bytes read)11:40.21 
  It's a clear change in behaviour from previous VS builds.11:40.38 
  and it means the gs code detects it as being EOF.11:40.53 
  and closes the console down.11:41.11 
  I have a hacky fix.11:41.25 
tor8 eww. nasty.11:58.15 
Robin_Watts tor8: I might look at simon91's atof thing now ?12:52.47 
torand Robin_Watts: sure13:13.10 
sebras torand: Robin_Watts: I have been toying around with letting glue nodes be used for hyphens.13:56.16 
Robin_Watts sebras: So you'd insert glue nodes in the middle of words?13:56.45 
sebras I'm not sure if that is a good idea. I'm just playing with it to understand it a bit better.13:56.46 
  Robin_Watts: indeed.13:56.50 
Robin_Watts sebras: How do you know where it is safe to insert a hyphen?13:57.13 
sebras Robin_Watts: with content.text set to "-", and it I'm thinking that maybe it would only show it is at the end of the line.13:57.22 
  Robin_Watts: actually in my case the originaltext contains &shy; which is a soft hyphen.13:57.43 
torand sebras: my original idea was to put soft hyphens as break nodes13:57.56 
  with alternate text content for broken and unbroken13:58.06 
sebras Robin_Watts: they are normally not shown, unless the are used as a line breaking point at the end of the line and then they are rendered as normal hyphens.13:58.12 
Robin_Watts sebras: Gotcha.13:58.28 
sebras torand: ah! yeah, that is a another idea.13:58.33 
  tor8: welcome back!13:58.37 
  tor8: I'll try to use the break nodes instead in that case.13:59.01 
  tor8: but at least now I understand the layout functions a bit better. :)13:59.18 
Robin_Watts The 'interesting' question for hyphens is whether we can support hypenating text that is not marked with &shy;13:59.23 
  auto-13:59.41 
tor8 Robin_Watts: okay, updated java commit that does away with the JavaDevice and rethrows java exceptions using the exception.toString() string to fitz13:59.42 
Robin_Watts matically.13:59.44 
  tor8: will look in just a mo.14:00.02 
sebras Robin_Watts: to do that we'd need to know the language and how hyphenation works in that language, no..?14:00.12 
tor8 Robin_Watts: we could, given the presence of lang="" html attributes, preprocess the text using the TeX hyphenation tables to insert &shy; characters14:00.19 
Robin_Watts tor8: That would be a reasonable approach.14:00.37 
  I did wonder about just encoding hyphenatable positions into the text as an illegal unicode value.14:01.06 
tor8 the "problem" is we'll need to insert some sort of penalty calculation -- when do we prefer to break a word in the middle or leave extra whitespace instead14:01.16 
  breaking at every possible hyphenation point is going to be ugly, there needs to be some tweakable value14:02.02 
Robin_Watts So rather than having "notif" "ic" "ation" we'd have "notifXicAation"14:02.04 
tor8 all as one flow node?14:02.29 
Robin_Watts Yes.14:02.36 
  It means we need to be smart about splitting the last flow node on a line.14:02.55 
tor8 my original thoughts when designing the flow nodes was to be ["notif"] + ["-"/""] + "ic" + ["-"/""] + ["ation"]14:03.12 
Robin_Watts but it avoids needlessly exploding the number of nodes.14:03.19 
Robin_Watts is called for lunch by the boss. bbs.14:03.47 
tor8 the nodes are there to eventually allow for a shortest-path search to optimize the line breaks14:03.57 
Robin_Watts tor8: You envisage more than a simple greedy algorithm eventually then ?14:04.33 
tor8 I do. I hope to get the TeX layout algorithm in there eventually.14:05.00 
Robin_Watts I'm not sure how well the TeX algorithm fits with the unicode stuff and shaping. But that's ignorance on my part.14:05.41 
tor8 and do small typographic tweaks like hanging punctuation (in essence kerning . and , against the margin)14:05.42 
Robin_Watts bbs.14:06.01 
tor8 the tex algorithm is just picking the line breaks and hyphenation points to minimize the number of hyphenations and lines with exceeding amounts of stretching for justification14:06.53 
Robin_Watts tor8: fair enough.14:40.27 
  tor8: Couple of commits on robin/master16:23.45 
tor8 'effectievely' typo in commit message16:29.11 
Robin_Watts Fixed.16:30.11 
tor8 blank line between fz_runetochar and fz_runelen looks like it has disappeared16:30.28 
  not sure why you broke the comment into two lines, are you using an 80-column editor?16:31.00 
  have not read strtof.c but other than my comments above LGTM16:31.28 
Robin_Watts tor8: I am not using an 80 char editor, but I try to keep comments to shorter than 72 chars.16:31.36 
tor8 I dislike broken commenst (because of the extra '*' at the beginning of the line16:32.08 
  sometimes I wish we could just use '//' comments16:32.36 
  for the last comment -- do we really care that much about matching adobe's arguably broken behaviour?16:33.19 
  s/comment/commit/16:33.24 
Robin_Watts oh, that comment change was simon :)16:33.51 
kens Customers complain if we don't :-(16:33.58 
tor8 kens: this is the insane number overflow parsing16:34.22 
kens Yeah, and like I said :-)16:34.31 
tor8 kens: bah! stupid customers!16:34.40 
  kens: too bad I can't just wish them to go away ;)16:34.48 
Robin_Watts tor8: There are files in our test suite that go horribly wrong without it.16:34.49 
kens The reason I changed GS's handlign was to match (better) Adobe16:34.50 
tor8 Robin_Watts: kens: okay, fair enough. it stays then I guess.16:35.03 
kens doesn't like it one bit either.....16:35.14 
Robin_Watts Ok, so I'm going to look at the gs tiling problem bug.16:35.31 
  tor8: Thanks for the review.16:35.40 
mvrhel_laptop tor8: finally getting going on the pdfwrite stuff 16:35.50 
Robin_Watts Let me know when the java stuff is in a state to be picked up again.16:35.53 
kens Im writing documentation, I hate writing documentation :-(16:35.59 
mvrhel_laptop tor8: so I will get it to share the image handling that I addec16:36.05 
  added16:36.07 
Robin_Watts actually, better go and get the dog back from the vets.16:36.09 
tor8 Robin_Watts: downloading the latest android sdk's, ndk's and studio... taking AGES due to slow pipes from google16:36.14 
Robin_Watts As I dropped him at the vets yesterday, I spied an intersting figure on the computer screen: "Billings this period: 1944 quid"16:36.55 
  I told myself that that couldn't possibly be right, but I'm beginning to fear it might be...16:37.37 
tor8 Robin_Watts: if that's the figure for january, then he's not going to stay open much longer I can't imagine...16:38.08 
  mvrhel_laptop: cool!16:38.15 
mvrhel_laptop tor8: do we want to do anything with the type0 font stuff?16:38.30 
Robin_Watts tor8: No. That's the figure that they've billed *me*, I think.16:39.06 
tor8 Robin_Watts: ouch. expensive dogs.16:39.17 
  mvrhel_laptop: define "do anything"16:39.22 
Robin_Watts dog was free. dog upkeep not free.16:39.33 
tor8 Robin_Watts: just like kids, from what I've heard...16:39.48 
mvrhel_laptop tor8: is there anything that I need to do with pdfwrite and the type0 work I did. Right now it works for pdfcreate16:39.49 
Robin_Watts It's like the whole printer/printer toner thing :)16:39.49 
mvrhel_laptop tor8: But I was not sure if there was any other uses for it16:40.02 
  tor8: At some point, it would be nice to add the capability via an api to add in an image or text to a page in an existing document16:40.36 
  that would be useful in gsview16:40.39 
  tor8: I suspect I should be able to leverage what I have for that16:40.52 
tor8 mvrhel_laptop: ah! well, we should look into using it for pdfwrite. I believe pdfwrite's current font handling is not up to task when it comes to outputting text.16:41.05 
  now that you have the ability to create cid fonts, everything that comes into pdfwrite should come out as a cid font. we'll tackle type3 fonts later I think.16:41.50 
mvrhel_laptop tor8: ok. I will work on the images first and then the font16:42.10 
tor8 mvrhel_laptop: yes, having an api to add stuff to pages in existing documents can be done two ways.16:42.24 
  one: interpret the page through the device interface and use pdfwrite to recreate the graphics, and then add drawing commands to the end of that16:42.48 
  two: use the 'pdf_processor' interface to clean up the syntax, and then add extra pdf graphics commands to the end16:43.12 
  the latter is probably better in the long run; it doesn't change the actual contents of the page by reinterpreting them16:43.33 
mvrhel_laptop tor8: yes. that sounds good to me16:43.42 
tor8 so most of the file is going to be intact, with the existing fonts and encodings etc. less risk for trouble.16:43.53 
mvrhel_laptop not sure what you mean by "clean up the syntax" though16:43.55 
tor8 and the 'sanitize' mode to the pdf_processor will make sure to balance the q/Q push states so we can safely add stuff on top at the end16:44.13 
mvrhel_laptop ah ok16:44.20 
tor8 mvrhel_laptop: right. so we have another interface that sits between the pdf content stream parser and device interface16:44.52 
  we currently have three implementations of this interface16:45.08 
  the interface is basically just a bag of function pointers, one per content stream operator16:45.33 
  so we have functions for 'q' and 'Q' and moveto and lineto and fill etc16:45.46 
mvrhel_laptop ok. I recall seeing that16:46.03 
tor8 one of the implementations does the pdf interpretation and turns these commands into calls to the device interface16:46.09 
  one of them just printf's the commands back out to a buffer16:46.26 
  so if we use the second one, we can get a syntax pretty-printed content stream16:46.48 
kens tor8 you may not want to turn incoming non-CIDFonts into CIDFonts, or did I misunderstand you above ?16:47.09 
tor8 kens: you understood correctly. I think turning everything into an Identity-H CIDFont is the road to least complexity.16:47.34 
kens Its less complex true, but.....16:47.45 
tor8 except Type3 fonts, they'll need to be special cased.16:47.47 
  mvrhel_laptop: the code for this processing interface is in pdf-op-run.c and pdf-op-buffer.c16:48.10 
kens If the incoming text is in a regular Font, and uses ASCII character codes, then its searchable in Acrobat. But if you turn it into a CIDFont, then it isn't16:48.19 
tor8 for the two implementations I've mentioned so far16:48.20 
mvrhel_laptop tor8: ok great. what is the third interface?16:48.31 
tor8 kens: even if we create a ToUnicode table?16:48.34 
kens </p>OK if you create a ToUnicode then its fine16:48.45 
tor8 mvrhel_laptop: that's the fancy implementation -- pdf-op-filter.c16:48.46 
  it tracks state changes and omits redundant ones16:49.03 
  so you get a minimized content stream out16:49.10 
kens has an enhancement bug somewhere to create ToUnicode CMaps in pdfwrite if we don't have one and think we can.16:49.48 
tor8 so the pdf_new_filter_processor takes another processor as an argument and forwards the "optimized" calls to it16:50.02 
mvrhel_laptop kens: you can borrow what I wrote for mupdf16:50.03 
kens mvrhel_laptop : That's not the problem :-)16:50.14 
mvrhel_laptop tor8: ok that makes sense16:50.17 
tor8 so if you chain the filter and the buffer processor together you can get a nicely formatted and somewhat cleaned up content stream up16:50.39 
kens GS already has loads of code to write ToUnicode, its identifying the condition where such athing is sensible16:50.42 
tor8 this is what the '-s' flag to mutool clean does16:50.45 
  so if we're editing pages to add images on top, etc. what I think we could do is run the page through said filters and then just concatenate on the graphics commands to draw the image.16:51.22 
  after the 'pdf-op-filter' pass, the q and Q's should be balanced and the graphics state is back to the initial defaults16:51.55 
  kens: yeah, given garbage in we're going to have garbage out. I expect this will fail in some cases; I think mvrhel is building the ToUnicode from the actual font cmaps16:52.50 
mvrhel_laptop tor8: ok. I believe I follow that. So is the added content stored with the existing content in a pdfobj at that point?16:52.59 
tor8 so if those are missing, we're in trouble16:53.01 
kens tor8 fonts don't have CMaps, only CIDFonts :-)16:53.10 
  But it shouldn't be any worse than the original anyway16:53.28 
tor8 mvrhel_laptop: I haven't got that far. the current code just gives you the primitive operations; nothing is tied into automatically updating pd_obj's and associated streams.16:53.40 
mvrhel_laptop tor8: sorry16:53.50 
tor8 kens: lowercase truetype cmaps, sorry for the confusion16:53.58 
kens :-)16:54.05 
tor8 kens: or glyph names16:54.06 
mvrhel_laptop gawd. more font syntax goofiness 16:54.18 
kens Yeah glyph names was my approach, we have the Adobe Glyh List already in GS16:54.25 
tor8 mvrhel_laptop: I took a quick glance through your latest commits.16:56.59 
  It looks like you suffer the same affliction as Robin and most other people who use syntax coloring...16:57.20 
mvrhel_laptop what is that16:57.45 
tor8 you don't put blank lines before a new section is introduced with a comment, for example in pdf_add_cid_to_unicode at the "Now output non-zero entries" comment I would've put a blank line before16:58.10 
mvrhel_laptop tor8: ah ok16:59.02 
tor8 I possibly overdo the blank lines personally; I think about code in paragraphs. and it helps me navigate with my editor, there are quick short cuts to skip to the next/prev blank line16:59.08 
mvrhel_laptop I can see where without color that would be harder to see16:59.14 
tor8 I haven't used syntax coloring in over a decade; I find it extremely distracting.16:59.38 
mvrhel_laptop and I can understand the editor usefullness16:59.39 
  I love it16:59.44 
  I will fix these16:59.50 
tor8 turn it off for a few weeks; code formatting will only improve once you don't have coloring as a crutch ;)17:00.19 
mvrhel_laptop what about at line 203217:00.34 
  do you put a blank line before that comment?17:00.47 
  the line before it is a {17:00.53 
tor8 mvrhel_laptop: just a sec, I'm looking in gitk17:01.01 
  mvrhel_laptop: nah, the '{' on a line of its own is enough of a paragraph separator for me17:02.07 
mvrhel_laptop tor8: ok good. oh also a question. for pdfcreate do we want to add a command line option to use either the simple / or type 0?17:02.23 
tor8 I'd vote for simple, or distinguish using a different command line flag to add the font?17:02.50 
  the use case for the tool is to hand craft simple pdf files; and there the simple fonts would be easier to work with17:03.27 
mvrhel_laptop yes17:03.32 
  tor8: ok. great. thanks for all the details about the options after the pdf content stream parser. I will take a look at this once I get pdfwrite changes in place17:05.04 
  maybe around that time we can merge this work in?17:05.12 
  perhaps after the release though17:05.23 
  I worry about the testing for pdfwrite17:05.29 
  do we currently have any regression testing in place for it?17:05.42 
tor8 mvrhel_laptop: yes. I wouldn't worry much, I certainly hope we don't have anybody actually using our pdfwrite yet.17:05.48 
  I'm not sure about testing, Robin would know17:06.02 
mvrhel_laptop gsview uses it a lot17:06.07 
  or at least I use it quite a bit with gsview17:06.13 
tor8 mvrhel_laptop: ah!17:06.20 
mvrhel_laptop mainly for expanding content17:06.23 
tor8 good :)17:06.24 
  we have a user!17:06.32 
mvrhel_laptop anytime I have a pdf file that I need to fool with I open with gsview and save as expanded pdf17:06.50 
tor8 mvrhel_laptop: typedef struct resource_tables_s resource_tables that publicly visible struct really needs to have our pdf_ namespace prefix17:07.25 
mvrhel_laptop tor8: ok. that makes sense17:07.39 
tor8 same with res_table and res_search_fn17:07.57 
mvrhel_laptop ok17:08.07 
tor8 if structs and functions are local to a file, I don't mind skipping the prefix but we shouldn't pollute the namespace with externally visible symbols17:08.46 
mvrhel_laptop right. 17:08.56 
tor8 I'll probably go and make you rename every single function once we're ready to merge. we haven't been perfectly consistent in our naming,17:09.54 
  but I have tried to make some effort to clean up some of the places lately. as you've no doubt ran into and had to fix your code for.17:10.24 
  I'm sure you've already looked at docs/naming.txt ... I ought to add some more examples and motivations to that document.17:11.09 
  and I'm sure there are things I've forgot to put in it in the first place17:11.30 
mvrhel_laptop tor8: yes. no problem. I will take another look at that document also.17:11.47 
  tor8: tbh I may not have read naming.txt before or it may have been a long time ago. Just been copying the style that I saw. This clears up a few things... 17:17.46 
tor8 mvrhel_laptop: the pdf_obj function naming does not follow the style guide17:18.29 
  Robin prefers functions of the noun_verb kind (object oriented) whereas I prefer verb_noun (it reads better, and I prefer functional programming to o.o.)17:19.19 
  and we've had verb_noun since the beginning even though sometimes it would be clearer with noun_verb17:20.12 
  conversion functions are generally of the x = x_from_y(y) form, easer to keep track of what's what with the alliteration of variable names and the function name parts17:21.32 
  x = y_to_x(y) is just awkward, IMO17:21.46 
mvrhel_laptop certainly it is good to pick one and do it for all17:22.08 
Robin_Watts mvrhel_laptop: You get used to it. tor has just completely renamed everything in the java stuff I did :)17:24.25 
  I note that we now have to_Matrix rather than fz_matrix_to_Matrix.17:24.42 
mvrhel_laptop tor8: added the pdf prefix to the visible structures. question for you. should pdf_resource_table_free not use the "free" wording since it is not really rc'd I see in naming.txt you say that that word is reserved for rc schemes. I could change it to release17:33.52 
  just trying to reduce the amount of rewrite as I move forward17:34.32 
Robin_Watts keep/drop are the rc words.17:36.12 
  free seems fine to me, I think.17:37.03 
mvrhel_laptop ok17:40.04 
  bbiab17:52.04 
tor8 mvrhel_laptop: pdf_drop_resource_table is the preferred naming22:22.33 
  Robin_Watts: we use drop for both free and refcounted things. saves us having to rename (and remember which is which) should we change22:23.13 
 Forward 1 day (to 2016/02/04)>>> 
ghostscript.com
Search: