| <<<Back 1 day (to 2011/11/29) | 2011/11/30 |
robin_watts_mac | chrisl_x100e: ping? | 00:13.43 |
chrisl_x100e | robin_watts_mac: pong | 00:14.29 |
robin_watts_mac | you up for food? | 00:14.42 |
chrisl_x100e | sure - something light, I reckon | 00:14.57 |
robin_watts_mac | yeah. 2 mins? | 00:15.08 |
chrisl_x100e | OKay, see you then | 00:15.20 |
Alasdair_ | I have an issue with jbig2dec... is anyone there? | 07:27.34 |
| Hello? | 07:31.23 |
chrisl_x100e | Looks like Ken's flight left pretty much on time - I don't know what flight Tor is supposed to be on from Heathrow. | 13:06.04 |
robin_watts_mac | chrisl_x100e: tors flight landed at 11:38. | 13:08.04 |
| next one leaves at 14:15 | 13:08.12 |
| heathrow reporting no delays. | 13:08.20 |
| so he should be fine. | 13:08.24 |
chrisl_x100e | That's an AA flight? | 13:08.27 |
robin_watts_mac | BA codesharing with AA, yes. | 13:08.37 |
| AA filed for Chapter 11 the other day. | 13:08.44 |
chrisl_x100e | I saw that - the experts expect "no change in service", shame, it could do with improving...... | 13:09.24 |
| I think the independent flight info site I looked at reckoned on a 20 minute delay for Tor's second flight, but still looking positive! | 13:10.23 |
robin_watts_mac | 20 minute delay to *second* flight is good :) | 13:10.54 |
chrisl_x100e | Yeh, finger's crossed the status doesn't change...... | 13:11.49 |
AlecTaylor | hi | 20:36.42 |
ghostbot | privet | 20:36.42 |
AlecTaylor | Are Page numbers stored in a special dictionary inside an XObject? | 20:37.11 |
sebras | AlecTaylor: no, page numbers are drawn on the page using the same content stream commands used to draw any text. | 20:38.40 |
| AlecTaylor: except in certain circumstances where page numbers may be pictures or line art (e.g. vector graphics). | 20:39.16 |
| AlecTaylor: this is what is visible on the page when rendered. | 20:39.36 |
AlecTaylor | So not in a dictionary at all? | 20:40.21 |
| Would it be wrong to say "page numbers from the page stream"? | 20:40.36 |
sebras | AlecTaylor: then you have the page names that appear in Adobe Reader's page-number box. those I believe may be stored in a dictionary somewhere, mostly because these sometimes use roman numerals... | 20:40.43 |
AlecTaylor | Yeah, that's what I'm asking about | 20:41.00 |
| You wouldn't happen to know the specifics of where and how they're stored would you? | 20:41.21 |
sebras | AlecTaylor: so you are not concerned with what is drawn on the page? | 20:41.36 |
AlecTaylor | sebras: I am very much concerned about making the two equal | 20:41.54 |
| :) | 20:41.55 |
AlecTaylor | will then offer this as an extension to MuPDF :P | 20:42.04 |
sebras | AlecTaylor: no, not at this moment. if those page-numbers are stored somewhere then I guess it is possible to find in the pdf spec. but I have never been bothered to look it up. :) | 20:42.42 |
AlecTaylor | :S! | 20:42.52 |
sebras | AlecTaylor: may I ask why you are concerned about making the two equal? is it just a personal gripe of yours or is there an application where it does indeed matter? | 20:44.10 |
AlecTaylor | Table of contents, readability, logical-structure coherence, accessibility, citations | 20:55.56 |
sebras | AlecTaylor: since you have been asking about this before I looked up the dictionary that I mentioned before. the strings that appear in Adobe Reader's page-number box is refered to as page labels, and you can read about them in chapter 8.3.1 of PDF spec 1.7. table 3.25 refers to the tree of page labels. note though, that this has nothing to do with what is actually drawn on the page itself. | 20:56.29 |
AlecTaylor | has successfully extract page numbers out into XML :) | 20:57.00 |
| (now I just need to figure out how to put them back in :P) | 20:57.12 |
| sebras: did my previous answer answer your question? | 20:57.23 |
sebras | AlecTaylor: but can you do that for any PDF? what if the entire page in a PDF is covered by a picture? consider e.g. PDFs of scanned books (which may or may not be searchable, i.e. contain hidden text)... | 20:58.26 |
| AlecTaylor: I believe you should try to understand more about the problem domain before you attack the problem, otherwise you risk investing a lot of time in something that might not be worthwhile doing. or only get a solution to you problem that works for a handful of cases. | 21:00.31 |
AlecTaylor | sebras: I have read over 20 articles on this issue | 21:02.53 |
| sebras: It is standard to employ an OCR step | 21:03.20 |
sebras | AlecTaylor: yup, you would need to do something like that. OCR is an interesting topic on its own, but I doubt that it makes sense to add to a PDF viewer, only for the purpose of renumbering pages. | 21:06.35 |
AlecTaylor | No no, I was thinking the pdfclean would be the item to extend | 21:07.01 |
| :) | 21:07.02 |
sebras | AlecTaylor: ok, so let's continue with my previous example of PDFs of scanned books. let's say that you render the pages, do the OCR step, and figure out what text on the page is the page number. do you then propose to alter those pixels, or just put a bit white box on top of those and use some embedded font to draw the corrected page number in that very same spot? | 21:10.20 |
| and by pixels I'm thinking of the pixel in the image embedded inside the PDF. | 21:10.52 |
AlecTaylor | 3.6.2 | 21:11.42 |
| sebras: Nope! - No need to edit the page itself, just what is shown the PDF reader | 21:12.16 |
sebras | AlecTaylor: just so we are using the same terminology: what you want to change is the contents of the box saying "1 of 10" in this screenshot? http://goo.gl/kYS4b | 21:15.13 |
| oh, and yes 3.6.2 is indeed the subchapter describing the document catalog of a PDF where there is a reference to the page labels. | 21:16.28 |
| AlecTaylor: are we refering to the same thing? | 21:27.33 |
mvrhel_laptop | robin_watts_mac: what are you all doing for dinner tonight? | 21:54.12 |
AlecTaylor | sebras: Sorry, little off and on with IRC, working in a VirtualBox VM, and IRC is on my host | 22:03.55 |
| and yes, we're on the same page (pun intended!) | 22:04.21 |
sebras | AlecTaylor: ok, then I recommend that you take a look at chapter 3.8.1 and 3.6.2 to find out where those things are located in a PDF. | 22:13.15 |
AlecTaylor | Cheers, I will do that :) | 22:28.04 |
mvrhel_laptop | robin_watts_mac: I think we are going out to eat at big pink. you could probably get a milk shake there.... | 22:59.05 |
AlecTaylor | Just come down here for a milkshake, it'll only take you 23 hours :P | 23:00.43 |
| (but it's worth it!) | 23:00.54 |
robin_watts_mac | mvrhel_laptop: ooh, where's that? | 23:01.10 |
| we're meeting scott/miles and going to south beach at 7ish... | 23:01.27 |
kens | Where ? | 23:01.33 |
robin_watts_mac | Hey, kens. | 23:01.33 |
kens | Just got in, where are you meetgin ? | 23:01.51 |
robin_watts_mac | Lobby presumably. | 23:02.00 |
kens | OK, just got time to shower then :-) | 23:02.11 |
robin_watts_mac | We'll speak to scott and sort a time/place and say on here. | 23:02.25 |
kens | OK I'll leave it connected, thanks | 23:02.38 |
chrisl_x100e | Is Miles in then? | 23:02.46 |
kens | runs off to shower | 23:02.47 |
robin_watts_mac | chrisl_x100e: He's in Miami, may not be at hotel yet. | 23:03.09 |
chrisl_x100e | Great | 23:03.41 |
kens | Not boding well on the WiFi.... | 23:10.34 |
robin_watts_mac | Plan - meet in lobby at 7. | 23:45.01 |
kens | OK thanks, see you there | 23:45.11 |
| Forward 1 day (to 2011/12/01)>>> | |