| <<<Back 1 day (to 2020/08/12) | Fwd 1 day (to 2020/08/14)>>> | 20200813 |
pedr0 | I've a question regarding the fz_quad struct. are 'ul, ur, ll, lr' upper-left upper-right lower-left, lower-right ? | 11:34.12 |
Robin_Watts_ | they are the images of those points after transformation, yes. | 11:37.20 |
| (at least, that's what I believe) | 11:37.34 |
malc_ | a confidence inspiring comment form one of the authors.. | 11:40.42 |
Robin_Watts_ | I didn't code any quad stuff :) | 11:41.55 |
| (and Acrobat disagrees with the spec in at least one case in terms of the ordering of points used for highlight regions) | 11:42.34 |
malc_ | excuses excuses :) | 11:42.59 |
| ouch | 11:43.07 |
Robin_Watts_ | (so it's just possible that the quads might actually be in an unexpected order, hence my caution) | 11:43.07 |
pedr0 | :-) | 11:44.02 |
| I see, but the coordinate space is (0,0) bottom left of the plane/page | 11:45.04 |
malc_ | pedr0: innocent questions only appear to be that on a very shallow inspection, look deeper and the con of worms presents itself | 11:45.12 |
pedr0 | :-)) | 11:45.53 |
malc_ | con... sigh... can... | 11:46.38 |
malc_ | crawls back under his rock | 11:46.57 |
pedr0 | eheh | 11:47.10 |
Robin_Watts_ | pedr0: For PDF, yes. For mupdf, no :) | 11:47.45 |
| basically, you shouldn't assume anything about the position of the quads. | 11:48.22 |
| If the page is rotated, they obviously they'll be completely messed up. | 11:48.39 |
| All you need to know is that they are 4 positions on the page that are the corners of the highlights. | 11:49.07 |
kens | IIRC rectangles have to specify two opposite corners, they need not be lower left and upper right | 11:49.25 |
| In PDF | 11:49.35 |
Robin_Watts_ | kens: Right. But the image of those rectangles after transformation may not be axis aligned. Hence why corners of such rectangles are given as quads. | 11:50.40 |
kens | Which is why Ghostscript's PDF interpreter has code specifically for 'normalising' rectangles | 11:50.47 |
| OK I wasn't certain if the question was PDF or MuPDF | 11:51.24 |
Robin_Watts_ | See page 634 of pdf_reference17.pdf | 11:51.25 |
| It's both a PDF and a MuPDF thing. | 11:51.39 |
pedr0 | I see, but given the stext 'view' is there an easy way to know the coordinates *on the screen* of a given <char> ? | 12:08.42 |
Robin_Watts_ | Let's talk about a concrete example. | 12:09.22 |
| platform/win32/debug/mutool.exe draw -o - -F stext ../MyTests/pdf_reference17 | 12:10.43 |
| .pdf 1 | 12:10.43 |
| That gives me an stext dump for page 1 of pdf_reference17.pdf - I believe you have the same file, yes? | 12:11.10 |
pedr0 | one jiff | 12:11.20 |
| I've the book, hard paper of a previous version .. | 12:11.52 |
Robin_Watts_ | ok, so fetch http://ghostscript.com/~robin/pdf_reference17.pdf and you have the file too :) | 12:12.17 |
pedr0 | thanks, doing it | 12:12.43 |
| ok, here I am | 12:14.39 |
| got it, right in front of me | 12:14.52 |
Robin_Watts_ | So, for the first character on the page, a 'P', I get: | 12:14.55 |
| <char quad="119.94 61.226316 146.772 61.226316 119.94 115.898319 146.772 115.898319" x="119.94" y="103.898319" color="#000000" c="P"/> | 12:15.04 |
pedr0 | Yap | 12:15.09 |
Robin_Watts_ | So, the 'origin' for that character is at the x/y position. 119.94, 103.898319 | 12:15.38 |
| The origin being (for latin fonts at least), the left hand side on the baseline of the char. | 12:16.01 |
pedr0 | relative to the page ? | 12:16.15 |
kens | Fonts have the origin at bottom left, even for non-Latin glyphs | 12:16.24 |
pedr0 | x/y of the box describing the page or the glyph ? | 12:16.37 |
Robin_Watts_ | I haven't mentioned any box yet. | 12:17.10 |
pedr0 | sorry | 12:17.15 |
Robin_Watts_ | The x,y of the origin of the glyph is at the point I said. | 12:17.39 |
| This is in mupdf coordinates, with (0,0) being the top left of the page, and y increasing downwards. | 12:18.28 |
| the page being 531 wide by 666 down. | 12:19.16 |
| OK so far? | 12:19.18 |
pedr0 | yes | 12:19.21 |
Robin_Watts_ | So, now let's look at the quad. | 12:19.31 |
| The quad gives 4 points. | 12:19.43 |
| If you draw the convex hull of those 4 points, you'll get a box that encloses the glyph. | 12:20.04 |
| If you're happy to ignore non-axis aligned glyphs (or page rotations), then you can find the bbox for the glyph by taking the union of those 4 points. | 12:21.33 |
| Does that make sense? | 12:21.52 |
pedr0 | Let me try to digest this a little bit | 12:24.25 |
| the coordinate of the quad, are they MUPDF or 'PDF' ? | 12:25.01 |
| I think they are MUPDF, but I am double checking | 12:25.14 |
| No, I am confused, but it isn't your explanation is that I am not sufficiently acquainted with the matter. | 12:30.31 |
| The reason why I am banging my head is that I am drawing a PDF using PDF.js on a canvas, fair enough, I set the scale to be 1, hence not scaled. Then I draw the quads on the top if it and I can tell that the positions don't match at all | 12:45.28 |
| Although I am starting to be skeptic about the rendering as well. | 12:46.32 |
Robin_Watts | Sorry, powercut. | 13:17.33 |
| all the coords are mupdf, obviously. Mixing them within a file would be mad. | 13:17.50 |
pedr0 | :-) | 13:22.04 |
| I found the problem, it all makes sense. I still don't get why a rotation messes the quads up as I thought they would 'include' the rotation. But I definitely ignorant on the matter and I don't need to understand everything in a single shot | 13:23.34 |
| *I am* | 13:23.48 |
Robin_Watts | "messes the quads up" ? | 13:25.04 |
| I suspect that 'ul' 'll' etc are in terms of the pre-transformed text objects. | 13:26.07 |
| after the transformation, the upper left corner may not be the upper left corner any more. | 13:26.28 |
| (in terms of the position on the page) | 13:26.41 |
ator | Robin_Watts: that's why I named them ul ll ur ul because adobe can't keep the numeric ordering of them straight | 13:36.05 |
| mupdf uses its own coordinate system that it shares with all the possible input formats | 13:36.22 |
| pdf.js uses the pdf coordinate system | 13:36.28 |
| usually the translation between them is trivial, but rotation and UserUnit can easily mess you up | 13:36.44 |
pedr0 | Robin_Watts: does the same reasoning apply to x,y ? | 14:00.47 |
| They are coordinates ante text-object transormation | 14:01.06 |
Robin_Watts | pedr0: All coords given are post transformation. | 14:01.19 |
pedr0 | Ah, I see. You meant the 'names' are in terms of pre-trasformed object | 14:02.11 |
| ul, ll .. | 14:02.15 |
Robin_Watts | yes. | 14:02.15 |
pedr0 | Thanks :-) | 14:02.20 |
| <<<Back 1 day (to 2020/08/12) | Forward 1 day (to 2020/08/14)>>> | |