[gs-bugs] [Bug 692308] New: improve extracting text in right-to-left alphabets

bugzilla-daemon at ghostscript.com bugzilla-daemon at ghostscript.com
Tue Jun 28 14:42:17 UTC 2011


http://bugs.ghostscript.com/show_bug.cgi?id=692308

           Summary: improve extracting text in right-to-left alphabets
           Product: MuPDF
           Version: unspecified
          Platform: PC
               URL: http://code.google.com/p/sumatrapdf/issues/detail?id=1
                    466
        OS/Version: Windows 7
            Status: NEW
          Severity: normal
          Priority: P4
         Component: mupdf
        AssignedTo: tor.andersson at artifex.com
        ReportedBy: zeniko at gmail.com
         QAContact: gs-bugs at ghostscript.com


Adobe Reader is much more successful for extracting text e.g. from
http://www.ice.gov/doclib/sevis/pdf/sevis_arabic_fs.pdf (one of the first
results from http://www.google.com/search?q=arabic+ext%3Apdf ). This seems
partially related to dev_text not expecting RtL text and inserting too many
unintended linebreaks, and also due to Unicode normalization divergences.

-- 
Configure bugmail: http://bugs.ghostscript.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


More information about the gs-bugs mailing list