Search results
Results from the WOW.Com Content Network
poppler-utils is a collection of command-line utilities built on Poppler's library API, to manage PDF and extract contents: pdfattach – add a new embedded file (attachment) to an existing PDF; pdfdetach – extract embedded documents from a PDF; pdffonts – lists the fonts used in a PDF
As with Adobe Acrobat, Nitro PDF Pro's reader is free; but unlike Adobe's free reader, Nitro's free reader allows PDF creation (via a virtual printer driver, or by specifying a filename in the reader's interface, or by drag-'n-drop of a file to Nitro PDF Reader's Windows desktop icon); Ghostscript not needed. PagePlus: Proprietary: No
Dictionary Builder is a Rust program that can parse XML dumps and extract entries in files; Scripts for parsing Wikipedia dumps – Python based scripts for parsing sql.gz files from wikipedia dumps. parse-mediawiki-sql – a Rust library for quickly parsing the SQL dump files with minimal memory allocation
Sigil is free, open-source editing software for e-books in the EPUB format. As a cross-platform application, Sigil is distributed for the Windows, macOS, Haiku and Linux platforms under the GNU GPL license. Sigil supports code-based editing of EPUB files, as well as the import of HTML and plain text files.
The hOCR format is most commonly used in order to make searchable PDF files or as an extracted metadata of the PDF file. In order to create searchable PDF files we can use a scanned document image and a .hocr file of the particular image. We can use the following open source tools in order to achieve that.
Sumatra PDF is a free and open-source document viewer that supports many document formats including: Portable Document Format (PDF), Microsoft Compiled HTML Help (CHM), DjVu, EPUB, FictionBook (FB2), MOBI, PRC, Open XML Paper Specification (OpenXPS, OXPS, XPS), and Comic Book Archive file (CB7, CBR, CBT, CBZ). [3]
Layout analysis software, that divide scanned documents into zones suitable for OCR Graphical interfaces to one or more OCR engines Software development kits that are used to add OCR capabilities to other software (e.g. forms processing applications, document imaging management systems, e-discovery systems, records management solutions)
The plain text format doesn't support DRM or formatting options (such as different fonts, graphics or colors). It has excellent portability as it is the simplest e-book encoding possible; a plain text file contains only ASCII or Unicode text (text files with UTF-8 or UTF-16 encoding are also popular for languages other than English). Almost all ...