Search results
Results from the WOW.Com Content Network
Default PDF and file viewer for GNOME; replaces GPdf. Supports addition and removal (since v3.14), of basic text note annotations. CUPS: Apache License 2.0: No No No Yes Printing system can render any document to a PDF file, thus any Linux program with print capability can produce PDF files Pdftk: GPLv2: No Yes Yes
poppler-utils is a collection of command-line utilities built on Poppler's library API, to manage PDF and extract contents: pdfattach – add a new embedded file (attachment) to an existing PDF; pdfdetach – extract embedded documents from a PDF; pdffonts – lists the fonts used in a PDF
Layout analysis software, that divide scanned documents into zones suitable for OCR Graphical interfaces to one or more OCR engines Software development kits that are used to add OCR capabilities to other software (e.g. forms processing applications, document imaging management systems, e-discovery systems, records management solutions)
Sumatra PDF is a free and open-source document viewer that supports many document formats including: Portable Document Format (PDF), Microsoft Compiled HTML Help (CHM), DjVu, EPUB, FictionBook (FB2), MOBI, PRC, Open XML Paper Specification (OpenXPS, OXPS, XPS), and Comic Book Archive file (CB7, CBR, CBT, CBZ). [3]
Sigil supports code-based editing of EPUB files, as well as the import of HTML and plain text files. [2] [3] A companion application, PageEdit, allows WYSIWYG editing of EPUB files. Sigil has been developed by Strahinja Val Marković and others since 2009. From July 2011 to June 2015 John Schember was the lead developer. [4]
Dictionary Builder is a Rust program that can parse XML dumps and extract entries in files; Scripts for parsing Wikipedia dumps – Python based scripts for parsing sql.gz files from wikipedia dumps. parse-mediawiki-sql – a Rust library for quickly parsing the SQL dump files with minimal memory allocation
The plain text format doesn't support DRM or formatting options (such as different fonts, graphics or colors). It has excellent portability as it is the simplest e-book encoding possible; a plain text file contains only ASCII or Unicode text (text files with UTF-8 or UTF-16 encoding are also popular for languages other than English). Almost all ...
Template filling: Extracting a fixed set of fields from a document, e.g. extract perpetrators, victims, time, etc. from a newspaper article about a terrorist attack. Event extraction: Given an input document, output zero or more event templates. For instance, a newspaper article might describe multiple terrorist attacks.