Dejean and Meunier used four methods for extracting book structure including (i) detecting and parsing TOC pages (ii) parsing index pages (iii) using classical methods for TOC detection and (iv) using trailing page whitespace methods.īooks being a valuable source of knowledge and learning, have always been searched for on the Web.
For this purpose, several IE methods have been devised, which include using book layout analysis for extracting TOC using resurgence software for detecting different parts of books by considering typographical positions and book content instead of TOC to detect parts, chapters, sections, and pages using rule-based methods for extracting TOC from books that are having TOC pages, and SVM-based methods for books that are without TOC pages and using layout analysis to identify TOC and other functional regions including chapters, paragraphs, and notes in books. Information Extraction (IE) can be very tricky when applied to digitized books for extracting structure and layout information including TOC. Dejean and Meunier used four methods for extracting book structure including (i) detecting and parsing TOC pages (ii) parsing index pages (iii) using classical methods for TOC detection and (iv) using trailing page whitespace methods. Compared with the most of other methods used to optimize workflow, this method is simpler, more efficient, and more suitable for e-book format conversion. This research introduces the traditional IE analytical techniques to the workflow optimization of e-book conversion. The simulation results show that, under similar circumstance, both quantity and quality of the products is improved after optimization, which indicates the optimization method is effective. In order to validate the optimization effect, the workflow before and after optimization are generated and implemented by the ExtendSim® simulation software. Then the workflow is analyzed by using 5W1H (why, who, what, where, when, how) methodology and optimized with ECRSI (Eliminate, Combine, Rearrange, Simplify and Increase) principles. The generated PDF files will be placed into this folder.This paper aims to provide an optimization method of workflow for publishing houses and electronic book (e-book) researches in the field of digital publishing.īased on the researches of publishing houses in Beijing, the present conversion workflow is illustrated by using a functional modeling methodology. Browse and select the folder of your choice.
If you are merging all EPUB files in one PDF file, it will contain the eBooks in the specified sequence. Reorder the files using the UP and DOWN arrows in the right hand side of the window.
It is also possible to protect the generated PDF documents by entering a password value in this panel.
The newly displayed settings panel allows you to enter such details as the filename, the title for the PDF document, publisher information. Specify settings for all files by clicking on the icon in the bottom of the window.