Quantcast

[ dspace-Patches-2234659 ] Add support for DjVu-documents

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[ dspace-Patches-2234659 ] Add support for DjVu-documents

SourceForge.net
Patches item #2234659, was opened at 2008-11-07 17:06
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=319984&aid=2234659&group_id=19984

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Serhij Dubyk (dubyk)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add support for DjVu-documents

Initial Comment:
Hello All

This patch based on
http://mailman.mit.edu/pipermail/dspace-general/2007-May/001513.html

In DSpace 1.5.0+ we need (before compilation)

1) Add utility djvutxt (package djvulibre), for Debian it is:
   apt-get install djvulibre-bin

2) Edit [dspace-source]/dspace/config/dspace.cfg, text-block "### Media Filter / Format Filter plugins"
and add DjVu-support in 3 places:

   filter.plugins = ... \
                DjVu Text Extractor

   plugin.named.org.dspace.app.mediafilter.FormatFilter = ... \
  org.dspace.app.mediafilter.DjVuFilter =  DjVu Text Extractor

   filter.org.dspace.app.mediafilter.DjVuFilter.inputFormats = DjVu

3) Edit [dspace-source]/dspace/config/registries/bitstream-formats.xml
and add next

  <bitstream-type>
          <mimetype>image/vnd.djvu</mimetype>
          <short_description>DjVu</short_description>
          <description>DjVu</description>
          <support_level>1</support_level>
          <internal>false</internal>
          <extension>djvu</extension>
          <extension>djv</extension>
  </bitstream-type>

4) Create file [dspace-source]/dspace-api/src/main/java/org/dspace/app/mediafilter/DjVuFilter.java
with next content

/*
DjVuFilter.java
 Version: 0.1
 DSpace version: 1.4.2 beta
 Author: Ivan Penev
 e-mail: inpenev at gmail.com
*/

package org.dspace.app.mediafilter;

import java.io.InputStream;
import java.io.FileInputStream;
import java.io.BufferedInputStream;
import java.io.ByteArrayInputStream;
import java.io.OutputStream;
import java.io.FileOutputStream;
import java.io.BufferedOutputStream;
import java.io.FileReader;
import java.io.BufferedReader;
import java.io.File;
 
/**
 * This class provides a media filter for processing files of type DjVu.
 * <p>The current implementation uses a program called
 <code>djvutxt</code>, which extracts the text layer from a previously
 OCR-ed DjVu file and saves it into a UTF-8 text document. The program
 is distributed with the <code>djvulibre</code> package which is freely
 available under the GPL license from <a
 href="http://djvu.sourceforge.net/">http://djvu.sourceforge.net/</a>
 for both Unix and Windows operating systems. Hence, for the media
 filter to work it is required that <code>djvutxt</code> is a valid
 command (in the working environment).</p>
*/

public class DjVuFilter extends MediaFilter
{
 /**
  * Get a filename for a newly created filtered bitstream.
  *
  * @param sourceName
  * name of source bitstream
  * @return filename generated by the filter - for example, document.djvu
  * becomes document.djvu.txt
 */

 public String getFilteredName(String sourceName)
 {
  return sourceName + ".txt";
 }
 
 /**
  * Get name of the bundle this filter will stick its generated bitstreams.
  *
  * @return "TEXT"
 */
 public String getBundleName()
 {
  return "TEXT";
 }
 
 /**
  * Get name of the bitstream format returned by this filter.
  *
  * @return "Text"
 */

 public String getFormatString()
 {
  return "Text";
 }
 
 /**
  * Get a string describing the newly-generated bitstream.
  *
  * @return "Extracted text"
 */

 public String getDescription()
 {
  return "Extracted text";
 }
 
 /**
  * Get a bitstream filled with the extracted text from a DjVu bitstream.
  * <p>The bitstream supplied as a parameter is written to a DjVu
  file on the file system (in the working directory), and the system
  command <code>djvutxt</code> is called on the latter to produce a
  UTF-8 text file containg the extracted text. The file is then copied
  to a bitstream. Finally, the auxiliary files are removed from the file
  system, and the generated bitsream is returned as a result.</p>
  * <p>WARNING! Write access to the working directory is needed for
  this method to operate! No exception handling provided!</p>
  *
  * @param source
  * input stream
  *
  * @return result of filter's transformation, written out to a bitstream
 */

 public InputStream getDestinationStream(InputStream source) throws Exception
 {
  /* Some convenience initializations. */
  final String cmd = "djvutxt";
  final String fileName = "aux";
  final String djvuFileName = fileName + ".djvu";
  final String txtFileName = fileName + ".txt";
 
  /* Store input bitstresam to auxiliary DjVu file. */
  File djvuFile = streamToFile(source, djvuFileName);
 
  /* Invoke external command djvutxt with appropriate agruments
   to do the actual job... */
  final String[] cmdArray = {cmd, djvuFileName, txtFileName};
  Process p = Runtime.getRuntime().exec(cmdArray);
  /* ...and wait for it to terminate */
  p.waitFor();
 
  /* Copy extracted text from file to an independent bitstream,
   and optionally print the text to standard output. */
  File txtFile = new File(txtFileName);
  InputStream dest = fileToStream(txtFile, MediaFilterManager.isVerbose);
 
  /* Then remove auxiliary files...*/
  djvuFile.delete();
  txtFile.delete();
  /* ...and return resulting bitstream. */
  return dest;
 }
 
 /**
  * Write given input stream to a file on the file system.
  * <p>WARNING! No exception handling!</p>
  *
  * @param inStream input stream
  * @param fileName name of the file to be generated
  *
  * @return <code>File</code> object associated with the generated file
  *
  * @throws Exception
 */

 private File streamToFile(InputStream inStream, String fileName)
 throws Exception
 {
  /* Data will be read from input stream in chunks of size e.g. 4KB. */
  final int chunkSize = 4096;
  byte[] byteArray = new byte[chunkSize];
 
  /* Open the stream for buffered reading. */
  InputStream bufInStream = new BufferedInputStream(inStream);
 
  /* Create an empty file (if the file already exists, it will be left
   untouched)
   to store the supplied bitstream... */
  File file = new File(fileName);
  file.createNewFile();
  /* ...and associate a buffered output stream with it. */
  OutputStream bufOutStream = new BufferedOutputStream(new
  FileOutputStream(file));
 
  /* Copy data from input stream to newly generated file. */
  int readBytes = -1;
  while ((readBytes = bufInStream.read(byteArray, 0, chunkSize)) != -1)
  bufOutStream.write(byteArray, 0, readBytes);
 
  /* Stop transactions to the file system... */
  bufOutStream.close();
  /* ...and return result. */
  return file;
 }
 
 /**
  * Produce input stream from a given file on the file system.
  * <p>WARNING! No exception handling!</p>
  *
  * @param file <code>File</code> object associated with the given file
  *
  * @return input stream containing the data read from file
  *
  *@throws Exception
 */

 private InputStream fileToStream(File file, boolean verbose) throws Exception
 {
  /* Open the stream for reading. */
  InputStream inStream = new FileInputStream(file);
 
  /* Allocate necessary memory for data buffer. */
  byte[] byteArray = new byte[(int)file.length()];
 
  /* Load file contents into buffer. */
  inStream.read(byteArray);
 
  /* And imediately close transactions with the file system. */
  inStream.close();
 
  /* If required to send the retrieved data to standard output... */
  if (verbose)
  {
   /* Open the file again, but this tim handle it as a character stream... */
   BufferedReader bufReader = new BufferedReader(new FileReader(file));
   /* ...then print its contents line by line to the standard output... */
   String lineOfText = null;
   while ((lineOfText = bufReader.readLine()) != null)
   System.out.println(lineOfText);
   /* ...and close connection to the file. */
   bufReader.close();
  }
 
  /* Finally, generate and return input stream containing desired data. */
  return new ByteArrayInputStream(byteArray);
  }
 
 }

5) Compilation/recompilation
   cd [dspace-source]/dspace/dspace-1.5.0-src-release/dspace/
   mvn package

6) Install or for recompilation - {edit work bitstream-formats.xml & dspace.cfg as above and replace dspace-api-1.5.0.jar from folders webapps/jspui/WEB-INF/lib/, lib/, webapps/lni/WEB-INF/lib/, webapps/oai/WEB-INF/lib/, webapps/xmlui/WEB-INF/lib/ by compiled [dspace-source]/dspace-api/target/dspace-api-1.5.0.jar}

7) Don't forgive restart Tomcat and run
   /usr/share/dspace/bin/filter-media

With best regards
 Serhij Dubyk

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=319984&aid=2234659&group_id=19984

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Dspace-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspace-devel
Loading...