Harvesting EPrints repository from DSpace

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Harvesting EPrints repository from DSpace

amgciadev
Hi all,

We are running a DSpace repository and would like to harvest metadata from an EPrints repository.

We have successfully configured a collection that harvests its content from an external source, in this case an EPrints repository OAI feed (selecting an specific OAI set). DSpace connects successfully to the EPrints OAI server, however it’s only retrieving empty metadata records. We are experiencing the same issue as described here.

We have looked at the DSpace code and located the issue: DSpace’s harvester module requests first a list of supported metadata formats to find the metadata prefix that the target repository uses for the selected metadata schema (http://www.openarchives.org/OAI/2.0/oai_dc). Internally this is done by iterating through the results of the ListMetadataFormats query and returns the metadataPrefix for the first metadataFormat that matches the selected metadata schema.

What happens in this case is that EPrints has more than one entry for the supported metadata formats using OAI_DC (oai_bibl and oai_dc prefixes):


<metadataFormat>
  <metadataPrefix>oai_bibl</metadataPrefix>
  <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
  <metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
</metadataFormat>
<metadataFormat>
  <metadataPrefix>oai_dc</metadataPrefix>
  <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
  <metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
</metadataFormat>


DSpace’s harvester is then selecting the first metadataPrefix, i.e. oai_bibl, for which EPrints is returning records with no metadata.

Just wondering if anyone has experienced the same issue, and if so, how did they solve it. Any help/suggestions are very much appreciated. Ideally we would like to avoid customising DSpace just to fix this, so I’m wondering whether there’s an easy way to fix this through EPrints configuration: e.g. disabling oai_bibl or something similar.

Many thanks and kind regards,
Agustina

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Harvesting EPrints repository from DSpace

Franziska Ackermann
Hi,

What worked for me was the solution described under the link you sent below.
After the file "OAI_Bibliography.pm" had been removed from the E-Prints Repository, DSpace could harvest the records via oai_dc.

Best regards,
Franziska


Am 25.02.2016 um 17:12 schrieb [hidden email]:
Hi all,

We are running a DSpace repository and would like to harvest metadata from an EPrints repository.

We have successfully configured a collection that harvests its content from an external source, in this case an EPrints repository OAI feed (selecting an specific OAI set). DSpace connects successfully to the EPrints OAI server, however it’s only retrieving empty metadata records. We are experiencing the same issue as described here.

We have looked at the DSpace code and located the issue: DSpace’s harvester module requests first a list of supported metadata formats to find the metadata prefix that the target repository uses for the selected metadata schema (http://www.openarchives.org/OAI/2.0/oai_dc). Internally this is done by iterating through the results of the ListMetadataFormats query and returns the metadataPrefix for the first metadataFormat that matches the selected metadata schema.

What happens in this case is that EPrints has more than one entry for the supported metadata formats using OAI_DC (oai_bibl and oai_dc prefixes):


<metadataFormat>
  <metadataPrefix>oai_bibl</metadataPrefix>
  <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
  <metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
</metadataFormat>
<metadataFormat>
  <metadataPrefix>oai_dc</metadataPrefix>
  <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
  <metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
</metadataFormat>


DSpace’s harvester is then selecting the first metadataPrefix, i.e. oai_bibl, for which EPrints is returning records with no metadata.

Just wondering if anyone has experienced the same issue, and if so, how did they solve it. Any help/suggestions are very much appreciated. Ideally we would like to avoid customising DSpace just to fix this, so I’m wondering whether there’s an easy way to fix this through EPrints configuration: e.g. disabling oai_bibl or something similar.

Many thanks and kind regards,
Agustina
--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.