Dspace production environment

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Dspace production environment

Seth Robbins
Hi,

We're using DSpace5 with a custom-themed XMLUI. We're still using the old lucene search (because of some legacy customizations) and an old version of postgres (8.4, since campus IT, who runs our production environment, isn't supporting anything more recent). Were working to upgrade to discovery and postgres 9.5, so that may help alleviate some issues. 
We also have a fairly large archive ~ 90k items. 
Lately, we've come up against multiple situations where non-search engine crawlers or other automated users will hit us with a large number of requests (most recently with spikes around 100 requests/second). This seems to consistently bring down the server and I end up having to block the offending ip addresses. This seems to happen every few weeks. 
Usually I find errors like java.lang.OutOfMemoryError: unable to create new native thread or org.postgresql.util.PSQLException: Connection rejected: could not fork new process for connection: Resource temporarily unavailable.
So It seems like we're running up against OS system limits rather than running out of jvm memory. 

I've been logging the number of postgres processes, open files, httpd processes, etc, and these all seem to get very high when we have an issue with DSpace crashing with the above error. 
I still haven't gotten to the bottom of the problem but this has generated some questions about how others run DSpace in production:

First off, should I be able to handle a load of 100 requests per second? We usually see between 2-10 with occasionally higher spikes up to around 50.  What load should the system be prepared to handle?

As I said Campus IT runs our production VM and I'll need to coordinate with them to change system parameters like ulimits, but it seems like the problem could be related to these being too low. 
Have others who use linux environments had to change or alter system ulimits? 

I've noticed that the problem seems to occur when we have a lot of requests for pages rather than bitstreams. Is it possible that the xslt processing is causing a backup? Have others experimented with using xslt processor's besides xalan?  

I've also never seen the number of postgres processes go down, so If I leave the system running for a week or so without restarting the number of open files and postgres connections gets very large. I know there have been perennial issues with the connection pool, but I'm wondering if these have been resolved with the move to Hibernate, or if this should be resolved when we upgrade our postgres?

Thanks,
Seth Robbins

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Dspace production environment

Alan Orth-2
Hi, Seth.

Regarding ulimits, we definitely started having problems with this a few years ago. Since then we have been adding the following to the end of /etc/defaults/tomcat7 on our Ubuntu-based hosts:

# raise limit for tomcat (default 4096) to avoid
# "java.net.SocketException: Too many open files"
ulimit -n 16384

Also, I recently upgraded our development and production servers to bigger machines and have increased PostgreSQL's shared_buffers to 10% of system RAM. The PostgreSQL wiki recommends setting it to 25%, but I figure the DSpace workload needs most of its memory for Java's JVM heap, and Solr actually benefits from any "unused" memory that the Linux kernel uses for file caches, so it's good to leave some unused.

Cheers,

On Wed, Jan 4, 2017 at 8:22 PM Seth Robbins <[hidden email]> wrote:
Hi,

We're using DSpace5 with a custom-themed XMLUI. We're still using the old lucene search (because of some legacy customizations) and an old version of postgres (8.4, since campus IT, who runs our production environment, isn't supporting anything more recent). Were working to upgrade to discovery and postgres 9.5, so that may help alleviate some issues. 
We also have a fairly large archive ~ 90k items. 
Lately, we've come up against multiple situations where non-search engine crawlers or other automated users will hit us with a large number of requests (most recently with spikes around 100 requests/second). This seems to consistently bring down the server and I end up having to block the offending ip addresses. This seems to happen every few weeks. 
Usually I find errors like java.lang.OutOfMemoryError: unable to create new native thread or org.postgresql.util.PSQLException: Connection rejected: could not fork new process for connection: Resource temporarily unavailable.
So It seems like we're running up against OS system limits rather than running out of jvm memory. 

I've been logging the number of postgres processes, open files, httpd processes, etc, and these all seem to get very high when we have an issue with DSpace crashing with the above error. 
I still haven't gotten to the bottom of the problem but this has generated some questions about how others run DSpace in production:

First off, should I be able to handle a load of 100 requests per second? We usually see between 2-10 with occasionally higher spikes up to around 50.  What load should the system be prepared to handle?

As I said Campus IT runs our production VM and I'll need to coordinate with them to change system parameters like ulimits, but it seems like the problem could be related to these being too low. 
Have others who use linux environments had to change or alter system ulimits? 

I've noticed that the problem seems to occur when we have a lot of requests for pages rather than bitstreams. Is it possible that the xslt processing is causing a backup? Have others experimented with using xslt processor's besides xalan?  

I've also never seen the number of postgres processes go down, so If I leave the system running for a week or so without restarting the number of open files and postgres connections gets very large. I know there have been perennial issues with the connection pool, but I'm wondering if these have been resolved with the move to Hibernate, or if this should be resolved when we upgrade our postgres?

Thanks,
Seth Robbins

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.
--

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.