Discussion:
Possible Memory issue
(too old to reply)
Rabatscher Michael
2008-06-25 10:37:11 UTC
Permalink
Hi all!

I'm using an Apache module and want to implement file uploads.
The thing is that these files are quite big (up to 100MB) and
many concurrent file uploads shall be possible. The thing is
that the Webbroker module is quite memory intensive, there are
many analysis modules which are memory intensive too.

I found out that the apache module first copies the whole file
into a string (fContent). Which is then parsed in the
ContentParser (a derivate from Matlus's content parser). This
parser again moves the content into a memorystream so the data is
at least doubled and thus can lead to memory problems.

Has anyone written a more intelligent parser or can I reduce the
memory footprint if I use memory mapped files (instead of the
fContent variable) for large data chunks?

kind regards
Mike
Dan Downs
2008-06-25 16:23:06 UTC
Permalink
I ran into the same thing, it gets extra fun when memory fragmentation
comes into play and you're trying to allocate 2-3 100mb chunks at once.
FastMM helped greatly with this, but due to all the traffic and other
caching its not hard to get memory allocations spread out enough that
they chop those larger blocks down.

I started by modifying matlus parser to use a temp file stream class I
wrote to auto cleanup the temp file on free to help reduce the memory
footprint. But the whole initial request is still in a single string
variable.

In the end I moved the file uploading out to a cgi app.

This solved a few problems:

- Each process gets its own 2gb virtual memory space so its not
sharing apache's.

- Our main web server has 8gb of ram that can now be more readily used.

- no more out of memory exceptions

- Each upload can handle larger files, which is nice for us since we
support batch uploading of 5 files at once. Its usually smaller 50kb-3mb
files but every once in awhile someone will try and upload a 100mb mpeg.

- The upload routine always had a ReturnUrl param as to where to
redirect to after upload, so the move didn't break anything.

- There really was zero gains in having this logic inside the dso.
Uploads don't happen all that frequently and the user has to wait
anyways, so what's maybe an extra second in overhead by having it in a
cgi app.

DD
Rabatscher Michael
2008-06-27 13:21:39 UTC
Permalink
Post by Dan Downs
I ran into the same thing, it gets extra fun when memory fragmentation
comes into play and you're trying to allocate 2-3 100mb chunks at once.
FastMM helped greatly with this, but due to all the traffic and other
caching its not hard to get memory allocations spread out enough that
they chop those larger blocks down.
I started by modifying matlus parser to use a temp file stream class I
wrote to auto cleanup the temp file on free to help reduce the memory
footprint. But the whole initial request is still in a single string
variable.
In the end I moved the file uploading out to a cgi app.
Thats a very good idea!!!

I think I will do something similar ;)

kind regards
Mike
Dan Downs
2008-06-27 13:50:58 UTC
Permalink
Ya I messed around with trying to fix the dso for far to long until the
light finally flickered on. I think this was during the time I was
trying to give up coffee, I've since fixed that problem too.


There's a couple helper functions that I use in my TDanTempFileStream
class that wrap up api calls. Once I find out where I got them (I think
JCL), if I can, I'll post it in attachments. Even in a CGI I didn't see
any reason not to continue dumping the parsed file data to disk.

Off to make a pot,
DD
Rabatscher Michael
2008-06-30 08:55:42 UTC
Permalink
Post by Dan Downs
Ya I messed around with trying to fix the dso for far to long until the
light finally flickered on. I think this was during the time I was
trying to give up coffee, I've since fixed that problem too.
Pherhaps I have to give it up too to have some enlightments, but I think
that would be hard ;)
Post by Dan Downs
There's a couple helper functions that I use in my TDanTempFileStream
class that wrap up api calls. Once I find out where I got them (I think
JCL), if I can, I'll post it in attachments.
That would be great!!
Post by Dan Downs
Even in a CGI I didn't see
any reason not to continue dumping the parsed file data to disk.
Makes perfectly sense. Otherwise some concuring big file uploads will
eat all system resources and nothing is left for the server.
Post by Dan Downs
Off to make a pot,
DD
Loading...