Very Long Delay in Queue

Generic questions on MailCleaner usage. Please check if your question cannot find a more specific topic below before posting here !<br>
This section is now READ ONLY, please choose the section below that fit your question !

Moderators: FlorianB, Pascal, bourgeois, mentor

rebus
Posts: 141
Joined: Thu Sep 14, 2006 3:29 pm
Location: FL - USA

Very Long Delay in Queue

Postby rebus » Tue Nov 28, 2006 6:21 pm

I've noticed that hundreds of messages are stacking up in the Filtering queue, and are never being forwarded to the Outgoing queue. Currently there are messages in "waiting" status anywhere from 8 minutes to more than 5 days old.

In this queue I see numerous messages to my own mailbox, which I know (a) the destination mail server is running, (b) my account exists, and (c) there is no size restriction on my mailbox.

Stopping and starting the Filtering service seems to cause some messages to break free and enter the Outgoing queue, but obviously something is wrong.

What might be the cause of this?
olivier
Posts: 1348
Joined: Thu Jan 01, 1970 1:00 am
Contact:

Postby olivier » Wed Nov 29, 2006 9:17 am

what does the engine log gives you ?
rebus
Posts: 141
Joined: Thu Sep 14, 2006 3:29 pm
Location: FL - USA

Postby rebus » Wed Nov 29, 2006 2:56 pm

Ouch... I have to plead temporary ignorance. I'm assuming the logs are stored in /var/mailcleaner/log but which specific logs do I need to look at?

I'm very good at reading raw SMTP session logs, but I'm not sure what I'm looking for here. I'm currently reading /var/mailcleaner/log/mailscanner/infolog but am not seeing anything that stands out as a problem.

I'm also noticing this trend. http://gnaaste.pair.com/basic001/spools-day-full.png Around 14:00 yesterday is when I restarted all of the MailCleaner services, and that's when the spools emptied out and mail was delivered. You can see it starting to build up again around 16:00 and taking about 2 hours to empty around 550 messages.
rebus
Posts: 141
Joined: Thu Sep 14, 2006 3:29 pm
Location: FL - USA

Postby rebus » Wed Nov 29, 2006 3:00 pm

Following up to my previous post a few minutes ago... I am seeing a lot of this in the 'infolog' file:

Code: Select all

Nov 29 01:01:28 localhost MailScanner[7510]: New Batch: Found 19 messages waiting
Nov 29 01:01:28 localhost MailScanner[7510]: New Batch: Scanning 1 messages, 2478 bytes

Code: Select all

Nov 29 01:01:58 localhost MailScanner[7510]: New Batch: Found 24 messages waiting
Nov 29 01:01:58 localhost MailScanner[7510]: New Batch: Scanning 2 messages, 6694 bytes
Nov 29 01:01:58 localhost MailScanner[7510]: Spam Checks: Starting

Code: Select all

Nov 29 01:02:59 localhost MailScanner[7510]: New Batch: Found 37 messages waiting
Nov 29 01:02:59 localhost MailScanner[7510]: New Batch: Scanning 13 messages, 88521 bytes


Lots of messages are found when MailCleaner sweeps the spool, but it begins scanning only a few at a time. Is this normal, or could that be the first clue to the source of the problem?

Here is an infolog entry for one message that is showing 15m in the filtering spool:

Code: Select all

Nov 29 10:02:28 localhost MailScanner[30368]: Message 1GpQnG-0002ry-Gv from 127.0.0.1 (bounce-318@novoforce.com) to customerdomain.com is not spam, SpamAssassin (not cached, score=2.502, required 5, BAYES_50 0.00, HTML_MESSAGE 0.00, RAZOR2_CF_RANGE_51_100 0.50, RAZOR2_CF_RANGE_E8_51_100 1.50, RAZOR2_CHECK 0.50)

It appears to have been scanned, but then just sits there and does not move to the outbound spool.
User avatar
jordant
Posts: 218
Joined: Sat Sep 16, 2006 2:48 pm
Location: Perth, Western Australia

Postby jordant » Wed Nov 29, 2006 3:51 pm

/var/mailcleaner/log/exim_stage1/mainlog is the incoming smtp engine
/var/mailcleaner/log/exim_stage4/mainlog is the outgoing smtp engine
rebus
Posts: 141
Joined: Thu Sep 14, 2006 3:29 pm
Location: FL - USA

Postby rebus » Wed Nov 29, 2006 4:35 pm

Right, that much I knew. But there are 3 stages, are there not? (Incoming, Filtering, Outbound) In our case the mail is jamming up in the Filtering queue. Incoming works fine. Mail that hits the Outbound queue it is delivered normally.

The problem is messages can sit for a very long time (hours, days) in the middle (Filtering) queue.
rebus
Posts: 141
Joined: Thu Sep 14, 2006 3:29 pm
Location: FL - USA

Postby rebus » Wed Nov 29, 2006 7:50 pm

Problem is getting worse. There are nearly 4,000 messages jammed up in the Filtering queue that are not making it into the Outbound queue.

http://gnaaste.pair.com/basic001/spools-day-full2.png
n0lqu
Posts: 55
Joined: Tue Nov 21, 2006 6:17 pm

Memory full?

Postby n0lqu » Wed Nov 29, 2006 8:32 pm

If you check your Statistics graphs, does it look like your memory is exhausted and/or your swap memory is being used more? Your system might be doing too much disk swapping because there is not enough memory. Check the other graphs and see if anything unusual appears in any of the others around the time the spool started growing.

We are running it on a dual processor system and a week ago our filtering spool started climbing up into the thousands. The CPU graph, which normally shows 200% (probably to reflect the 2 CPUs) started showing only 100% total CPU usage, making me think one of the CPUs stopped responding for some reason. I also noticed the memory running full, so I took the system down, tried to add memory (didn't have compatible memory on hand though), then when I brought the system back up it was running at 200% CPU again and was able to work through the huge backlog pretty quickly. It's run fine ever since, with 200% CPU.
rebus
Posts: 141
Joined: Thu Sep 14, 2006 3:29 pm
Location: FL - USA

Postby rebus » Wed Nov 29, 2006 8:45 pm

P4 3.0 (hyperthreaded) therefore 2 "virtual" CPUs, which causes MailCleaner stats to show 200%. With that said, max CPU for user processes was 58%, average 17%, and current 45%, so it's using only about 1/4 the available CPU (58 out of 200%).

Memory is 2 GB: 231 MB buffered, 721 MB cached, 67 MB free, and 1957 MB used.

Oddly, Swap used is showing 0 MB.

Also, I just applied the lastest patch and here are the 2 error messages I've received so far when trying to view the filtering queue:

Code: Select all

Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate 811633 bytes) in /usr/mailcleaner/www/classes/view/Template.php on line 222

-- and --

Code: Select all

Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate 483109 bytes) in /usr/mailcleaner/www/classes/system/Spooler.php on line 89
jim
Posts: 12
Joined: Thu Oct 05, 2006 5:45 pm
Location: Iowa, USA

Postby jim » Wed Nov 29, 2006 11:17 pm

Go into the web interface, click on status, and click the stop button for "incoming" to stop inbound mail. This will stop inbound connections and give the filtering que a chance to clear out. This takes a while, but will work.

A cron script of some sort must turn the incoming MTA back on after a while, so keep an eye on it and when it turns on, click the stop button again until the filtering que goes empty. This will take a LONG time.

I ran in to this problem, and this is why i posted the question about order of RBL, Greylist, spamassassin question (see http://forum.mailcleaner.org/viewtopic.php?t=223 )

I think the server is simply being overloaded with junk connections that should be being blacklisted by the RBL's immediately. Everyone of these is first greylisted, then handled by Spamassin, and it's overloading the box.

How many emails have been processed in your current 24 hour period, and how many hours does that encompass?

This is a major usability problem that needs to be addressed.
olivier
Posts: 1348
Joined: Thu Jan 01, 1970 1:00 am
Contact:

Postby olivier » Thu Nov 30, 2006 12:01 am

how many messages are you processing per day ?

A full stock MailCleaner based on a decent hardware (P4, 2Ghz, 2GB RAM), will process up to 50'000 messages per day without a glitch.
With some customizations and tunings, you can go up to 150'000 or even 200'000 messages per days.
Of course, these are real mails. If you enable callout and greylisting, you can have even bigger numbers, but I don't like these, because it's easy to refuse large amount of mail.. what is more difficult is to really filter them.

As I said Jim, the goal of MailCleaner is to reduce the risk of false positives. Refusing mails based on a third party RBL is just not possible.

Now, apart from this discussion, if you don't have more than 50k and a correct hardware, there is surely something else.
Are you sure the DNS is working correctly ? check your /etc/resolv.conf file.
You can also try to disable pyzor, razor and DCC. They can be performance killers when not correctly responding.
You can also try to debug MailScanner, by putting "Debug" and "debug Spamassassin" directive to yes in /usr/mailcleaner/etc/mailscanner/MailScanner.conf_template, then stop and start it in the console (not in the gui). You'll show plenty of information and will maybe find out what slow down the thing.
rebus
Posts: 141
Joined: Thu Sep 14, 2006 3:29 pm
Location: FL - USA

Postby rebus » Thu Nov 30, 2006 1:48 am

Server is P4/3.0 GHz, 2 GB RAM, 36 GB SCSI disk.

We have another gateway server further upstream that handles the greylisting and RBLs, to remove that load from MailCleaner.

But for now, all MailCleaner does is Callout, SpamAssassin, Razor, Pyzor, DCC, and ClamAV scans.

This server is only in limited production (just a few domains) at this time and only handles about 10,000 messages per day. The CPU and system load statistics show the machine has very low load.
olivier
Posts: 1348
Joined: Thu Jan 01, 1970 1:00 am
Contact:

Postby olivier » Thu Nov 30, 2006 1:55 am

this really sound like network latency. Check your dns settings, and try to disable Pyzor, DCC and Razor, in order.
rebus
Posts: 141
Joined: Thu Sep 14, 2006 3:29 pm
Location: FL - USA

Postby rebus » Thu Nov 30, 2006 2:14 am

Another newsletterbounce message is stuck in the Filtering queue as "Frozen". Here is what the log shows for that message ID.

Code: Select all

2006-11-29 20:00:38 1GpUD0-0000Jp-T3 == newsletterbounce@domain.com R=system_aliases defer (-1): failed to expand "${lookup{$local_part}lsearch{/etc/aliases}}": failed to
open /etc/aliases for linear search: No such file or directory

Code: Select all

2006-11-29 20:53:58 1GpUD0-0000Jp-T3 Spool file is locked (another process is handling this message)

Code: Select all

2006-11-29 20:56:47 1GpUD0-0000Jp-T3 ** newsletterbounce@domain.com: Unknown user


The last message makes the least sense because the box is valid, plus I confirmed that this server CAN deliver to the destination by doing an SMTP session via Telnet manually from the MailCleaner server, and the message was delivered to the destination mailbox successfully.
olivier
Posts: 1348
Joined: Thu Jan 01, 1970 1:00 am
Contact:

Postby olivier » Thu Nov 30, 2006 9:58 am

try to:

Code: Select all

> touch /etc/aliases


did you installed all patches in order ?

Return to “General help”

Who is online

Users browsing this forum: No registered users and 1 guest