Raised This Month: $51 Target: $400
 12% 

socket_recv() hangs?


Post New Thread Reply   
 
Thread Tools Display Modes
Author Message
xOR
Veteran Member
Join Date: Jun 2006
Location: x-base.info
Old 07-02-2006 , 14:38   socket_recv() hangs?
Reply With Quote #1

searching the forum shows me that this has been posted before and the solution to this was that the people having this problem didn't check socket_change() before.
i do, and even did it right from the start. here is my code snippet:

Code:
            if (socket_change(g_naServerSockets[nServerCount], 1))             {                 // initialize our receive buffer                 for (nClearCounter = 0; nClearCounter < MAX_INFO_LEN; nClearCounter++)                     sRcvBuf[nClearCounter] = 0                 socket_recv(g_naServerSockets[nServerCount], sRcvBuf, MAX_INFO_LEN - 1)                 if (nCheckMethod == 1)                 {


i have added a log_amx() with a unique message directly before and after socket_recv() and the last message i see in console is the one that i placed before socket_recv() so there is no doubt that this is the place where the server hangs.
the weird thing is that all my 4 CS 1.5 servers are running fine for days with this code, while my one 1.6 test machine is crashing along with other 1.6 servers from other testers randomly. Once it was running for a whole night and some time later it crashes 30 minutes after start.
the 1.5 servers use AMXX 1.71 and the 1.6 servers are using 1.60 and 1.71.

i can only think of 3 reasons for this:
  • socket_change() is returning true although there is nothing new to receive and calling socket_recv() hangs the server
  • i understood something with the timeout wrong and should set it to a higher value
  • socket_recv has a bug and sometimes hangs even when there are changes

if you want to have a look at the complete plugin code get it from this thread. then just search for socket_recv in the source, as it is only used once within the code.

thanks in advance for any help and suggestions.




EDIT: oh and i just remembered that some time ago the original code had "while socket_change()...". i changed it to "if", because from time to time socket_change didn't stop to return true and the while loop got endless.
this would support my first suggestion that socket_change sometimes just returns true when it shouldn't.
__________________
Got more than one HL1 (CS, DoD, NS, TS, TFC, HLDM...) server? Check:

Last edited by devicenull; 07-05-2006 at 22:02.
xOR is offline
Hawk552
AMX Mod X Moderator
Join Date: Aug 2005
Old 07-02-2006 , 20:49   Re: socket_recv() hangs
Reply With Quote #2

The only thing I have to suggest is that you probably shouldn't use 1 as a timeout, use the default instead.

Other than that your code looks normal to me.
__________________
Hawk552 is offline
Send a message via AIM to Hawk552
jtp10181
Veteran Member
Join Date: May 2004
Location: Madison, WI
Old 07-02-2006 , 21:46   Re: socket_recv() hangs
Reply With Quote #3

This has some good examples of how to use sockets

http://cvs.thekingpin.net/cvsweb.cgi...erverquery.sma

see if that helps you any.
__________________
jtp10181 is offline
Send a message via ICQ to jtp10181 Send a message via AIM to jtp10181 Send a message via MSN to jtp10181 Send a message via Yahoo to jtp10181
xOR
Veteran Member
Join Date: Jun 2006
Location: x-base.info
Old 07-03-2006 , 04:57   Re: socket_recv() hangs
Reply With Quote #4

jtp10181 thanks for the example but i don't see any difference to my socket handling - it even uses the same timeout value

Hawk552 thanks for checking the code. i think my next steps will be to use the default timeout and if this doesn't help i will write a smaller proof-of-concept plugin. if this hangs the server as well then i will submit this to the bug section.

oh and dominion just pointed out that this plugin hangs the server as well - i will check the code and see if it could be the same problem there.

EDIT:
the code is the same there, down to the timeout value. i will post there to see whether the author has heard of the same problems already.
__________________
Got more than one HL1 (CS, DoD, NS, TS, TFC, HLDM...) server? Check:

Last edited by xOR; 07-03-2006 at 05:08.
xOR is offline
jtp10181
Veteran Member
Join Date: May 2004
Location: Madison, WI
Old 07-03-2006 , 10:19   Re: socket_recv() hangs
Reply With Quote #5

sounds like something might have gotten changed in the sockets module... maybe we need to update our code or maybe its a bug. Did you figure out what circumstances causes the server to hang?

Quote:
socket_change() is returning true although there is nothing new to receive and calling socket_recv() hangs the server
That seems to be the problem?
__________________
jtp10181 is offline
Send a message via ICQ to jtp10181 Send a message via AIM to jtp10181 Send a message via MSN to jtp10181 Send a message via Yahoo to jtp10181
xOR
Veteran Member
Join Date: Jun 2006
Location: x-base.info
Old 07-03-2006 , 12:03   Re: socket_recv() hangs
Reply With Quote #6

i already know from what i read in this forum that calling socket_recv() when there is no data to receive hangs the server so let's take this as a fact (search for socket_recv and you'll find those posts).

the next fact is that i noticed that socket_change() sometimes keeps on returning true no matter how often you receive. i have made a counter in the while loop (while socket_change()...) called nEndlessProtection that gets incremented each run and when it was 500 it would break the loop. this endless loop break was triggered! i am quite sure that it didn't receive 500 packets then and i know that it would never have stopped.


if we add both it seems like this is exactly what's happening:
socket_change() returns true although there is no data and then socket_recv() is called and hangs the server, because there is nothing to receive.
__________________
Got more than one HL1 (CS, DoD, NS, TS, TFC, HLDM...) server? Check:

Last edited by xOR; 07-03-2006 at 12:08.
xOR is offline
jtp10181
Veteran Member
Join Date: May 2004
Location: Madison, WI
Old 07-03-2006 , 13:54   Re: socket_recv() hangs
Reply With Quote #7

I did some testing with my plugin. I cannot get it to hang at all. I am getting the same results with both 1.75 and 1.71

Here is the IP:PORT I tested with
Code:
70.86.71.106 27015 (valid)
70.86.71.106 27017 (valid)
75.85.158.4 27015 (made up, invalid)
192.168.1.10 27016 (my computer but no server on this port)
70.86.71.106 27016 (our machine from above, but no server on this port)
Here is my output:
Code:
************************** socket_open 70.86.71.106 27015
************************** socket_change FAIL #0 70.86.71.106 27015
************************** socket_open 70.86.71.106 27017
************************** socket_change FAIL #0 70.86.71.106 27017
************************** socket_open 75.85.158.4 27015
************************** socket_change FAIL #0 75.85.158.4 27015
************************** socket_open 192.168.1.10 27016
************************** socket_change PASS #0 192.168.1.10 27016
************************** socket_open 70.86.71.106 27016
************************** socket_change FAIL #0 70.86.71.106 27016
************************** socket_change PASS #1 70.86.71.106 27015     m70.86.71.106:27015
************************** socket_change PASS #1 70.86.71.106 27017     m70.86.71.106:27017
************************** socket_change FAIL #1 75.85.158.4 27015
************************** socket_change FAIL #1 70.86.71.106 27016
************************** socket_change FAIL #2 75.85.158.4 27015
************************** socket_change FAIL #2 70.86.71.106 27016
************************** socket_change FAIL #3 75.85.158.4 27015
************************** socket_change FAIL #3 70.86.71.106 27016
************************** socket_change FAIL #4 75.85.158.4 27015
L 07/03/2006 - 12:53:01: [serverquery.amxx] No data retrieved from socket connection with 75.85.158.4:27015
************************** socket_change FAIL #4 70.86.71.106 27016
L 07/03/2006 - 12:53:01: [serverquery.amxx] No data retrieved from socket connection with 70.86.71.106:27016
for some reason "192.168.1.10 27016" gets a blank packet, and the other two get no response (socket_change keeps returning flase). Not sure why its doing that but its still not hanging, my computer is returning a blank packet.
__________________
jtp10181 is offline
Send a message via ICQ to jtp10181 Send a message via AIM to jtp10181 Send a message via MSN to jtp10181 Send a message via Yahoo to jtp10181
xOR
Veteran Member
Join Date: Jun 2006
Location: x-base.info
Old 07-03-2006 , 16:57   Re: socket_recv() hangs
Reply With Quote #8

yesterday i did everything i could to get a good test case. remember, the server already crashed before. it had the plugin running with a server check frequency of 60 seconds. it was checking 7 servers, 4 of them Dominion's servers and 3 are IP's that don't run a server, so they are down. it didn't crash a whole night.
one day later i was frustrated and restarted the server. 4 hours later or so it crashed.
now yesterday i set the check frequency to 5 seconds. so now 7 servers are checked every 5 seconds. guess what: the server ran stable the whole night.

so as long as you didn't test for at least 48 hours continously you can't say it works. this is exactly what makes this a debugging hell.
between times i even believed the crash problem had magically fixed itself or me and the other people who had crashes just all had hallucinations.
but then after almost 48 hours, the next crash. and then sometimes 30 minutes after server restarting a crash, and so on...
there is just ZERO system behind this.

btw i don't know whether i receive a blank packet somewhere but that really doesn't matter. there was a log_amx() directly after socket_recv() that was never called. so i can code what i want after this line, it won't be of any help.

anyway, thank you for investing time in trying to find the problem. i really start to run out of ideas, the only thing that remains to try is whether a different timeout value would help as suggested by Hawk. again, the problem with this is that i could test it now and after 3 days without crash think it works and then at day 4 get the server hung up again. i wish i would know something to encourage a crash to be able to make faster tests.
__________________
Got more than one HL1 (CS, DoD, NS, TS, TFC, HLDM...) server? Check:
xOR is offline
jtp10181
Veteran Member
Join Date: May 2004
Location: Madison, WI
Old 07-03-2006 , 18:57   Re: socket_recv() hangs
Reply With Quote #9

is the server crashing... or is it locking up and getting stuck? There is a difference. Server crash, its a fact of life, but they should not lock up completely. Only other thing I can think of is that the sockets are not getting freed correctly and after it builds up so many it makes the server puke. Try printing the socket numbers out, it should always stay the same and not increment. Everytime a socket is closed that number should be reused again.

-----

After testing mine with printing socket numbers I realized I had a leak. If an attempt was aborted because no data was found (socket_change was false 5 times) I was not using socket_close. Anytime you have a valid socket number and abort or are done with the socket you need to use socket_close.
__________________

Last edited by jtp10181; 07-03-2006 at 19:05.
jtp10181 is offline
Send a message via ICQ to jtp10181 Send a message via AIM to jtp10181 Send a message via MSN to jtp10181 Send a message via Yahoo to jtp10181
xOR
Veteran Member
Join Date: Jun 2006
Location: x-base.info
Old 07-03-2006 , 20:20   Re: socket_recv() hangs
Reply With Quote #10

it is completely locking up thus the automatic server restart isn't triggered as well. it just freezes with the last message displayed being my debug message before socket_recv() but the one after is not displayed anymore.

my code as it currently is (and was righ from the start) would always close the socket as long as the socket number is > 0 (meaning it was opened correctly) no matter whether socket_change(d) or not.

but you might still have a point there with the socket numbers. maybe i have some leak in a different way somewhere. printing the socket numbers is really a good hint. i will try this tomorrow, too tired now
__________________
Got more than one HL1 (CS, DoD, NS, TS, TFC, HLDM...) server? Check:
xOR is offline
Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 22:11.


Powered by vBulletin®
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Theme made by Freecode