Home
JAQForum Ver 24.01
Log In or Join  
Active Topics
Local Time 10:37 25 Nov 2024 Privacy Policy
Jump to

Notice. New forum software under development. It's going to miss a few functions and look a bit ugly for a while, but I'm working on it full time now as the old forum was too unstable. Couple days, all good. If you notice any issues, please contact me.

Forum Index : Microcontroller and PC projects : WebMite Watchdog Timeout Revisit

Author Message
Malibu
Senior Member

Joined: 07/07/2018
Location: Australia
Posts: 228
Posted: 11:10pm 07 Jun 2024
Copy link to clipboard 
Print this post

G'day all  
So, this niggling problem goes back a while for me, and it seems others have had the same/similar sort of issue.
It's been driving me nuts, so I thought I'd look into it deeper and try and understand what's going on... My apologies for any toes that I may have stepped on - that was not my intention!
(I guess this is more a post for matherp because of the MM source code - Sorry for messing with it)

To recap - I've had lots of mysterious watchdog resets, lost WiFi connections and stalled WebMite instances. The occasional Error messages that made no sense as well.

So, I jumped in the deep end and installed Visual Studio Code, Pico-SDK and the MM source code.
The only debugging I could do was to use a lot of 'printf' functions to show me where the code was running, plus see what the variable values were at the time by printing to the MMCC. All good, but a looooong process!
Anyway, a long story short, and I've pinned my issue down to a While loop in MMtcpserver.c
//while(!(state->sent_len[pcb]==state->total_sent[pcb] )|| Timer4==0){
   CheckAbort();
//}

If I comment out the While loop (which I have as shown, but leaving CheckAbort to run as it should), I don't get any of the previous problems I had before... No matter how hard I try to beat the WM into submission, I can't get it to fault. It also seems to answer the HTTP request much quicker.
I realise that the While loop is checking that everything that should be sent is actually sent as part of the error checking, but this is a quick fix that my skill level allows.
Everything else seems to work fine, but I would suspect I've probably broken something along the way.

As I say, it's more a post for Peter  
I have ZERO experience in C/C++, so I don't know what I'm doing in it apart from bumbling my way around to 'see what happens'...
I used the latest release of VSC and Cmake, TinyUSB 0.16.0 and MM source 5.09.00RC3
Apart from the While loop comment out, the only other change I've done is to the Version string and added 'MyVer 1.0', so I know it's the one I've modified when I do an 'option list' command.

On that, I'll run it as is for a few days and see what happens. Fingers crossed!  
John
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 9110
Posted: 07:56am 09 Jun 2024
Copy link to clipboard 
Print this post

John: Thanks for your continuing work on this - it is really appreciated.
Your "fix" is likely to create other issues (buffer overrun) but it gives me another place to look. I've put a minor change in RC5 by using a new timer in case Timer4 was getting overwritten but I'll keep playing. I'm testing using Geoff's watering controller software - currently running for 11 hours since I made the timer change. I'll post on this thread when I have something more to report.

If you want to try a different fix to double the investigation rate please build a version with the attached (untested) and see how it runs

void checksent(void *arg, int fn, int pcb){
   TCP_SERVER_T *state = (TCP_SERVER_T*)arg;
   int loopcount=1000000;

   while(state->sent_len[pcb]!=state->total_sent[pcb]  && loopcount ){
       loopcount--;
       CheckAbort();
   }
   if(loopcount==0){
       if(fn)ForceFileClose(fn);
       tcp_server_close(state, pcb) ;
       MMPrintString("Warning: LWIP send data timeout\r\n");
   }
}

Edited 2024-06-09 18:53 by matherp
 
Malibu
Senior Member

Joined: 07/07/2018
Location: Australia
Posts: 228
Posted: 08:14pm 09 Jun 2024
Copy link to clipboard 
Print this post

Wow! I think you've nailed it  

I wouldn't have suspected the timer because I never saw the original timeout error. I also notice you slightly changed the while statement from
!(state->sent_len[pcb]==state->total_sent[pcb])

to
state->sent_len[pcb]!=state->total_sent[pcb]

which was the bit of code where my suspicions pointed, but don't have the smarts to understand it. The WM seemed to get 'confused' about which lengths it was comparing when there was a lot of connections open. Previously, by me clicking the browsers load/cancel/reload/cancel/reload(etc) as fast as I could, the WM would go into a meltdown and the watchdog would fire.
With your updated fix, I can do the same reload/cancel(etc), but the WM seems to be able to keep track of what it's comparing. Now the timeout warning pops up as I would expect it should.
The WM seems to sort through all the data and answer or close the connections in an orderly manner and keep running. Perfect!  

I've made a couple of small changes:
printf("Warning: LWIP send data timeout connection no. %d\r\n",pcb+1);
//MMPrintString("Warning: LWIP send data timeout\r\n");

...to see which connection has timed out, and then, in the ProcessWeb function:
//error("No response to request from connection no. %",i+1);
printf("No response to request from connection no. %d\r\n",i+1);

...only to avoid shelling to the prompt when the Error function is called. (Maybe not a good idea, but at least it keeps the program running)

The other problem that popped up in my build was the dreaded "[CYW43] do_ioctl(2, 263, 16): timeout" issue, but I realised I was using RC3 source, so your 'fix' for that in RC4 was missing.

My early conclusion is that it's fixed, so I urge anyone else with mysterious watchdog problems, try out the RC5 version and see what happens.
Thanks for looking into this Peter, I'll keep you posted on the long term results... and as a side note, while going through your source code, I'm REALLY impressed in what you've done with the C/C++ code to make these little boards work!
Edited 2024-06-10 06:15 by Malibu
John
 
Malibu
Senior Member

Joined: 07/07/2018
Location: Australia
Posts: 228
Posted: 05:32am 13 Jun 2024
Copy link to clipboard 
Print this post

So, a quick update on the proceedings...
Using the code Peter asked to try out on my RC3 build, with only a small modification, it has been pretty stable so far.
Running for approximately 70 hours (probably more if I include a few coding sessions). I've have a few faults along the way, mainly:
Internal error in tcp_server_recv - Attempting recovery
Warning: LWIP send data timeout connection no. 1

all of which were handled in code and were recoverable, plus a single one each of:
do_ioctl: got unexpected packet
[CYW43] do_ioctl(2, 263, 16): timeout

... that stopped the firmware running and froze the heartbeat LED - I suspect it's probably because I don't have Peter's 'fix' for that problem.
That's been among 700+ external HTTP requests to the WebMite.

Last night, there was a watchdog reset with no reason why being obvious, but that's been the only WD reset so far. Being the only one, I think it's not really a problem, or, at least it's an acceptable quantity.

I suspect the troubles might be solved (note that it's the C-code above that I used, because RC5 is still a version that's easy to get the WD to fire which has similar code to RC3 - tells me that it may not have been Timer4)

Anyway, just a quicky update, but I'll keep testing and see how it gets on for a few more days  
Edited 2024-06-13 15:34 by Malibu
John
 
Print this page


To reply to this topic, you need to log in.

© JAQ Software 2024