Notice. New forum software under development. It's going to miss a few functions and look a bit ugly for a while, but I'm working on it full time now as the old forum was too unstable. Couple days, all good. If you notice any issues, please contact me.
|
Forum Index : Microcontroller and PC projects : Error : [CYW43] do_ioctl(2, 263, 16): timeout
Page 1 of 2 | |||||
Author | Message | ||||
Malibu Senior Member Joined: 07/07/2018 Location: AustraliaPosts: 228 |
I've been chasing the above problem on a WebMite WiFi connection for a few months now, trying to figure out what's going on and the meaning behind it. It seems there's a few other people on the forum with the same problem. My brain has gone mushy over it, and after about my 700th theory of what's going on (Well, maybe not 700, but it sure seems like it!) I'll spare everyone the theories, guesses and stabs-in-the-dark I've done on this, but I've got an idea that might be plausible (Every other idea never seemed to 'feel' right, and they ended up being dead-ends... this one 'feels' right) So, here's a screen grab of a packet I sometimes get from the WWW and is processed by the WebMite, answered with a standard 404 response. ...and a few days later... Exactly the same packet, but something I hadn't seen before is the [NUL] characters in it... The top line is the PRINT statement to send the string to MMCC - it's not part of the packet. (note that this packet didn't trip up the WebMite, but you can see that the Process Time is 8500mS and 5500mS seconds. The process time printed is how long the WebInterrupt code takes to process that particular packet, from first entry to sending the 404 reply. Most packets take around 500mS) So, I did a search on [NUL] in HTTP code and came up with this text - Other sites say that a [NUL] char can produce 'unpredictable results' (Exactly what I seem to be having) Which started me on theory #701, and something I'm testing at the moment. I'll need to wait a few days running the current code until the error happens and figure out what came through, then modify the code to try and trap any [NUL] chars to see if that fixes anything... Probably a week or so for testing, so in the meanwhile, I'm looking for some clarity of thought from others here... Is my theory #701 a valid idea? Am I missing something obvious? Any other suggestions? Ideas? John |
||||
matherp Guru Joined: 11/12/2012 Location: United KingdomPosts: 9111 |
John This sounds like a VERY promising avenue. I've been running Geoff's watering controller now for many weeks and sometimes it runs for 2 or 3 days without interruption, other times it may reset twice or three times a day. I've added mm.info(uptime) to the main html page so I can easily see how long it has been running. It definitely feels like something external is triggering the resets and NUL characters in code that is doing string handling are clearly bad news for C. Where in the firmware are you looking to correct the spurious NUL characters? Edited 2024-04-27 17:36 by matherp |
||||
Malibu Senior Member Joined: 07/07/2018 Location: AustraliaPosts: 228 |
Thanks for the sanity check, it's been sending me bonkers! I've also run Geoff's controller at times (not actually controlling anything) and it gets the occasional hiccup that mostly triggers a watchdog reset. If it sheds more light on the subject... For my testing as it is right now, I wrote some VB code to send a string (not html code, just a string "Hits " + an increasing number) that is answered with the standard 404. Right now, it has run 10 hours today with 11382 packets sent, at random times (0.6 to 6 seconds time span) with no problems. I'll leave it run overnight and check tomorrow. From the public WWW, I've had 25 correctly formatted hits, printed to MMCC, also with no problems. I won't know the true result until I check the saved log on the WebMite and decipher any spurious characters that might come up. If you want, I can post the resulting file if it helps... Agreed. The ONLY time that there's a problem, it comes from the WWW (ie: external sources) and the NUL char is the only thing I can point my finger at. I also read a few times that NUL chars are a problem for strings in C language. I don't know, I have NO idea how C works (My hat's off to you with what you can do in C) I was just planning to trap the NUL chars in the MMbasic routine itself. Maybe replace the chars with a space or other printable char, but more likely just reject it all together. I would have no idea where to start in modifying the firmware, so that's off the table for me As a side note, probably related... I tried using UDP after the frustrations of TCP and managed to send 65000+ VB messages perfectly. I don't know if it's related, but might be a clue. John |
||||
matherp Guru Joined: 11/12/2012 Location: United KingdomPosts: 9111 |
John: Please try this version - it should replace any NUL characters with space PicoMiteWeb.zip If you ever get the error message "Error: WTF" please let me know. It is there to trap something that should never happen but could be related Edited 2024-04-27 18:49 by matherp |
||||
Malibu Senior Member Joined: 07/07/2018 Location: AustraliaPosts: 228 |
Thanks Peter, I've flashed the WebMite just now and set everything exactly the same as previous. I'll run it for as long as possible and see where it leads... I can confirm that the 'test' NULs sent were replaced with SPACEs This morning, 30 mins or so after I got up, the MM code received a packet the same as I posted above. The heartbeat was still winking (so the code was running), but the VB code was caught in a loop here ...which in my code, tells me that the previous packet sent was never replied to, and so the winsock wasn't closed (7 means Socket Connected) That would make sense if the WM was stuck in a bottle-neck trying to respond to the spurious packet prior. (Edit: Probably not... It was because I didn't do a whole lot of error trapping in the VB code and ran into a conflict. My fault... Me bad...) Anyway, I'll keep you posted - Thanks for the great work on the M.Mites Out of curiosity, what conditions would result in my most common expression lately - "Error: WTF"? Edited 2024-04-28 16:19 by Malibu John |
||||
Malibu Senior Member Joined: 07/07/2018 Location: AustraliaPosts: 228 |
Sadly, [NUL] turned out to be another dead-end (At least as far as the timeout error is concerned) I was running as usual, doing some unrelated fiddling on the computer listening to the click of the relay going on and off (So I know there's a new http request come in) I had this packet come in that caught my eye - but the WebMite kept running without a worry Maybe... 3 minutes later, there was silence from the relay which usually spells trouble. Sure enough, the heartbeat was off, no response to the VB packets and nothing new showing on the MMCC screen. (The timeout error wasn't showing either) Out of curiosity, I clicked the "Run" button on the MMCC and the heartbeat LED came on, and the "[CYW43] do_ioctl(2, 263, 16): timeout" message appeared on the screen. Click Run again, the led went off and the error message showed again... Click again... led went on, error showed... ... and so on for the 70'ish times I clicked the run button... Every time I would click run, the heartbeat led changes state and the error is printed. I checked the saved logs and there was no indication of any problems. I ended up re-powering the WM and now it's up and working again, so I'm back to square one. {I did manage to work out how to fire the watchdog timer by sending http packets. I'll need to churn it around in my head for a bit and I'll post soon} John |
||||
led-bloon Senior Member Joined: 21/12/2014 Location: AustraliaPosts: 205 |
Google mstshash - there are plenty of hits for it, like this: mstshash It would appear to be a common problem. Perhaps dispose of these type packets early and see if that fixes it? led Edited 2024-04-29 16:48 by led-bloon Miss you George |
||||
Malibu Senior Member Joined: 07/07/2018 Location: AustraliaPosts: 228 |
G'day led, Interesting link - pretty much what I thought the packet would be; some sort of bot. A plan I have is to add a function to check if there's any chars under Chr(32) to reject the packet. I think the problem is deeper than that though, but it's another option I haven't tried yet Yeah, I'm pretty much in the same school there. I would have thought with all the ISP's passing along the packet data over WWW, at least one of them would think 'This is rubbish... I'll delete it' Right now, I've jumped down the rabbit hole and looking into the CYW43 driver C-Code. (Just because I still have a few loose hairs left on my head, so I may as well lose them too ) John |
||||
matherp Guru Joined: 11/12/2012 Location: United KingdomPosts: 9111 |
This is very interesting. The timeout message is part of the Pico SDK and uses printf rather than the normal MMBasic text output routines and it looks like printf is blocking. I've changed the printf to a normal non-blocking message. Please test and let's see what happens PicoMiteWeb.zip |
||||
Malibu Senior Member Joined: 07/07/2018 Location: AustraliaPosts: 228 |
Thanks, I've just nuked and flashed the WebMite Coincidentally, as I was reading your post, the WM froze and had the same symptoms as above. That was after 11610 hits from the VB program. [Edit] Which tells me that it's probably not limited only to WWW packets. I'm running on TCP port 1111, so anything on port 80, doesn't get forwarded. I'll check this new version in the AM and report back in the PM tomorrow (ugggh! it's a work day tomorrow ) Edited 2024-04-29 17:59 by Malibu John |
||||
Malibu Senior Member Joined: 07/07/2018 Location: AustraliaPosts: 228 |
So, the latest update... I re-wrote the vb code because the error trapping in the previous version was a bit non-existent. It's much better now I ran the vb code and the WM code tonight when I got home from work. Here's the results: 12013 hits to the WM from VB code at 500mS intervals over approximately 2 1/2 hours until it trapped a fault. The VB code process list as printed to a text box that shows each stage as it happens (Top line on the right is start time, bottom line is the fault time, and 'Number' is the WinSock error number...) - This is the print to MMCC... Which shows that the WM received the data, but couldn't (or wasn't able to) respond. The VB connection is/was still open, so it was waiting for a reply, but for some reason the WM got stuck inside a brain-freeze and couldn't send the data out. I'll triple check all my code, but I'm pretty sure it's pretty much bullet-proof There's no sign of the '[CYW43] do_ioctl(2, 263, 16): timeout' as yet, but this could be part of the symptoms... and, as it is right now, the output relay is ON (it gets turned on at interrupt entry and turned off at exit time) and the heartbeat LED is on solid. If I hit ^C then run, run, run... the heartbeat changes state on each 'run'. I have also disabled the Watchdog, so when a fault occurs, it freezes up. Otherwise this condition would result in firing the WD timer. I will try and refine the vb code tomorrow and see if I can squeeze another byte of information out of the data/info. Any thoughts? John |
||||
OA47 Guru Joined: 11/04/2012 Location: AustraliaPosts: 926 |
I have a project that is crashing at least twice a day. The program locks up without error message until a key is pressed then I get the [CYW43] do_ioctl(2, 263, 16): timeout and a power reset is the only way I can get the program running again. I have a 5 second watchdog running in the main loop and ON ERROR SKIP prior to the WEB calls. Is there any other error trapping I can do to keep the program running? 0A47 |
||||
TassyJim Guru Joined: 07/08/2011 Location: AustraliaPosts: 6098 |
I have been running Peters latest beta on a test system. It locks up and does a reboot most days. My system that is doing real work is back to an early beta and is very reliable. The method I find that gives the best service is twofold. 1 Run the hardware watchdog. 2 put a loop counter in the main program loop. The counter is reset during a normal interrupt. If the counter gets too high, the interrupts are not triggering so do a CPO restart. DO IF tickTime THEN dotick ' reset the loop counter and do the background work here k$ = INKEY$ IF k$ <>"" THEN SELECT CASE k$ CASE "L","l" ptt_ok = 1-ptt_ok PIN(ptt_inhibit) = ptt_ok CASE "X","x" OPTION AUTORUN OFF WATCHDOG HW OFF syslog "Quit" Quit = 1 END SELECT ENDIF INC loopCount IF loopCount > 15000 THEN syslog " Loop failed - RESTARTING" CPU RESTART ENDIF IF checkNTP THEN updateNTP LOOP UNTIL Quit The theory is If the CPU timers are running, the watchdog will detect a stalled program If the timers have stopped, the loop keeps going without any interrupts so the counter gets to the panic setting and reboots. This requires NO pause statements in the main loop. Jim Edited 2024-05-01 13:45 by TassyJim VK7JH MMedit MMBasic Help |
||||
Malibu Senior Member Joined: 07/07/2018 Location: AustraliaPosts: 228 |
Jim, AO47 - thanks for your input I currently have the watchdog turned off because I want to freeze the MMCC screen (and WM) where the problem happens - I'm trying to avoid a reset and keep an 'as-is' state. There's a few times where a problem happens where the heartbeat stops. If the WD fires, it can sometimes clear it and the HB starts again, so I miss the lock-up conditions. Normally I'll have the HB at 20 seconds. For MOST times, a full power reset is the only way to do a 'clean' recovery - By clean, I mean everything goes back to as it should be. For example, it was locked up when I got home from work with the dreaded timeout message, so (long story short) I did ^C and clicked run and everything started working again. Regular heartbeat, relay clicking on and off. Hit ^C, the WM went to a 'normal' stopped state, but the heartbeat was not regular. on-off-on-off-on-on-off. After a full repower, it's happy again and doing exactly as intended. I'm stumped and I'm out of ideas. I've looked at the C-code source for the CYW43 driver and found where the message probably gets fired, but knowing nothing about C, I struggle with the workings of it. This here - CYW43_WARN("do_ioctl(%u, %u, %u): timeout\n", (unsigned int)kind, (unsigned int)cmd, (unsigned int)len); There's a lot of discussion out there on the timeout problem and most people follow it up with a "do_ioctl: got unexpected packet" error as well - which is something I haven't seen. (I have seen a few "Internal error in tcp_server_recv - Attempting recovery", but they seem to be handled OK, I suspect by MMbasic and not the Pico SDK. There's too many symptoms, at irregular times resulting in unpredictable behaviour for me to have a "GOTCHYA" moment... I dunno, I think I'm beaten on this one Edit: And to shed some observations on the watchdog firing, I can fire it at my pleasure. If a http request gets sent to the WM from VB, the interrupt triggers. IF I close the vb down before the interrupt sub has finished (ie: connection is lost) the WM has nowhere to reply to, so sits in a kind of 'limbo' waiting for a connection that is listening. It will fire the watchdog, 100% sure of it. So the questions is, what happens when a legitimate WWW request comes in and drops out during the interrupt routine? Edited 2024-05-01 16:06 by Malibu John |
||||
OA47 Guru Joined: 11/04/2012 Location: AustraliaPosts: 926 |
Thanks Jim, Out of curiosity which version is this? 0A47 |
||||
TassyJim Guru Joined: 07/08/2011 Location: AustraliaPosts: 6098 |
I don't push it too hard and leave the important things to non-webmite devices. WebMite MMBasic Version 5.07.08b10 OPTION SYSTEM SPI GP10,GP11,GP12 OPTION CPUSPEED (KHz) 250000 OPTION LCDPANEL VIRTUAL_C OPTION WIFI xxxxxx, ***********, RTVmon OPTION TCP SERVER PORT 80, 10000 OPTION TELNET CONSOLE ON OPTION SDCARD GP13 > and just in case you don't have a copy WebMiteV5.07.08b10.zip It is old so missing a lot but works for my uses. Jim VK7JH MMedit MMBasic Help |
||||
Malibu Senior Member Joined: 07/07/2018 Location: AustraliaPosts: 228 |
Sorry Jim, a silly question - what's the 10000 in this line? The manual says... Launches a TCP server on port 'n' during every restart of the WebMite. Typically HTTP servers use port 80. USE "OPTION TCP SERVER PORT 0" to disable When the server is running it can respond to up to MM.INFO(MAX CONNECTIONS) The only parameter have found is for the actual port number (80 for a web server) or did I miss something (As I usually do)? John |
||||
Pluto Guru Joined: 09/06/2017 Location: FinlandPosts: 357 |
Found this in my own notes, couldn't however find it on the thebackshed: OPTION TCP SERVER PORT n, timeout Pluto |
||||
JohnS Guru Joined: 18/11/2011 Location: United KingdomPosts: 3801 |
hmm, I found a quote here Other than that? John |
||||
TassyJim Guru Joined: 07/08/2011 Location: AustraliaPosts: 6098 |
Pluto is correct. It is a timeout which defaults to 5 seconds (5000mS) You can set it between 1 and 20 seconds. If you are serving up large images a longer timeout is needed. I can't find the reference to it but it will be here somewhere in the webmite beta threads. Something that I missed when checking the manuals. Jim VK7JH MMedit MMBasic Help |
||||
Page 1 of 2 |
Print this page |