Notice. New forum software under development. It's going to miss a few functions and look a bit ugly for a while, but I'm working on it full time now as the old forum was too unstable. Couple days, all good. If you notice any issues, please contact me.
|
Forum Index : Microcontroller and PC projects : WEBmite - The day time stood still
Page 1 of 2 | |||||
Author | Message | ||||
TassyJim Guru Joined: 07/08/2011 Location: AustraliaPosts: 6100 |
WEBmite - The day time stood still Not a game but a mysterious bug which has consumed the last two weeks of my time. I have deployed a WEBmite to monitor a radio repeater that lives in my garage. VK7RTV for those that way inclined. The WEBmite works well, most of the time. Occasionally the TIME stops advancing and my SETTICK no longer fires. The WATCHDOG no longer triggers either. The main program continues to function. I have tried changing the power supply from a switch-mode to linear, both with plenty of grunt. I have discounted any RF interference. I have another WEBmite running as a test on my desk and it also has the same problem. My latest thought was it's temperature sensitive as we had a cold snap and the garage where the main unit resides is not heated. I tried running at maximum clock speed and enclosed it in an insulated package to raise the cpu temperature. Currently idling at 32 degrees, up from ~20 a few days ago. I think the temperature has some bearing but it is not the whole answer. The data sheet reckons the pico is good for -20 and more recently -40. To monitor the bug, I count the number of main loops for each second and raise a panic if it becomes obvious that the clock has stopped. I did have this problem some time ago but I can't find our what version of MMBasic we had last time. I thought the problem had been fixed but it might just have been summer arriving. What can cause some of the internal clocks for failing? Test system options: I have tried turning all but the essential options off with no change. OPTION LIST WebMite MMBasic Version 5.07.08b6 OPTION SYSTEM SPI GP10,GP11,GP12 OPTION AUTORUN ON OPTION CPUSPEED (KHz) 250000 OPTION LCDPANEL VIRTUAL_C OPTION WIFI ******, ********, PICO******* OPTION TCP SERVER PORT 80, 1000 OPTION UDP SERVER PORT 6802 OPTION TELNET CONSOLE ON OPTION SDCARD GP13 Both my main system and the test machine on my desk failed once last night. At different times. Test code: ' RTV monitor kw\WEBmite OPTION EXPLICIT OPTION DEFAULT INTEGER OPTION AUTORUN ON DIM ticktime, loopcount, cputemp! DIM startTime$, olddate$ SETTICK 1000, tick WATCHDOG 12000 ON ERROR SKIP 2 WEB NTP 10, "10.1.1.52" 'WEB tcp interrupt gotrequest IF olddate$ <> DATE$ THEN ON ERROR SKIP WEB NTP 10, "10.1.1.52" ENDIF syslog "Started" startTime$ = TIME$ PRINT CHR$(27)+"[2J" dotick DO IF ticktime THEN dotick INC loopcount IF (loopcount MOD 60) = 0 THEN cputemp! = PIN(temp) IF (loopcount MOD 8000) =0 THEN syslog "Loop failed " PRINT "Loop failed at ";TIME$ ENDIF IF LOOPCOUNT > 24000 THEN syslog "Loop failed "+STR$(cputemp!,3,1) CPU RESTART ENDIF LOOP SUB tick ticktime = 1 END SUB SUB dotick ticktime = 0 WATCHDOG 12000 PRINT CHR$(27)+"["+STR$(0)+";"+STR$(0)+"H" PRINT TIME$;" ";DATE$; " Started at ";startTime$ PRINT "Loop = ";loopcount;" Temp= ";STR$(cputemp!,3,1) loopcount = 0 END SUB SUB syslog txt$ OPEN "syslog.txt" FOR APPEND AS #1 PRINT #1, DATE$+" "+TIME$+" "+txt$ CLOSE #1 END SUB Jim Edited 2023-06-30 07:35 by TassyJim VK7JH MMedit MMBasic Help |
||||
damos Regular Member Joined: 15/04/2016 Location: AustraliaPosts: 63 |
Has anyone determined the radio range of the webmites? Without an external antenna I am not expecting much, but I wonder whether basic tricks like mounting the webmite in an enclosure where the RF part just happens to be at the focal point of a large stainless cooking bowl are effective. How do they compare to the cheap UHF modules for range? My brother has a 40 acre property and I am thinking of using them as an option to control various solar powered devices on the property. I believe the Pi people are also talking about releasing an ethernet version, which will be good as it means power on ethernet will be a good option for this sort of thing. In the meanwhile this isn't a huge issue as there are power on ethernet to serial adapters that can be be used. |
||||
DrifterNL Regular Member Joined: 27/09/2018 Location: NetherlandsPosts: 58 |
@TassyJim I had something similar happen. While testing software I wanted to manually set the time and accidentally entered a wrong value and pressed enter. I got the error message and the pico seemed to lock up, even the heartbeat led stopped. The software has a watchdog timer. I waited a bit and then randomly presses a key on my keyboard and the pico came back to life, except... the heartbeat led stayed off and the time stayed locked up. I power cycled the pico it worked again. edit: WebMite V5.07.07 @damos You would probably be better off using a few HC-12 with correct antennas. Edited 2023-07-01 21:31 by DrifterNL Floating Point Keeps Sinking Me! Back To Integer So I Don't Get Injured. |
||||
TassyJim Guru Joined: 07/08/2011 Location: AustraliaPosts: 6100 |
I am glad that I am not the only one who has seen the strange clock stopping situation. I put a small heater under the pico and that brought the cpu temperature up to 50 degrees. Last 24 hours had 9 failures. So I conclude that it was not temperature related. Running out of ideas to test. Jim VK7JH MMedit MMBasic Help |
||||
Plasmamac Guru Joined: 31/01/2019 Location: GermanyPosts: 554 |
try it this night @Jim Plasma |
||||
TassyJim Guru Joined: 07/08/2011 Location: AustraliaPosts: 6100 |
My latest thought is to try and minimise the number of calls to functions that briefly stop the timer ticks. These are the TIME$ and DATE$ as well as PULSE if I am reading the source code correctly. I will also change my test board to make much greater use of the TIME related functions in an attempt to present some code that fails more frequently. It is not very easy to chase bugs that only appear rarely. The reason is I think it is possible for the network features that are running in the background to interrupt a MMBasic command that has paused the timer and prevent it from getting turned back on. I will not be surprised if I have to scrap that thought after a few days more testing. Jim VK7JH MMedit MMBasic Help |
||||
TassyJim Guru Joined: 07/08/2011 Location: AustraliaPosts: 6100 |
THis code failed 3 times in it's first hour. I blame it on the excessive use of TIME$ and DATE$ ' RTV monitor kw\WEBmite OPTION EXPLICIT OPTION DEFAULT INTEGER OPTION AUTORUN ON DIM ticktime, loopcount, cputemp! DIM startTime$, olddate$, dummy$ SETTICK 1000, tick WATCHDOG 12000 ON ERROR SKIP 2 WEB NTP 10, "10.1.1.52" 'WEB tcp interrupt gotrequest IF olddate$ <> DATE$ THEN ON ERROR SKIP WEB NTP 10, "10.1.1.52" ENDIF syslog "Started" startTime$ = TIME$ PRINT CHR$(27)+"[2J" dotick DO IF ticktime THEN dotick INC loopcount IF LOOPCOUNT >15000 THEN syslog "Loop failed "+STR$(PIN(temp),3,1) CPU RESTART ENDIF dummy$=MID$(TIME$,1,6)+MID$(DATE$,1,5)+TIME$+DATE$ LOOP SUB tick ticktime = 1 END SUB SUB dotick ticktime = 0 WATCHDOG 12000 PRINT CHR$(27)+"["+STR$(0)+";"+STR$(0)+"H" PRINT TIME$;" ";DATE$; " Started at ";startTime$ PRINT "Loop = ";loopcount;" Temp= ";STR$(PIN(temp),3,1) loopcount = 0 END SUB SUB syslog txt$ OPEN "syslog.txt" FOR APPEND AS #1 PRINT #1, DATE$+" "+TIME$+" "+txt$ CLOSE #1 END SUB any OPTIONs should be OK to see the bug but this is the one I am using: OPTION LIST WebMite MMBasic Version 5.07.08b6 OPTION SYSTEM SPI GP10,GP11,GP12 OPTION AUTORUN ON OPTION CPUSPEED (KHz) 250000 OPTION LCDPANEL VIRTUAL_C OPTION WIFI ******, ********, PICO******* OPTION TCP SERVER PORT 80, 1000 OPTION UDP SERVER PORT 6802 OPTION TELNET CONSOLE ON OPTION SDCARD GP13 Jim VK7JH MMedit MMBasic Help |
||||
NPHighview Senior Member Joined: 02/09/2020 Location: United StatesPosts: 200 |
Not sure that TIME$ and DATE$ are the source of your problem. My analog clock program Github link here uses TIME$ at least once per second, updates the clock once an hour from the NTP server, and uses DATE$ at least once an hour. No crashes, even over some weeks. Live in the Future. It's Just Starting Now! |
||||
TassyJim Guru Joined: 07/08/2011 Location: AustraliaPosts: 6100 |
Not the source of the problem, but an easy way to work the pico in away that demonstrates the problem. One big difference between your program and mine is, I use SETTICK and you use SYNC I haven't played with SYNC and not sure if it slows down the responce to web requests too much. I will have to do some experimenting but I would like to get to the cause of the lockups first. Jim VK7JH MMedit MMBasic Help |
||||
Andrew_G Guru Joined: 18/10/2016 Location: AustraliaPosts: 847 |
Hi Jim, I've come late to the party but I can't see where 'olddate$' is set. It looks to me that it will call NTP every cycle? (Edit: No it doesn't but is it doing what you want?) Cheers, Andrew Edited 2023-07-03 09:03 by Andrew_G |
||||
TassyJim Guru Joined: 07/08/2011 Location: AustraliaPosts: 6100 |
No it is not doing anything useful. It does do what it's supposed to do in the 'real' program. This is just a cut down version to force an error with the timer stopping. It is much easier to debug if the bug happens frequently instead of once or twice a day. Jim VK7JH MMedit MMBasic Help |
||||
Andrew_G Guru Joined: 18/10/2016 Location: AustraliaPosts: 847 |
Yep - understood. Andrew |
||||
TheMonkeys Regular Member Joined: 15/12/2022 Location: AustraliaPosts: 59 |
I've noticed the same thing. Currently, V5.070806, I think the problem started when updated from 5.070805 It's not just a Date$/Time$ thing. I have my own heartbeat that utilises Timer, and I spotted that it stopped working, too. Dim integer rd,gn,bl ' Cortex Dim integer cort=0 ' first LED on the RGB LED bar main: rd=Val(Field$(Time$,1,":"))*5: gn=Val(Field$(Time$,2,":"))*2: bl=Val(Field$(Time$,3,":"))*2 ' Time of Day If (Timer Mod 1000) < 100 Then ' Cortex Stable RGBled(cort,rd,gn,bl) ' H,M,S Else RGBled(cort,rd/8,gn/8,bl/8), ' h,m,s EndIf ...some other code goto main RGBled is a sub that wraps Bitbang ws2812. The LED stopped blinking. The above code should still blink the LED, even if the "clock" stops. Cheers, Chris. |
||||
TheMonkeys Regular Member Joined: 15/12/2022 Location: AustraliaPosts: 59 |
A followup: I set up a logging script on my machine to track the failures. From the log: it hung at around 02:50-ish 08-07-2023 02:50:08 up for 16h 17m 15s from the console: Watchdog timeout PICOE6614103E71 connecting to WiFi... Connected 192.168.0.102 Starting TCP server at 192.168.0.102 on port 2040 ntp address 162.159.200.123 got ntp response: 08/07/2023 04:09:26 Legion is GO!!! WebmiteOmnibusWebserver 12 1.6.1 07 Jul 2023 11:20 04:10:02 1:parsing clock.pi 0.23s Secondly: it froze @ ~ 09:55 09:40:09 1:parsing clock.pi 0.23s PICOE6614103E71 connecting to WiFi... Connected 192.168.0.102 Starting TCP server at 192.168.0.102 on port 2040 ntp address 162.159.200.123 got ntp response: 08/07/2023 09:56:53 Legion is GO!!! WebmiteOmnibusWebserver 12 1.6.1 07 Jul 2023 11:20 09:57:11 1:parsing state.pi 0.23s By "Froze" I mean that it stopped responding. The heartbeats (both mine and the real one) both froze. The restart was realised by un-plugging the power. My thought is that the WiFi chip locks up, and this has some effect on the clock/timer (Date$, Time$ and Timer). Potentially, the clock slows down rather than stopping, as the 04:09 watchdog (which happened while I was asleep) took about two hours - rather than the 13 seconds to which it was set - to kick in. Hope this helps, Chris. |
||||
TassyJim Guru Joined: 07/08/2011 Location: AustraliaPosts: 6100 |
With my setup. MMBasic V5.07.08b4 works well and passes all my testing without any issues. b5 and later all fail with loss of heartbeat and other clocks. This is when the only item configured is the WiFi connection. I can continue with my deployment using beta4 but will keep trying to track down the culprit on my test unit. Jim VK7JH MMedit MMBasic Help |
||||
Mixtel90 Guru Joined: 05/10/2019 Location: United KingdomPosts: 6798 |
Could this be a cpu speed issue? Is it being overclocked? If so, all bets are off. Mick Zilog Inside! nascom.info for Nascom & Gemini Preliminary MMBasic docs & my PCB designs |
||||
TassyJim Guru Joined: 07/08/2011 Location: AustraliaPosts: 6100 |
I have spent the last 2+ weeks testing. Tried all firmware versions from a few 5.07.07 betas to the latest 5.07.08 beta. 5.07.08b4 and earlier work, later ones fail. Tried all speeds, 3 different picos, 2 different routers, numerous versions of the test basic code, numerous OPTION settings. Always doing a full reset and clean-out between firmware versions. I would love to be proved wrong, but for now, I will stay on beta 4. Jim VK7JH MMedit MMBasic Help |
||||
matherp Guru Joined: 11/12/2012 Location: United KingdomPosts: 9129 |
Jim Please try the attached. I've reverted to the older sdk. The update may have happened around b4 PicoMiteWeb.zip |
||||
TassyJim Guru Joined: 07/08/2011 Location: AustraliaPosts: 6100 |
Thanks Peter, Up and running. I will let you know how it goes over the next few hours. Jim VK7JH MMedit MMBasic Help |
||||
TassyJim Guru Joined: 07/08/2011 Location: AustraliaPosts: 6100 |
Looking good after 2 hours. Lets see what over night night brings us. Jim VK7JH MMedit MMBasic Help |
||||
Page 1 of 2 |
Print this page |