Home
JAQForum Ver 24.01
Log In or Join  
Active Topics
Local Time 08:42 28 Nov 2024 Privacy Policy
Jump to

Notice. New forum software under development. It's going to miss a few functions and look a bit ugly for a while, but I'm working on it full time now as the old forum was too unstable. Couple days, all good. If you notice any issues, please contact me.

Forum Index : Microcontroller and PC projects : WEBmite - The day time stood still

     Page 1 of 2    
Author Message
TassyJim

Guru

Joined: 07/08/2011
Location: Australia
Posts: 6100
Posted: 09:34pm 29 Jun 2023
Copy link to clipboard 
Print this post

WEBmite - The day time stood still

Not a game but a mysterious bug which has consumed the last two weeks of my time.

I have deployed a WEBmite to monitor a radio repeater that lives in my garage. VK7RTV for those that way inclined.
The WEBmite works well, most of the time.
Occasionally the TIME stops advancing and my SETTICK no longer fires.
The WATCHDOG no longer triggers either.
The main program continues to function.

I have tried changing the power supply from a switch-mode to linear, both with plenty of grunt.
I have discounted any RF interference.
I have another WEBmite running as a test on my desk and it also has the same problem.

My latest thought was it's temperature sensitive as we had a cold snap and the garage where the main unit resides is not heated.

I tried running at maximum clock speed and enclosed it in an insulated package to raise the cpu temperature. Currently idling at 32 degrees, up from ~20 a few days ago.

I think the temperature has some bearing but it is not the whole answer. The data sheet reckons the pico is good for -20 and more recently -40.

To monitor the bug, I count the number of main loops for each second and raise a panic if it becomes obvious that the clock has stopped.

I did have this problem some time ago but I can't find our what version of MMBasic we had last time.
I thought the problem had been fixed but it might just have been summer arriving.

What can cause some of the internal clocks for failing?


Test system options:
I have tried turning all but the essential options off with no change.
OPTION LIST
WebMite MMBasic Version 5.07.08b6
OPTION SYSTEM SPI GP10,GP11,GP12
OPTION AUTORUN ON
OPTION CPUSPEED (KHz) 250000
OPTION LCDPANEL VIRTUAL_C
OPTION WIFI ******, ********, PICO*******
OPTION TCP SERVER PORT 80, 1000
OPTION UDP SERVER PORT 6802
OPTION TELNET CONSOLE ON
OPTION SDCARD GP13


Both my main system and the test machine on my desk failed once last night. At different times.

Test code:
 ' RTV monitor kw\WEBmite
 OPTION EXPLICIT
 OPTION DEFAULT INTEGER
 OPTION AUTORUN ON
 
 DIM ticktime, loopcount, cputemp!
 DIM startTime$, olddate$
 
 SETTICK 1000, tick
 WATCHDOG 12000
 ON ERROR SKIP 2
 WEB NTP 10, "10.1.1.52"
 'WEB tcp interrupt gotrequest
 
 IF olddate$ <> DATE$ THEN
   ON ERROR SKIP
   WEB NTP 10, "10.1.1.52"
 ENDIF
 syslog "Started"
 startTime$ = TIME$
 PRINT CHR$(27)+"[2J"
 dotick
 DO
   IF ticktime THEN dotick
   INC loopcount
   IF (loopcount MOD 60) = 0 THEN cputemp! = PIN(temp)
   IF (loopcount MOD 8000) =0 THEN
     syslog "Loop failed "
     PRINT "Loop failed at ";TIME$
   ENDIF
   IF LOOPCOUNT > 24000 THEN
     syslog "Loop failed "+STR$(cputemp!,3,1)
     CPU RESTART
   ENDIF
 LOOP
 
SUB tick
 ticktime = 1
END SUB
 
SUB dotick
 ticktime = 0
 WATCHDOG 12000
 PRINT CHR$(27)+"["+STR$(0)+";"+STR$(0)+"H"
 PRINT TIME$;"  ";DATE$; "  Started at  ";startTime$
 PRINT "Loop = ";loopcount;"  Temp= ";STR$(cputemp!,3,1)
 loopcount = 0
END SUB
 
SUB syslog txt$
 OPEN "syslog.txt" FOR APPEND AS #1
 PRINT #1, DATE$+" "+TIME$+" "+txt$
 CLOSE #1
END SUB


Jim
Edited 2023-06-30 07:35 by TassyJim
VK7JH
MMedit   MMBasic Help
 
damos
Regular Member

Joined: 15/04/2016
Location: Australia
Posts: 63
Posted: 11:36pm 29 Jun 2023
Copy link to clipboard 
Print this post

Has anyone determined the radio range of the webmites? Without an external antenna I am not expecting much, but I wonder whether basic tricks like mounting the webmite in an enclosure where the RF part just happens to be at the focal point of a large stainless cooking bowl are effective. How do they compare to the cheap UHF modules for range?

My brother has a 40 acre property and I am thinking of using them as an option to control various solar powered devices on the property.

I believe the Pi people are also talking about releasing an ethernet version, which will be good as it means power on ethernet will be a good option for this sort of thing. In the meanwhile this isn't a huge issue as there are power on ethernet to serial adapters that can be be used.
 
DrifterNL

Regular Member

Joined: 27/09/2018
Location: Netherlands
Posts: 58
Posted: 11:29am 01 Jul 2023
Copy link to clipboard 
Print this post

@TassyJim

I had something similar happen.
While testing software I wanted to manually set the time and accidentally entered a wrong value and pressed enter.
I got the error message and the pico seemed to lock up, even the heartbeat led stopped.
The software has a watchdog timer.
I waited a bit and then randomly presses a key on my keyboard and the pico came back to life, except...
the heartbeat led stayed off and the time stayed locked up.
I power cycled the pico it worked again.
edit: WebMite V5.07.07

@damos

You would probably be better off using a few HC-12 with correct antennas.
Edited 2023-07-01 21:31 by DrifterNL
Floating Point Keeps Sinking Me!
Back To Integer So I Don't Get Injured.
 
TassyJim

Guru

Joined: 07/08/2011
Location: Australia
Posts: 6100
Posted: 09:27pm 01 Jul 2023
Copy link to clipboard 
Print this post

I am glad that I am not the only one who has seen the strange clock stopping situation.

I put a small heater under the pico and that brought the cpu temperature up to 50 degrees.
Last 24 hours had 9 failures. So I conclude that it was not temperature related.

Running out of ideas to test.

Jim
VK7JH
MMedit   MMBasic Help
 
Plasmamac

Guru

Joined: 31/01/2019
Location: Germany
Posts: 554
Posted: 12:42am 02 Jul 2023
Copy link to clipboard 
Print this post

try it this night @Jim
Plasma
 
TassyJim

Guru

Joined: 07/08/2011
Location: Australia
Posts: 6100
Posted: 05:24am 02 Jul 2023
Copy link to clipboard 
Print this post

My latest thought is to try and minimise the number of calls to functions that briefly stop the timer ticks.
These are the TIME$ and DATE$ as well as PULSE if I am reading the source code correctly.

I will also change my test board to make much greater use of the TIME related functions in an attempt to present some code that fails more frequently.
It is not very easy to chase bugs that only appear rarely.

The reason is I think it is possible for the network features that are running in the background to interrupt a MMBasic command that has paused the timer and prevent it from getting turned back on.

I will not be surprised if I have to scrap that thought after a few days more testing.

Jim
VK7JH
MMedit   MMBasic Help
 
TassyJim

Guru

Joined: 07/08/2011
Location: Australia
Posts: 6100
Posted: 06:53am 02 Jul 2023
Copy link to clipboard 
Print this post

THis code failed 3 times in it's first hour.
I blame it on the excessive use of TIME$ and DATE$
 ' RTV monitor kw\WEBmite
 OPTION EXPLICIT
 OPTION DEFAULT INTEGER
 OPTION AUTORUN ON
 
 DIM ticktime, loopcount, cputemp!
 DIM startTime$, olddate$, dummy$
 
 
 SETTICK 1000, tick
 WATCHDOG 12000
 ON ERROR SKIP 2
 WEB NTP 10, "10.1.1.52"
 'WEB tcp interrupt gotrequest
 
 IF olddate$ <> DATE$ THEN
   ON ERROR SKIP
   WEB NTP 10, "10.1.1.52"
 ENDIF
 syslog "Started"
 startTime$ = TIME$
 PRINT CHR$(27)+"[2J"
 dotick
 DO
   IF ticktime THEN dotick
   INC loopcount
   
   IF LOOPCOUNT >15000 THEN
     syslog "Loop failed "+STR$(PIN(temp),3,1)
     CPU RESTART
   ENDIF
   dummy$=MID$(TIME$,1,6)+MID$(DATE$,1,5)+TIME$+DATE$
 LOOP
 
SUB tick
 ticktime = 1
END SUB
 
SUB dotick
 ticktime = 0
 WATCHDOG 12000
 PRINT CHR$(27)+"["+STR$(0)+";"+STR$(0)+"H"
 PRINT TIME$;"  ";DATE$; "  Started at  ";startTime$
 PRINT "Loop = ";loopcount;"  Temp= ";STR$(PIN(temp),3,1)
 loopcount = 0
END SUB
 
SUB syslog txt$
 OPEN "syslog.txt" FOR APPEND AS #1
 PRINT #1, DATE$+" "+TIME$+" "+txt$
 CLOSE #1
END SUB


any OPTIONs should be OK to see the bug but this is the one I am using:
OPTION LIST
WebMite MMBasic Version 5.07.08b6
OPTION SYSTEM SPI GP10,GP11,GP12
OPTION AUTORUN ON
OPTION CPUSPEED (KHz) 250000
OPTION LCDPANEL VIRTUAL_C
OPTION WIFI ******, ********, PICO*******
OPTION TCP SERVER PORT 80, 1000
OPTION UDP SERVER PORT 6802
OPTION TELNET CONSOLE ON
OPTION SDCARD GP13


Jim
VK7JH
MMedit   MMBasic Help
 
NPHighview

Senior Member

Joined: 02/09/2020
Location: United States
Posts: 200
Posted: 04:18pm 02 Jul 2023
Copy link to clipboard 
Print this post

Not sure that TIME$ and DATE$ are the source of your problem.
My analog clock program Github link here uses TIME$ at least once per second, updates the clock once an hour from the NTP server, and uses DATE$ at least once an hour.
No crashes, even over some weeks.
Live in the Future. It's Just Starting Now!
 
TassyJim

Guru

Joined: 07/08/2011
Location: Australia
Posts: 6100
Posted: 10:50pm 02 Jul 2023
Copy link to clipboard 
Print this post

  NPHighview said  Not sure that TIME$ and DATE$ are the source of your problem.


Not the source of the problem, but an easy way to work the pico in away that demonstrates the problem.

One big difference between your program and mine is, I use SETTICK and you use SYNC

I haven't played with SYNC and not sure if it slows down the responce to web requests too much. I will have to do some experimenting but I would like to get to the cause of the lockups first.

Jim
VK7JH
MMedit   MMBasic Help
 
Andrew_G
Guru

Joined: 18/10/2016
Location: Australia
Posts: 847
Posted: 10:59pm 02 Jul 2023
Copy link to clipboard 
Print this post

Hi Jim,
I've come late to the party but I can't see where 'olddate$' is set. It looks to me that it will call NTP every cycle?
(Edit: No it doesn't but is it doing what you want?)

Cheers,

Andrew
Edited 2023-07-03 09:03 by Andrew_G
 
TassyJim

Guru

Joined: 07/08/2011
Location: Australia
Posts: 6100
Posted: 11:13pm 02 Jul 2023
Copy link to clipboard 
Print this post

  Andrew_G said  Hi Jim,
I've come late to the party but I can't see where 'olddate$' is set. It looks to me that it will call NTP every cycle?
(Edit: No it doesn't but is it doing what you want?)

Cheers,

Andrew


No it is not doing anything useful.

It does do what it's supposed to do in the 'real' program. This is just a cut down version to force an error with the timer stopping.
It is much easier to debug if the bug happens frequently instead of once or twice a day.

Jim
VK7JH
MMedit   MMBasic Help
 
Andrew_G
Guru

Joined: 18/10/2016
Location: Australia
Posts: 847
Posted: 11:17pm 02 Jul 2023
Copy link to clipboard 
Print this post

Yep - understood. Andrew
 
TheMonkeys

Regular Member

Joined: 15/12/2022
Location: Australia
Posts: 59
Posted: 06:12am 05 Jul 2023
Copy link to clipboard 
Print this post

I've noticed the same thing.
Currently, V5.070806, I think the problem started when updated from 5.070805

It's not just a Date$/Time$ thing. I have my own heartbeat that utilises Timer, and I spotted that it stopped working, too.


Dim integer rd,gn,bl ' Cortex
Dim integer cort=0 ' first LED on the RGB LED bar

main:
rd=Val(Field$(Time$,1,":"))*5: gn=Val(Field$(Time$,2,":"))*2: bl=Val(Field$(Time$,3,":"))*2 ' Time of Day
If (Timer Mod 1000) < 100 Then ' Cortex Stable
 RGBled(cort,rd,gn,bl) ' H,M,S
Else
 RGBled(cort,rd/8,gn/8,bl/8), ' h,m,s
EndIf
...some other code
goto main

RGBled is a sub that wraps Bitbang ws2812.

The LED stopped blinking. The above code should still blink the LED, even if the "clock" stops.

Cheers,
Chris.
 
TheMonkeys

Regular Member

Joined: 15/12/2022
Location: Australia
Posts: 59
Posted: 04:29am 08 Jul 2023
Copy link to clipboard 
Print this post

A followup:
I set up a logging script on my machine to track the failures.
From the log: it hung at around 02:50-ish

08-07-2023 02:50:08 up for 16h 17m 15s

from the console:

Watchdog timeout
PICOE6614103E71 connecting to WiFi...
Connected 192.168.0.102
Starting TCP server at 192.168.0.102 on port 2040
ntp address 162.159.200.123
got ntp response: 08/07/2023 04:09:26
Legion is GO!!! WebmiteOmnibusWebserver 12 1.6.1 07 Jul 2023 11:20
04:10:02 1:parsing clock.pi 0.23s


Secondly: it froze @ ~ 09:55

09:40:09 1:parsing clock.pi 0.23s
PICOE6614103E71 connecting to WiFi...
Connected 192.168.0.102
Starting TCP server at 192.168.0.102 on port 2040
ntp address 162.159.200.123
got ntp response: 08/07/2023 09:56:53
Legion is GO!!! WebmiteOmnibusWebserver 12 1.6.1 07 Jul 2023 11:20
09:57:11 1:parsing state.pi 0.23s

By "Froze" I mean that it stopped responding. The heartbeats (both mine and the real one) both froze. The restart was realised by un-plugging the power.

My thought is that the WiFi chip locks up, and this has some effect on the clock/timer (Date$, Time$ and Timer). Potentially, the clock slows down rather than stopping, as the 04:09 watchdog (which happened while I was asleep) took about two hours - rather than the 13 seconds to which it was set - to kick in.

Hope this helps,

Chris.
 
TassyJim

Guru

Joined: 07/08/2011
Location: Australia
Posts: 6100
Posted: 04:58am 08 Jul 2023
Copy link to clipboard 
Print this post

With my setup. MMBasic V5.07.08b4 works well and passes all my testing without any issues.
b5 and later all fail with loss of heartbeat and other clocks. This is when the only item configured is the WiFi connection.

I can continue with my deployment using beta4 but will keep trying to track down the culprit on my test unit.

Jim
VK7JH
MMedit   MMBasic Help
 
Mixtel90

Guru

Joined: 05/10/2019
Location: United Kingdom
Posts: 6798
Posted: 06:01am 08 Jul 2023
Copy link to clipboard 
Print this post

Could this be a cpu speed issue? Is it being overclocked? If so, all bets are off.
Mick

Zilog Inside! nascom.info for Nascom & Gemini
Preliminary MMBasic docs & my PCB designs
 
TassyJim

Guru

Joined: 07/08/2011
Location: Australia
Posts: 6100
Posted: 06:51am 08 Jul 2023
Copy link to clipboard 
Print this post

  Mixtel90 said  Could this be a cpu speed issue? Is it being overclocked? If so, all bets are off.


I have spent the last 2+ weeks testing.

Tried all firmware versions from a few 5.07.07 betas to the latest 5.07.08 beta.
5.07.08b4 and earlier work, later ones fail.

Tried all speeds, 3 different picos, 2 different routers, numerous versions of the test basic code, numerous OPTION settings.
Always doing a full reset and clean-out between firmware versions.

I would love to be proved wrong, but for now, I will stay on beta 4.

Jim
VK7JH
MMedit   MMBasic Help
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 9129
Posted: 07:13am 08 Jul 2023
Copy link to clipboard 
Print this post

Jim

Please try the attached. I've reverted to the older sdk. The update may have happened around b4


PicoMiteWeb.zip
 
TassyJim

Guru

Joined: 07/08/2011
Location: Australia
Posts: 6100
Posted: 07:33am 08 Jul 2023
Copy link to clipboard 
Print this post

Thanks Peter,
Up and running. I will let you know how it goes over the next few hours.

Jim
VK7JH
MMedit   MMBasic Help
 
TassyJim

Guru

Joined: 07/08/2011
Location: Australia
Posts: 6100
Posted: 09:33am 08 Jul 2023
Copy link to clipboard 
Print this post

Looking good after 2 hours.
Lets see what over night night brings us.

Jim
VK7JH
MMedit   MMBasic Help
 
     Page 1 of 2    
Print this page
© JAQ Software 2024