Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mDNS sudden death #7262

Closed
jjsuwa opened this issue May 2, 2020 · 15 comments
Closed

mDNS sudden death #7262

jjsuwa opened this issue May 2, 2020 · 15 comments

Comments

@jjsuwa
Copy link
Contributor

jjsuwa commented May 2, 2020

MCVE:
Simple WiFiSTA + mDNS, and LED blink as coalmine canary :)

  • Hardware: [Generic ESP8266 (ESP-WROOM-02 breakout PCB w/DTR reset circuitry)]
  • Core Version: commit 4e3a4b6 (latest @ the moment)
  • [Edited 3] CPU Frequency: 160 MHz
  • [Edited 3] lwIP Variant: v2 Lower Memory (no features)
  • [Edited 2] Espressif FW: nonos-sdk 2.2.1+119 (191122)
#include <ESP8266WiFi.h>
#include <ESP8266mDNS.h>

#define AP_SSID       "YOUR-SSID"
#define AP_PASSWORD   "YOUR-PASSWORD"
#define MDNS_HOSTNAME "WiFiSta"

#define LED_PIN 4
#define DIVIDER 10000

void setup() {
  pinMode(LED_PIN, OUTPUT);
  WiFi.persistent(false);
  WiFi.mode(WIFI_OFF);
  Serial.begin(115200);
  delay(250);

  WiFi.mode(WIFI_STA);
  Serial.print(F("\n\n\n"
                 "WiFi(STA): SSID=\"" AP_SSID "\".\n"
                 "WiFi(STA): connecting"));
  WiFi.begin(F(AP_SSID), F(AP_PASSWORD));
  for (; ; delay(500)) {
    if (WiFi.status() == WL_CONNECTED) {
      Serial.printf_P(PSTR(", done.\n"
                           "WiFi(STA): IP address=%s/%s.\n"
                           "mDNS: hostname=\"" MDNS_HOSTNAME ".local\".\n"), WiFi.localIP().toString().c_str(), WiFi.subnetMask().toString().c_str());
      MDNS.begin(F(MDNS_HOSTNAME));
      break;
    }
    Serial.print('.');
  }
}

void loop() {
  MDNS.update();

  // visible canary
  static unsigned int counter = DIVIDER;
  if (--counter == 0) {
    counter = DIVIDER;
    digitalWrite(LED_PIN, digitalRead(LED_PIN) == 0);
  }
}

Symptom:

  • After WiFi connecting done, 1st mDNS response will almost always be fine.
  • But later attempts will not often be responded without any signs, especially at some interval (a few minites~).
  • [Edited 1] Once happened, it seems not to recover permanently.

Additional Info:

  • Regardless of above, LED blink doesn't stop (loop() lives).
  • Same as above, ping w/dot-decimal-notation responds (both WiFi and IP echo live).
  • Debug output (eg. CORE+WIFI+HTTP_UPDATE+UPDATER+OTA+OOM+MDNS) tells no clue about this...
@mikekgr
Copy link

mikekgr commented May 2, 2020

{ Continuing the related discussion from https://gitter.im/esp8266/Arduino }
As @d-a-v suggested to me, I just finished the testing of "OTA-mDNS-SPIFFS" that is coming as an mDNS library example. I Noticed the same bad behavior , when the ESP8266 D1 mini is starting, I have correct bonjour appearance for about 8 minutes, then is lost, nothing and never more. Same as my sketch where initially I found the problem...

edit from maintainter: ref

@d-a-v
Copy link
Collaborator

d-a-v commented May 3, 2020

@jjsuwa @mikekgr
I have been running the two above tests without issues for several minutes (I'll let them run and update if something happens).
It may not be an issue with mDNS but with NONOS-SDK FW.

Latest release 2.7.0 is using NONOS-SDK v2.2.1+100 (2019-07-03).
You may try with "Legacy 2.2.1" which was previously shipped, or with more recent ones: 2.2.1(2019-11-22) is the latest.
This list is available in arduino IDE menus when the generic board is chosen.
The current default version was chosen based on user reports.

You may also add WiFi.setSleepMode(WIFI_NONE_SLEEP); (just in case / for the test / to be sure).

I am running the gitter sketch as-is, and the above one with this added code in the end of setup():

    auto hService = MDNS.addService(0, "itworks", "tcp", 58266);
    if (hService)
    {
        if ((!MDNS.addServiceTxt(hService, "readme", "0xdeep")))
        {
            MDNS.removeService(hService);
            hService = 0;
        }
    }

This code allows me to run this bash command on Linux with avahi:

edit: with cache flush

#!/bin/bash
srv=_itworks._tcp

c=0
while true; do
    echo ""
    date
    c=$((c+1))
    echo $c
    echo
    avahi-browse -t -r $srv
    sudo avahi-daemon --kill # flush mDNS cache, automatically restarted
    sleep 10
done

(replace _itworks by _arduino for the OP example)

Both are running flawlessly for 1770 seconds (gitter) and 770 seconds (OP).
edit: restarted with cache flush and still running after 2180s (resp. 1700s)

@jjsuwa
Copy link
Contributor Author

jjsuwa commented May 3, 2020

I think 10 second intervals are not enough to reproduce the issue. (needs 1 min+)

And, any of

  • Erasing all flash contents
  • using NONOS-SDK v2.2.1+100 (190703) or NONOS-SDK v2.2.1 (legacy)
  • adding WiFi.setSleepMode(WIFI_NONE_SLEEP);
  • 80MHz CPU Frequency

cannot help to resolve the issue.

Repetitive mDNS resolve one-liner for Windows Command Prompt:
for /L %a in (0,0,1) do @(ping -4 -n 1 target.local & timeout /nobreak 60)

C:\Users\Administrator>for /L %a in (0,0,1) do @(ping -4 -n 1 WiFiSta.local & timeout /nobreak 60)

Pinging WiFiSta.local [192.168.2.20] with 32 bytes of data:
Reply from 192.168.2.20: bytes=32 time=5ms TTL=255

Ping statistics for 192.168.2.20:
    Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 5ms, Maximum = 5ms, Average = 5ms

Waiting for  0 seconds, press CTRL+C to quit ...

Pinging WiFiSta.local [192.168.2.20] with 32 bytes of data:
Reply from 192.168.2.20: bytes=32 time=50ms TTL=255

Ping statistics for 192.168.2.20:
    Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 50ms, Maximum = 50ms, Average = 50ms

Waiting for  0 seconds, press CTRL+C to quit ...

Pinging WiFiSta.local [192.168.2.20] with 32 bytes of data:
Reply from 192.168.2.20: bytes=32 time=5ms TTL=255

Ping statistics for 192.168.2.20:
    Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 5ms, Maximum = 5ms, Average = 5ms

Waiting for  0 seconds, press CTRL+C to quit ...
Ping request could not find host WiFiSta.local. Please check the name and try again.

Waiting for  0 seconds, press CTRL+C to quit ...
Ping request could not find host WiFiSta.local. Please check the name and try again.

Waiting for  0 seconds, press CTRL+C to quit ...
Ping request could not find host WiFiSta.local. Please check the name and try again.

Waiting for 57 seconds, press CTRL+C to quit ...

However, dot-decimal-form ping is still working.

C:\Users\Administrator>ping 192.168.2.20

Pinging 192.168.2.20 with 32 bytes of data:
Reply from 192.168.2.20: bytes=32 time=4ms TTL=255
Reply from 192.168.2.20: bytes=32 time=1ms TTL=255
Reply from 192.168.2.20: bytes=32 time=2ms TTL=255
Reply from 192.168.2.20: bytes=32 time=3ms TTL=255

Ping statistics for 192.168.2.20:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 1ms, Maximum = 4ms, Average = 2ms

@d-a-v
Copy link
Collaborator

d-a-v commented May 3, 2020

interval edited
result edited
@jjsuwa

I modifiied my scan-from-linux script:

  • an mdns request is made
  • hostname (not IP) is extracted
  • cache is flushed
  • 5 pings with 40 secs space are sent (200 secs between loops)
  • loop

I let it run for a while.
It has run for 15 times 200secs or both of them (script is provided in case one could test/improve it)
mdnscan.zip

Can you tell which is the last working core version ?

@jjsuwa
Copy link
Contributor Author

jjsuwa commented May 3, 2020

@d-a-v

Can you tell which is the last working core version ?

Backing to r2.6.3 (commit 3d128e5), it seems fine.
Both Linux w/avahi service query and Windows w/Bonjour hostname resolve work well for me.

Advancing to #7025 (commit 7b0fa35)... seems OK.
#7042 (commit a8515a7)... OK,
#7216 (commit e5f4514)... OK,
#7217 (commit 77b82a0)... failed!

And then, backing again to the latest... of course reprods the issue.

@d-a-v
Copy link
Collaborator

d-a-v commented May 3, 2020

Thanks. So #7217 would be the hidden-to-me issue ?

Well. We have planned to make another mDNS update that will allow to have a single instance working for all interfaces. I guess it is time to try it now for a 2.7.1 bugfix release.

@mikekgr
Copy link

mikekgr commented May 3, 2020

I can confirm that, also in my case, the proposed solution is working fine. I have back the normal mDNS functionality. Many thanks for @jjsuwa ,. @d-a-v and all wonderful people that working hard and free to have this ESP8266 Arduino Core. I continuing the testing but all seems fine.

@d-a-v
Copy link
Collaborator

d-a-v commented May 3, 2020

@BbIKTOP we are going to merge #7266 because #7217 causes issues.

I am anyway going to try a change that would hopefully fit with everyone.

devyte pushed a commit that referenced this issue May 3, 2020
#7266)

workaround for #7262 (reverts #7217)

Co-authored-by: Takayuki 'January June' Suwa <[email protected]>
@devyte
Copy link
Collaborator

devyte commented May 3, 2020

PR #7266 is merged as a temporary workaround. That means that issue #7217 is now present again.
This is being kept open to track a solution that meets both cases.

@BbIKTOP
Copy link
Contributor

BbIKTOP commented May 3, 2020

@BbIKTOP we are going to merge #7266 because #7217 causes issues.

I am anyway going to try a change that would hopefully fit with everyone.

Do you already understand how is it possible? I just cannot imagine.

@d-a-v
Copy link
Collaborator

d-a-v commented May 3, 2020

I still haven't understood why it works with me/you/some and not others.
Anyway having one instance per interface is quite a nonsense on such small architecture.
I am trying something with multicast over all interfaces, with a single instance.

@reaper7
Copy link
Contributor

reaper7 commented May 3, 2020

with this commit bf718c3 programming via OTA is again possible (devices do not disappear after a while)

@BbIKTOP
Copy link
Contributor

BbIKTOP commented May 3, 2020

I still haven't understood why it works with me/you/some and not others.
Anyway having one instance per interface is quite a nonsense on such small architecture.
I am trying something with multicast over all interfaces, with a single instance.

Yes, single instance is what i asked for since the very beginning.
Although it’s quite strange that it can cause any problems. I’d like to understand how is it possible but cannot reproduce

@jjsuwa
Copy link
Contributor Author

jjsuwa commented May 4, 2020

In my envs, when listening to m_netif->ip_addr, MDNSResponder::_callProcess() will never be called back.
To IP4_ADDR_ANY, will be.

The scenario which I can assume:

  1. MDNS.begin(...) advertises to other mDNS servers.
  2. the other ones can cache the presense of the ESP8266mDNS service during a some short period.
  3. a mDNS client multicasts the request soon.
  4. ESP8266mDNS do not respond, but other mDNS servers (if exist) will do instead.
  5. as time passes, the cached ESP8266mDNS info will cease to be.
  6. now, nobody knows...

@devyte
Copy link
Collaborator

devyte commented May 6, 2020

As discussed internally, this issue no longer applies because the troublesome commit was backed out.
The original issue that resulted in that troublesome commit is #7213, and it has been reopened. Tracking will continue there.
Closing.

@devyte devyte closed this as completed May 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants