Skip to content

Robust contacts - backup first before saving and fallback to backup file#1447

Open
weebl2000 wants to merge 3 commits into
meshcore-dev:devfrom
weebl2000:robust-contacts
Open

Robust contacts - backup first before saving and fallback to backup file#1447
weebl2000 wants to merge 3 commits into
meshcore-dev:devfrom
weebl2000:robust-contacts

Conversation

@weebl2000

@weebl2000 weebl2000 commented Jan 22, 2026

Copy link
Copy Markdown
Contributor

Problem: Contacts sometimes partially disappear on Heltec v4 and Heltec Wireless Tracker v1.2. This is likely caused by power loss or reset during a direct overwrite of the contacts file.

Solution: Atomic-style write pattern for saveContacts:

  1. Write contacts to a temporary file (/contacts3.tmp)
  2. Flush to ensure data is on flash
  3. Rename existing /contacts3/contacts3.bak
  4. Rename /contacts3.tmp/contacts3
  5. Remove /contacts3.bak (no longer needed until next save)

If the write fails mid-save, the .tmp file is removed and the original contacts file is left untouched.

On load, if /contacts3 is missing or empty, loadContacts falls back to /contacts3.bak automatically. This covers the edge case where power is lost between steps 3 and 4.

Platform guard: Only enabled on devices with sufficient flash for the temporary file (~2x contacts storage during save). Disabled on NRF52 unless EXTRAFS or QSPIFLASH is defined.


Build firmware: Build from this branch

@weebl2000 weebl2000 changed the base branch from main to dev January 22, 2026 07:42
@weebl2000 weebl2000 force-pushed the robust-contacts branch 3 times, most recently from 9778b43 to 06dc6f5 Compare February 11, 2026 04:13
@weebl2000

Copy link
Copy Markdown
Contributor Author

I think what actually caused the contacts to miss in the first place was the nodeinfo files that were written regardless of auto-add contacts.

@winnieXY

Copy link
Copy Markdown

Hey, I'm working on something similar for the repeater - however as far as I know the rename is atomic - thus there is no need to create a .bak file in my eyes.

@syssi

syssi commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

This fixes a real and reproducible data loss path. Confirmed on a Seeed Xiao S3 WIO (ESP32-S3), env Xiao_S3_WIO_companion_radio_wifi, with ~350 contacts, related to #1519.

The current saveContacts() opens /contacts3 with openWrite("w"), which truncates the file to 0 bytes before the rewrite even starts. If anything goes wrong during the rewrite of ~53 KB (350 * 152 bytes), the contact list is gone. On my device the trigger was a nearly full SPIFFS (1312 of 1404 KB used, see #1519/#1601): the truncate frees the old file, the rewrite then fails on the full fragmented filesystem, and /contacts3 is left at 0 bytes. At the next boot loadContacts reads it as empty (loaded=0) and the first sync makes the loss permanent. So it is not only power loss that hits this, a failing write does too.

I implemented the same atomic tmp + rename pattern independently and it stops the wipe:

saveContacts: OK wrote=350 contacts
reboot
loadContacts: file_size=53200 bytes, loaded=350

Contacts now survive the reboot. The extra .bak retention and the load time fallback to .bak in this PR are a good addition. They cover the power loss window between renaming /contacts3 to .bak and renaming .tmp to /contacts3, which a plain tmp + rename does not fully cover. Worth pairing with #1601 so an already full device first gets enough free space for the temporary file.

@weebl2000 weebl2000 force-pushed the robust-contacts branch 2 times, most recently from d3ab801 to 16e98d9 Compare June 6, 2026 19:57
weebl2000 added a commit to weebl2000/meshcore that referenced this pull request Jun 6, 2026
… + minor

Changes vs dev_plus:
- meshcore-dev#1727 hardware-CAD redesign: cad_enabled pref (slot 293), 'cad on/off' CLI cmd,
  RSSI interference (int.thresh) + separate hardware-CAD gate in RadioLibWrappers.
  DEVIATION (per request): cad_enabled defaults to 1 (CAD ENABLED) on
  repeater/room/sensor; companion getCADEnabled() returns true.
  companion getInterferenceThreshold() aligned to PR (0; RSSI int.thresh disabled
  until currentRSSI() is fixed). Fixed a merge-dup of performChannelScan() decl.
- meshcore-dev#2001: 'af' getter ftoa -> ftoa3
- meshcore-dev#1896: ESP32 RTC fallback seed literal -> RTC_TIME_MIN
- meshcore-dev#1925: rak11200 P_LORA_DIO_1 via platformio build flag (not header #define)

No-ops (verified already-aligned in dev_plus, or intentionally absent):
meshcore-dev#1677 AEAD, meshcore-dev#1686 short-sleeps (=> meshcore-dev#1347 subset), meshcore-dev#2674 be-more-patient(+backoff),
meshcore-dev#1447 robust-contacts (absent), meshcore-dev#2537 fix-serial-bursts (absent).
Wessel Nieboer and others added 3 commits June 14, 2026 15:36
The fs->remove("/contacts3.bak") before the rename sequence creates a
vulnerability window: if power is lost right after removing the backup
but before the rename completes, both the backup and primary file could
be lost. The remove is unnecessary since rename() on both SPIFFS and
LittleFS replaces the target if it already exists.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants