Skip to content

Add an auto-generated unicode character category file#4605

Merged
tnull merged 1 commit into
lightningdevkit:mainfrom
TheBlueMatt:2026-05-unicode-autogen
May 18, 2026
Merged

Add an auto-generated unicode character category file#4605
tnull merged 1 commit into
lightningdevkit:mainfrom
TheBlueMatt:2026-05-unicode-autogen

Conversation

@TheBlueMatt

Copy link
Copy Markdown
Collaborator

1a01b5a added detection of unicode format characters in PrintableString, but used a hard-coded table which may eventually become out of date.

Here we switch to an auto-generated table, include all General_Category Other characters, and also ban unallocated code points.

Finally, CI validates that the file is kept up to date.

Written by Claude

@ldk-reviews-bot

ldk-reviews-bot commented May 7, 2026

Copy link
Copy Markdown

👋 Thanks for assigning @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@TheBlueMatt TheBlueMatt requested a review from tnull May 7, 2026 18:47
@TheBlueMatt TheBlueMatt force-pushed the 2026-05-unicode-autogen branch from b6f8c03 to bd75483 Compare May 7, 2026 18:49
Comment on lines +36 to +38
let is_other = is_unicode_general_category_other(c);
let is_unassigned = is_unicode_general_category_unassigned(c);
let c = if c.is_control() || is_other || is_unassigned {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: c.is_control() is now fully redundant — it checks Cc (Control), which is already covered by is_unicode_general_category_other (see 0x0000..=0x001F and 0x007F..=0x009F in unicode.rs). The old code needed it because is_format_char only covered Cf, but the new function covers all of Cc / Cf / Cs / Co.

Not a bug (the || short-circuits harmlessly), but it's potentially confusing because it suggests is_other doesn't handle control characters.

Suggested change
let is_other = is_unicode_general_category_other(c);
let is_unassigned = is_unicode_general_category_unassigned(c);
let c = if c.is_control() || is_other || is_unassigned {
let c = if is_unicode_general_category_other(c) || is_unicode_general_category_unassigned(c) {

@ldk-claude-review-bot

ldk-claude-review-bot commented May 7, 2026

Copy link
Copy Markdown
Collaborator

Review Summary

New issue found

  • .github/workflows/check_unicode.yml:25gh issue create runs without deduplication; will create a new issue every day the file is out of date. build.yml already demonstrates the correct pattern (check for existing open issue, comment on it instead).

Correction to prior review

My three prior comments about the YAML workflow file (missing jobs: key, missing checkout step, fi indentation) were incorrect — the file structure is fine as written. I misread the diff indentation. Apologies for the noise.

Prior comment still applicable

  • lightning-types/src/string.rs:38c.is_control() remains redundant with is_unicode_general_category_other, which already covers all Cc characters.

@codecov

codecov Bot commented May 7, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.23301% with 40 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.15%. Comparing base (1a01b5a) to head (65e8cc8).
⚠️ Report is 66 commits behind head on main.

Files with missing lines Patch % Lines
lightning-types/src/unicode.rs 92.18% 40 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4605      +/-   ##
==========================================
+ Coverage   86.09%   86.15%   +0.06%     
==========================================
  Files         157      158       +1     
  Lines      108828   109323     +495     
  Branches   108828   109323     +495     
==========================================
+ Hits        93694    94189     +495     
+ Misses      12519    12518       -1     
- Partials     2615     2616       +1     
Flag Coverage Δ
tests 86.15% <92.23%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tnull tnull left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, seems maybe a bit excessive, but fine by me. No need to check this on every CI run though, let's just move it to a weekly cronjob?

Comment thread .github/workflows/build.yml Outdated

@tnull tnull left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, please squash.

Comment thread .github/workflows/check_unicode.yml
Comment thread .github/workflows/check_unicode.yml
Comment thread .github/workflows/check_unicode.yml Outdated
@tnull

tnull commented May 8, 2026

Copy link
Copy Markdown
Contributor

Claude's reviews seem legit though

1a01b5a added detection of unicode
format characters in `PrintableString`, but used a hard-coded table
which may eventually become out of date.

Here we switch to an auto-generated table, include all
`General_Category` `Other` characters, and also ban unallocated
code points.

Finally, CI validates that the file is kept up to date.

Written by Claude
@TheBlueMatt TheBlueMatt force-pushed the 2026-05-unicode-autogen branch from 2de5954 to 65e8cc8 Compare May 8, 2026 19:44
@TheBlueMatt

Copy link
Copy Markdown
Collaborator Author

Squashed and fixed claude's comments:

$ git diff-tree -U2 2de5954d9 65e8cc8d5
diff --git a/.github/workflows/check_unicode.yml b/.github/workflows/check_unicode.yml
index c22a4e3b29..a01add3f81 100644
--- a/.github/workflows/check_unicode.yml
+++ b/.github/workflows/check_unicode.yml
@@ -5,4 +5,5 @@ on:
     - cron: '42 3 * * *'

+jobs:
   check-unicode:
     runs-on: ubuntu-latest
@@ -10,4 +11,6 @@ on:
       issues: write
     steps:
+      - name: Checkout source code
+        uses: actions/checkout@v4
       - name: Check unicode file state
         env:
@@ -21,3 +24,3 @@ on:
             BODY="The unicode character listing is out of date, see $RUN_URL"
             gh issue create --title "$TITLE" --body "$BODY"
-        fi
+          fi

TITLE="Unicode listing out of date: ${{ github.workflow }}"
RUN_URL="https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}"
BODY="The unicode character listing is out of date, see $RUN_URL"
gh issue create --title "$TITLE" --body "$BODY"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: No deduplication — this creates a new issue every day the file is out of date. Unicode updates are infrequent (roughly annual), but if nobody regenerates the file promptly, this will spam the issue tracker with identical issues.

build.yml already has the right pattern (lines 380-391): check for an existing open issue with a label, and comment on it instead of creating a duplicate. This workflow should do the same.

Minimal fix sketch:

          LABEL="unicode outdated"
          gh label create "$LABEL" 2>/dev/null || true
          EXISTING=$(gh issue list --label "$LABEL" --state open --json number --jq '.[0].number // empty')
          if [ -n "$EXISTING" ]; then
            gh issue comment "$EXISTING" --body "$BODY"
          else
            gh issue create --title "$TITLE" --label "$LABEL" --body "$BODY"
          fi

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, that takes effort, this is fine we expect to respond quickly.

@tnull tnull merged commit 71fdc27 into lightningdevkit:main May 18, 2026
23 of 24 checks passed
@TheBlueMatt

Copy link
Copy Markdown
Collaborator Author

Backported to 0.1 in #4680.

@TheBlueMatt

Copy link
Copy Markdown
Collaborator Author

Backported to 0.2 in #4683.

TheBlueMatt added a commit that referenced this pull request Jun 19, 2026
v0.1.10 - Jun 18, 2026 - "Loupe de Loupe"

API Updates
===========

 * `DefaultMessageRouter` will now always generate blinded message paths that
   provide no privacy (where our node is the introduction node) for nodes with
   public channels. This works around an issue which will appear for any nodes
   with LND peers that enable onion messaging - such peers will refuse to
   forward BOLT 12 messages from unknown third parties, which most BOLT 12
   payers rely on today (#4647).
 * Explicit `amount_msats` of 0 is rejected in BOLT 12 `Offer`s; `OfferBuilder`
   now maps 0-amounts to an amount of `None` (#4324).

Bug Fixes
=========

 * Async `ChannelMonitorUpdate` persistence operations which complete, but are
   not marked as complete in a persisted `ChannelManager` prior to restart,
   followed immediately by a block connection and then another restart could
   result in some channel operations hanging leading for force-closures (#4377).
 * If an MPP payment is claimed but `ChannelMonitorUpdate`s for some parts are
   still being completed asynchronously, further channel updates (e.g.
   forwarding another payment) are pending and the node restarts, the channel
   could have become stuck (#4520).
 * The presence of unconfirmed transactions actually no longer causes
   `ElectrumSyncClient` to spuriously fail to sync (#4590).
 * `FilesystemStore::list_all_keys` will no longer fail if there are stale
   intermediate files lying around from a previous unclean shutdown (#4618).
 * When forwarding an HTLC while in a blinded path with proportional fees over
   200%, LDK will no longer spuriously allow a forward that pays us 1 msat too
   little in fees (#4697).
 * Fixed a rare case where a channel could get stuck on reconnect when using
   both async `ChannelMonitorUpdate` persistence and async signing (#4684).
 * `Event::PaymentSent::fee_paid_msat` is no longer `None` in cases where
   `ChannelManager::abandon_payment` was called before the payment ultimately
   completes anyway (#4651).
 * Syncing a `ChainMonitor` using the `Confirm` trait will no longer write some
   full `ChannelMonitor`s to disk several times per block (#4544).
 * `OMDomainResolver` now correctly accounts for failed queries when rate
   limiting, ensuring we continue to respond to queries after failures (#4591).
 * Calling `ChannelManager::send_payment_with_route` without a `route_params`
   and with an invalid `Route` will no longer panic (#4707).
 * `lightning-custom-message`'s handling of `peer_connected` events now ensures
   that sub-handlers will see a `peer_disconnected` event if a different
   sub-handler refused the connection by `Err`ing `peer_connected` (#4595).
 * Incomplete MPP keysend payments will no longer see their HTLCs held until
   expiry (#4558).
 * `InvoiceRequestBuilder` will no longer accept a `quantity` of `0` for a
   BOLT 12 `Offer`, allowing any quantity up to a bound (#4667).
 * `lightning-custom-message` handlers that return `Ok(None)` when asked to
   deserialize a message in their defined range no longer cause panics (#4709).
 * Several spurious debug assertions were fixed (#4537, #4618).

Security
========

0.1.10 fixes a sanitization issue and several denial-of-service vulnerabilities.
 * `Bolt11Invoice::recover_payee_pub_key` no longer panics if called on an
   invoice which set an explicit public key, rather than relying on public key
   recovery. This method is called from `payment_parameters_from_invoice` and
   `payment_parameters_from_variable_amount_invoice` (#4717).
 * Maliciously-crafted unpayable invoices which have overflowing feerates will
   no longer cause an `unwrap` failure panic (#4716).
 * `possiblyrandom` did not properly generate random data except when it was
   explicitly configured to. By default this means LDK is vulnerable to various
   HashDoS attacks (#4719).
 * `OMNameResolver` will no longer panic when looking up payment instructions
   which include unicode characters at the start of a TXT record (#4718).
 * `PrintableString` did not properly sanitize unicode format characters,
   allowing an attacker to corrupt the rendering of logs or UI (#4593, #4605).
 * RGS data is now limited in how large of a graph it is able to cause a client
   to store in memory. Note that RGS data is still considered a DoS vector in
   general and you should only use semi-trusted RGS data (#4713).
 * Counterparty-provided strings in failure messages are no longer logged in
   full, reducing the ability of such a counterparty to spam our logs (#4714).
 * Reading a corrupted `ChannelManager` or `ProbabilisticScorer` can no longer
   cause us to allocate large amounts of memory (#4712).

Thanks to Project Loupe for reporting most of the issues fixed in this release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants