Add an auto-generated unicode character category file#4605
Conversation
|
👋 Thanks for assigning @tnull as a reviewer! |
b6f8c03 to
bd75483
Compare
| let is_other = is_unicode_general_category_other(c); | ||
| let is_unassigned = is_unicode_general_category_unassigned(c); | ||
| let c = if c.is_control() || is_other || is_unassigned { |
There was a problem hiding this comment.
Nit: c.is_control() is now fully redundant — it checks Cc (Control), which is already covered by is_unicode_general_category_other (see 0x0000..=0x001F and 0x007F..=0x009F in unicode.rs). The old code needed it because is_format_char only covered Cf, but the new function covers all of Cc / Cf / Cs / Co.
Not a bug (the || short-circuits harmlessly), but it's potentially confusing because it suggests is_other doesn't handle control characters.
| let is_other = is_unicode_general_category_other(c); | |
| let is_unassigned = is_unicode_general_category_unassigned(c); | |
| let c = if c.is_control() || is_other || is_unassigned { | |
| let c = if is_unicode_general_category_other(c) || is_unicode_general_category_unassigned(c) { |
Review SummaryNew issue found
Correction to prior reviewMy three prior comments about the YAML workflow file (missing Prior comment still applicable
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4605 +/- ##
==========================================
+ Coverage 86.09% 86.15% +0.06%
==========================================
Files 157 158 +1
Lines 108828 109323 +495
Branches 108828 109323 +495
==========================================
+ Hits 93694 94189 +495
+ Misses 12519 12518 -1
- Partials 2615 2616 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
tnull
left a comment
There was a problem hiding this comment.
Alright, seems maybe a bit excessive, but fine by me. No need to check this on every CI run though, let's just move it to a weekly cronjob?
tnull
left a comment
There was a problem hiding this comment.
Looks good, please squash.
|
Claude's reviews seem legit though |
1a01b5a added detection of unicode format characters in `PrintableString`, but used a hard-coded table which may eventually become out of date. Here we switch to an auto-generated table, include all `General_Category` `Other` characters, and also ban unallocated code points. Finally, CI validates that the file is kept up to date. Written by Claude
2de5954 to
65e8cc8
Compare
|
Squashed and fixed claude's comments: $ git diff-tree -U2 2de5954d9 65e8cc8d5
diff --git a/.github/workflows/check_unicode.yml b/.github/workflows/check_unicode.yml
index c22a4e3b29..a01add3f81 100644
--- a/.github/workflows/check_unicode.yml
+++ b/.github/workflows/check_unicode.yml
@@ -5,4 +5,5 @@ on:
- cron: '42 3 * * *'
+jobs:
check-unicode:
runs-on: ubuntu-latest
@@ -10,4 +11,6 @@ on:
issues: write
steps:
+ - name: Checkout source code
+ uses: actions/checkout@v4
- name: Check unicode file state
env:
@@ -21,3 +24,3 @@ on:
BODY="The unicode character listing is out of date, see $RUN_URL"
gh issue create --title "$TITLE" --body "$BODY"
- fi
+ fi |
| TITLE="Unicode listing out of date: ${{ github.workflow }}" | ||
| RUN_URL="https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}" | ||
| BODY="The unicode character listing is out of date, see $RUN_URL" | ||
| gh issue create --title "$TITLE" --body "$BODY" |
There was a problem hiding this comment.
Bug: No deduplication — this creates a new issue every day the file is out of date. Unicode updates are infrequent (roughly annual), but if nobody regenerates the file promptly, this will spam the issue tracker with identical issues.
build.yml already has the right pattern (lines 380-391): check for an existing open issue with a label, and comment on it instead of creating a duplicate. This workflow should do the same.
Minimal fix sketch:
LABEL="unicode outdated"
gh label create "$LABEL" 2>/dev/null || true
EXISTING=$(gh issue list --label "$LABEL" --state open --json number --jq '.[0].number // empty')
if [ -n "$EXISTING" ]; then
gh issue comment "$EXISTING" --body "$BODY"
else
gh issue create --title "$TITLE" --label "$LABEL" --body "$BODY"
fiThere was a problem hiding this comment.
Eh, that takes effort, this is fine we expect to respond quickly.
|
Backported to 0.1 in #4680. |
|
Backported to 0.2 in #4683. |
v0.1.10 - Jun 18, 2026 - "Loupe de Loupe" API Updates =========== * `DefaultMessageRouter` will now always generate blinded message paths that provide no privacy (where our node is the introduction node) for nodes with public channels. This works around an issue which will appear for any nodes with LND peers that enable onion messaging - such peers will refuse to forward BOLT 12 messages from unknown third parties, which most BOLT 12 payers rely on today (#4647). * Explicit `amount_msats` of 0 is rejected in BOLT 12 `Offer`s; `OfferBuilder` now maps 0-amounts to an amount of `None` (#4324). Bug Fixes ========= * Async `ChannelMonitorUpdate` persistence operations which complete, but are not marked as complete in a persisted `ChannelManager` prior to restart, followed immediately by a block connection and then another restart could result in some channel operations hanging leading for force-closures (#4377). * If an MPP payment is claimed but `ChannelMonitorUpdate`s for some parts are still being completed asynchronously, further channel updates (e.g. forwarding another payment) are pending and the node restarts, the channel could have become stuck (#4520). * The presence of unconfirmed transactions actually no longer causes `ElectrumSyncClient` to spuriously fail to sync (#4590). * `FilesystemStore::list_all_keys` will no longer fail if there are stale intermediate files lying around from a previous unclean shutdown (#4618). * When forwarding an HTLC while in a blinded path with proportional fees over 200%, LDK will no longer spuriously allow a forward that pays us 1 msat too little in fees (#4697). * Fixed a rare case where a channel could get stuck on reconnect when using both async `ChannelMonitorUpdate` persistence and async signing (#4684). * `Event::PaymentSent::fee_paid_msat` is no longer `None` in cases where `ChannelManager::abandon_payment` was called before the payment ultimately completes anyway (#4651). * Syncing a `ChainMonitor` using the `Confirm` trait will no longer write some full `ChannelMonitor`s to disk several times per block (#4544). * `OMDomainResolver` now correctly accounts for failed queries when rate limiting, ensuring we continue to respond to queries after failures (#4591). * Calling `ChannelManager::send_payment_with_route` without a `route_params` and with an invalid `Route` will no longer panic (#4707). * `lightning-custom-message`'s handling of `peer_connected` events now ensures that sub-handlers will see a `peer_disconnected` event if a different sub-handler refused the connection by `Err`ing `peer_connected` (#4595). * Incomplete MPP keysend payments will no longer see their HTLCs held until expiry (#4558). * `InvoiceRequestBuilder` will no longer accept a `quantity` of `0` for a BOLT 12 `Offer`, allowing any quantity up to a bound (#4667). * `lightning-custom-message` handlers that return `Ok(None)` when asked to deserialize a message in their defined range no longer cause panics (#4709). * Several spurious debug assertions were fixed (#4537, #4618). Security ======== 0.1.10 fixes a sanitization issue and several denial-of-service vulnerabilities. * `Bolt11Invoice::recover_payee_pub_key` no longer panics if called on an invoice which set an explicit public key, rather than relying on public key recovery. This method is called from `payment_parameters_from_invoice` and `payment_parameters_from_variable_amount_invoice` (#4717). * Maliciously-crafted unpayable invoices which have overflowing feerates will no longer cause an `unwrap` failure panic (#4716). * `possiblyrandom` did not properly generate random data except when it was explicitly configured to. By default this means LDK is vulnerable to various HashDoS attacks (#4719). * `OMNameResolver` will no longer panic when looking up payment instructions which include unicode characters at the start of a TXT record (#4718). * `PrintableString` did not properly sanitize unicode format characters, allowing an attacker to corrupt the rendering of logs or UI (#4593, #4605). * RGS data is now limited in how large of a graph it is able to cause a client to store in memory. Note that RGS data is still considered a DoS vector in general and you should only use semi-trusted RGS data (#4713). * Counterparty-provided strings in failure messages are no longer logged in full, reducing the ability of such a counterparty to spam our logs (#4714). * Reading a corrupted `ChannelManager` or `ProbabilisticScorer` can no longer cause us to allocate large amounts of memory (#4712). Thanks to Project Loupe for reporting most of the issues fixed in this release.
1a01b5a added detection of unicode format characters in
PrintableString, but used a hard-coded table which may eventually become out of date.Here we switch to an auto-generated table, include all
General_CategoryOthercharacters, and also ban unallocated code points.Finally, CI validates that the file is kept up to date.
Written by Claude