Stable sort for school escorting#1085
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves determinism and cross-platform consistency in the school escorting model by (1) enforcing stable sorting where ordering affects assignment/processing and (2) standardizing key ID/counter columns to 64-bit integers to avoid Windows 32-bit int pitfalls.
Changes:
- Add
kind="stable"to keysort_valuescalls to make tie-breaking deterministic. - Replace platform-native
astype(int)casts withastype("int64")for tour/person-related integer columns. - Apply stable ordering before downstream computations (e.g., pure escort tour construction and escort bundle ordering).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| activitysim/abm/models/util/school_escort_tours_trips.py | Uses int64 for ID-like columns and applies stable sorting when ordering impacts pure escort tour processing. |
| activitysim/abm/models/school_escorting.py | Applies stable sorts in participant numbering and bundle ordering; switches chauffeur-related numeric fields to int64. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
@americalexander Is this PR motivated by an actual finding of non-reproducibility, or just having discovered the theoretical potential of such? |
|
@jpn-- this is an actual problem, we found this independently when looking into unexpected differences in EET scenarios, let's discuss this in today's engineering meeting. I think it would be good to have a test here that shows how this manifests and how the new code fixes it. |
|
As Jan mentioned, this was something we discovered in practice. CCing @dhensle |
janzill
left a comment
There was a problem hiding this comment.
I suggest we use explicit sorting keys. With stable sort we might see differences when using different number of processes or adding additional population (while keeping existing household and person ids stable).
|
|
||
| chaperones["chaperone_num"] = ( | ||
| chaperones.sort_values("chaperone_weight", ascending=False) | ||
| chaperones.sort_values("chaperone_weight", ascending=False, kind="stable") |
There was a problem hiding this comment.
| chaperones.sort_values("chaperone_weight", ascending=False, kind="stable") | |
| chaperones.sort_values(["chaperone_weight", "person_id"], ascending=[False, True]) |
As discussed in ActivitySim/meeting-notes#114, using person_id as explicit sort key makes this independent of the row order of chaperones.
| ) | ||
| escortees["escortee_num"] = ( | ||
| escortees.sort_values("age", ascending=True).groupby("household_id").cumcount() | ||
| escortees.sort_values("age", ascending=True, kind="stable") |
There was a problem hiding this comment.
| escortees.sort_values("age", ascending=True, kind="stable") | |
| escortees.sort_values(["age", "person_id"], ascending=[True, True]) |
Same as chaperones just above.
| school_time_cols = [ | ||
| "time_home_to_school" + str(i) for i in range(1, NUM_ESCORTEES + 1) | ||
| ] | ||
| bundles["outbound_order"] = list(bundles[school_time_cols].values.argsort() + 1) |
There was a problem hiding this comment.
Does this (lines 283-286) need ordering as well?
# Use a stable sort so that escortees with identical home-to-school times
# (e.g. siblings attending the same school) are always ordered by child
# number rather than by the non-deterministic tie-break of the default
# quicksort. This keeps child pickup/dropoff order reproducible.
bundles["outbound_order"] = list(
bundles[school_time_cols].values.argsort(kind="stable") + 1
)
bundles["inbound_order"] = list(
(-1 * bundles[school_time_cols]).values.argsort(kind="stable") + 1
) # inbound gets reverse order
|
|
||
| pe_tours = pe_tours.sort_values(by=["household_id", "person_id", "start"]) | ||
| pe_tours = pe_tours.sort_values( | ||
| by=["household_id", "person_id", "start"], kind="stable" |
There was a problem hiding this comment.
| by=["household_id", "person_id", "start"], kind="stable" | |
| by=["household_id", "person_id", "start", "bundle_id"] |
I suggest we use bundle_id to make this independent of row order in pe_tours.
| + escort_bundles.groupby("household_id").cumcount() | ||
| + 1 | ||
| ) | ||
| escort_bundles.sort_values( |
There was a problem hiding this comment.
This would make the sorting independent of dataframe order:
escort_bundles.sort_values(
by=["household_id", "school_escort_direction", "bundle_id"],
ascending=[True, False, True],
inplace=True,
)
|
I also had AI generate some tests for this, see if you think it is of value @americalexander. The following is the additional code that was added to test_school_escorting_utils.py: |
Address multiple locations where the school escorting model does not sort in a stable manner. Also use 64-bit integers for tour and person IDs.
Changes
Stable Sorting
Added
kind=""stable""tosort_valuescalls throughout the school escorting model to ensure deterministic, reproducible results regardless of the initial row ordering of input data. Without stable sorting, ties in the sort key can be broken arbitrarily, leading to non-reproducible outputs across runs or platforms.Affected locations:
determine_escorting_participants: stable sort of chaperones bychaperone_weight(descending) when assigningchaperone_num, and stable sort of escortees byage(ascending) when assigningescortee_numschool_escorting: stable sort of school escort tours byhousehold_idandschool_escort_directionbefore further processingcreate_pure_school_escort_tours: stable sort of pure escort tours byhousehold_id,person_id, andstarttime64-bit Integer Types
Replaced
.astype(int)with.astype(""int64"")for tour IDs, person IDs, and related integer columns. Using the platform-nativeinttype can produce 32-bit integers on Windows, which risks integer overflow for large ID values and inconsistent behavior across operating systems.Affected locations:
chauf_numandchauf_idincreate_school_escorting_bundles_tablejoin_attributestour_idincreate_chauf_escort_tripsnum_escorteesinprocess_tours_after_escorting_model