Skip to content

Support hourly data#54

Open
SarahAlidoost wants to merge 33 commits into
mainfrom
support_hourly
Open

Support hourly data#54
SarahAlidoost wants to merge 33 commits into
mainfrom
support_hourly

Conversation

@SarahAlidoost

@SarahAlidoost SarahAlidoost commented Jun 12, 2026

Copy link
Copy Markdown
Member

closes #50

This PR:

  • adds support for hourly data
  • refactor code to improve efficiency
  • adds instructions for dkrz jupyter hub

Issues found:

@SarahAlidoost

Copy link
Copy Markdown
Member Author

@meiertgrootes and @rogerkuou I implemented the support for hourly data and ran the notebook on dkrz jupyter notebook.

  • When using hourly data, the input shape grows quickly e.g. one month: (31×24, 160, 400) = (T, H, W). For longer periods, this becomes too large for memory, even infeasible. Therefore, we cannot train on continuous multi-month sequences. Instead, we should group each month in the batch dimension, e.g. (2, 31×24, 160, 400) = (B, T, H, W) for two months, see issue Add support for patching in time in dataset #62 .
  • I refactored parts of the code to reduce CPU data overhead. Training is now faster, but performance is still limited by CPU constraints. We should moving to GPU, but the code must first be debugged for GPU usage, see issue Add support GPU in the model #30 .
  • We could run the "validation" in parallel with the training loop to improve performance, but it is currently not a bottleneck. The inference (forward call) is fast, but the loss.backward in train mode is the challenge.
  • In the example notebook, training was only run for 50 epochs, so the results might not be valid. They should not be used to evaluate the performance of the model in SST prediction.

@SarahAlidoost SarahAlidoost marked this pull request as ready for review June 26, 2026 15:28
@SarahAlidoost SarahAlidoost mentioned this pull request Jul 1, 2026

@meiertgrootes meiertgrootes left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice implementation, including the elegant support for both CPU ad GPU. Some minor comments, but non-blocking for the merge.

In particular, with the envisaged move of months to batch dimension, it would be good to consider dropping the month positional encoding.

Another low priority point (maybe before final relase) is the use of daily_* for the input values and monthly_* for target. the monthly is fine, but the daily can also be hourly. Maybe we can consider renaming somehow. Clearly not crucial

Comment thread climanet/dataset.py
daily_da, monthly_da, time_dim=time_dim
)

# Convert to numpy once — all __getitem__ calls use these

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably update tis comment as the conversion format is now a torch tensor rather than numpy

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Comment thread climanet/dataset.py
# Precompute the NaN mask before filling NaNs
# daily_mask: True where NaN (i.e. missing ocean data, not land)
self.daily_nan_mask = np.isnan(self.daily_np) # (M, T=31, H, W)
self.daily_nan_mask = torch.isnan(self.daily_t) # (M, T=31, H, W)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding the _t suffix to tensors for clarity as you've done above is a really good idea. Should we adopt that consistently, i.e. also for the daily_nan_mask? Or do you think that is not needed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, it is fixed.

Comment thread climanet/predict.py
@@ -27,16 +24,14 @@ class VideoEncoder(nn.Module):
https://arxiv.org/abs/2203.12602
"""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have chosen to get rid of dropout regularization throughout? Could you expand on why in the PR?
This will also propagate to the projection NN used for spatial embeddings. likely not a significant issue.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. To fix the performance and investigate the bottlenecks, I used the PyTorch profiler with record_function. One thing that stood out was the large number of aten::dropout and aten::bernoulli calls, under the class VideoEncoder. I originally added dropout to help with overfitting. But we are now using torch.optim.AdamW and a validation dataset during training. I tested both keeping and removing the dropout in this class. Removing it improved the runtime without affecting the results, so I decided to remove it here. I left the dropout in some other classes because they didn't show up as bottlenecks in the profiler results.

max_days: Maximum length of the temporal dimension to precompute
encodings for. Default is 31, which is sufficient for a month of
daily data.
max_months: Maximum number of months (temporal patches) to precompute

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have dropped the day encodings in favor of doy and hod sensitive cyclical encodings. We have kept the explicit month encoding, which will provide varying encodings depending on the length of the sequence in months used and/or will potentially provide the same encodings for different months if a different startig point in the sequnce of months is used.
Should we consider dropping this or/and adding a month-of-year feature to the cyclical encoding module?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. As you suggested we can implement month-of-year. This issue is related to PR #64.

Comment thread climanet/st_encoder_decoder.py
)

def forward(self, x, M, T, H, W, time_features, padded_days_mask=None):
# Pre-compute and register as buffer — auto-moves with .to(device/dtype)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above

seq = seq + temp_emb # add temporal embeddings
seq = seq + pe_months[None, None, :, None, :] # add month PE
temp_emb = self.time_embed(time_features)
pe_months = self.pe_months_cache[:M]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above. In addition, if we move to folding months into the batch dimension, adding a month-of-year to the cyclic time embedding would preserve the information in a simple manner.

Comment thread docs/levante_usage.md
source $HOME/.local/bin/env
```

2. Create a new conda environment and install ipykernel and climanet:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this isn't a conda environment, but a virtual environment crreated (and managed) by uv.

The reason to be pedantic about this, is that otherwise people might try to use (exisiting) conda installs to modify the environment.

It is possible to use uv with conda, but that requires it's own specific setup (and is not recommended afaik)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for hourly data

2 participants