Support hourly data by SarahAlidoost · Pull Request #54 · ESMValGroup/ClimaNet

SarahAlidoost · 2026-06-12T15:12:41Z

closes #50

This PR:

adds support for hourly data
refactor code to improve efficiency
adds instructions for dkrz jupyter hub

Issues found:

dataset should support patching in time to add months to batch dimensions, see Add support for patching in time in dataset #62
the code should be checked for gpu device and dtype, see Add support GPU in the model #30 ---> draft PR Fix gpu #63

SarahAlidoost · 2026-06-26T15:28:00Z

@meiertgrootes and @rogerkuou I implemented the support for hourly data and ran the notebook on dkrz jupyter notebook.

When using hourly data, the input shape grows quickly e.g. one month: (31×24, 160, 400) = (T, H, W). For longer periods, this becomes too large for memory, even infeasible. Therefore, we cannot train on continuous multi-month sequences. Instead, we should group each month in the batch dimension, e.g. (2, 31×24, 160, 400) = (B, T, H, W) for two months, see issue Add support for patching in time in dataset #62 .
I refactored parts of the code to reduce CPU data overhead. Training is now faster, but performance is still limited by CPU constraints. We should moving to GPU, but the code must first be debugged for GPU usage, see issue Add support GPU in the model #30 .
We could run the "validation" in parallel with the training loop to improve performance, but it is currently not a bottleneck. The inference (forward call) is fast, but the loss.backward in train mode is the challenge.
In the example notebook, training was only run for 50 epochs, so the results might not be valid. They should not be used to evaluate the performance of the model in SST prediction.

meiertgrootes

Very nice implementation, including the elegant support for both CPU ad GPU. Some minor comments, but non-blocking for the merge.

In particular, with the envisaged move of months to batch dimension, it would be good to consider dropping the month positional encoding.

Another low priority point (maybe before final relase) is the use of daily_* for the input values and monthly_* for target. the monthly is fine, but the daily can also be hourly. Maybe we can consider renaming somehow. Clearly not crucial

meiertgrootes · 2026-07-02T12:49:29Z

+                daily_da, monthly_da, time_dim=time_dim
+            )

        # Convert to numpy once — all __getitem__ calls use these


We should probably update tis comment as the conversion format is now a torch tensor rather than numpy

meiertgrootes · 2026-07-03T09:04:27Z

        # Precompute the NaN mask before filling NaNs
        # daily_mask: True where NaN (i.e. missing ocean data, not land)
-        self.daily_nan_mask = np.isnan(self.daily_np)  # (M, T=31, H, W)
+        self.daily_nan_mask = torch.isnan(self.daily_t)  # (M, T=31, H, W)


adding the _t suffix to tensors for clarity as you've done above is a really good idea. Should we adopt that consistently, i.e. also for the daily_nan_mask? Or do you think that is not needed?

agreed, it is fixed.

meiertgrootes · 2026-07-03T09:31:04Z

@@ -27,16 +24,14 @@ class VideoEncoder(nn.Module):
    https://arxiv.org/abs/2203.12602
    """



You have chosen to get rid of dropout regularization throughout? Could you expand on why in the PR?
This will also propagate to the projection NN used for spatial embeddings. likely not a significant issue.

Good point. To fix the performance and investigate the bottlenecks, I used the PyTorch profiler with record_function. One thing that stood out was the large number of aten::dropout and aten::bernoulli calls, under the class VideoEncoder. I originally added dropout to help with overfitting. But we are now using torch.optim.AdamW and a validation dataset during training. I tested both keeping and removing the dropout in this class. Removing it improved the runtime without affecting the results, so I decided to remove it here. I left the dropout in some other classes because they didn't show up as bottlenecks in the profiler results.

meiertgrootes · 2026-07-03T09:45:02Z

-            max_days: Maximum length of the temporal dimension to precompute
-            encodings for. Default is 31, which is sufficient for a month of
-            daily data.
            max_months: Maximum number of months (temporal patches) to precompute


We have dropped the day encodings in favor of doy and hod sensitive cyclical encodings. We have kept the explicit month encoding, which will provide varying encodings depending on the length of the sequence in months used and/or will potentially provide the same encodings for different months if a different startig point in the sequnce of months is used.
Should we consider dropping this or/and adding a month-of-year feature to the cyclical encoding module?

Good point. As you suggested we can implement month-of-year. This issue is related to PR #64.

meiertgrootes · 2026-07-03T10:09:38Z

        )

-    def forward(self, x, M, T, H, W, time_features, padded_days_mask=None):
+        # Pre-compute and register as buffer — auto-moves with .to(device/dtype)


see comment above

meiertgrootes · 2026-07-03T10:11:21Z

-        seq = seq + temp_emb  # add temporal embeddings
-        seq = seq + pe_months[None, None, :, None, :]  # add month PE
+        temp_emb = self.time_embed(time_features)
+        pe_months = self.pe_months_cache[:M]


see above. In addition, if we move to folding months into the batch dimension, adding a month-of-year to the cyclic time embedding would preserve the information in a simple manner.

meiertgrootes · 2026-07-03T10:17:20Z

+source $HOME/.local/bin/env
+```
+
+2. Create a new conda environment and install ipykernel and climanet:


Technically this isn't a conda environment, but a virtual environment crreated (and managed) by uv.

The reason to be pedantic about this, is that otherwise people might try to use (exisiting) conda installs to modify the environment.

It is possible to use uv with conda, but that requires it's own specific setup (and is not recommended afaik)

SarahAlidoost added 30 commits June 10, 2026 09:03

add a util function for hourly data

e491bcf

fix minor docstrings

8066c9a

Merge branch 'main' into support_hourly

bd9c771

fix encoder_decoder

3874ba6

update nb

0dd47b0

remove unused argument from model

615a43a

remove setting device from the model class

2e1a57f

remove unused arg

ee0626e

refcator dataset

a9d519d

fix a bug

33befb3

refactor the model

87405cc

bring nb from main

977ff25

bring nb from main

74bd3a9

Merge branch 'main' into support_hourly

6b33b05

add doc about jupyter hub

84c0603

remove some of dropouts

146cb48

remove permute

5e8707e

Merge branch 'main' into support_hourly

d08c9d7

make batch tensor

85d403d

set batch device

21ce834

fix the title of the histogram

7452cd7

remove to dtype in forward

afc34ed

refactor model

34096e1

refactor temporal agg

0ccb9ef

refactor encoder

9cdc177

use checkpoint in forward of the model

b1d1428

improve predict and train

1e7f9bd

improve encoder

832634e

improve train

ee82421

add notebook for hourly data

d65eca2

SarahAlidoost marked this pull request as ready for review June 26, 2026 15:28

SarahAlidoost requested review from meiertgrootes and rogerkuou June 26, 2026 15:28

SarahAlidoost mentioned this pull request Jul 1, 2026

Fix gpu #63

Draft

SarahAlidoost added 3 commits July 1, 2026 09:48

fix dataset tests

207fdac

run ruff

31be9d1

add device to nbs

1081822

SarahAlidoost mentioned this pull request Jul 1, 2026

Data time subsetting #64

Draft

meiertgrootes approved these changes Jul 3, 2026

View reviewed changes

		@@ -27,16 +24,14 @@ class VideoEncoder(nn.Module):
		https://arxiv.org/abs/2203.12602
		"""

Uh oh!

Conversation

SarahAlidoost commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SarahAlidoost commented Jun 26, 2026

Uh oh!

meiertgrootes left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SarahAlidoost commented Jun 12, 2026 •

edited

Loading