Skip to content

Setting Up DRUF on CKAN 2.11.3 datapusher 3.1.0a #331

Description

@a5dur

Setting Up DRUF on CKAN 2.11.3 — Issues & Workarounds

Notes from setting up the DRUF (Dataset Resource Upload First) workflow with DataPusher+ on a CKAN 2.11.3 instance.
Posted here so other developers hit the same wall less hard.


Environment

Component Version
CKAN 2.11.3 (stable release)
DataPusher+ latest (main / 0e46e10)
Python 3.10
Prefect 3.x (as required by DP+)
ckanext-scheming latest

Issue 1: Prefect Worker Fails to Start — BooleanOptionalAction ImportError

What happened

After installing DataPusher+ and its requirements, attempting to start the Prefect worker immediately crashed with:

ImportError: cannot import name 'BooleanOptionalAction' from 'argparse'

Full traceback:

Traceback (most recent call last):
  File ".../pydantic_settings/env_settings.py", line ..., in ...
    from argparse import BooleanOptionalAction
ImportError: cannot import name 'BooleanOptionalAction' from 'argparse'
  (/path/to/venv/lib/python3.10/site-packages/argparse.py)

Why it happens

pydantic-settings (a Prefect dependency) imports BooleanOptionalAction, which was added to the Python standard library in Python 3.9. On Python 3.10 this should be available without issue.

The problem is that some older pip packages distribute a standalone argparse module and install argparse.py directly into site-packages. Because site-packages is earlier in Python's module search path than the standard library, this legacy file shadows the stdlib argparse, causing the import to fail even on Python 3.10.

You can confirm this is your issue:

python3 -c "import argparse; print(argparse.__file__)"
# If it prints a site-packages path instead of a stdlib path, you have the problem.
# Normal (correct) output looks like: /usr/lib/python3.10/argparse.py
# Problem output looks like: /path/to/venv/lib/python3.10/site-packages/argparse.py

Fix

Remove (or rename) the shadowing argparse.py from site-packages:

# Find it
find /path/to/your/venv -name "argparse.py" -path "*/site-packages/*"

# Remove it (or rename to .bak if you want to be cautious)
rm /path/to/your/venv/lib/python3.10/site-packages/argparse.py

After removing, verify the stdlib version is now loaded:

python3 -c "import argparse; print(argparse.__file__)"
# Should now print something like: /usr/lib/python3.10/argparse.py
python3 -c "from argparse import BooleanOptionalAction; print('OK')"
# Should print: OK

Preventive fix for Docker / CI builds

If you are building in Docker, add a cleanup step after all pip install commands to remove any stray argparse.py that may have been deposited by a transitive dependency:

# Defensive cleanup: remove any legacy argparse.py that shadows the stdlib.
# pydantic-settings (Prefect dep) needs BooleanOptionalAction (stdlib 3.9+).
RUN find /usr/local/lib /usr/lib -name "argparse.py" -path "*/site-packages/*" -delete || true

The || true prevents the build from failing if no such file exists, which is the normal case.


Issue 2: DRUF Does Not Redirect to Metadata Form After Resource Upload

What happened

After getting the Prefect worker running and enabling DRUF in ckan.ini:

ckanext.datapusher_plus.enable_druf = true
ckanext.datapusher_plus.enable_form_redirect = true

The workflow partially works — clicking "Add Dataset" correctly skips the metadata form and goes straight to the resource upload page. But after uploading the file and clicking "Add", the user lands on the dataset read page (/dataset/{name}) instead of the metadata edit form (/dataset/{id}/edit).

CKAN request log:

302 POST /dataset/{id}/resource/new
302 GET  /dataset/{id}
200 GET  /dataset/{dataset-name}

Expected:

302 POST /dataset/{id}/resource/new
302 GET  /dataset/{id}/edit
200 GET  /dataset/{id}/edit

The dataset ends up published with only the auto-generated placeholder name (e.g. dataset97689) and no real metadata filled in.

Root cause: IFormRedirect does not exist in CKAN 2.11.3

DataPusher+ implements CKAN's IFormRedirect plugin interface to intercept the post-save redirect and point it at the metadata edit form. In plugin.py:

try:
    p.implements(p.IFormRedirect)
except (ImportError, AttributeError):
    # IFormRedirect not available in this CKAN version
    pass

The resource_save_redirect method (the one that would return the dataset.edit URL) is only ever called if IFormRedirect is registered — which requires the interface to exist in the CKAN version being used.

IFormRedirect does not exist in any stable CKAN release as of 2.11.3. It was introduced on the unreleased branch 7778-iformredirect. The DataPusher+ README links to this branch under the "Enabling DRUF" section:

"To enable DRUF you need DRUF compatible ckan version"https://github.com/ckan/ckan/tree/7778-iformredirect

Because the try/except silently swallows the AttributeError, CKAN starts without error and no warning is logged. The redirect hook simply never fires. CKAN's default post-resource-save redirect logic takes over — which sends the user to the read page.

How CKAN's redirect logic works (relevant context)

In ckan/views/resource.py, after a resource is saved, the redirect destination is determined by the value of a save hidden field posted with the form:

save field value What CKAN does
go-metadata Activates the package (state=active), redirects to dataset read page
go-dataset Redirects to dataset edit page (the metadata form)
go-dataset-complete Redirects to dataset read page
again Redirects to resource new page (add another resource)

CKAN's standard "Add" button in package/snippets/resource_form.html uses value="go-dataset-complete" — which goes to the read page. That is what fires when DRUF's IFormRedirect hook does not exist to intercept it.

Additionally, the DRUF-created package has state='active' (set in druf_view.py). Because the package is not in draft state, CKAN renders the new_resource_not_draft.html template rather than the wizard-stage version — so there is no "Finish" button, only "Add" (go-dataset-complete).

Workaround for CKAN 2.11.3

Since IFormRedirect is unavailable, the workaround is to:

  1. Have druf_view.py pass a ?druf=1 query parameter on its redirect to the resource upload page.
  2. Override package/snippets/resource_form.html in your CKAN extension to detect ?druf=1 and swap the "Add" button (go-dataset-complete) for a "Save & Fill Metadata" button (go-dataset) — which CKAN already routes to dataset.edit.

Step 1 — druf_view.py patch:

# In ckanext/datapusher_plus/druf_view.py, change line 46 from:
return h.redirect_to('dataset_resource.new', id=pkg['id'])

# to:
return h.redirect_to('dataset_resource.new', id=pkg['id'], druf='1')

In Flask/CKAN, url_for kwargs that are not part of the URL rule are appended as query string parameters. This produces /dataset/{id}/resource/new?druf=1.

Step 2 — Template override (in your CKAN extension):

Create templates/package/snippets/resource_form.html inside your CKAN extension:

{% ckan_extends %}

{% block add_button %}
  {% if request.args.get('druf') %}
    <button class="btn btn-primary" name="save" value="go-dataset" type="submit">{{ _('Save & Fill Metadata') }}</button>
  {% else %}
    {{ super() }}
  {% endif %}
{% endblock %}

{% ckan_extends %} resolves to CKAN's own resource_form.html snippet. request.args is the Flask query string dict and is available in all CKAN Jinja2 templates.

How the full flow works after the workaround:

  1. User clicks "Add Dataset" → DRUF view (POST /resource-first/new) creates placeholder package
  2. druf_view.py redirects to GET /dataset/{id}/resource/new?druf=1
  3. CKAN renders new_resource_not_draft.html → loads resource_form_snippet
  4. Your extension's resource_form.html override is served; request.args.get('druf') is '1'
  5. "Save & Fill Metadata" button (value="go-dataset") rendered instead of "Add"
  6. User uploads file and clicks "Save & Fill Metadata"
  7. Form POSTs to /dataset/{id}/resource/new with save=go-dataset
  8. CKAN saves the resource, hits elif save_action == 'go-dataset': → redirects to GET /dataset/{id}/edit
  9. User lands on the metadata edit form

Note: ?druf=1 does not need to be in the form's action URL. The save field value in the POST body is what drives the redirect, not the URL. The query parameter is only needed during the GET (template render) to know which button to show.

Proper fix (awaiting CKAN merge)

The proper solution is for IFormRedirect to be merged into a stable CKAN release. Track the PR here: https://github.com/ckan/ckan/tree/7778-iformredirect. Once that branch lands in a stable CKAN release, the try/except in plugin.py will succeed, resource_save_redirect will be called, and the workaround above will no longer be needed (though it is harmless to leave in place — on CKAN versions that do have IFormRedirect, the ?druf=1 query parameter will still be present and request.args.get('druf') will still return '1', but the redirect will already have been handled correctly by IFormRedirect before the template is even rendered for a second time).


Summary

# Issue Root Cause Workaround
1 Prefect worker crashes — BooleanOptionalAction ImportError Legacy argparse.py in site-packages shadows stdlib Remove site-packages/argparse.py; add defensive find ... -delete to Docker build
2 DRUF sends user to read page instead of metadata form IFormRedirect not in CKAN 2.11.3 (only on unreleased branch 7778-iformredirect) Pass ?druf=1 in druf_view.py redirect; override resource_form.html to swap button to go-dataset

Both workarounds are non-destructive and compatible with future CKAN versions that do implement IFormRedirect.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions