You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tom Kaitchuck edited this page Aug 3, 2015
·
15 revisions
The Python MapReduce library can be used for complete map-shuffle-reduce pipelines only. It does not have the ability to run a map-only job.
Features and capabilities
The App Engine adaptation of Google's MapReduce model is optimized for the needs of the App Engine environment, where resource quota management is a key consideration. This release of the MapReduce API provides the following features and capabilities:
Automatic sharding for faster execution, allowing you to use as many workers as you need to get your results faster
Standard data input readers for iterating over blob and datastore data.
Standard output writers
Status pages to let you see how your jobs are running
Processing rate limiting to slow down your mapper functions and space out the work, helping you avoid exceeding your resource quotas
To build the library and the demo application, run the provided build script as follows from within the python sub-directory of the source tree:
check the box that says "Sign in as Administrator" and click login.
Then if you go to: http://localhost:8080 you can interact with the demo application to run 3 different MapReduce jobs over a zip of text files.
The python MapReduce code uses Pypi for dependency management. The build script runs the pip commands for you. If you prefer, you may run pip directly using the following command: