I've been doing a lot with IBM Cloud Functions recently. So far I've always been passing up simple query string values or json-encoded data. But what if you want to upload an actual binary file? The usual way, in the web world, is to use the multipart/form-data
content type.
But how does Apache Openwhisk (that IBM Cloud Functions are based on) handle this?
If you set the function to raw
type, rather than getting the JSON-decoded values passed to your function as a dictionary, you get a dictionary that gives you the raw data. The data itself if base64 encoded in __ow_body
and the headers in __ow_headers
.
There are several ways to process this in Python. The requests_toolbelt
package has a MultipartDecoder
class. But I'm going to use one from the cgi
library that is included in Python itself:
from cgi import parse_multipart, parse_header
from io import BytesIO
from base64 import b64decode
def main(args):
c_type, p_dict = parse_header(args['__ow_headers']['content-type'])
decoded_string = b64decode(args['__ow_body'])
p_dict['boundary'] = bytes(p_dict['boundary'], "utf-8")
p_dict['CONTENT-LENGTH'] = len(decoded_string)
form_data = parse_multipart(BytesIO(decoded_string), p_dict)
# Do something with the data. In this simple example
# we will just return a dict with the part name and
# length of the content of that part
ret = {}
for key, value in form_data.items():
ret[key] = len(value[0])
return ret
The parse_multipart
method expects to find a header called CONTENT-LENGTH
so we need to add that in to our headers before passing it to the parser.
To create this cloud function, we need to set --web raw
flag to tell Openwhisk not to try and parse the payload as JSON for us:
% ibmcloud fn action create upload upload.py --web raw
We can then get the URL for it:
% ic fn action get upload --url
ok: got action upload
https://eu-gb.functions.appdomain.cloud/api/v1/web/1d0ffa5a-835d-4c40-ac80-77ca4a35f028/upload
And we can then call it using cURL and upload some data to it. In this case we are passing both a simple string value (foo
) and the contents of a binary WAV audio file:
% curl -F id=foo -F audio=@test.wav https://eu-gb.functions.appdomain.cloud/api/v1/web/1d0ffa5a-835d-4c40-ac80-77ca4a35f028/upload.json
{
"audio": 3840102,
"id": 3
}
It seems that IBM Cloud functions has a limit of 5MB for the payload. I guess anything larger than this and you'd be better to upload it to IBM Cloud Object Storage (COS) then pass the URL of that uploaded item to the cloud function.