Tuesday, August 28, 2018

Time to talk about google app engine, flex , standard , php, and google cloud storage.

I've been doing some experimenting using GAE to make a service like cloudinary.  It's pretty simple to do, using imagick, which is installed in GAE.  But here are some simple discoveries;

1)  GAE flex, sounds like GAE standard, but it's not.  if using php55 it's quite different.  with php72 now launched in standard, it's closer to flex, you need a front_controller, and your app.yaml is simpler
2) GAE standard scales really quickly, like to 10k instances in 1 to 2 mins.  Flex takes a LOT longer, think 30 mins
3) GAE can use imagick, imagick is not fast, and cpu intensive , fetching an image from a GCS bucket and rendering it cropped or resized is a 1-2 second operation.

you could use appengines image api, but for php it's limited to getservingurl functions. unless you want to write protobuf interfaces .

In python, it's way more useful, you can do all the usual transforms, and it's not cpu intensive on your appengine instance, meaning you don't scale to 10k instances.  However it's not as fast as GCE , as below

So instead, I used GCE because

1) you can use libvips instead of imagick, 400ms for image resizing and serving, compared to 1-2 seconds with GAE, or 800ms with GAE and image api
2) you can use preemptible instances 

The savings are huge, for my estimates, $400k + per year between GAE and GCE

two other findings, about php and the google wrappers;  they are cool, you can do things like

if (file_exists("gs://bucket/foo/bar"))

which is cool, right ? , not really, when you dig in.  under the hood, that does two bucket calls.  because it tests if it's writable, even if your operation doesn't care about writes.  It's more efficient to use curl, and test the contents returned etc, and then use the contents later in you code.  This becomes more apparent (it did for me) when using a GLB, with nodes around the world. users hitting nodes in Australia hitting a multi-region bucket in the US, were a low slower, than the same users hitting a node in the US.  This is due to the multiple transcontinental bucket calls.  using curl and a single call eliminated the performance differences

figuring out s3 bucket size