App Engine flexible Python 3.6 to App Engine standard 3.7

I decided to move my cryptocurrency forecasting platform from python 3.6 App Engine flexible/docker environment to python 3.7 App engine standard.

The benefits of this will be a more scalable infrastructure and cost savings on hosting... (hopefully)

The setup is bleeding edge as the Google cloud sdk is still ran under python 2:

running with python2 where python3 is in the environment

pyenv local 3.7.0  
# run with old python2.7
/Users/lee.penkman/code/someplace/env/bin/python /Users/lee.penkman/Downloads/google-cloud-sdk/bin/dev_appserver.py --host 127.0.0.1 .

i deployed a separate version that took down the main site unexpectedly, it couldnt find gunicorn workers the way it could locally with an entrypoint: section in the app.yaml file and also was complaining about not being able to connect to the database. i quickly rolled back.
This was because i forgot the --no-promote flag which deploys a new version and doesn't promote it to handle any traffic
e.g.
(gcloud needs python 2.7)

pyenv shell 2.7.8

gcloud app deploy --project PROJECT_ID --version VERSION_ID --no-promote

--no-promote everybody!

====logs error the separate version didn't have any instances so i had to add the following to the app.yaml

instance_class: F4  
automatic_scaling:  
  max_pending_latency: 7.5s
  min_idle_instances: 1

The request failed because the instance could not start successfully

looked in stackdriver but doesnt yet support python37
Stackdriver Debugger is not set up for the python37 runtime on GAE

i tried the new beta deploy command and --verbosity=info
gcloud beta app deploy . --project=bitbank-nz --version=v2 --no-promote --verbosity=info

but it didnt really say much more besides what was being uploaded to google cloud storage which is the only part of the deployment that was working.

I noticed the version of flask is older than running in the examples Updating all the requirements and redeploying gave a better error message in the logs

"Exceeded soft memory limit of 128 MB with 163 MB after servicing 0 requests total. Consider setting a larger instance class in app.yaml"

bumping the memory gives:

"Exceeded soft memory limit of 256 MB with 325 MB after servicing 0 requests total. Consider setting a larger instance class in app.yaml."

Shouldnt really need that much but it can get a lot of data at times... I bumped it up to 1G for now...

I deployed a version without gunicorn in front of it which gave some more insights into what was going wrong.

"Read only file system error"

Traceback (most recent call last): File "/env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 578, in spawn_worker worker.init_process() File "/env/lib/python3.7/site-packages/gunicorn/workers/base.py", line 126, in init_process self.load_wsgi() File "/env/lib/python3.7/site-packages/gunicorn/workers/base.py", line 135, in load_wsgi self.wsgi = self.app.wsgi() File "/env/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() File "/env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 65, in load return self.load_wsgiapp() File "/env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 52, in load_wsgiapp return util.import_app(self.app_uri) File "/env/lib/python3.7/site-packages/gunicorn/util.py", line 352, in import_app __import__(module) File "/srv/webapp/app.py", line 5, in <module> from models.featureset import AlgorithmConfig File "/srv/models/featureset.py", line 6, in <module> import config File "/srv/config.py", line 8, in <module> 'collector.log', maxBytes=(1048576 * 5), backupCount=7 File "/opt/python3.7/lib/python3.7/logging/handlers.py", line 147, in __init__ BaseRotatingHandler.__init__(self, filename, mode, encoding, delay) File "/opt/python3.7/lib/python3.7/logging/handlers.py", line 54, in __init__ logging.FileHandler.__init__(self, filename, mode, encoding, delay) File "/opt/python3.7/lib/python3.7/logging/__init__.py", line 1041, in __init__ StreamHandler.__init__(self, self._open()) File "/opt/python3.7/lib/python3.7/logging/__init__.py", line 1070, in _open return open(self.baseFilename, self.mode, encoding=self.encoding) OSError: [Errno 30] Read-only file system: '/srv/collector.log'  

Bitbank shares python 3 code with the backend forecaster process and setup/logging was shared too, didn't need the file based logging of backend processes while in app engine mode and skipped that which fixed this error.

Strange deploy problem i ran into about uploading to google cloud storage:

File upload done.  
Updating service [default]...failed.  
ERROR: (gcloud.beta.app.deploy) Error Response: [3] The following errors occurred while copying files to App Engine:  
File https://storage.googleapis.com/staging.bitbank-nz.appspot.com/651b41e9b8d40d3c9c9d74a2b1e2b14b562b1c9d failed with: Conflicting SHA1 sum for file. Expected "972d366c_d8a7fc7d_d09336ff_0e0b205f_2fb0e78d" but received "651b41e9_b8d40d3c_9c9d74a2_b1e2b14b_562b1c9d".

Details: [  
  [
    {
      "@type": "type.googleapis.com/google.rpc.ResourceInfo",
      "description": "Conflicting SHA1 sum for file. Expected \"972d366c_d8a7fc7d_d09336ff_0e0b205f_2fb0e78d\" but received \"651b41e9_b8d40d3c_9c9d74a2_b1e2b14b_562b1c9d\".",
      "resourceName": "https://storage.googleapis.com/staging.bitbank-nz.appspot.com/651b41e9b8d40d3c9c9d74a2b1e2b14b562b1c9d",
      "resourceType": "file"
    }
  ]
]

So i had to delete 651b41e9b8d40d3c9c9d74a2b1e2b14b562b1c9d in google cloud storage... not sure why i came into this scenario, you'd think if hashes didn't match it should consider the new file new and go with that.

[CRITICAL] WORKER TIMEOUT (pid:106)

I found after many failed deploys that the production environment variables like SERVER_SOFTWARE where not making it through

i had to insert this hack in the main file which isnt great for now.

# hack to say we are in ga standard
os.environ['SERVER_SOFTWARE'] = 'Google App Engine/'  

As it wasnt performing well on memory i noticed it had been spinning up up to 6 pricey instances as my requests where backing up showing that my containers where not responding in time as app engine autoscales based on the queue length wait times.

instances

i noticed there where too many gunicorn workers so i added
--workers=4

Python memory usage drop The memory usage immediately dropped to around 300mb :)

On a side note you do have to be careful with gunicorn/web server workers in Python as lots of the time operations don't correctly do things like green threading and cooperative multitasking so for example with 1 worker it can get tied up waiting for the file system or the database to respond and not be able to handle other requests, Ideally the app would use something like greenthreads/eventlet/gevent and database pooling to cut down on connections to the postgres database.

Overall the migration was not without pain but now the app is running on 1/8 of the hardware as on the App Engine standard environment, it has a free tier and we will see how it goes on memory, hopefully its more scalable this way but unfortunately app engine doesn't autoscale you for memory only for number of instances so you can still drop off a performance cliff if you have a memory leak, your process will be restarted a lot.

From Google App Engine standard it would be great to see some way of more closely replicating your production environment like we have with the Docker based flexible environment, it should sandbox the app to memory thats defined in your app.yaml, not support writing to the filesystem and support autoscaling or monitoring with similar tools to whats available in app engine prod, Will be great when they auto scale on memory.

Thanks,
Lee Penkman.
Founder BitBank.nz.
Keep up to date with us on Facebook, Twitter @bitbanknz
Follow me on twitter at @LeeLeePenkman