Understand Flask (一)

Aug 10, 2019

Previous answers already give a nice overview of what goes on in the background of Flask during a request. If you haven't read it yet I recommend @MarkHildreth's answer prior to reading this. In short, a new context (thread) is created for each http request, which is why it's necessary to have a thread Local facility that allows objects such as request and g to be accessible globally across threads, while maintaining their request specific context. Furthermore, while processing an http request Flask can emulate additional requests from within, hence the necessity to store their respective context on a stack. Also, Flask allows multiple wsgi applications to run along each other within a single process, and more than one can be called to action during a request (each request creates a new application context), hence the need for a context stack for applications. That's a summary of what was covered in previous answers.

My goal now is to complement our current understanding by explaining how Flask and Werkzeug do what they do with these context locals. I simplified the code to enhance the understanding of its logic, but if you get this, you should be able to easily grasp most of what's in the actual source (werkzeug.local and flask.globals).

Let's first understand how Werkzeug implements thread Locals.

Local

When an http request comes in, it is processed within the context of a single thread. As an alternative mean to spawn a new context during an http request, Werkzeug also allows the use of greenlets (a sort of lighter "micro-threads") instead of normal threads. If you don't have greenlets installed it will revert to using threads instead. Each of these threads (or greenlets) are identifiable by a unique id, which you can retrieve with the module's get_ident() function. That function is the starting point to the magic behind having request, current_app,url_for, g, and other such context-bound global objects.

try:
    from greenlet import get_ident
except ImportError:
    from thread import get_ident

Now that we have our identity function we can know which thread we're on at any given time and we can create what's called a thread Local, a contextual object that can be accessed globally, but when you access its attributes they resolve to their value for that specific thread. e.g.

# globally
local = Local()

# ...

# on thread 1
local.first_name = 'John'

# ...

# on thread 2
local.first_name = 'Debbie'

Both values are present on the globally accessible Local object at the same time, but accessing local.first_name within the context of thread 1 will give you 'John', whereas it will return 'Debbie' on thread 2.

How is that possible? Let's look at some (simplified) code:

class Local(object)
    def __init__(self):
        self.storage = {}

    def __getattr__(self, name):
        context_id = get_ident() # we get the current thread's or greenlet's id
        contextual_storage = self.storage.setdefault(context_id, {})
        try:
            return contextual_storage[name]
        except KeyError:
            raise AttributeError(name)

    def __setattr__(self, name, value):
        context_id = get_ident()
        contextual_storage = self.storage.setdefault(context_id, {})
        contextual_storage[name] = value

    def __release_local__(self):
        context_id = get_ident()
        self.storage.pop(context_id, None)

local = Local()

From the code above we can see that the magic boils down to get_ident() which identifies the current greenlet or thread. The Local storage then just uses that as a key to store any data contextual to the current thread.

You can have multiple Local objects per process and request, g, current_app and others could simply have been created like that. But that's not how it's done in Flask in which these are not technically Local objects, but more accurately LocalProxy objects. What's a LocalProxy?

LocalProxy

A LocalProxy is an object that queries a Local to find another object of interest (i.e. the object it proxies to). Let's take a look to understand:

class LocalProxy(object):
    def __init__(self, local, name):
        # `local` here is either an actual `Local` object, that can be used
        # to find the object of interest, here identified by `name`, or it's
        # a callable that can resolve to that proxied object
        self.local = local
        # `name` is an identifier that will be passed to the local to find the
        # object of interest.
        self.name = name

    def _get_current_object(self):
        # if `self.local` is truly a `Local` it means that it implements
        # the `__release_local__()` method which, as its name implies, is
        # normally used to release the local. We simply look for it here
        # to identify which is actually a Local and which is rather just
        # a callable:
        if hasattr(self.local, '__release_local__'):
            try:
                return getattr(self.local, self.name)
            except AttributeError:
                raise RuntimeError('no object bound to %s' % self.name)

        # if self.local is not actually a Local it must be a callable that 
        # would resolve to the object of interest.
        return self.local(self.name)

    # Now for the LocalProxy to perform its intended duties i.e. proxying 
    # to an underlying object located somewhere in a Local, we turn all magic
    # methods into proxies for the same methods in the object of interest.
    @property
    def __dict__(self):
        try:
            return self._get_current_object().__dict__
        except RuntimeError:
            raise AttributeError('__dict__')

    def __repr__(self):
        try:
            return repr(self._get_current_object())
        except RuntimeError:
            return '<%s unbound>' % self.__class__.__name__

    def __bool__(self):
        try:
            return bool(self._get_current_object())
        except RuntimeError:
            return False

    # ... etc etc ... 

    def __getattr__(self, name):
        if name == '__members__':
            return dir(self._get_current_object())
        return getattr(self._get_current_object(), name)

    def __setitem__(self, key, value):
        self._get_current_object()[key] = value

    def __delitem__(self, key):
        del self._get_current_object()[key]

    # ... and so on ...

    __setattr__ = lambda x, n, v: setattr(x._get_current_object(), n, v)
    __delattr__ = lambda x, n: delattr(x._get_current_object(), n)
    __str__ = lambda x: str(x._get_current_object())
    __lt__ = lambda x, o: x._get_current_object() < o
    __le__ = lambda x, o: x._get_current_object() <= o
    __eq__ = lambda x, o: x._get_current_object() == o

    # ... and so forth ...

Now to create globally accessible proxies you would do

# this would happen some time near application start-up
local = Local()
request = LocalProxy(local, 'request')
g = LocalProxy(local, 'g')

and now some time early over the course of a request you would store some objects inside the local that the previously created proxies can access, no matter which thread we're on

# this would happen early during processing of an http request
local.request = RequestContext(http_environment)
local.g = SomeGeneralPurposeContainer()

The advantage of using LocalProxy as globally accessible objects rather than making them Locals themselves is that it simplifies their management. You only just need a single Local object to create many globally accessible proxies. At the end of the request, during cleanup, you simply release the one Local (i.e. you pop the context_id from its storage) and don't bother with the proxies, they're still globally accessible and still defer to the one Local to find their object of interest for subsequent http requests.

# this would happen some time near the end of request processing
release(local) # aka local.__release_local__()

To simplify the creation of a LocalProxy when we already have a Local, Werkzeug implements the Local.__call__() magic method as follows:

class Local(object):
    # ... 
    # ... all same stuff as before go here ...
    # ... 

    def __call__(self, name):
        return LocalProxy(self, name)

# now you can do
local = Local()
request = local('request')
g = local('g')

However, if you look in the Flask source (flask.globals) that's still not how request, g, current_app and session are created. As we've established, Flask can spawn multiple "fake" requests (from a single true http request) and in the process also push multiple application contexts. This isn't a common use-case, but it's a capability of the framework. Since these "concurrent" requests and apps are still limited to run with only one having the "focus" at any time, it makes sense to use a stack for their respective context. Whenever a new request is spawned or one of the applications is called, they push their context at the top of their respective stack. Flask uses LocalStack objects for this purpose. When they conclude their business they pop the context out of the stack.

LocalStack

This is what a LocalStack looks like (again the code is simplified to facilitate understanding of its logic).

class LocalStack(object):

    def __init__(self):
        self.local = Local()

    def push(self, obj):
        """Pushes a new item to the stack"""
        rv = getattr(self.local, 'stack', None)
        if rv is None:
            self.local.stack = rv = []
        rv.append(obj)
        return rv

    def pop(self):
        """Removes the topmost item from the stack, will return the
        old value or `None` if the stack was already empty.
        """
        stack = getattr(self.local, 'stack', None)
        if stack is None:
            return None
        elif len(stack) == 1:
            release_local(self.local) # this simply releases the local
            return stack[-1]
        else:
            return stack.pop()

    @property
    def top(self):
        """The topmost item on the stack.  If the stack is empty,
        `None` is returned.
        """
        try:
            return self.local.stack[-1]
        except (AttributeError, IndexError):
            return None

Note from the above that a LocalStack is a stack stored in a local, not a bunch of locals stored on a stack. This implies that although the stack is globally accessible it's a different stack in each thread.

Flask doesn't have its request, current_app, g, and session objects resolving directly to a LocalStack, it rather uses LocalProxy objects that wrap a lookup function (instead of a Local object) that will find the underlying object from the LocalStack:

_request_ctx_stack = LocalStack()
def _find_request():
    top = _request_ctx_stack.top
    if top is None:
        raise RuntimeError('working outside of request context')
    return top.request
request = LocalProxy(_find_request)

def _find_session():
    top = _request_ctx_stack.top
    if top is None:
        raise RuntimeError('working outside of request context')
    return top.session
session = LocalProxy(_find_session)

_app_ctx_stack = LocalStack()
def _find_g():
    top = _app_ctx_stack.top
    if top is None:
        raise RuntimeError('working outside of application context')
    return top.g
g = LocalProxy(_find_g)

def _find_app():
    top = _app_ctx_stack.top
    if top is None:
        raise RuntimeError('working outside of application context')
    return top.app
current_app = LocalProxy(_find_app)

All these are declared at application start-up, but do not actually resolve to anything until a request context or application context is pushed to their respective stack.

If you're curious to see how a context is actually inserted in the stack (and subsequently popped out), look in flask.app.Flask.wsgi_app() which is the point of entry of the wsgi app (i.e. what the web server calls and pass the http environment to when a request comes in), and follow the creation of the RequestContext object all through its subsequent push() into _request_ctx_stack. Once pushed at the top of the stack, it's accessible via _request_ctx_stack.top. Here's some abbreviated code to demonstrate the flow:

So you start an app and make it available to the WSGI server...

app = Flask(*config, **kwconfig)

# ...

Later an http request comes in and the WSGI server calls the app with the usual params...

app(environ, start_response) # aka app.__call__(environ, start_response)

This is roughly what happens in the app...

def Flask(object):

    # ...

    def __call__(self, environ, start_response):
        return self.wsgi_app(environ, start_response)

    def wsgi_app(self, environ, start_response):
        ctx = RequestContext(self, environ)
        ctx.push()
        try:
            # process the request here
            # raise error if any
            # return Response
        finally:
            ctx.pop()

    # ...

and this is roughly what happens with RequestContext...

class RequestContext(object):

    def __init__(self, app, environ, request=None):
        self.app = app
        if request is None:
            request = app.request_class(environ)
        self.request = request
        self.url_adapter = app.create_url_adapter(self.request)
        self.session = self.app.open_session(self.request)
        if self.session is None:
            self.session = self.app.make_null_session()
        self.flashes = None

    def push(self):
        _request_ctx_stack.push(self)

    def pop(self):
        _request_ctx_stack.pop()

Say a request has finished initializing, the lookup for request.path from one of your view functions would therefore go as follow:

start from the globally accessible LocalProxy object request.
to find its underlying object of interest (the object it's proxying to) it calls its lookup function _find_request() (the function it registered as its self.local).
that function queries the LocalStack object _request_ctx_stack for the top context on the stack.
to find the top context, the LocalStack object first queries its inner Local attribute (self.local) for the stack property that was previously stored there.
from the stack it gets the top context
and top.request is thus resolved as the underlying object of interest.
from that object we get the path attribute

So we've seen how Local, LocalProxy, and LocalStack work, now think for a moment of the implications and nuances in retrieving the path from:

a request object that would be a simple globally accessible object.
a request object that would be a local.
a request object stored as an attribute of a local.
a request object that is a proxy to an object stored in a local.
a request object stored on a stack, that is in turn stored in a local.
a request object that is a proxy to an object on a stack stored in a local. <- this is what Flask does.