Building a unified data layer

by Ben on 18th April 2018

A look at how we helped a client bridge the gap between multiple data sources and provide a simple interface for developers to consume a single data set for users.

Objective

An organisation that we were working with wanted to provide better personalised experiences for their users, however the only data sources they had available were non-standard, undocumented and incomplete.

The long term solution would be to overhaul the data services within the organisation, however we took on the challenge to help them deliver a more tactical solution that would provide:

  • A useable solution within the required timescales
  • A proof of concept to feed into a larger data overhaul project

As is the case within many larger organisations there are often a number of systems that hold different bits of data about customers or users. These systems have often grown organically and tend to primarily serve the needs of the department in which they live.

As developers and designers, when we approach a solution from a user-centric perspective, the organisational structure, backend systems and processes often do not align with the way that the user wants to engage with the website or application. So we find ourselves having to try and bridge this gap in the code. This task can be confusing, time-consuming and often introduce additional business logic into the mix.

The aim was to create a unified data layer that would sit across the existing data sources, encapsulate any additional business logic and provide a clean and easy-to-use interface for web developers.

Our approach

At first we considered building the solution as an HTTP API that could be consumed by different technology stacks, however since the immediate work was focussed around a specific set of developers and technology stack we decided to create a PHP package as a first step. Afterall, the longer term goal was to help better inform a web service project across the organisation.

The available data sources included server variables, session cookies and proprietary JSON web services. The authentication methods varied from using a Guzzle CookieJar to pass on session cookies, a custom Apache auth module and IP filtering.

The organisation already had a way for certain users to ‘log in’ or ‘alias’ as another user, in order to have that data come through instead of their own, and the web service endpoints needed to be different depending on the environment in which the application was running. We also needed to build in ways to emulate these things when working and testing applications locally.

The first thing we did was to decide on what attributes we wanted the User model to have. Once this was decided we then needed to work out, for each attribute, how we would get the value(s) for it. In theory, this isn’t too difficult, however once we started to introduce all the different variations and layers of data and logic we realised we’d need to come up with some nice abstractions to keep it clear and maintainable.

We came up with the idea of Data Providers. We’d have a Data Provider for the ‘Authentic’ user data and a Data Provider for the ‘Alias’ user data. By keeping these two data sets separate it meant we’d always have access to both. We also came up with the idea of creating a ‘Current’ user, which would deliver the Alias data if it was available and the Authentic data if it wasn’t. This was a nice convenience to prevent developers having to keep checking which one to use (whilst still allowing them to access either specifically).

So each Data Provider would return a value for each of the User model attributes we had defined. We also decided that the Data Provider would be responsible for returning local config values if the application was being run on a local development server. This would allow developers to emulate both the Authentic and the Alias data layers.

We then needed to deal with how we would get the data from the various data sources and convert it into a usable format.

In the case of a cookie, it’s pretty straight forward to read the values directly, however in the case of a web service we needed to set up Clients. A Client simply encapsulates the logic required to send a request to a web service and receive a response. We also built cache wrappers for each Client so that we could pass the requests through a caching layer to reduce the number of requests that were being made to the endpoints.

The next issue we faced was converting the raw data into more usable values. For this we decided to build simple Transformers. We’d simply take the data into the Transformer and transform the structure and values into something more usable.

This worked great for some of the data, however we kept coming across situations where we’d need to introduce logic to compute a value, based on other data, rather than simply tidy up the way the data was being output. For this we felt like we needed a slightly different pattern to keep things clear. In the end we came up with Resolvers. They would take one or more sets of data into them and return a single, computed value.

At this point we now had our Data Providers pulling in the data from the various data sources, transforming and resolving it into usable values and returning those values for each User model attribute. The final task was to map the values onto our domain models and return them from a User repository.

In addition to the code we also developed a simple GUI for the package that would make it easy to see the various data layers for the current user to help with debugging and testing.

Summary

We were really happy with how this piece of work turned out. Not only has it created a simple and consistent way for developers to work with user data, it has also provided a model for business discussion and creative thinking to pivot around.

We’ve also been able to update the business logic in the Resolvers to adapt to organisational change without having to update the specific implementation details in the applications that use it. This has been well received by developers, testers and stakeholders and has really shown some of the value that it delivers.

On a technical note, a couple of highlight areas for us are:

  • Abstracting into Transformers and Resolvers made it really easy to set up Unit Tests around these critical processes
  • Creating validators on our domain models, to log errors if they are given unexpected values for any of the attributes, has proven really useful for helping to debug data-related issues

Work with us

If you enjoyed this blog post and you’re interested in working with us, drop us a message and we’d be happy to have a chat.