Early 2000’s Funda - an online platform for home seekers, buyers, and sellers - changed the Dutch real estate market forever. Today, 98% of the Dutch population know Funda, and 86% use it when looking for a house. Funda’s mission is to bring together supply and demand. The company wants to be there at one of the most memorable moments in everyone’s life, the purchase or sale of your home! With around 73 million visits per month, the amount of data at the company’s disposal is immense. But, how do you turn that into insights that can enrich the user experience?
Funda & Data
The importance of data to Funda is evident from the fact that data-driven decision-making is one of their values. “However,” Spiros Kouloumpis, Head of Data at Funda, adds, “becoming a data-driven organization remained a Northern Star because we were suffering from infrastructure issues. Most of our data was on-premise and dispersed across the organization. With no direct access to data, exploration was challenging and time-consuming for data scientists, as it required the manual collection of data. Due to a lack of standardization, they had to reinvent the wheel for each request, and product teams kept on hitting a wall. ”
So, Funda expanded the data team and set itself three goals to derive more value from data:
2. Move data to the cloud
3. Democratize data
A Self-Service Data Cloud Platform
Funda has a longstanding relationship with Xebia. So, when the company realized it needed a cloud data platform but didn’t have the resources to build it, Spiros sat down with Niels (CTO, Xebia Data and AI). “I shared our ideas, like running Airflow on Kubernetes and our specific requirements for the to be built platform. Niels saw the potential, and we set up a team to start working on it. Two of Xebia's experts – Anibal Kolker and Roman Ivanov – formed a sub-data-team with its own cycle and stand-ups for six weeks. Later, Xebia's Guillermo Sánchez Dionis, joined as an Analytics Engineer to help us move from data to insights. While Xebia taught us new things, our experts were mostly responsible for aligning and sharing what we were doing with the rest of the business,” Spiros elaborates.
The Collaboration in a Nutshell:
- Funda asked Xebia to build a cloud data platform based on specific requirements, and deliver a small Proof of Concept with 3 objectives: the platform needed to be able to execute a dbt job, a Spark job and a Python job.
- By including a PoC in the Scope of Work, Funda had proof that the data platform worked and met its requirements.
- When finalizing the assignment, Xebia handed over the roadmap documentation and organized sessions to teach others how to operate the platform.
- Funda took over and started building on the data platform.
The Technical Specs
The main objective of Funda was to create a single Kubernetes-based data runtime, where jobs are scheduled, and data is transformed. So that everyone could use the data for his or her own purpose. This data infrastructure needed to be set up in Microsoft Azure. The team migrated Funda’s use cases from the Hadoop platform and old BI server to Google Cloud Platform and integrated best practices around dbt (data build tool) and data quality. Data Engineers and Data Scientists can now create new pipelines without working on underlying cloud infrastructure. Data is accessible via an SQL interface, fulfilling the “self-service” criteria. Teams can access various data marts depending on their role, and creating new use cases is easy and quick.
Guillermo, Analytics Engineer at Xebia Data and AI, specifies: “Data is now both replicated from Funda’s systems and micro-batched from different event streams into BigQuery. However, other surrounding services were deployed in Microsoft Azure, where most of Funda’s applications reside. In Azure, the team setup Airflow on Azure Kubernetes Service (AKS) to orchestrate all of Funda’s data pipelines. These pipelines could run in three different setups, all of them running on AKS. These setups are: (1) Python scripts, (2) Spark jobs, (i3) dbt transformations. The data pipelines platform is implemented with GitOps methodology, where all code resides in Git and is automatically deployed to Kubernetes by Flux GitOps agent. For self-service reporting, Google Data Studio was chosen as the preferred tool for its easy out of the box integration with BigQuery. Cloud Infrastructure management was implemented with infra-as-code principles using Terraform.”
A New Way of Working
The new platform also introduced a new, more flexible, more robust way of working. Spiros shares: “We don’t have to understand the new data platform in detail; as end-users, all we need to know is how to use it. Xebia implemented the platform and trained us on it. We also changed our way of working. For instance, before, we didn’t have a production environment to run batch jobs. So, whenever we had to create a new data application, we needed to figure it out from scratch. Now, we know that whenever we build something and need to run it, we ‘Dockerize’ it, and it will run. Before, deploying and scheduling were challenging and caused significant delays. Today, creating an application is done in no time!”
Guillermo adds his experience: “Every consultant wants to understand the needs of the end-users as soon as possible to start adding value as quickly as possible. Funda really helped me become familiar with its way of working. I gained insight into how data was behaving, made recommendations for testing, trained the team and any new hires, and worked closely with the marketing team. This collaboration was very fruitful. Some of its requests we could even deliver on the same day. That’s way ahead of what I’ve seen in other businesses.”
“When we make an impact for end-users, it means making a difference for almost everyone in the Netherlands.” –Spiros Kouloumpis, Head of Data at Funda.
Using a new, centralized, self-service Cloud data platform has enabled Funda to:
- Decrease time to production of projects by more than 50%.
- Introduce Data Democratization within the organization; provide internal stakeholders, like marketing, advertising, and product analytics, with easy access to data to create new target audiences or understand even more about the customer journey.
- Improve the focus on benefits for its end-users.
Spiros: “We were able to create an ML model that predicts where our users are in the journey; looking, orientating, viewing, bidding, or buying. We want to personalize this, but without easily accessible data, that process was slow. Now we have a centralized cloud data platform, we can deploy products extremely fast and very frequently.”
Finally, Spiros shares a piece of advice for companies in a similar process: “Create a data platform that’ll enable your use cases. Define them first and work backwards. Don’t over-engineer it. Keep it simple and pragmatic. Make sure the platform can do what you want it to do; answer your questions and scale. Also, understand the importance of data engineers. Having a data team is great, but without engineers, what data will you work on? Firstly focus on creating data sets. Data Scientists aren’t magicians, you need to create the infrastructure and understand data engineering first.”