zalando aws case study

Cloud native technologies power radical agility at Europe’s leading online fashion platform

With plans to further expand its original e-commerce site to include new services and products, Zalando embarked on a radical transformation in 2015, resulting in autonomous self-organizing teams. The technology department began rewriting its applications to be cloud-ready and moving its infrastructure from on-premise data centers to  Amazon Web Services  (AWS). “We saw the pain teams were having with infrastructure and Cloud Formation on AWS,” says Henning Jacobs, Head of Developer Productivity. “There’s still too much operational overhead for the teams and compliance.” To provide better support, cluster management was brought into play.

The company now runs its Docker containers on AWS using Kubernetes orchestration.

With the old infrastructure “it was difficult to properly embrace new technologies, and DevOps teams were considered to be a bottleneck,” says Jacobs. “Now, with this cloud infrastructure, they have this packaging format, which can contain anything that runs on the Linux kernel. This makes a lot of people pretty happy. The engineers love autonomy.”

Projects used

By the numbers

3,000+ production deployments per week

Team of 9 runs 140+ Kubernetes clusters

Wait time for a stack to come up with AWS went from 10-15 minutes to seconds

When Henning Jacobs arrived at Zalando in 2010, the company was just two years old with 180 employees running an online store for European shoppers to buy fashion items.

“It started as a PHP e-commerce site which was easy to get started with, but was not scaling with the business’s needs” says Jacobs, Head of Developer Productivity at Zalando.

At that time, the company began expanding beyond its German origins into other European markets. Fast-forward to today and Zalando now has more than 14,000 employees, 3.6 billion Euro in revenue for 2016, and operates across 15 countries. “With growth in all dimensions, and constant scaling, it has been a once-in-a-lifetime experience,” he says.

Not to mention a unique opportunity for an infrastructure specialist like Jacobs. Just after he joined, the company began rewriting all their applications in-house. “That was generally our strategy,” he says. “For example, we started with our own logistics warehouses but at first you don’t know how to do logistics software, so you have some vendor software. And then we replaced it with our own because with off-the-shelf software you’re not competitive. You need to optimize these processes based on your specific business needs.”

In parallel to rewriting their applications, Zalando had set a goal of expanding beyond basic e-commerce to a platform offering multi-tenancy, a dramatic increase in assortments and styles, same-day delivery and even your own personal online stylist.

The need to scale ultimately led the company on a cloud native journey. As did its embrace of a microservices-based software architecture that gives engineering teams more autonomy and ownership of projects. “This move to the cloud was necessary because in the data center you couldn’t have autonomous teams. You have the same infrastructure and it was very homogeneous, so you could only run your Java or Python app,” Jacobs says.

Zalando office

Zalando began moving its infrastructure from two on-premise data centers to the cloud, requiring the migration of older applications for cloud-readiness. “We decided to have a clean break,” says Jacobs. “Our  Amazon Web Services  infrastructure was set up like so: Every team has its own AWS account, which is completely isolated, meaning there’s no ‘lift and shift.’ You basically have to rewrite your application to make it cloud-ready even down to the persistence layer. We bravely went back to the drawing board and redid everything, first choosing Docker as a common containerization, then building the infrastructure from there.”

The company decided to hold off on orchestration at the beginning, but as teams were migrated to AWS, “we saw the pain teams were having with infrastructure and cloud formation on AWS,” says Jacobs.

Zalando’s 200+ autonomous engineering teams decide what technologies to use and could operate their own applications using their own AWS accounts. This setup proved to be a compliance challenge. Even with strict rules-of-play and automated compliance checks in place, engineering teams and IT-compliance were overburdened addressing compliance issues. “Violations appear for non-compliant behavior, which we detect when scanning the cloud infrastructure,” says Jacobs. “Everything is possible and nothing enforced, so you have to live with violations (and resolve them) instead of preventing the error in the first place. This means overhead for teams—and overhead for compliance and operations. It also takes time to spin up new EC2 instances on AWS, which affects our deployment velocity.”

The team realized they needed to “leverage the value you get from cluster management,” says Jacobs. When they first looked at Platform as a Service (PaaS) options in 2015, the market was fragmented; but “now there seems to be a clear winner. It seemed like a good bet to go with Kubernetes.”

The transition to Kubernetes started in 2016 during Zalando’s  Hack Week  where participants deployed their projects to a Kubernetes cluster. From there 60 members of the tech infrastructure department were on-boarded – and then engineering teams were brought on one at a time. “We always start by talking with them and make sure everyone’s expectations are clear,” says Jacobs. “Then we conduct some Kubernetes training, which is mostly training for our CI/CD setup, because the user interface for our users is primarily through the CI/CD system. But they have to know fundamental Kubernetes concepts and the API. This is followed by a weekly sync with each team to check their progress. Once they have something in production, we want to see if everything is fine on top of what we can improve.”

“We envision all Zalando delivery teams running their containerized applications on a state-of-the-art, reliable and scalable cluster infrastructure provided by Kubernetes.” — HENNING JACOBS, HEAD OF DEVELOPER PRODUCTIVITY AT ZALANDO

At the moment, Zalando is running an initial 40 Kubernetes clusters with plans to scale for the foreseeable future. Once Zalando began migrating applications to Kubernetes, the results were immediate. “Kubernetes is a cornerstone for our seamless end-to-end developer experience. We are able to ship ideas to production using a single consistent and declarative API,” says Jacobs. “The self-healing infrastructure provides a frictionless experience with higher-level abstractions built upon low-level best practices. We envision all Zalando delivery teams will run their containerized applications on a state-of-the-art reliable and scalable cluster infrastructure provided by Kubernetes.”

With the old on-premise infrastructure “it was difficult to properly embrace new technologies, and DevOps teams were considered to be a bottleneck,” says Jacobs. “Now, with this cloud infrastructure, they have this packaging format, which can contain anything that runs in the Linux kernel. This makes a lot of people pretty happy. The engineers love the autonomy.” There were a few challenges in Zalando’s Kubernetes implementation. “We are a team of seven people providing clusters to different engineering teams, and our goal is to provide a rock-solid experience for all of them,” says Jacobs. “We don’t want pet clusters. We don’t want to have to understand what workload they have; it should just work out of the box. With that in mind, cluster autoscaling is important. There are many different ways of doing cluster management, and this is not part of the core. So we created two components to provision clusters, have a registry for clusters, and to manage the whole cluster life cycle.”

Jacobs’s team also worked to improve the Kubernetes-AWS integration. “Thus you’re very restricted. You need infrastructure to scale each autonomous team’s idea.””

Plus, “there are still a lot of best practices missing,” says Jacobs. The team, for example, recently solved a pod security policy issue. “There was already a concept in Kubernetes but it wasn’t documented, so it was kind of tricky,” he says. The large Kubernetes community was a big help to resolve the issue. To help other companies start down the same path, Jacobs compiled his team’s learnings in a document called  Running Kubernetes in Production .

“Kubernetes is a cornerstone for our seamless end-to-end developer experience. We are able to ship ideas to production using a single consistent and declarative API.” — HENNING JACOBS, HEAD OF DEVELOPER PRODUCTIVITY AT ZALANDO

In the end, Kubernetes made it possible for Zalando to introduce and maintain the new products the company envisioned to grow its platform. “ The fashion advice  product used Scala, and there were struggles to make this possible with our former infrastructure,” says Jacobs. “It was a workaround, and that team needed more and more support from the platform team, just because they used different technologies. Now with Kubernetes, it’s autonomous. Whatever the workload is, that team can just go their way, and Kubernetes prevents other bottlenecks.”

Looking ahead, Jacobs sees Zalando’s new infrastructure as a great enabler for other things the company has in the works, from its new logistics software, to a platform feature connecting brands, to products dreamed up by data scientists. “One vision is if you watch the next James Bond movie and see the suit he’s wearing, you should be able to automatically order it, and have it delivered to you within an hour,” says Jacobs. “It’s about connecting the full fashion sphere. This is definitely not possible if you have a bottleneck with everyone running in the same data center and thus very restricted. You need infrastructure to scale each autonomous team’s idea.”

For other companies considering this technology, Jacobs says he wouldn’t necessarily advise doing it exactly the same way Zalando did. “It’s okay to do so if you’re ready to fail at some things,” he says. “You need to set the right expectations. Not everything will work. Rewriting apps and this type of organizational change can be disruptive. The first product we moved was critical. There were a lot of dependencies, and it took longer than expected. Maybe we should have started with something less complicated, less business critical, just to get our toes wet.”

But once they got to the other side “it was clear for everyone that there’s no big alternative,” Jacobs adds. “The Kubernetes API allows us to run applications in a cloud provider-agnostic way, which gives us the freedom to revisit IaaS providers in the coming years. Zalando Technology benefits from migrating to Kubernetes as we are able to leverage our existing knowledge to create an engineering platform offering flexibility and speed to our engineers while significantly reducing the operational overhead. We expect the Kubernetes API to be the global standard for PaaS infrastructure and are excited about the continued journey.”

The journey to an agile organization at Zalando

Zalando—founded in 2008 , publicly listed, and Berlin-based—is Europe’s leading online fashion platform. It is one of the platforms around which the online fashion sector has coalesced. Part of the Zalando strategy is to digitize fashion, moving the organization from a digital store to an online platform. From 2015 to 2017, it increased its digital infrastructure to 1,700 employees, up from 800, and the digital effort now represents one in seven (14 percent) of its staff.

Here, Eric Bowman, Zalando’s VP Engineering, talks to McKinsey’s Stephanie Cadieux and Miriam Heyn about his experience of the digital expansion and how it carries over to agile working at the front line.

McKinsey: First, what is your definition of agility?

Eric Bowman: Agility is just being able to respond quickly to changing business contexts. Beneath that, it’s being able to work well in parallel and to minimize the number of decision or alignment choke points. Imagine walking through a thicket of brambles. Agility is about cutting off the thorns of the brambles so that we can change direction quickly (Exhibit 1). There are so many ways that teams can get snagged: on technology, on organizational structure, on culture, on mind-sets. In a thicket, agility is about removing thorns!

McKinsey: What are the main benefits you have seen from agility at Zalando?

Eric Bowman: As a company, the benefit of agility is that almost everything that you do has the potential to start a flywheel: once it gets spinning, it can spin faster, and it’s harder to slow down. It becomes possible to grow teams and their impact, to compete in new industries, and to get more comfortable starting from scratch on new problems. In general, agility makes it much easier to compete.

Agility at Zalando enabled us to keep scaling our product vision, our platform vision, the scope of our business, and the extent to which we deliver value, impact, and satisfaction to our customers. And agility is not just great for the company. It’s also great for the company’s customers and for our employees. Customers get new features faster and experience dynamism in the business. Employees are generally much more engaged because they spend less time aligning and more time solving problems.

McKinsey: What was the situation at Zalando when you arrived in 2014?

McKinsey: Can you describe Zalando’s journey?

McKinsey: Can you explain radical agility?

McKinsey: This strategy requires the right structures; what were the lessons there?

Eric Bowman: Most company structures are set up to optimize consensus or the decision-making process. But agility really requires flexibility around how decisions are made. And so, people who become comfortable maybe making decisions in one way often have to change their thinking to make decisions in a different way. Finding the right organizational structure where people have real end-to-end ownership can be counterintuitive. But once that’s in place and people are comfortable with frequent restructuring to solve real problems that come up, it’s a necessary ingredient to being agile as a company.

McKinsey: What are those structures for end-to-end ownership?

Eric Bowman: There are three: first is what we call dedicated ownership, a holistic leadership role where one person is accountable for the work of a team or a group of teams that have been assembled to achieve some kind of end-to-end ownership. It usually combines business, product, and technology oversight. So, it’s really pulling lots of strands together: dedicated ownership can then lead a multidisciplinary team toward a common goal.

Here is an example: in online clothing retail, sizing is one of the main drivers of any return rate for us, [and that’s] a very important problem to solve. We want to make sure that what people order will fit them. One of our first senior and specialized dedicated owners was in clothes sizing. She was able to have people in the warehouse, data scientists, and commercial website people working for her; we were able to scoop together into an org all the people who needed to solve this problem. 

The second is prioritization, a big part of traditional agility but even more important for agility at scale. Actually, at Zalando, we started by prioritizing everything as highest priority. And this very quickly led to a situation where we had too much work in progress. The projects that were most likely to fail were not necessarily projects that were complex but projects that touched many teams. We evolved a simple prioritization model to focus on the customer, on company priorities, and on local priorities; this was an incredible unlocking mechanism, allowing people to make decisions without needing to align. Simultaneously through that, we managed to significantly reduce work in progress.

And the third is the right scope of work (rightsizing) for employees. Here we were pushing against the idea that the only way to succeed is by having a large organization or having broad scope. We knew that depth is also extremely important, and we had to make people comfortable with the idea that sometimes their scope (but not their rewards) needs to be reduced in order to allow them go deeper. It’s extremely important as you scale, because every topic becomes increasingly more complex.

McKinsey: How did you remove structures that no longer work? What was your experience at Zalando?

Eric Bowman: Well, in general, changing your IT architecture is horrible. This is very often about decisions that are hard to change, so people are resistant to sacrifice it. But it’s often necessary periodically to basically start again with a new approach for a new scale. It’s a necessary part of growth, particularly to unlock that parallelism I mentioned. When you have a small number of people, work in parallel slows you down. But not in a larger organization. Very often what works at a small scale works against you at a larger scale. We think of these soon-to-go structures as sacrificial architectures. We must not be afraid to abandon what has worked well in the past, in terms of the architecture of a business or a technology stack.

McKinsey: OK, what were or are the effects on people that go along with these structures? Are any unique to agile operations?

Eric Bowman: We found that as we scaled, there was what we call the first-time leader problem, where people who were not necessarily on a leadership track were suddenly put in leadership positions. We had leaders who could not say no to the business or could not say no to their team. So we introduced a shared leadership model at the team level: one leader for about every 12 people, which was my goal. Shifting to agile means that leaders really have to change their mind-set, typically away from a purely functional view to that much more holistic view. Very often, this kind of transformation means that many leaders have less scope as you scale. And this can be quite uncomfortable at first. But very often, less scope is actually a sign of success.

Another impact was how we ran performance management, an incredibly important facet of radical agility. Making sure that people are fairly compensated and that they are hearing the feedback that they really need to hear are both really important dimensions of how we build trust.

And here is the key: high levels of trust. Trust is a friction area: if it’s missing, almost everyone is wasting time trying to understand what people really meant, what they’re trying to say. But establishing trust is a great simplifier, in terms of all of the interactions that people have day to day. If people are known to be straightforward and authentic, then you can communicate simply and clearly. And alignment becomes much, much easier and cheaper. With the right level of trust, a problem becomes like a cry for support, not an opportunity for emotion.

McKinsey: What else do high levels of trust enable you to do in terms of managing responsibilities and accountabilities?

Eric Bowman: At the heart of all this is a trinity of trust, accountability, and autonomy. Here’s how I think of them: we achieve accountability and autonomy through the dedicated ownership model, and both rest on trust. Accountability is something that cannot be shared (unlike responsibility). We want people to make decisions. But we need them to be accountable for the outcome of those decisions in a very nonblaming way; that is where trust fits in. We also want them to be autonomous. But autonomy without accountability is called vacation. And when things go wrong and people have made questionable decisions, we have a process of review: “Well, how did you get there, really?” So autonomy is learned and earned. [That’s] all on the basis that good judgment comes from experience and experience comes from poor judgment.

McKinsey: What are the processes that serve both your structure and your people?

Eric Bowman: First comes unblocking, essential for achieving parallelism. In a large organization, there are many things which cause people to block. And going through them bit by bit, structuring the organization, looking at individual workflows, how alignment happens, how meetings happen, truly unblocking, this requires looking at every part of what we do, finding every thorn (as it were) and removing it, so that people can move quickly to make decisions and have impact.

Second is deciding what processes to keep, improve, and standardize. For us, this meant standardizing the rituals in what is expected from a team or from a dedicated owner, and in how we measure success consistently, and so that people can, for example, move to different parts of the company. By rituals I mean the cadence of meeting, reporting, and assessing. Rituals provide this constant source of energy into teams—very useful for setting expectations. And they allow teams both to talk about their failures and how they’ve learned from them, and also to celebrate their successes.

But these rituals are also important for scaling leadership. Giving people a regular forum to demonstrate leadership, show leadership, and to show their growing mastery of different topics is one of the most important factors for scaling leadership at a large organization.

We have rituals for product reviews and around operational performance and around project steering. Bringing people together on a regular cycle in a positive and constructive atmosphere exposes people to different ways of thinking and different parts of the business. It also makes people very comfortable, because it’s almost like clockwork. And what’s expected of them becomes increasingly clear over time. So they get better at satisfying what are the key expectations and how can they get better.

McKinsey: How about that process of setting the expectations?

Eric Bowman: We use KPIs [key performance indicators] because they are one of the most important aspects for achieving agility at scale. But here is where this differs from other approaches. The agile way is to map the work that people do to an ROI [return on investment] model and to grow the concept of KPIs to make sure that those KPIs are proving or disproving that ROI model. The ROI model is incredibly important for helping teams prioritize what are the most important things to do and what has the biggest impact on our customers and the business.

The five trademarks of agile organizations

The five trademarks of agile organizations

McKinsey: Let’s return to technology; how does radical agility work in your part of Zalando?

Eric Bowman: One of the challenges in enabling the parallel working I mentioned is finding what are the right things for us to standardize on. And this comes down to how we standardize on technology, how we standardize on process, and how we standardize on rituals. Finding the right balance so that we can continue, for example, to advance our technology stack, while still having the right standard so that the system keeps working in the presence of change, is extremely difficult and takes time and a lot of alignment.

Essentially in tech we’ve tried to get away from the traditional matrix model. Almost every start-up that has a tech team grows that tech team separately. There comes this point where the alignment becomes too difficult and you have to integrate all relative concerns under a single leader. And then that leader has to learn how to lead those teams. This is not a new idea, basically. We all know you have to have multidisciplinary teams. But each team must be able to decide for itself. This is multidisciplinary teams at scale.

McKinsey: So how do you impose and maintain technical standards and compliance?

Eric Bowman: Part of enabling agility for us is getting everybody used to choosing the best tech tool for the job. Whether it’s an internal tool or not, we’re well set up to be very flexible in that regard. We can tolerate duplication and overlap. It’s more important that we succeed as a business than that we say, “Don’t duplicate certain parts of our tech stack.”

We have plenty of autonomy built into our tech organization. For example, teams don’t necessarily have to use the infrastructure that we provide. We would like them to. We think that will save them money, it will make them go faster, and all these different things. But, for example, we can acquire a company and find that they’ve opted out of everything. It’s fairly common for them to pick and choose what they will opt into in terms of our existing tech stack, which processes matter. So we use “inner source,” the internal company version of open source, where we prepare teams to expect that other people will come along and make changes to their systems. It also raises the bar for some of the hygiene factors around how they work. They have to have good documentation. They have to have good automated testing. And, in general, they have to be more comfortable with more people reading their code.

McKinsey: And what are the lessons you can draw from this journey into agility?

Eric Bowman: What scales is principles. At Zalando, our principles are enabling parallelism and unblocking or minimizing the frictions, finding the right size of work for each person.

Actually, we’re less prescriptive about which agile methodology works. We are interested in creating these end-to-end organizations within the company that have the right accountability and autonomy models and the right incentive alignment to really make sure that we can move in parallel.

Stay current on your favorite topics

Stephanie Cadieux is a consultant in McKinsey’s Chicago office , and Miriam Heyn is a partner in the Berlin office .

Explore a career with us

Related articles.

An agenda of a talent-first CEO

An agenda for the talent-first CEO

zalando aws case study

How to create an agile organization

  • About Amazon (English)
  • About Amazon (日本語)
  • About Amazon (Français)
  • About Amazon (Deutsch)
  • Newsroom (Deutsch)
  • About Amazon (Italiano)
  • About Amazon (Polski)
  • About Amazon (Español)
  • Press Center (English)
  • Press Center
  • About Amazon (Português)
  • Press Release Archive
  • Images & Videos
  • Investor Relations

Zalando Selects AWS as Its Preferred Cloud Provider

Europe’s largest online fashion and lifestyle platform uses AWS machine learning technology to innovate faster and redefine the online shopping experience

SEATTLE--(BUSINESS WIRE)--Nov. 24, 2020-- Today, Amazon Web Services (AWS), an Amazon.com, Inc. company (NASDAQ: AMZN), announced that Zalando has selected AWS as its preferred cloud provider and is going all-in on AWS for machine learning, running all of its machine learning workloads on the world’s leading cloud. Zalando, which offers lifestyle and fashion products to more than 35 million customers across 17 countries, will use AWS’s machine learning services to innovate faster and offer a more personalized online shopping experience. In addition to machine learning, Zalando is using the breadth and depth of AWS technologies in analytics, compute, database, networking, serverless, storage, and more to transform the company into a more data-driven organization, optimizing critical business functions such as supply chain management, pricing, marketing, and customer care.

AWS’s machine learning services will enable Zalando to continuously improve the customer experience by reducing the time it takes to design, launch, and scale new features for its e-commerce platform. Using Amazon SageMaker to build, train and deploy machine learning models quickly, and Amazon EMR to capture, store, and analyze large volumes of data, Zalando’s engineering teams can use customer purchase data to create personalized shopping features like individual product and size recommendations and to predict a customer’s future outfit preferences. Zalando is also using AWS machine learning to offer more personalized recommendations based on style preferences or a brand’s ethical practices, predict when items are in-stock for more accurate package delivery and return times, and forecast real-time availability of the latest fashion trends. Working with AWS, Zalando can develop and implement new customer applications faster, such as creating digital avatars that allow customers to virtually try on clothes, and delivering a customer experience that enables shoppers to see which outfits fit them best without trying them on physically.

Zalando is using a wide portfolio of AWS services to drive operational efficiencies and track business performance in near real-time with data and insights. Zalando used AWS Lake Formation to create a data lake running on Amazon Simple Storage Service (Amazon S3) to securely enable its developer teams to collaborate more effectively on projects across different service lines. In addition to its data lake, Zalando also combines data from its internal SAP workloads, including accounting, supply chain management, and e-commerce platforms, with AWS’s analytics portfolio, including AWS Glue, Amazon Redshift, and Amazon Athena to produce transactional and analytical data reports that track business performance in real time. These insights help Zalando’s Size & Fit team to significantly reduce size-related returns by predicting how a garment’s fit is impacted by the material or stretch and making size recommendations that match the customer’s fit preferences. Additionally, by migrating its SAP workloads to AWS, Zalando has reduced IT management time by more than 30 percent.

“Working with AWS, we’ve created a next-generation machine learning platform that enables all of our data scientists and developers to collaborate better and work more efficiently with teams across the company,” said Rodrigue Schaefer, Vice President Digital Foundation at Zalando. “This platform is enabling us to continuously improve the customer experience by rapidly reducing the time it takes to design and implement state-of-the-art personalization tools and new product features. From better stock management to a quicker returns process, we’ve also used a range of additional AWS services to drive operational efficiencies at all stages of the customer journey.”

“As one of the largest online retailers in Germany and Europe, it is exciting to see Zalando go all-in on AWS for its machine learning provider to support their growth and innovation strategy. With millions of customers across Europe, Zalando is a great example of a company that is using machine learning to build, test, and introduce new and more personalized features at scale,” said Klaus Bürg, General Manager for Amazon Web Services EMEA SARL in Germany, Austria, and Switzerland. “By using the full portfolio of AWS in conjunction with machine learning, Zalando will become a more data-driven, flexible, and scalable organization that leverages customer insights to reimagine the online shopping experience.”

About Amazon Web Services

For 14 years, Amazon Web Services has been the world’s most comprehensive and broadly adopted cloud platform. AWS offers over 175 fully featured services for compute, storage, databases, networking, analytics, robotics, machine learning and artificial intelligence (AI), Internet of Things (IoT), mobile, security, hybrid, virtual and augmented reality (VR and AR), media, and application development, deployment, and management from 77 Availability Zones (AZs) within 24 geographic regions, with announced plans for 15 more Availability Zones and five more AWS Regions in India, Indonesia, Japan, Spain, and Switzerland. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—trust AWS to power their infrastructure, become more agile, and lower costs. To learn more about AWS, visit  aws.amazon.com .

About Amazon

Amazon is guided by four principles: customer obsession rather than competitor focus, passion for invention, commitment to operational excellence, and long-term thinking. Customer reviews, 1-Click shopping, personalized recommendations, Prime, Fulfillment by Amazon, AWS, Kindle Direct Publishing, Kindle, Fire tablets, Fire TV, Amazon Echo, and Alexa are some of the products and services pioneered by Amazon. For more information, visit  www.amazon.com/about  and follow  @AmazonNews .

About Zalando

Zalando is Europe’s leading online platform for fashion and lifestyle. Founded in Berlin in 2008, we bring head-to-toe fashion to around 35 million active customers in 17 markets, offering clothing, footwear, accessories and beauty. The assortment of international brands ranges from world famous names to local labels. Our platform is a one-stop fashion destination for inspiration, innovation and interaction. As Europe’s most fashionable tech company, we work hard to find digital solutions for every aspect of the fashion journey: for our customers, partners and every valuable player in the Zalando story. Our goal is to become the starting point for fashion.

zalando aws case study

View source version on  businesswire.com :  https://www.businesswire.com/news/home/20201124005277/en/

Amazon.com, Inc. Media Hotline [email protected] www.amazon.com/pr

Source: Amazon Web Services Inc.

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

Deploying Kubernetes on AWS with CloudFormation and Ubuntu

zalando-incubator/kubernetes-on-aws

Folders and files, repository files navigation, kubernetes on aws.

WORK IN PROGRESS

This repo contains configuration templates to provision Kubernetes clusters on AWS using Cloud Formation and Ubuntu Linux .

Many values are parameterized and values are not always visible. We're focusing on solving our own, specific/Zalando use case. However, we are open to ideas from the community at large about potentially turning this idea into a project that provides universal/general value to others . Please contact us via our Issues Tracker with your thoughts and suggestions.

Configuration in this repository initially was based on kube-aws , but now depends on four components which aren't all yet open sourced:

  • Cluster Registry to keep desired cluster states (e.g. used config channel and version)
  • Cluster Lifecycle Manager to provision the cluster's Cloud Formation stack and apply Kubernetes manifests for system components
  • Cluster Lifecycle Controller that handles rolling updates from inside the cluster, for example node termination
  • Authnz Webhook to validate OAuth tokens and authorize access

Lean more about Zalando's cloud native journey by reading the Zalando Case Study on kubernetes.io . See our Running Kubernetes in Production on AWS document for details on the setup.

  • Highly available master nodes (ASG) behind ELB
  • Worker Auto Scaling Group with node pools support
  • Flannel overlay networking
  • Cluster autoscaling (using cluster-autoscaler )
  • Kubernetes DNS with node-local dnsmasq as daemonset and CoreDNS resolver for cluster.local domain running in the same pod.
  • Route53 DNS integration via External DNS
  • AWS IAM integration via kube2iam , AWS OIDC IAM
  • Standard components are installed: dashboard, node exporter, kube-state-metrics, see also cluster/manifests directory
  • Webhook authentication and authorization (roles "ReadOnly", "PowerUser", "Manual", "Emergency", "Administrator")
  • Emergency Access via internal emergency-access-service, that grant roles "Manual" and "Emergency" with 4 eyes principle and audit logging
  • Log shipping via Scalyr
  • Full Ingress support with ALB/NLB and TLS integration via kube-ingress-aws-controller and HTTP routing via skipper
  • Enhanced usability with managed stacks and blue green deployments via stackset-controller and skipper
  • Fabric API Gateway , which can be used in combination with stackset-controller
  • Static Egress IPs to route through NAT Gateways with Elastic IPs via kube-static-egress-controller
  • Horizontal Pod Autoscaling with scaling by request per second, SQS queue size or others via kube-metrics-adapter
  • Vertical Pod Autoscaling to scale for example Prometheus
  • EFS support
  • GPU support
  • ETCD backup via Kubernetes cronjob and etcdctl snapshot and upload to S3
  • Monitoring via Prometheus and OpenTracing
  • Fully automated cluster updates via Cluster Lifecycle Manager
  • Automated downscaling for test clusters with kube-downscaler
  • Fallback node pools
  • Spot node pool integration
  • automated PDB creation with pdb-controller
  • Node and user authentication is done via tokens (using the webhook feature)
  • SSL client-cert authentication is disabled
  • Many values are hardcoded
  • Secrets (e.g. shared token) are not KMS-encrypted in the cluster

Assumptions

  • The AWS account has one or more hosted zones in Route53 including a proper SSL cert (you can use the free ACM service)
  • The VPC has at least one public subnet per AZ (either AWS default VPC setup or public subnet named "dmz-<REGION>-<AZ>")
  • The VPC is in region eu-central-1 or eu-west-1
  • etcd cluster is available via DNS discovery (SRV records) at etcd.<YOUR-HOSTED-ZONE>
  • OAuth Token Info is available to validate user tokens

Directory Structure

  • cluster/cluster.yaml: Cloud Formation template files for the cluster (will be applied by Cluster Lifecycle Manager )
  • cluster/config-defaults.yaml: Default values for different kind of use that can be overridden by values from our cluster-registry (will be applied by Cluster Lifecycle Manager )
  • cluster/etcd-cluster.yaml: Senza Cloud Formation to deploy ETCD
  • cluster/manifests: Kubernetes manifests for system components (will be applied by Cluster Lifecycle Manager )
  • cluster/node-pools: Cloud Formation template files and userdata (cloud-init) for ContainerLinux node-pools (will be applied by Cluster Lifecycle Manager )
  • docs: extracts from internal Zalando documentation .

Security policy

Contributors 74.

  • Python 2.4%
  • Skip to main navigation
  • Skip to main content
  • Skip to footer
  • Case Studies
  • IMC Sustaining Members
  • Board of Governors
  • IoT Newsdesk
  • White Papers
  • Press Releases
  • Quarterly IoT Buyers’ Index
  • IoT Vertical Markets
  • IMC Events/Webinars
  • Industry Events
  • Past Events & Webinars

IMC Newsdesk

Home / IoT Library / News / Smart Retail News / Zalando picks AWS for machine-learning services

Zalando picks AWS for machine-learning services

  • December 8, 2020
  • Steve Rogerson

zalando aws case study

German lifestyle and fashion retailer Zalando will use AWS’s machine-learning services to innovate faster and offer a more personalised online shopping experience.

In addition to machine learning, the firm is using AWS technologies in analytics, compute, database, networking, serverless, storage and more to transform the company into a more data-driven organisation, optimising critical business functions such as supply chain management, pricing, marketing and customer care.

The machine-learning services will enable Zalando to improve the customer experience by reducing the time it takes to design, launch and scale features for its ecommerce platform. Using SageMaker to build, train and deploy machine-learning models quickly, and Amazon EMR to capture, store and analyse large volumes of data, Zalando’s engineering teams can use customer purchase data to create personalised shopping features such as individual product and size recommendations and to predict a customer’s future outfit preferences.

“Working with AWS, we’ve created a next-generation machine-learning platform that enables all of our data scientists and developers to collaborate better and work more efficiently with teams across the company,” said Rodrigue Schaefer, vice president at Zalando. “This platform is enabling us to continuously improve the customer experience by rapidly reducing the time it takes to design and implement state-of-the-art personalisation tools and new product features. From better stock management to a quicker returns process, we’ve also used a range of additional AWS services to drive operational efficiencies at all stages of the customer journey.”

Zalando is also using AWS machine learning to offer more personalised recommendations based on style preferences or a brand’s ethical practices, predict when items are in-stock for more accurate package delivery and return times, and forecast real-time availability of the latest fashion trends. Working with AWS, Zalando can develop and implement customer applications faster, such as creating digital avatars that allow shoppers to try on clothes virtually, and delivering a customer experience that enables shoppers to see which outfits fit them best without trying them on physically.

Zalando is using a wide portfolio of AWS services to drive operational efficiencies and track business performance in near real time with data and insights. It used AWS Lake Formation to create a data lake running on Amazon Simple Storage Service (Amazon S3) to enable its developer teams to collaborate more effectively and securely on projects across different service lines.

In addition to its data lake, Zalando also combines data from its internal SAP workloads, including accounting, supply chain management and ecommerce platforms, with AWS’s analytics portfolio, including AWS Glue, Amazon Redshift and Amazon Athena to produce transactional and analytical data reports that track business performance in real time.

These insights help Zalando’s size-and-fit team reduce size-related returns by predicting how a garment’s fit is impacted by the material or stretch and making size recommendations that match the customer’s fit preferences. Additionally, by migrating its SAP workloads to AWS, Zalando has reduced IT management time by more than 30 per cent.

“As one of the largest online retailers in Germany and Europe, it is exciting to see Zalando go all-in on AWS for its machine-learning provider to support their growth and innovation strategy,” said Klaus Bürg, general manager for AWS in Germany, Austria and Switzerland. “With millions of customers across Europe, Zalando is a great example of a company that is using machine learning to build, test and introduce new and more personalised features at scale. By using the full portfolio of AWS in conjunction with machine learning, Zalando will become a more data-driven, flexible, and scalable organisation that leverages customer insights to reimagine the online shopping experience.”

Founded in Berlin in 2008, Zalando brings head-to-toe fashion to around 35 million active customers in 17 markets, offering clothing, footwear, accessories and beauty. The assortment of international brands ranges from world famous names to local labels.

• Mercado Libre, headquartered in Argentina and Brazil, has selected AWS as its primary cloud provider to transform the company into a data-driven organisation, improve user experiences, accelerate the launch of services, and support its regional expansion. Mercado Libre is the largest online commerce and payments provider in Latin America and connects businesses across 18 countries with more than 76 million active users.

  • Connected Health News
  • Connected Industries News
  • Connected Transportation News
  • IoT in Public Policy
  • Smart Building & Construction News
  • Smart Cities News
  • Smart Energy News
  • Smart Logistics News
  • Smart Retail News
  • Sustaining Member News

Membership Options

zalando aws case study

Cookie Notice

This site uses cookies for performance, analytics, personalization and advertising purposes.

For more information about how we use cookies please see our Cookie Policy .

Cookie Policy   |   Privacy Policy

Manage Consent Preferences

Essential/Strictly Necessary Cookies

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/ Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.

Functional/ Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/ Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.

zalando aws case study

  • Starburst vs OSS Trino

By Use Cases

  • Open Data Lakehouse
  • Artificial Intelligence
  • ELT Data Processing
  • Data Applications
  • Data Migrations
  • Data Products

By Industry

  • Financial Services
  • Healthcare & Life Sciences

Retail & CPG

  • All Industries
  • Meet our Customers
  • Customer Experience
  • Starburst Data Rebels
  • Documentation
  • Technical overview
  • Starburst Galaxy
  • Starburst Enterprise
  • Upcoming Events
  • Data Universe
  • Data Fundamentals
  • Starburst Academy
  • Become a Partner
  • Partner Login
  • Security & Trust

zalando aws case study

Fully managed in the cloud

Self-managed anywhere

zalando aws case study

Superior stability, fine-grained security, enterprise support, and cost savings

Zalando se wanted to extract more value from its growing aws amazon simple storage service (amazon s3) data lake. starburst enterprise provided buyers and business analysts with a more efficient way to extract value from this distributed data to launch its customer 360 program..

reduction in infrastructure costs

Starburst Enterprise support

Environment

The decision to deploy Starburst Enterprise was made simpler because it has proven to be a reliable, fast, and stable query engine for S3 data lakes.

Alberto Miorin

Engineering Lead

Zalando SE, Europe’s leading online platform for fashion and lifestyle, boasts $5B in annual revenue, largely from selling brand-name clothing and footwear online. The company has thousands of business and marketing analysts who need a fast, simple, reliable way to query data with MicroStrategy. When Zalando decided to transition from legacy data warehouses to an Amazon S3 cloud data lake, it wanted to give its buyers and business analysts a more efficient way to extract value from this distributed data. Zalando deployed Starburst Enterprise for superior stability, fine-grained security, cost savings, and more.

Making smart, analytics-based business decisions is more important than ever in this uncertain global economy. However, Zalando had siloed data warehouse platforms, and the business intelligence and analytics teams couldn’t query data without costly and time-consuming ETL.

When Zalando decided to transition from legacy data warehouses to an Amazon S3 cloud data lake, the company also needed to give its buyers and business analysts a more efficient way to extract value from this distributed data in order to launch its Customer 360 program.

Zalando uses Spark for its data transformation and data science activities — to build and train the machine learning models that drive its recommendation engines. The company also has thousands of business and marketing analysts who need a fast, simple, reliable way to query Zalando’s data through MicroStrategy. 

The company’s initial choice was Trino, the world’s fastest distributed SQL query engine. But the open source deployment couldn’t deliver the efficiency, fine-grained security, and enterprise-grade features Zalando required.

After Zalando chose Amazon S3 as the place to build its data lake, it selected Starburst Enterprise for its security, enterprise support, and superior performance. 

Starburst provides Zalando with:

  • Enterprise support – including 24/7/365 support from true Trino experts.
  • Platform stability & performance – Starburst deploys fully tested, stable releases that work from day one.
  • Security & GDPR – Starburst’s fine-grained security and role-based access control features meet GDPR requirements.
  • Starburst Enterprise and Spark – the company is able to use both solutions in complementary functions, with Spark preparing the data and Starburst Enterprise serving the data with fine-grained access control.

“We can discover, access, and analyze data in our data lake with our preferred tools, and leverage it for business intelligence and data science,” says Miorin. “This streamlined workflow helps our executives make the right decisions on time, and fosters innovation through machine learning.”

As the fully supported, production-tested, enterprise-grade distribution of open source Trino, Starburst Enterprise is constantly adding connectors, demonstrating Starburst’s ongoing commitment to provide fast access to customer data no matter where it resides.

zalando aws case study

Starburst enables Zalando to scale its Amazon Elastic Compute Cloud (EC2) resources up during peak usage, and scale down at other times to save on costs, giving the company unprecedented flexibility.

“Auto-scaling with graceful shutdown was another attractive feature,” Miorin notes. “We are saving 50% on our AWS compute costs.”

The business intelligence and analytics teams can now join and access data without costly and time-consuming ETL.

“We have gotten very good feedback from our users. With Starburst, queries are now faster, and users can see the difference,” says Miorin. “I cannot share the number, but our daily queries are increasing over time. We have observed that users are gravitating toward Starburst.” 

With AWS and Starburst Enterprise, Zalando has the fast, affordable, secure data access the company needs to maintain its position as Europe’s e-commerce leader in a difficult economy.

Priceline Case Study

Glovo case study, banco inter case study, a single point of access to all your data, stay in the know - sign up for our newsletter.

  • Resource Library
  • Events and Webinars
  • Open-source Trino

Quick Links

Get in touch.

  • Customer Support

LinkedIn

© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC

Read Starburst reviews on G2

Privacy Policy   |   Legal Terms   |   Cookie Notice

Start Free with Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes

For more deployment options:

Please fill in all required fields and ensure you are using a valid email address.

By clicking Create Account , you agree to Starburst Galaxy's terms of service and privacy policy .

  • AWS Autoscaling
  • AWS Pricing
  • AWS Cost Optimization
  • AWS EC2 Pricing
  • AWS Fargate
  • Azure Automation
  • Azure Cost Optimization
  • Azure Kubernetes Service
  • Azure Pricing
  • Google Cloud Pricing
  • Google Kubernetes Engine
  • Cloud Optimization
  • Cloud Security
  • Spot Instances
  • Kubernetes Architecture
  • Kubernetes Autoscaling
  • Container Security
  • Platform Engineering
  • Request demo

Schedule a demo with a solution architect

zalando.png

Zalando Runs Mission-Critical Workloads on Spot Instances with Spot

zalando aws case study

The Challenge

Zalando, a European e-commerce company follows a platform approach, offering Fashion and Lifestyle products to customers in 17 European markets. With companies who are operating in such a large scale such as Zalando, the positivity of this growth comes with one common caveat: increased cloud infrastructure costs. In their journey of continuously optimizing and controlling their cloud costs, across over 200 cloud accounts and thousands of developers, they have also explored the option of using Amazon EC2 Spot Instances.

Once Zalando have decided they would like to leverage EC2 Spot Instances, they have faced a whole new challenge of provisioning, operating and more importantly guaranteeing availability for their applications when using EC2 Spot Instances. They were concerned mainly with 2 things:

  • 2 minutes heads up on Spot termination is not enough for the vast majority of their applications
  • Lose the entire cluster at the same time, and having their critical services become unavailable or in degradation of performance.

Meeting Spot

The SRE team at Zalando looked for ways to utilize EC2 Spot Instances across all of their environments. They began their search for a solution that can help them reduce costs, minimize operations and implement it easily across their +200 business units. One option was to develop a solution in house, another option was to look for a 3rd party platform that already does that, This is when they found Elastigroup by Spot and decided to try it mainly because of 3 key features that were appealing to them:

  • Termination prediction of up to 15 minutes in advance, allowing them to run more complex environments without the risk of a down time
  • Ability to automatically fallback to On-Demand instances
  • Abstracting the complexity of using multiple instance types and sizes to decrease the risk of losing instances or to increase the chances of getting it.

Testing the waters with Spot using a complex environment

When Zalando decided to start a POC with Spot, they have deliberately selected a more complex use case than simple Stateless applications behind a load balance in order to truly test the power of the platform. They have decided to try Spot support for stateful workloads and run Cassandra nodes on Elastigroup. “We decided to see without any investment of time on our side, just to test if they can deliver what they promise,” said Luis Mineiro, Site Reliability Engineer at Zalando

Spot Elastigroup allows customers to persist the instance’s storage and network configuration and its state. When working with Cassandra clusters, each node is identified by an IP address so it’s crucial that the nodes will maintain their configuration even during a Spot replacement. With this capability, and while experiencing high-availability and long instance lifetime, Zalando were able to confirm this solution was reliable for them and to roll it out to their entire company.

Seamless integration with existing CI/CD tools

At Zalando, there are more than 200 independent teams and each is using different tooling for provisioning and deployments including CloudFormation. Finding a tool that can easily and seamlessly integrate with their existing tools without having their teams change the way they work, was in the top of their list.

Luckily, Spot natively supports AWS CloudFormation and allowed Zalando to provision and manage everything programmatically in the exact same way their developers and teams are used to. With a short implementation time, they were able to easily change their CFN templates to work directly with Elastigroup.

Deploying Spot across 200 teams in production

After a successful POC where Zalando were able to prove cost savings, ease of use and reliability of Elastigroup, they have started to implement it across all of their different teams.  Spot was able to support their large scale of  200 development teams managing over 300 AWS accounts and automatically integrate with their SSO.  Thanks to Elastigroup, Zalando can focus more on creating quality applications for their customers and less time worrying about the underlying infrastructure.

Zalando

Zalando SE is a Germany-based online shoes and fashion retailer, one of the most innovative fashion platform in Europe. The Company offers a portfolio of women, men and children clothing. Its assortment comprises a range of shoes, clothes, accessories, beauty products and sports goods from more than 1,500 brands

Business Wire

Amazon.com, Inc. Media Hotline [email protected] www.amazon.com/pr

zalando aws case study

Case study: Europe's Leading Online Fashion Platform Gets Radical with Cloud Native

Zalando, Europe's leading online fashion platform, has experienced exponential growth since it was founded in 2008. In 2015, with plans to further expand its original e-commerce site to include new services and products, Zalando embarked on a radical transformation resulting in autonomous self-organizing teams. This change requires an infrastructure that could scale with the growth of the engineering organization. Zalando's technology department began rewriting its applications to be cloud-ready and started moving its infrastructure from on-premise data centers to the cloud. While orchestration wasn't immediately considered, as teams migrated to Amazon Web Services (AWS): "We saw the pain teams were having with infrastructure and Cloud Formation on AWS," says Henning Jacobs, Head of Developer Productivity. "There's still too much operational overhead for the teams and compliance. " To provide better support, cluster management was brought into play.

The company now runs its Docker containers on AWS using Kubernetes orchestration.

With the old infrastructure "it was difficult to properly embrace new technologies, and DevOps teams were considered to be a bottleneck," says Jacobs. "Now, with this cloud infrastructure, they have this packaging format, which can contain anything that runs on the Linux kernel. This makes a lot of people pretty happy. The engineers love autonomy."

"It started as a PHP e-commerce site which was easy to get started with, but was not scaling with the business' needs" says Jacobs, Head of Developer Productivity at Zalando.

At that time, the company began expanding beyond its German origins into other European markets. Fast-forward to today and Zalando now has more than 14,000 employees, 3.6 billion Euro in revenue for 2016 and operates across 15 countries. "With growth in all dimensions, and constant scaling, it has been a once-in-a-lifetime experience," he says.

Not to mention a unique opportunity for an infrastructure specialist like Jacobs. Just after he joined, the company began rewriting all their applications in-house. "That was generally our strategy," he says. "For example, we started with our own logistics warehouses but at first you don't know how to do logistics software, so you have some vendor software. And then we replaced it with our own because with off-the-shelf software you're not competitive. You need to optimize these processes based on your specific business needs."

In parallel to rewriting their applications, Zalando had set a goal of expanding beyond basic e-commerce to a platform offering multi-tenancy, a dramatic increase in assortments and styles, same-day delivery and even your own personal online stylist .

The need to scale ultimately led the company on a cloud-native journey. As did its embrace of a microservices-based software architecture that gives engineering teams more autonomy and ownership of projects. "This move to the cloud was necessary because in the data center you couldn't have autonomous teams. You have the same infrastructure and it was very homogeneous, so you could only run your Java or Python app," Jacobs says.

Zalando began moving its infrastructure from two on-premise data centers to the cloud, requiring the migration of older applications for cloud-readiness. "We decided to have a clean break," says Jacobs. "Our Amazon Web Services infrastructure was set up like so: Every team has its own AWS account, which is completely isolated, meaning there's no 'lift and shift.' You basically have to rewrite your application to make it cloud-ready even down to the persistence layer. We bravely went back to the drawing board and redid everything, first choosing Docker as a common containerization, then building the infrastructure from there."

The company decided to hold off on orchestration at the beginning, but as teams were migrated to AWS, "we saw the pain teams were having with infrastructure and cloud formation on AWS," says Jacobs.

Zalandos 200+ autonomous engineering teams decide what technologies to use and could operate their own applications using their own AWS accounts. This setup proved to be a compliance challenge. Even with strict rules-of-play and automated compliance checks in place, engineering teams and IT-compliance were overburdened addressing compliance issues. "Violations appear for non-compliant behavior, which we detect when scanning the cloud infrastructure," says Jacobs. "Everything is possible and nothing enforced, so you have to live with violations (and resolve them) instead of preventing the error in the first place. This means overhead for teams—and overhead for compliance and operations. It also takes time to spin up new EC2 instances on AWS, which affects our deployment velocity."

The team realized they needed to "leverage the value you get from cluster management," says Jacobs. When they first looked at Platform as a Service (PaaS) options in 2015, the market was fragmented; but "now there seems to be a clear winner. It seemed like a good bet to go with Kubernetes."

The transition to Kubernetes started in 2016 during Zalando's Hack Week where participants deployed their projects to a Kubernetes cluster. From there 60 members of the tech infrastructure department were on-boarded - and then engineering teams were brought on one at a time. "We always start by talking with them and make sure everyone's expectations are clear," says Jacobs. "Then we conduct some Kubernetes training, which is mostly training for our CI/CD setup, because the user interface for our users is primarily through the CI/CD system. But they have to know fundamental Kubernetes concepts and the API. This is followed by a weekly sync with each team to check their progress. Once they have something in production, we want to see if everything is fine on top of what we can improve."

At the moment, Zalando is running an initial 40 Kubernetes clusters with plans to scale for the foreseeable future. Once Zalando began migrating applications to Kubernetes, the results were immediate. "Kubernetes is a cornerstone for our seamless end-to-end developer experience. We are able to ship ideas to production using a single consistent and declarative API," says Jacobs. "The self-healing infrastructure provides a frictionless experience with higher-level abstractions built upon low-level best practices. We envision all Zalando delivery teams will run their containerized applications on a state-of-the-art reliable and scalable cluster infrastructure provided by Kubernetes."

With the old on-premise infrastructure "it was difficult to properly embrace new technologies, and DevOps teams were considered to be a bottleneck," says Jacobs. "Now, with this cloud infrastructure, they have this packaging format, which can contain anything that runs in the Linux kernel. This makes a lot of people pretty happy. The engineers love the autonomy."

There were a few challenges in Zalando's Kubernetes implementation. "We are a team of seven people providing clusters to different engineering teams, and our goal is to provide a rock-solid experience for all of them," says Jacobs. "We don't want pet clusters. We don't want to have to understand what workload they have; it should just work out of the box. With that in mind, cluster autoscaling is important. There are many different ways of doing cluster management, and this is not part of the core. So we created two components to provision clusters, have a registry for clusters, and to manage the whole cluster life cycle."

Jacobs's team also worked to improve the Kubernetes-AWS integration. "Thus you're very restricted. You need infrastructure to scale each autonomous team's idea." Plus, "there are still a lot of best practices missing," says Jacobs. The team, for example, recently solved a pod security policy issue. "There was already a concept in Kubernetes but it wasn't documented, so it was kind of tricky," he says. The large Kubernetes community was a big help to resolve the issue. To help other companies start down the same path, Jacobs compiled his team's learnings in a document called Running Kubernetes in Production .

In the end, Kubernetes made it possible for Zalando to introduce and maintain the new products the company envisioned to grow its platform. " The fashion advice product used Scala, and there were struggles to make this possible with our former infrastructure," says Jacobs. "It was a workaround, and that team needed more and more support from the platform team, just because they used different technologies. Now with Kubernetes, it's autonomous. Whatever the workload is, that team can just go their way, and Kubernetes prevents other bottlenecks."

Looking ahead, Jacobs sees Zalando's new infrastructure as a great enabler for other things the company has in the works, from its new logistics software, to a platform feature connecting brands, to products dreamed up by data scientists. "One vision is if you watch the next James Bond movie and see the suit he's wearing, you should be able to automatically order it, and have it delivered to you within an hour," says Jacobs. "It's about connecting the full fashion sphere. This is definitely not possible if you have a bottleneck with everyone running in the same data center and thus very restricted. You need infrastructure to scale each autonomous team's idea."

For other companies considering this technology, Jacobs says he wouldn't necessarily advise doing it exactly the same way Zalando did. "It's okay to do so if you're ready to fail at some things," he says. "You need to set the right expectations. Not everything will work. Rewriting apps and this type of organizational change can be disruptive. The first product we moved was critical. There were a lot of dependencies, and it took longer than expected. Maybe we should have started with something less complicated, less business critical, just to get our toes wet."

But once they got to the other side "it was clear for everyone that there's no big alternative," Jacobs adds. "The Kubernetes API allows us to run applications in a cloud provider-agnostic way, which gives us the freedom to revisit IaaS providers in the coming years. Zalando Technology benefits from migrating to Kubernetes as we are able to leverage our existing knowledge to create an engineering platform offering flexibility and speed to our engineers while significantly reducing the operational overhead. We expect the Kubernetes API to be the global standard for PaaS infrastructure and are excited about the continued journey."

Subscribe to AI in Action – your guide to AI adoption in business >

AIX | AI Expert Network

  • September 23, 2023
  • AI Case Studies

Case Study: Zalando’s Innovative Use of AI

zalando aws case study

Zalando, a leading European online platform for fashion and lifestyle, has consistently been at the forefront of innovation since its inception in 2008. Founded by business school friends Robert Gentz and David Schneider, the German-based company serves 17 European markets with an extensive selection of 400,000 products from over 2,000 brands. The retailer’s customer base has grown to 27 million, thanks in part to its early and sustained commitment to leveraging artificial intelligence (AI) and machine learning technologies. Most recently, Zalando announced the upcoming launch of a ChatGPT-powered fashion assistant aimed at transforming the online shopping experience.

Key Takeaways

  • Zalando has been using AI and machine learning to personalize the shopping experience and better understand customer needs.
  • The company launched an Algorithmic Fashion Companion (AFC) that uses AI to recommend outfits, driving business metrics like basket size.
  • Financially, Zalando reported an 87% surge in Q2 2023’s adjusted EBIT, despite a challenging retail environment.
  • The upcoming ChatGPT-powered fashion assistant aims to revolutionize the way customers interact with Zalando’s platform, making it more intuitive and personalized.
  • AI also finds applications in Zalando’s logistics, supply chain, and even size recommendations, showcasing the technology’s versatility.

Deep Dive: Zalando’s Innovative Use of AI

Zalando has adopted a customer-centric approach to implementing AI, with the primary aim of improving the user experience and boosting sales. The company has focused on addressing customer needs by offering hyper-personalized experiences, whether that’s through outfit recommendations or size advice. With teams that include 120 researchers, Zalando invests in technology to deepen its understanding of customer preferences and the broader fashion landscape.

Implementation

The Algorithmic Fashion Companion (AFC) was one of Zalando’s earlier implementations, a digital outfit recommendation tool based on a customer’s wish list or prior purchases. Adjustments by human stylists ensured that the algorithm remained aligned with current fashion trends. More recently, Zalando announced its plans to introduce a ChatGPT-powered fashion assistant. This assistant will help customers navigate the online marketplace by interpreting their natural language queries, making the shopping experience more intuitive. Furthermore, Zalando has employed AI in back-end operations, from supply chain logistics to offering AI-powered size recommendations based on customer photos.

AI implementation has been fruitful for Zalando. For example, the AFC has been a driver in increasing basket sizes by 40%. Financially, the company’s Q2 2023 report showed a strong adjusted EBIT increase of 87%, partially attributable to its strategic focus on AI and technology. AI is also influencing Zalando’s partnerships, drawing in brands like lululemon and Lancôme.

Challenges and Barriers

Despite its successes, Zalando faces challenges in fully leveraging AI. One limitation is the AI’s inability to account for the influence of external factors like celebrities and social trends on customer preferences. There is also a data privacy concern as the company collects more personalized information from consumers. In terms of workforce, Zalando had to cut 250 marketing and communications professionals as AI increasingly automates these roles.

Future Outlook

Zalando’s future in AI appears promising, with plans to iterate and scale their ChatGPT-powered fashion assistant. The company aims to expand the assistant’s capabilities to offer fashion and beauty advice and potentially to integrate more personalized recommendations based on customer data. They are also keen on driving innovation in size and fit, aiming to solve this e-commerce challenge at scale.

Zalando has set a high standard in using AI to enhance the customer experience and improve its business metrics. Through a judicious mix of machine learning algorithms and human expertise, the company has succeeded in significantly influencing customer behavior and improving its financial standing. While challenges remain, especially concerning data privacy and the role of human employees, Zalando’s proactive approach to embracing technology makes it a case study worth emulating. The firm’s focus on constant learning and adaptation suggests a robust capability to harness AI’s full potential in the future.

Sources: Zalando launches AI fashion assistant Zalando to launch virtual fashion assistant powered by ChatGPT The Amazing Ways Retail Giant Zalando Is Using Artificial Intelligence How Zalando uses AI for hyper-personalised customer experiences Zalando launches new brands and AI-powered size recommendations Zalando to launch a fashion assistant powered by ChatGPT

Get in touch

Whether you’re looking for expert guidance on an AI initiative or want to share your AI knowledge with others, our network is the place for you. Let’s work together to build a brighter future powered by AI.

Related Posts

zalando aws case study

Case Study: Redfin’s Innovative AI Tools

  • April 27, 2024

zalando aws case study

Case Study: Compass Enhances Agent Productivity with AI

zalando aws case study

Case Study: JLL’s Implementation of JLL GPT™

Case Studies

How we drive results for our partners.

zalando aws case study

Steve Madden Influencer Event

zalando aws case study

MQ Marqet Ad Manager Campaign

zalando aws case study

Athleta - The Power of She

zalando aws case study

PURELEI - EOSS

zalando aws case study

ASICS - Nature Bathing

zalando aws case study

QS by s.Oliver x BVG

zalando aws case study

Buratti - Ad Manager Campaign

zalando aws case study

GAP - Sharing Pride

zalando aws case study

UGG - Feels like UGG

zalando aws case study

ELC - Unlock the Magic

zalando aws case study

SNOCKS - Ad Manager Campaign

zalando aws case study

Origins - Cyber Week 2022

zalando aws case study

Levi's 501® Live Show

zalando aws case study

ALDO x L'Oréal

zalando aws case study

adidas Originals

zalando aws case study

Tommy Hilfiger

zalando aws case study

Estée Lauder Companies

zalando aws case study

Polo Ralph Lauren

zalando aws case study

  I accept the privacy policy

Thank you, your message has been sent to our team!

zalando aws case study

A copy of your message will be sent to your email address.

  • Español – América Latina
  • Português – Brasil

Zalando: Building one of Europe’s leading data-driven companies

Zalando logo

About Zalando

Founded in 2008, Zalando is a leading European online fashion platform, connecting more than 24 million active customers with 2,000 leading brands.

Tell us your challenge. We're here to help.

Leading online fashion retailer zalando combines data from multiple sources using bigquery as the bedrock of its data-driven business., google cloud results.

  • Combines metrics from diverse sources for advanced reporting and insights
  • Generates same-day results for A/B testing to speed development

Democratizes data access with Looker Studio dashboards

Zalando is one of most remarkable success stories in European online retail. Founded in 2008, the company today connects more than 24 million active customers with clothing, shoes, and accessories from 2,000 brands. Following a phase of international expansion, Zalando is now active in 17 countries, employing more than 15,000 people. In 2017, not yet a decade old, the company posted annual revenues of €4.5 billion.

In 2016, as the scope, scale, and complexity of Zalando’s multinational operations increased, the company looked to anchor its activities in comprehensive data analytics, as Jorge Ramos, Team Lead Digital Analytics at Zalando, explains: “There were still some areas in the company that had trouble accessing all the data that they required. There's a famous saying, ‘If it takes two weeks to get an answer, in the end you stop asking questions.’ We were concerned that could happen at Zalando. We had over-complex tooling that meant data didn't flow as fast as we needed, and some teams were starting to make decisions that were not really based on data, or at least, not on all of the data that they would like to have.”

“We believe that Zalando has been one of the most data-driven organizations in Europe in the last decade. In a company of our size, quantitative data is essential to make decisions, and BigQuery was the best and, in some senses, the only option to work with data at our scale.”

To supply the framework necessary for a truly data-driven organization, Zalando supplemented Google Analytics 360 Suite with a solution based on BigQuery . Leveraging exported Google Analytics 360 data to BigQuery, Jorge and his team then made it available throughout the company with dashboards on Looker Studio .

“We believe that Zalando has been one of the most data-driven organizations in Europe in the last decade,” says Jorge. “In a company of our size, quantitative data is essential to make decisions, and BigQuery was the best and, in some senses, the only option to work with data at our scale.”

Combining data from across the organization in BigQuery

Online retail operations rely on accurate, rapid analytics to optimize customer experiences and meet KPIs. With more than 200 million monthly visits on mobile browsing for upwards of 300,000 products, Zalando used Google Analytics 360 to stream details ranging from customer lifetime value, to best- and worst-selling products, and pricing estimation models. In a single month, Zalando’s global activity could generate as many as 30 billion Google Analytics hits. To process that kind of volume, Zalando looked for a way to handle exports at scale.

“We were spending a lot of time taking care of those exports,” says Jorge. “We had to make sure that they happened at the right time, with no data loss, and at speed. The tools we were using proved to be quite complicated for end users who were not as experienced as our main analysts. Those were the main reasons why we switched to Google Analytics 360 as our primary analytics tool. We wanted to bring this data to every corner of the organization, with a very powerful tool that initial users can actually use.”

For Jorge, BigQuery was an obvious next step. “BigQuery is the best solution available for working with raw data at the level of granularity and accuracy we need. The Google Analytics 360 interface is good, but was not able to answer all of the questions that people may have, and that’s where BigQuery kicks in, delivering much more advanced analysis that we then plug into Looker Studio for dashboards that make big datasets digestible for anyone at Zalando.”

Zalando uses BigQuery to combine Google Analytics 360 data with information from other data sources, too, such as social media APIs or measurements of page performance from Zalando’s bespoke infrastructure, which show website loading speed for every single page view with absolute granularity. By bringing those datasets together, Zalando teams can understand the impact of a slow or fast website on commercial KPIs, conversions, and other metrics. “That’s only possible because of BigQuery,” adds Jorge. “Not only because of how easy it is to stream data into it, but also because of how easy it is to combine it once it’s in there, and then make fast calculations with huge volumes of data. Crucially, we can do that without sampling, which could otherwise mean losing 50% of our data – something we cannot afford to do.”

“We generate tens of billions of Google Analytics hits per month at Zalando. That amount of data just isn’t manageable with a more traditional database system. It needs something very advanced, like BigQuery, and thanks to the seamless integration between BigQuery and Google Analytics 360, it was easy to set up.”

Because BigQuery is a managed service, it requires minimal maintenance, beyond its basic configuration, meaning Jorge and his team can spend time on adding value elsewhere. “Now, for example, we are working to import certain fields from our own databases into BigQuery to make it more efficient,” says Jorge. “We are looking to combine business events, such as parcel deliveries or order acknowledgements, with the user behavior that we track on our websites with Google Analytics 360.” Combining those two sources in BigQuery means Zalando can create highly complex reports at speed, delivering sophisticated insights that add value and make the company more competitive.

“We generate tens of billions of Google Analytics hits per month at Zalando,” says Jorge. “That amount of data just isn’t manageable with a more traditional database system. It needs something very advanced, like BigQuery, and thanks to the seamless integration between BigQuery and Google Analytics 360, it was easy to set up.”

Different Google products for different levels of expertise

Zalando employees use Google Workspace for communications, and for the first year using BigQuery, the main reporting from the analytics solution was to Sheets , running scripts with BigQuery API. “Something we really enjoyed in the early days of using BigQuery was the easy integration with Sheets,” says Jorge. “We could call the BigQuery API from Sheets and easily build dashboards in a very short amount of time.”

At the time, Looker Studio was in an earlier stage of development, and as it became more advanced, Zalando switched to using it for dashboards. Now Looker Studio dashboards are updated daily from BigQuery for reference across the company.

“Our central idea is to have different Google products for different user profiles, to democratize data and bring it to every corner of the organization” says Jorge. “For hardcore analysts used to working with big volumes of data, BigQuery is just the perfect solution. The Google Analytics 360 interface is great for marketing, middle management, and other functions, and then on top of that we have Looker Studio connected to BigQuery. Those Looker Studio dashboards are consumed by the whole company, especially C-Suite executives and senior management. They provide an up-to-date, consolidated, single point of truth for our core KPIs.”

“We are really happy with Google Cloud support. As soon as we raise concerns or needs, Google addresses them. Google acts less like a vendor and more like a partner. Instead of selling products, they're invested in finding solutions together, and that's something we really appreciate at Zalando.”

Exploring machine learning to leverage analytics

Using BigQuery for analytics means Zalando can generate insights much more rapidly than in the past. “One example of that is in our A/B testing,” says Jorge. “We do hundreds of A/B tests, and the analysis used to be really time consuming. With BigQuery we can do it in a couple of hours, and as soon as a test is over, we can have results the same day and decide how to take things further. That’s been a change in the way we work.”

Now Jorge and the Zalando team are exploring the use of Google AI and machine learning tools to leverage analytics data, looking to create predictive models for conversions, churn, and other key metrics.

“We are really happy with Google Cloud support,” says Jorge. “As soon as we raise concerns or needs, Google addresses them. Google acts less like a vendor and more like a partner. Instead of selling products, they're invested in finding solutions together, and that's something we really appreciate at Zalando.”

COMMENTS

  1. AWS Customer Success Story: Zalando

    Zalando, Europe's largest online fashion platform, was founded in 2008 in Berlin.Today, the company employs about 15,000 people and ships approximately 90 million orders per year. Zalando's e-commerce platform runs on a range of AWS services, including Amazon Elastic Compute Cloud (EC2) Spot Instances.By using AWS, Zalando can quickly implement new product ideas and roll out new features ...

  2. Zalando Enhances Customer Experience Using Amazon CloudFront

    Zalando, a leading fashion, beauty, and lifestyle-focused online platform based in Berlin, Germany, was looking to optimize its services in the face of rapid growth.Zalando connects customers to brands and products across 25 European markets and serves more than 49 million active customers. A key component of Zalando's online customer experience is the use of rich media content across its ...

  3. Zalando SAP Case Study

    2020. Zalando is tracking performance in near real-time after migrating SAP to AWS and reducing the cost of obtaining business insight by 30 percent. Zalando is a European online fashion retailer based in Berlin, Germany. The company combines SAP with AWS Glue, Amazon Redshift, Amazon Athena, and an Amazon S3 data lake for transactional and ...

  4. Zalando

    Fast-forward to today and Zalando now has more than 14,000 employees, 3.6 billion Euro in revenue for 2016, and operates across 15 countries. "With growth in all dimensions, and constant scaling, it has been a once-in-a-lifetime experience," he says. Not to mention a unique opportunity for an infrastructure specialist like Jacobs.

  5. PDF Zalando Analytics and Reduces Costs with Starburst and AWS

    When Zalando decided to transition from legacy data warehouses to an AWS Amazon S3 cloud data lake, the company needed to give its buyers and business analysts a more efficient way to extract value from this distributed data. Zalando deployed Starburst Enterprise for superior stability, fine-grained security, cost savings, and more.

  6. The journey to an agile organization at Zalando

    In general, agility makes it much easier to compete. 1. Agility at Zalando enabled us to keep scaling our product vision, our platform vision, the scope of our business, and the extent to which we deliver value, impact, and satisfaction to our customers. And agility is not just great for the company. It's also great for the company's ...

  7. Zalando Selects AWS as Its Preferred Cloud Provider

    SEATTLE-- (BUSINESS WIRE)--Nov. 24, 2020-- Today, Amazon Web Services (AWS), an Amazon.com, Inc. company (NASDAQ: AMZN), announced that Zalando has selected AWS as its preferred cloud provider and is going all-in on AWS for machine learning, running all of its machine learning workloads on the world's leading cloud.

  8. Zalando: A digital foundation for fashion supply chain success

    Zalando realized that to achieve profitability and market leadership, it would have to leverage advanced technology and develop a data-driven supply chain. So, with the innovative spirit of a start-up, it embraced the challenge of building internal digital tools and competencies to drive supply chain efficiencies. The online fashion supply ...

  9. GitHub

    Lean more about Zalando's cloud native journey by reading the Zalando Case Study on kubernetes.io. See our Running Kubernetes in Production on AWS document for details on the setup. Features

  10. Zalando picks AWS for machine-learning services

    German lifestyle and fashion retailer Zalando will use AWS's machine-learning services to innovate faster and offer a more personalised online shopping experience. In addition to machine learning, the firm is using AWS technologies in analytics, compute, database, networking, serverless, storage and more to transform the company into a more data-driven organisation, optimising critical ...

  11. Zalando Case Study

    Starburst enables Zalando to scale its Amazon Elastic Compute Cloud (EC2) resources up during peak usage, and scale down at other times to save on costs, giving the company unprecedented flexibility. "Auto-scaling with graceful shutdown was another attractive feature," Miorin notes. "We are saving 50% on our AWS compute costs.".

  12. How Zalando built its data lake on Amazon S3

    Founded in 2008, Zalando is Europe's leading online platform for fashion and lifestyle with over 32 million active customers. I am a lead data engineer at Zalando and a steady contributor to the company's cloud journey. In this blog post, I cover how Amazon Simple Storage Service (Amazon S3) became a cornerstone of the data infrastructure of our company.

  13. Zalando Runs Mission-Critical Workloads on Spot Instances with Spot

    The SRE team at Zalando looked for ways to utilize EC2 Spot Instances across all of their environments. They began their search for a solution that can help them reduce costs, minimize operations and implement it easily across their +200 business units. One option was to develop a solution in house, another option was to look for a 3rd party ...

  14. Zalando Selects AWS as Its Preferred Cloud Provider

    November 24, 2020 03:01 AM Eastern Standard Time. SEATTLE-- ( BUSINESS WIRE )--Today, Amazon Web Services (AWS), an Amazon.com, Inc. company (NASDAQ: AMZN), announced that Zalando has selected AWS ...

  15. A Case Study On Amazon Web Services

    Amazon Web Services (AWS), the cloud platform offered by Amazon.com Inc ( AMZN ), has become a giant component of the e-commerce giant's business portfolio. In the first quarter of 2020, AWS brought in a record $10 billion of revenue, accounting for 13.5% of Amazon's total revenue.

  16. Zalando Case Study

    Challenge Zalando, Europe's leading online fashion platform, has experienced exponential growth since it was founded in 2008. In 2015, with plans to further expand its original e-commerce site to include new services and products, Zalando embarked on a radical transformation resulting in autonomous self-organizing teams. This change requires an infrastructure that could scale with the growth ...

  17. Case Study: Zalando's Innovative Use of AI

    Zalando, a leading European online platform for fashion and lifestyle, has consistently been at the forefront of innovation since its inception in 2008. Founded by business school friends Robert Gentz and David Schneider, the German-based company serves 17 European markets with an extensive selection of 400,000 products from over 2,000 brands. The retailer's customer base has grown to 27 ...

  18. Zalando

    Zalando implements Magic Transit to enhance security, optimize performance, and ensure fast, accurate customer fulfillment. Fashion ecommerce in Europe is growing nearly five times faster than the fashion industry overall and is projected to reach €130B in revenue by 2024.As Europe's leading online fashion platform, with annual revenues of €8 billion, Berlin-based Zalando is disrupting ...

  19. Customer Success Stories: Case Studies, Videos, Podcasts, Innovator stories

    Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today. Contact Sales. Learn how organizations of all sizes use AWS to increase agility, lower costs, and accelerate innovation in the cloud.

  20. Case Studies

    ALDO x L'Oréal. In 2022, ALDO and L'Oréal joined forces with ZMS. The collaboration was based on the assumption that even though ALDO and two L'Oréal brands, Maybelline and Essie, shared similar customer basket tendencies, they still had room to grow together. Learn more about how this cross-category collaboration increased new customers ...

  21. Zalando Meningkatkan Pengalaman Pelanggan Menggunakan Amazon CloudFront

    Zalando ingin terus berinovasi dalam manajemen dan manipulasi konten media kaya menggunakan AWS. Zalando berencana untuk mendorong keterlibatan pelanggan dengan membangun solusi ecommerce interaktif menggunakan AWS Elemental MediaConvert, layanan transkode video berbasis file dengan fitur tingkat siaran.. Zalando bermigrasi ke CloudFront untuk meningkatkan manajemen media dan arsitektur ...

  22. Zalando Case Study

    Google Cloud Results. Zalando is one of most remarkable success stories in European online retail. Founded in 2008, the company today connects more than 24 million active customers with clothing, shoes, and accessories from 2,000 brands. Following a phase of international expansion, Zalando is now active in 17 countries, employing more than ...

  23. Nasdaq Case Study

    Nasdaq uses AWS to ingest 70 billion records per day, load financial market data 5 hours faster, run Amazon Redshift queries 32 percent faster, and enable business transformation with shared data. The organization, based in New York City, owns and operates the Nasdaq stock market and eight European stock exchanges. Nasdaq uses a data lake based on Amazon S3 and Amazon Redshift to ingest and ...