alphalist Blog

How to Create Simulations for Reinforcement Learning


How can you accurately simulate situations to prepare autonomous vehicles for urban environments? In a recent CTO podcast, Maria Meier (CTO of Phantasma Labs) shared what goes in to creating simulations which are a key aspect of reinforcement learning. She also shares the technical challenges they face in creating simulations for autonomous vehicles and how the simulation landscape on the whole is being standardised.  This is a written version of her podcast for those who prefer to read.

Table of Contents
  • What is Reinforcement Learning
  • The Role of the Simulator in Reinforcement Learning
  • How Simulations are Used in the Autonomous Vehicle Industry
    • How the Simulator Works
    • The Challenges of Simulating Pedestrian Behaviour in an Urban Environment
      • The Effects of Geography and Culture in Traffic Patterns.
      • What Makes a Human-like Trajectory?
    • Tackling the Challenge as a DeepTech Company
  • Standards in the Simulation Industry
    • OpenDrive:
    • OSI (Open Simulation Interface - formerly Open Sensor Interface)

What is Reinforcement Learning

Reinforcement Learning is a machine learning technique that trains models by rewarding good decisions and/or punishing bad ones. Until now, it hasn’t been as popular as other machine learning paradigms as resources for effective reinforcement learning were not available.  One core competency needed for Reinforcement Learning is simulators as not only isn’t it practical to set up trial-error physical environments (how many cars do you want to crash) but also one needs a way to measure the models progress through benchmarking and tests. However with simulators becoming more accessible, reinforcement learning is going to become more popular.

The Role of the Simulator in Reinforcement Learning

An environment, built in a simulator, is the core aspect of Reinforcement Learning. It provides virtual and  realistic scenarios to train the model on and also allows you to track progress. It is a great way to train AI for a situation we don’t want to model in reality. One example of a simulator is OpenAI Gym which many researchers are using to build environments and even use their standard environments for benchmarking how their algorithms are working. But many standard environments offered  are 2D and you still need to do a lot of work. Therefore there is a huge field for creating simulators that work in advanced cases. For example a simulator for autonomous driving will need to be 3D and physically realistic. Not only can pedestrians not go beyond a certain speed, but all pedestrians are different.  Which is why there are companies like Phantasma that create specialised simulations for particular use cases.

How Simulations are Used in the Autonomous Vehicle Industry

Training autonomous vehicles to drive on highways is much easier than training them to drive in urban areas due to pedestrian traffic. This is why much work  has been done in the highway driving aspect and less on the urban driving aspect. In fact some models are almost ready now for highway use but a human would need to take over on urban roads. The entire field of urban driving is in pre-development. How will a car recognise a pedestrian and know when to stop for? The one way it is being tackled is through trajectory prediction which predicts where in the next five seconds the pedestrian is likely to be and have the car respond accordingly. But, the safe testing of any form of new autonomous mobility (self-driving car or delivery robot) is extremely hard to test in the real world when people are involved.If you are a self-driving car engineer  at some car company and you want to test your automated braking system if it's going to work or not. You cannot ask your colleague or your neighbour, “Hey, can you cross in front of my car so I can test if the brake is going to work?” That would be unethical.  

There are crash test dummies - but those are not diverse. For example the dummy might be a grown up male person but what will happen as soon as you have someone that deviates from the like a child or an old person or  someone who looks completely different.  So then the industry has started to use simulations very heavily. 

(Which is good as the best way to train an autonomous vehicle is through a combination of simulations and data collection. Too many companies just collect data but that is a very ‘brute force approach.’.)

 But when we looked at those simulators,  we realised that while they were great at showing the world from a specific perspective but the pedestrians and the cyclists were extremely rule following. The pedestrian remained on the sidewalk the whole time. They crossed only when the light was great. But nowhere in the world - not even in Switzerland - do people follow the rules 100% of the time. There needs to be a simulator that would generate all  the scenarios of pedestrian traffic that a typical car might be exposed to. This is where Phantasma comes in - we enhance existing simulators with realistic VRUs ( vulnerable road users) to test self driving functions in a realistic way. We are a plugin to enhance existing simulators to become like with human-like behaviour, if you will. We provide really realistic behaviour so the model will be exposed to  someone who is taking more risks or someone who might be drunk, people of all age groups etc.

How the Simulator Works

Let's say  someone wants to test the automated braking system. (In this case we are not talking about a fully autonomous car). In the simulation, you have a simple map where a car is taking a right turn, and the, shortly after the right turn, there's a pedestrian crossing the road.The engineer wants to understand -  “is my car going to brake in time because the pedestrian is coming from behind another car, so it's hard to perceive that pedestrian.”  The simulation then provides a pedestrian who crosses the road in multiple diverse ways taking different types of trajectories, different speeds, and different risk profiles. The engineer could set up the simulation to run e.g.  2000 times and then ask “how many times  (this will sound funny) did I kill the pedestrian? Did I hurt the pedestrian?”.  They will have a report. It's almost like running a gigantic set of unit tests and seeing what is going wrong? Then the engineer will adapt their own algorithm and test again to see if for these specific behaviour types are they now better prepared or not?  That is  why a frequent requirement in the industry is to have repeatable tests on a deterministic pedestrian. So the engineer will be able to understand specific behaviours e.g.  Is my brake working or not? This is a very common case But can also zoom out, and then you have a bigger map with pedestrian spawning across the map. The pedestrians that are spawning and appearing are going to different goal points along the way and producing different interactions with drivers or autonomous vehicles. It really looks like a game. So when people just walk by my desk, they might think, I am playing Sims or GTA :) 

The Challenges of Simulating Pedestrian Behaviour in an Urban Environment

The problem that we are solving is quite hard because to have, at the end of the day, what we would like to have is a generalised behaviour model of human beings for different parts of the world and even different locations.  Because a pedestrian, for example, in a roundabout will walk extremely differently than in an intersection. There are also crowded cities and less crowded cities. 

The Effects of Geography and Culture in Traffic Patterns. Furthermore traffic of both cars and pedestrians differs from country to country, city to city. If you go to the UK then people will behave differently. They drive on the left side of the road. They have to look the other way, and you cannot just kind of mirror your simulation and flip everything around. That is unfortunately not possible.  In India you have many more people, you have different types of actors on the road and people use different modes of transport. So it's an incredibly hard problem.  And we have the ambition to cover different geographical areas and our customers want a generalised model that they can just drop wherever in the world they might be, and that will just work. But we are not there yet. We will probably have to geofence autonomous vehicle models until we get to that stage of DeepTech.   

What Makes a Human-like Trajectory?  We are also still understanding what makes a trajectory human like.  Do we look at a trajectory just with kinematic metrics -  like in terms of speed, values etc. or do we have a behavioural component there as well. In truth it's all these things. Then we talk about different types of people e.g. an old lady, someone using a wheelchair, someone who is  blind and  someone returning from Octoberfest who is  incredibly drunk. There's incredible diversity in humankind which is  beautiful - but it requires a lot of work to put this as a technical product.

 Tackling the Challenge as a DeepTech Company

As a DeepTech company our development cycles are just incredibly long. We cannot just start and sell something right out of the gate. We have to start small and go in pieces. Our product still requires a lot of R & D work. Typically to solve a machine learning type problem you have to test and validate your problem and write some metrics. But how do you define a metric to quantify if a trajectory that your model produced is human-like or not? That's a very difficult thing to do. Even skimming research papers, there's no definite answer here. So you have to come at it from different viewpoints.  That is why we still have R & D department in our company and we are continuously researching because it's a safety critical application in the end of the day and our mission to be safe here

Standards in the Simulation Industry

In the simulation world, it's not always common to have a one size fits all simulator. It might make sense to have standards so specialised simulators can work together.The industry is working on that but so far 2 come to mind.

  • OpenDrive: This is a standard that makes maps and locations much more manageable. OpenDrive is just a file format  which kind of looks like XML which you can use to just describe a map (how the lanes should be; where are  the traffic signs and traffic lights. ) Then you have on top of that something called Open Scenario where you can describe dynamic things that are happening on an  OpenDrive map. In the simulation industry, I see that many more tools are now able to read OpenDrive files. Which is a nice thing because if you have a OpenDrive file, then you can generate a 3D map from that, and then you can work with that. This way if we integrate with another simulator, they are using OpenDrive, then we can immediately populate our own simulator on our end with that 3D map. 

  • OSI (Open Simulation Interface - formerly Open Sensor Interface): This describes how  we can couple two different or even more simulators together. Like you have a simulator that is  great at rendering specific sensors. Then you have another simulator (perhaps the main simulator) called the Environment simulator that displays the world in 3D. You might have our simulator that provides the human component. So you have to integrate them together. So wouldn't it be nice to have a standard way of exchanging information also in the simulation world. It works like an API, but if you couple two simulations together, you can exchange information every four milliseconds and go back and forth, back and forth. There is quite a lot happening on the technical side. 

I'm looking forward to seeing some of these standards develop,  because for a small company like ours it would be amazing not to have to adapt to all the different simulators that are out there, but to rather just provide a simulator and implement that standard interface. I think we will see in the next two years, there's going to be some nice developments there. Open Drive seems to become more user friendly  which wasn’t always the case.


We are about to see an explosion in the use of Reinforcement Learning as simulators become more readily available.. There is a need for specialised simulators - for example in autonomous vehicles, there needs to be a simulator that handles the human behaviour component of pedestrians in urban settings. Furthermore, standards are emerging that will help integrate specialised simulations so models can be trained in the most comprehensive way.

Maria Meier

Maria Meier

CTO @ Phantasma

Maria is the CTO at Phantasma, which she also Co-Founded. Through her work, Maria is working towards enabling the safe interaction between self-driving cars and vulnerable road users. Maria has aimed to promote technology and adoption if AI in Germany throughout her career. Maria has gained a number of certifications and licenses, enhancing her skills in fields such as Machine Learning and Big Data.