What We've Learned to Control

I’m giving a keynote address at the virtual IFAC congress this July, and I submitted an abstract that forces me to reflect on the current state of research at the intersection of machine learning and control. 2020 is particularly appropriate for reflection: For personal reasons, I’ve been working in this space for about half a decade now and wrote a blog series on the topic two years ago and it seemed like ideal timing. For the broader community, 2020 happens to be the year we were promised fleets of self-driving cars. Of course, for a myriad of reasons, we’re nowhere close to achieving this goal. Full self-driving has been a key motivator of work in learning-enabled autonomous systems, and it’s important to note this example as a marker of how difficult problems this space really are.

The research community has come to terms with this difficulty, and has committed itself to address the many pressing challenges. Over the last year I attended several great meetings on this topic, including an NSF funded workshop at UW, a plenary session at ITA, a workshop on intersections of learning, control, and optimization at IPAM, and the second annual conference on Learning for Dynamics and Control. There is clearly a ton of enthusiasm from researchers in a many different disciplines, and we’re seeing fascinating results mixing techniques from machine learning, computer science, and control. Obviously, I’m going to be leaving out many incredible papers, but, focusing on the theoretical end of the spectrum, perhaps I could highlight new work on policy optimization that demonstrates how simple optimization techniques can efficiently solve classic, nonconvex control-design problems or work connecting regret minimization and adaptive control that provides nonasymptotic bounds.

This work has been very useful for establishing language to bridge communication between the diverse research camps interested in the space of learning, dynamics and automation. But, as I’ll discuss shortly, I’d argue that it hasn’t provided many promising approaches to improving large-scale autonomy. That’s ok! These problems are incredibly difficult and aren’t going to be solved by wishing for them to be solved. But I also think it might be worth taking a moment to reflect on which problems we are working on. It’s always a bit too easy for theorists to focus on improving technical results and lose sight of why someone proved those results in the first place.

As an illustrative example, my research group spent a lot of time studying the Linear Quadratic Regulator (LQR). The point of this work was initially to establish baselines: LQR has a closed form solution when the model is known, so we wanted to understand how different algorithms might perform when the underlying model was unknown. It turns out that if you are willing to collect enough data, the best thing you can do for LQR is estimate the dynamical model, and then exploit this model as if it were true. This so-called “certainty equivalent control” is what practitioners have been doing since the mid-60s to fly satellites and solve other optimal control problems. Proving this result required a bunch of new mathematical insights that established connections between high dimensional statistics and automatic control theory. But it did not bring us closer to solving new challenges in robotics or autonomous systems. Our work here merely showed that what the controls community had been doing for 50 years was already about as well as we could do for this important baseline problem.

So what are the ways forward? Are there things that theory-minded folks can work on short term that might help us understand paths towards improving learning systems in complex feedback loops? Let me suggest a few challenges that I see as both very pressing, but also ones where we might be able to make near-term progress.

Machine Learning is still not reliable technology

At the aforementioned IPAM meeting, Richard Murray gave a fantastic survey of the sorts of standards of reliability imposed in aerospace engineering. Go watch it! I don’t want to spoil it for you, but his discussion of Ram Air Turbines is gripping. Richard covers what is needed to get to the sorts of reliability we’d like in autonomous systems. Unfortunately, having 88.5% Top-1 accuracy on ImageNet—while a stunning achievement—doesn’t tell us how to get to systems with failure rates on the order of 1 in a billion. As Boeing has tragically shown, cutting corners on autonomous system safety standards has horrible, tragic consequences.

How can we make machine learning more robust? How can we approach the failure rates needed for safe, reliable autonomy? And how can we establish testing protocols to assure we have such low failure rates?

Prediction systems in feedback loops

One particular aspect that I think is worth considering is how supervised learning systems can function as “sensors” in feedback loops. Even if you know everything about a dynamical system, when you observe the state via an estimator generated by a learned component, it’s not clear how to best take action on this observation. Most classic control and planning assumes that your errors in state-estimation are Gaussian or nicely uniformly bounded. Of course, the errors from machine learning systems are neither of these (I recommend checking out the crazy videos of the confusion that comes out of Tesla Autopilot’s vision systems). How to properly characterize the errors of machine learning systems for control applications seems like a useful, understudied problem. Using off-the-shelf machine learing analysis, it’s unavoidable to have to densely sample all of the possible scenarios in advance in order to guarantee the sort of uniform error bounds desired by control algorithms. This isn’t practical, and, indeed, it’s clear that this sort of sensor characterization is not needed to make reasonable demos work. Though it’s a bit mundane, I think a huge contribution lies in understanding how much data we need to quantify the uncertainty in learned perception components. It’s still not clear to me if this is a machine learning question or a closed-loop design question, and I suspect both views of the problem will be needed to make progress.

Why are we all sleeping on model predictive control?

I still remain baffled by how model predictive control (MPC) is consistently under appreciated. We’ll commonly see the same tasks in the same meeting, one task done on a robot using some sort of deep reinforcement learning and the other done using model predictive control, and the disparity in performance is stark. It’s like the difference between watching an Olympic level sprinter and me jogging in my neighborhood with a set of orthotics.

Here’s an example from the IPAM workshop. Martin Riedmiller presented work at DeepMind to catch a ball in a cup:

This system uses two cameras, has a rather large “cup,” (it’s a wastepaper basket) and yet still takes 3 days to train on the robot. Francesco Borrelli presented a different approach. Using only a single camera and basic, simple Newtonian physics, and MPC they were able to achieve this performance on the standard-sized “ball-in-a-cup” toy:

If you only saw these two videos, I can’t fathom why would you invest all of your assets into deep RL. I understand there are still a lot of diehards out there, and I know this will offend them. But I want to make a constructive point: so many theorists are spending a lot of time studying RL algorithms, but few in the ML community are analyzing MPC and why it’s so successful. We should rebalance our allocation of mental resources!

Now, while the basic idea of MPC is very simple, the theory gets very hairy very quickly. It definitely takes some time and effort to learn about how to prove convergence of MPC protocols. I’d urge the MPC crowd to connect more with the learning theory crowd to see if a common ground can be found to better understand how MPC works and how we might push its performance even farther.

Perhaps we should stop taking cues from AlphaGo?

One of the grand goals in RL is to use function approximation algorithms to estimate value functions. The conventional wisdom asserts that the world is a giant Markov Decision Process and once you have its value function, you can just greedily maximize it and you’ll win at life. Now, this sort of approach clearly doesn’t work for robots, and I’m perplexed by why people still think it will work at all. Part of the motivation is that this approach was used to solve Go. But at some point I think we all have to come to terms with the fact that games are not the real world.

Now, I’d actually argue that RL does work in the real world, but it’s in systems that most people don’t actively think of as RL systems. Greedy value function estimation and exploitation is literally how all internet revenue is made. Systems simply use past data to estimate value functions and then choose the action that maximizes the value at the next step. Though seldom described as such, these are instances of the “greedy contextual bandit” algorithm, and this algorithm makes tech companies tons of money. But many researchers have also pointed out that this algorithm leads to misinformation, polarization, and radicalization.

Everyone tries to motivate RL by the success of AlphaGo, but they should be using the success of Facebook and Google instead. And if they did this, I think it would be a lot more clear why RL is terrifying and dangerous, and one whose limitations we desperately need to understand so that we can build safer tools.

Lessons from the 70s about optimal control

I have one set of ideas along these lines that, while I think is important, I still am having a hard time articulating. Indeed, I might just take a few blog posts to work through my thoughts on this, but let me close this blog with a teaser of discussions to come. As I mentioned above, optimal control was a guiding paradigm for a variety of control applications in the 60s and 70s. During this time, it seemed like there might even be hidden benefits to a full-on optimization paradigm: though you’d optimize a single, simple objective, you would often get additional robustness guarantees for free. However, it turned out that this was very misleading and that there were no guarantees of robustness even for simple optimal control problems. This shouldn’t be too surprising, as if you devote a lot of resources towards one objective, you are likely neglecting some other objective. But showing how and why these fragilities arise is quite delicate. It’s not always obvious how you should be devoting your resources.

Trying to determine how to allocate engineering resources to balance safety and performance is the heart of “robust control.” One thing I’m fascinated by moving forward is if any of the early developments in robust control might transfer over for a new kind of “robust ML.” Unfortunately for all of us, robust control is a rather encrypted literature. There is a lot of mathematics, but often not clear statements about why we study particular problems or what are the fundamental limits of feedback. While diligent young learning theorists have been scouring classic control theory text books for insights, these books don’t always articulate what we can and cannot do and what are the problems that control theory might help solve. We still have a lot of work to do in communicating what we know and what problems remain challenging. I think it would be useful for control theorists to think of how to best communicate the fundamental concepts of robust control. I hope to take up this challenge in the next few months on this blog.

I’d like to thank Sarah Dean, Horia Mania, Nik Matni, and Ludwig Schmidt for their helpful feedback on this post. I’d also like to thank John Doyle for several inspiring conversations about robustness in optimal control and on the encrypted state of the control theory literature.