Motivation & Goal
Manual testing of Android apps is time-consuming, repetitive, and doesn’t scale well. Even scripted tests break frequently when UIs change. As research associate at AIFB (KIT), I explored whether a reinforcement learning agent could autonomously test Android apps by learning to interact with apps like a user would, without being explicitly told what to do.
Approach & Architecture
The solution was built using the android_env interface - an RL environment that bridges the Android emulator and RL agents - along with Stable-Baselines3 for training.
Key components:
- Agent: Trained with Proximal Policy Optimization (PPO) and Advantage Actor-Critic (A2C).
- Observation space: Initially raw pixel screenshots. Then reduced and preprocessed for faster training.
- Reward signal: Based on new screen discovery - agents were rewarded for reaching novel states using embeddings of screens and a vector database to calculate novelty of the screen via cosine similarity.
I also built custom components to solve the cold start problem. The feature extractor of a pretrained YOLOv5 model was integrated into the agents network and a wrapper penalizing clicks far away from visible UI elements initially sped up training.
Challenges & Limitations
Emulator speed was a major bottleneck - limiting parallelism and rollout throughput ultimately making any real progress hard on the available hardware.
The agent also had difficulty generalizing across apps, which suggests the need for hierarchical or curriculum learning in future versions.
Learnings
This project gave me hands-on experience applying RL in non-game, real-world settings:
- Designing reward functions for sparse and indirect objectives
- Using Gym-compatible wrappers and modifying training pipelines