Motivation
In the field of automated mobile app testing, training AI models to identify UI elements is critical. However, the standard approach of manually drawing bounding boxes around buttons, icons, and other elements is incredibly time-consuming, prone to error, and fundamentally unscalable. I saw an opportunity to completely automate this process, creating a system that could generate a vast, high-quality dataset with perfect accuracy, freeing up human resources and accelerating research.
Approach
My solution was a multi-part system that fully automated the data generation and training workflow.
I identified that simple bounding boxes (used by models like YOLO) are insufficient for UI elements, which can be non-rectangular, rotated, or partially obscured. Therefore, I opted for pixel-perfect segmentation masks to capture the precise shape of each element.
I developed a custom Android application in Kotlin that would act as an automated data generation pipeline:
- Programmatically generate randomized layouts using standard Material Design components.
- Capture a screenshot of the complete UI.
- Systematically generate a perfect segmentation mask for each interactive element by rendering it in solid black against a pure white background.
- This entire process was orchestrated by a remote server, allowing for the generation of thousands of image-mask pairs without any human intervention.
To work with the high-quality segmentation masks, I chose Mask R-CNN, a powerful instance segmentation model. This was a deliberate step up from simpler object detection models, tailored to the precision required for the task. Finally, I trained the Mask R-CNN model on the synthetically generated dataset and evaluated its performance on real-world applications to test its generalization capabilities.
Results and Academic Recognition
The project was a resounding success and validated the automated approach.
- Academic Excellence: The thesis received the highest possible grade (1.0) and was praised by my professor for its depth and innovation.
- Conference Submission: On the recommendation of my advisor, the research was written up as a formal paper and submitted to an international software testing conference (AST 2023). While it was ultimately lightly rejected, the submission itself is a testament to the work's quality.
- High-Fidelity Model: The resulting model demonstrated strong performance in identifying and segmenting UI elements in real-world apps, proving that a model trained purely on synthetic data could generalize effectively.
Learnings
In this project I learned:
- How to design and build robust, automated data generation pipelines.
- The power of synthetic data to solve real-world ML problems and overcome the limitations of manual labeling.
- The architectural differences and practical trade-offs between object detection (YOLO) and instance segmentation (Mask R-CNN).
- The formal process of academic research, writing, and submission at a conference.