Study on fighter pilots and drone swarms sheds light on the dynamics of trust within human-machine teams

In a new study published in the Journal of Cognitive Engineering and Decision Making, researchers from the U.S. Air Force, Leidos, and Booz Allen Hamilton have taken a significant leap in understanding the dynamics of trust within human-machine teams, specifically in the context of military operations involving unmanned aerial vehicles (UAVs).

This research illuminates how fighter pilots’ trust in one component of a technological system can affect their trust in the system as a whole—a phenomenon known as the pull-down effect. Crucially, the study finds that experienced pilots can differentiate between reliable and unreliable UAVs, suggesting that the pull-down effect can be mitigated, thereby enhancing mission performance and reducing cognitive workload.

The trust between humans and machines is a pivotal factor in the successful deployment of autonomous systems, especially in high-stakes environments like military operations. Trust is considered a cornerstone of effective human-machine interaction, influencing operators’ reliance on technology.

Prior research has shown that trust in automation is directly linked to system reliability. However, when multiple autonomous systems are involved, as in the case of UAV swarms, judging reliability becomes significantly more complex. This complexity introduces the risk of the pull-down effect, where trust in all system components is reduced due to the unreliability of a single element.

“I was interested in conducting this research because the pull-down effect is a phenomena based in heuristic responding that has relevance to the U.S. Air Force,” explained study author Joseph B. Lyons, the senior scientist for Human-Machine Teaming at the Air Force Research Laboratory and co-editor of Trust in Human-Robot Interaction.

“In the Air Force, we need to understand how humans respond to human-machine interactions and, in this case, if perceptions of one technology can propagate to others, that is something we need to account for when fielding novel technologies. Also, this is a topic that has only been done in laboratory settings, so it was not clear if the observed effects would translate into more Air Force relevant tasks with actual operators.”

To investigate this phenomenon, the researchers employed a highly immersive cockpit simulator to create a realistic operational environment for the participants. The study involved thirteen experienced fighter pilots, including both retired and currently active pilots, with a wealth of flying hours in 4th and 5th generation Air Force Fighter platforms, such as the F-16 and F-35.

Participants were exposed to a series of six flight scenarios, of which four included an unreliable UAV exhibiting errors, while the other two scenarios featured perfectly reliable UAVs. Each pilot encountered 24 UAV observations in total, with a mix of reliable and unreliable UAVs designed to simulate real-world operational conditions closely.

The simulation environment was designed to reflect the complexities of managing UAV swarms, requiring pilots to monitor and control multiple UAVs (referred to as Collaborative Combat Aircraft or CCAs) simultaneously. The scenarios tasked pilots with monitoring four CCAs for errors, communicating any unusual behaviors, and selecting one CCA for a mission-critical strike on a ground target, all while managing the cognitive workload and maintaining situational awareness.

Contrary to what might have been expected based on previous research, the study revealed that the presence of an unreliable UAV did not significantly diminish the trust that experienced fighter pilots placed in other, reliable UAVs within the same system. This suggests that experienced operators, such as the fighter pilots participating in this study, are capable of nuanced trust evaluations, effectively distinguishing between the reliability of individual system components.

This finding challenges the assumption underpinning the pull-down effect — that the unreliability of one component can tarnish operators’ trust in the entire system. Instead, the pilots demonstrated what can be described as a component-specific trust strategy, suggesting that their expertise and familiarity with operational contexts enable them to make more discerning judgments about technology.

Moreover, the study found a significant increase in cognitive workload associated with the unreliable UAV compared to the reliable ones. This was an expected outcome, logically aligning with the notion that dealing with unreliable system components requires more mental effort and monitoring from human operators.

Yet, the researchers observed that higher trust in UAVs corresponded with lower reported cognitive workload, hinting at the potential for trust to mitigate the cognitive demands placed on operators by unreliable technology.

“After reading about this study, people should take away a couple things,” Lyons told PsyPost. “First, we found no evidence that negative experiences with one technology contaminate perceptions of other similar technologies in realistic scenarios with actual operators (in this case fighter pilots). While this is interesting, it also represents one study and thus requires replication in other settings. Second, people should take away the idea that theories and concepts should be tested in realistic domains with non-student samples wherever possible.”

Interestingly, despite introducing heterogeneity in the UAV systems to see if this could mitigate the pull-down effect — through different naming schemes and suggested capability differences — the study did not find significant evidence that such measures influenced the pilots’ trust evaluations. This result suggests that the pilots’ ability to maintain specific trust towards reliable components was not necessarily enhanced by these attempts at system differentiation.

“I was surprised that our manipulation of different asset types did not seem to have any bearing on the pilot’s attitudes or behaviors,” Lyons said. “However, and as noted in the manuscript, it is possible that the scenario did not pull out the need (or affordance) for this asset heterogeneity as much as it should have. I think this is an area that is ripe for additional research.”

However, the study is not without its limitations. The small sample size and the specific context of military aviation may limit the generalizability of the findings. Furthermore, the researchers acknowledge the need for future studies to explore the mitigating effects of factors such as system heterogeneity and operator training on the pull-down effect.

“There are always caveats with any scientific work,” Lyons said. “This was just one study, with a pretty small sample size. These findings need to be replicated across other samples and other domains of interest. Never put all of your eggs into the basket of just one study.”

This research opens up new avenues for enhancing the design and deployment of autonomous systems in military operations, ensuring that trust calibration is finely tuned to the demands of high-stakes environments.

“Within the Air Force, we seek to understand how to build effective human-machine teams,” Lyons told PsyPost. “A significant part of that challenge resides in understanding why, when, and how humans form, maintain, and repair trust perceptions with advanced technologies. The types of technologies we care about are diverse, and we care about the gamut of Airmen and Guardians across the Air and Space Force.”

“I feel that the heuristics we use in making trust-related judgements are underestimated and underrepresented in the literature,” he added. “This is a great topic for academia to advance our collective knowledge.”

The study, “Is the Pull-Down Effect Overstated? An Examination of Trust Propagation Among Fighter Pilots in a High-Fidelity Simulation,” was authored by Joseph B. Lyons, Janine D. Mator, Tony Orr, Gene M. Alarcon, and Kristen Barrera.