Why 96.4% of Psychological Safety Assessments Miss the Point
The Measurement Everyone Gets Wrong
Psychological safety has become one of the most discussed concepts in organizational life. Since Google's Project Aristotle identified it as the defining characteristic of high-performing teams, every major consulting firm, HR platform, and leadership development program has incorporated it.
There is one problem: almost no one measures it correctly.
Key Research Finding
Key Research Finding: A review of commercially available psychological safety assessment instruments found that 96.4% measured psychological safety at the individual level — asking individual employees how safe they personally feel. Only 3.6% measured at the team level, which is the level at which psychological safety actually operates as a construct.
This distinction is not academic. It is the difference between data that predicts outcomes and data that does not.
Why the Level of Analysis Matters
Psychological safety is, by definition, a team-level construct. It was originally defined by Amy Edmondson as "a shared belief held by members of a team that the team is safe for interpersonal risk-taking."
The key word is shared. Psychological safety is not how one person feels. It is a property of the team — a climate that emerges from repeated interactions, shared norms, and collective experience.
Individual-Level Measurement
When you ask individual employees "Do you feel psychologically safe at work?" you get individual perceptions. These perceptions are influenced by:
- The person's general disposition (some people feel safe everywhere; others feel unsafe everywhere)
- Their most recent interaction (a bad meeting skews the response)
- Their personal relationship with their manager (which may not reflect the team climate)
- Response bias (people report what they think is expected)
Individual-level data tells you how individuals feel. It does not tell you about the team environment. Aggregating individual responses to the team level without testing whether those responses actually converge is a statistical error that invalidates the data.
Team-Level Measurement
Valid team-level measurement requires:
- Items designed for team-level referent: Questions that ask about "this team" rather than "I personally"
- Within-team agreement testing: Statistical verification (ICC, rwg) that team members actually agree — that there is a shared perception, not just an average of divergent ones
- Between-team variance: Evidence that teams differ meaningfully from each other — that the measurement captures real differences in team climate, not just noise
Key Research Finding
Key Research Finding: When psychological safety was measured at the individual level and aggregated to teams without agreement testing, it predicted team performance in only 12% of studies. When measured with validated team-level instruments that confirmed within-team agreement, it predicted team performance in 78% of studies.
What Invalid Measurement Produces
False Confidence
Organizations that measure psychological safety with individual-level surveys often report that their overall score is "above average." This is meaningless for three reasons:
- The average is based on individual responses, not team climates
- High individual scores may mask low-safety teams (the average conceals the variance)
- Without team-level agreement testing, a team score of 4.2/5.0 might represent five people who all scored 4.2 — or one person who scored 5.0 and four who scored 3.9
Misallocated Resources
If your measurement cannot distinguish high-safety teams from low-safety teams, your interventions cannot be targeted. Resources go to organization-wide programs rather than the specific teams that need them.
Inability to Track Change
If your baseline measurement is invalid, you cannot measure whether interventions worked. Improvement in an invalid metric is not improvement — it is noise.
What Valid Measurement Requires
1. Validated Instrument Design
The assessment must be built for team-level measurement from the ground up. This means:
- Items reference the team ("On this team, we...") not the individual ("I feel...")
- Items cover multiple facets: willingness to take risks, comfort with disagreement, response to mistakes, inclusion in decision-making
- The instrument has been validated across multiple samples with confirmed psychometric properties
2. ICC Analysis
Intraclass Correlation Coefficients (ICC) measure the proportion of variance in responses that is attributable to team membership. An ICC(1) value above 0.05 and ICC(2) above 0.70 indicate that the team-level construct is reliably measured.
Without ICC analysis, you do not know whether your team scores represent actual team-level phenomena or artifacts of aggregation.
3. Sufficient Team Size and Response Rate
Valid team-level measurement requires:
- Minimum 3 respondents per team (5+ recommended)
- Response rate above 60% per team
- Representation across roles and tenure within the team
Teams that do not meet these thresholds should be flagged — their scores are unreliable.
4. Longitudinal Design
A single measurement captures a snapshot. Valid assessment requires repeated measurement (waves) to distinguish stable team climate from temporary fluctuation.
Key Research Finding
Key Research Finding: Single-wave psychological safety assessments showed test-retest reliability of only 0.58 over 6 months, suggesting that nearly half the variance was situational rather than stable. Three-wave designs achieved stability coefficients above 0.80, providing a reliable baseline for measuring intervention effects.
The Practical Consequence
Organizations that measure psychological safety incorrectly make decisions based on data that does not reflect reality. They conclude that psychological safety is "fine" when specific teams are in crisis. They deploy organization-wide interventions when targeted team-level interventions would be more effective and less expensive. They report to boards and executives that they are meeting their psychological safety objectives when they have no valid evidence of their actual standing.
The 3.6% of instruments that measure correctly are not more expensive. They are not more complex to administer. They simply apply the measurement methodology that the construct requires.
The question is not whether to measure psychological safety. The question is whether you are willing to measure it in a way that produces data you can actually act on.
This article draws on findings from psychometric research, team psychology, and organizational measurement methodology. For the complete evidence base, see the CultureIQ Labs Research page.
Related Research
- ND Workplace Climate, Disclosure & Brain Health — Original research proposing four new measurement instruments for neurodiversity-affirming organizational climate, disclosure, accommodation, and masking.
- Why Engagement Surveys Don't Measure What Matters — The evidence brief on why 96.4% of instruments measure at the wrong level of analysis.
See the platform that operationalizes this research.
CultureIQ Labs connects psychological safety assessment, leadership training, and RTW risk scoring in one auditable system.
Research Updates
Get New Research When It's Ready.
New publications, evidence briefs, and free tools — delivered when they're ready, not on a schedule. No spam. No sales sequences. Just evidence.
Unsubscribe anytime. We respect your inbox.