Increasingly, policymakers are turning to behavioral science for insights about how to improve citizens’ decisions and outcomes. However, these insights can only inform policy insofar as they are comparable—and unfortunately, different intervention ideas are typically tested across different samples on different outcomes over different time intervals. Here we introduce the “mega-study,” a massive field experiment in which the effects of many different interventions are compared in the same population on the same objectively measured outcome for the same duration. In a mega-study targeting physical exercise among 61,293 members of an American fitness chain, 30 scientists from 15 different U.S. universities worked in small, independent teams to design a total of 54 different four-week digital programs encouraging exercise (or “interventions”). We show that 45% of these interventions significantly boosted weekly gym visits by 9 to 27%; the top-performing intervention offered micro-rewards for returning to the gym after a missed workout. Although only 8% of interventions created behavior change that was significant and measurable after the four-week intervention, in aggregate, we detect carry-over effects that are proportionally similar to those measured in prior research. Forecasts by impartial judges failed to predict which interventions would be most effective, underscoring the utility of mega-studies to improve the evidentiary value of behavioral science.