Program

Monday, 29 October 2018

09:00 – 9:30 Registration
09:30 – 10:45 TUTORIAL – Jan Sprenger
An Introduction to the Philosophy of Statistics
10:45 – 11:10 Coffee break (25min)
11:10 – 11:50 Silvia Ivani, Matteo Colombo and Leandra Bucher
Uncertainty in Science: A Study on the Role of Non-Cognitive Values in the Assessment of Inductive Risk
11:50 – 12:30 Kåre Letrud and Sigbjørn Hernes
Affirmative citation distortion in scientific myth debunking: a three-in-one case study
12.30 – 13.10 Olmo van den Akker, Marcel van Assen, Marjan Bakker, Jelte Wicherts
How do researchers interpret the outcomes of multiple studies?
13:10 – 14:45 Lunch (95min)
14:45 – 16:00 KEYNOTE – Anna Dreber
Replications and Prediction Markets
16:00 – 16:15 Coffee break (15min)
16:15 – 16:55 Henk Kiers and Jorge Tendeiro
Replacing Frequentist Statistics by Bayesian Statistics in Teaching and Research Practice
16:55 – 17:35 Barbara Osimani
Too good to be true: Bayesian models of reliability, bias and random error
17:35 – 19:00 POSTER SESSION(with drinks and bites)

    • Gerdien van Eersel, Gabriela Koppenol-Gonzalez and Julian Reiss
      The Average: Still in the Running to be Human Sciences’ Top Model? Extrapolation on the Basis of Latent Classes
    • Fayette Klaassen
      What are prior probabilities and why do we need them?
    • Vera Heininga, Bobby Stuijfzand, Jojanneke Bastiaansen, Tineke Oldehinkel, Wolf Vanpaemel, Francis Tuerlinckx, Richard Artner, Alice Mason and Marcus Munafò
      Justification of analytical choices is good; transparency is better
    • Helene Speyer
      The animal within
    • Richard Artner, Francis Tuerlinckx and Wolf Vanpaemel
      Reproducibility: The elephant in the room?
    • Stephanie Debray
      Pseudosciences, Scientific Errors and Fraud-science: what are the differences?
    • Aline Claesen, Wolf Vanpaemel and Francis Tuerlinckx
      Preregistration: Comparing dream to reality
19:30 – 21:30 Workshop Dinner at Restaurant Zondag

 

Tuesday, 30 October 2018

09:30 – 10:45 TUTORIALDaniël Lakens
Tutorial: An Introduction to (Mis)applied Statistics
10:45 – 11:20   Coffee break (35min)
11:20 – 12:00 Femke Truijens and Mattias Desmet
‘The data’ and the replication crisis in psychology. A qualitative case analysis of validity of data collection in psychotherapy research
12:00 – 13:15 KEYNOTE – Richard Morey
Shaky Foundations: Statistics in the Scientific Reform Movement
13:15 – 14:45 Lunch (90min)
14:45 – 15:25 Daniel Auker-Howlett
Error, Probability, and Evidence Assessment in Medicine
15:25 – 16:05 Rink Hoekstra, Richard Morey and Eric-Jan Wagenmakers.
Improving the interpretation of confidence and credible intervals
16:05 – 16:20         Coffee break (15min)
16:20 – 17:35 KEYNOTE – Jacob Stegenga
The New Problem of Old Evidence: P-hacking & Pre-analysis Plans

 

Presentation and poster abstracts in alphabetical order


Author: Olmo Van den Akker, Marcel Van Assen, Marjan Bakker and Jelte Wicherts (Tilburg University)

Title: How do researchers interpret the outcomes of multiple studies?

Abstract: We investigated how researchers assess the validity of a theory when they are presented with the results of multiple studies that all test that theory. We find that researchers’ belief in the theory increases with the number of significant outcomes and that replication type and the respondent’s role do not affect response patterns. In addition, we look at individual researcher data and find that only a handful of participants use the normative approach of Bayesian inference and that the majority of participants use vote counting approaches. These results highlight that researchers make structural errors when assessing papers with multiple outcomes.


 Author: Daniel Auker-Howlett (University of Kent)

Title: Error, Probability, and Evidence Assessment in Medicine

Abstract: This paper argues that an incomplete proposal by the clinical evidence assessor, the Grading, Recommendations, Assessment, Development and Evaluation (GRADE) framework, to rate the ‘Quality of Evidence’ using probabilities cannot be interpreted as using Frequentist probabilities. GRADE link evaluation of implementation errors in clinical trials to probability judgements about the evidence. I evaluate accounts Frequentist accounts of probability and inference and find them insufficient to account for the way GRADE links error and probability. The conclusions of this paper provide a counter-example to accounts of Frequentist post-experimental evidence assessment, and a guide for how GRADE can complete their proposal.


Author: Richard Artner, Francis Tuerlinckx and Wolf Vanpaemel (KU Leuven)

Title: Reproducibility: The elephant in the room?

Abstract: We investigated the reproducibility (i.e. reproduction of statistical findings by using the exact same models and raw data) of 46 articles published in 2012 in three different APA journals. First, we identified the main results of each article, defined as a priori hypotheses of primary interest that are mentioned in the abstract. Next, we closely followed the method section of those articles to aim to recalculate the main statistics of each main result by using the raw data of the respective article. This talk will summarize the results and discuss the encountered difficulties.


Author: Aline Claesen, Wolf Vanpaemel and Francis Tuerlinckx (KU Leuven)

Title: Preregistration: Comparing dream to reality

Abstract: Preregistration is one suggested method to gain more confidence in psychological research. Several journals implemented badges provided by the Center for Open Science, among which a preregistration badge, rewarding authors for open research practices. As a result, preregistered studies might inspire more confidence than studies without preregistration. To investigate whether this increased confidence is just, we examined 23 articles with a Preregistration badge, published in Psychological Science between February 2015 and November 2017. Results and implications will be discussed, which will hopefully guide us towards what can be considered as good or bad practices regarding preregistration.


Author: Stephanie Debray (Université de Lorraine)

Title: Pseudosciences, Scientific Errors and Fraud-science: what are the differences?

Abstract: James Ladyman (2013) uses the concept of « bullshit » introduced by Harry.G Frankfurt (2005) to question the relation between the concepts of « pseudoscience », « bad-science », « fraud-science » and « non-science ». Even though the concept of « pseudoscience » cannot be fully associated with the concept of « bullshit », his article allows us 1° to explain why the scientist behavior and the social aspect of the scientific practice seem to be essential components if we want to move forward on the demarcation question, 2° to suggest a definition of pseudo-science that takes into account these elements.


Author: Anna Dreber (Stockholm School of Economics)

Title: Replications and prediction markets

Abstract: In this talk we will discuss the recent replication project (Camerer et al. 2018 in Nature Human Behaviour) on social science experiments in Nature and Science 2010-2015 as well as the potential for prediction markets in helping us understand the replicability of scientific results.


Author: Gerdien van Eersel (Erasmus University Rotterdam), Gabriela Koppenol-Gonzalez (Erasmus University Rotterdam) and Julian Reiss (Durham University)

Title: The Average: Still in the Running to be Human Sciences’ Top Model? Extrapolation on the Basis of Latent Classes

Abstract: In the human sciences, experimental research is used to establish causal relationships. However, the extrapolation of these results to the target population can be problematic. To facilitate extrapolation, we propose to use the statistical technique Latent Class Regression Analysis in combination with the analogical reasoning theory for extrapolation (Guala 2005). This statistical technique can identify latent classes that differ in the effect of X on Y. In order to extrapolate by means of analogical reasoning, one can characterize the latent classes by a combination of features, and then compare these features to features of the target.


Author: Vera Heininga (KU Leuven), Bobby Stuijfzand (Statscape LTD London), Jojanneke Bastiaansen (The University Medical Center), Tineke Oldehinkel (The University Medical Center), Wolf Vanpaemel (KU Leuven), Francis Tuerlinckx (KU Leuven), Richard Artner (KU Leuven), Alice Mason (University of Bristol) and Marcus Munafò (University of Bristol)

Title: Justification of analytical choices is good; transparency is better

Abstract: There are many possibilities to model statistical relationships. Typically, researchers justify why one specific model is the best model to answer their research question. Although justification is good, transparency may be better (Simmons, Nelson en Simonsohn, 2011). Transparently discussing findings along different alternatives provides not only a robustness check, but also the opportunity for others to question analytical choices (Heininga et al., 2015; Silberzahn et al., 2017). With the aim to facilitate those researchers who would like to be more transparent, the present study evaluates a newly developed multiverse tool.


Author: Rink Hoekstra (University of Groningen), Richard Morey (University of Cardiff) and Eric-Jan Wagenmakers (University of Amsterdam)

Title: Improving the interpretation of confidence and credible intervals

Abstract: Confidence intervals (CIs) are often endorsed as a useful alternative for the frequently criticized significance test. However, neither students nor researchers find them easy to interpret. This may be understandable, given how complicated the interpretation of CIs can be, but it seems indicative of a statistical education that is suboptimal. This is underscored by an analysis of introductory statistical textbooks, which shows an alarming frequency of incorrect interpretations of CIs. Apparently, statistical education is not optimally effective. Subsequently, we discuss constructive suggestions to improve education, with the goal of eventually improving students’ understanding, despite the haziness in many textbooks.


Author: Silvia Ivani (Tilburg University), Matteo Colombo (Tilburg University) and Leandra Bucher (University of Wuppertal)

Title: Uncertainty in Science: A Study on the Role of Non-Cognitive Values in the Assessment of Inductive Risk

Abstract: Philosophers of science have called the chance of being wrong when assessing scientific hypotheses inductive risk, and argued that dealing with inductive risks requires appealing to non-cognitive values (e.g. moral and political values). The basic idea is that scientists can legitimately rely on non-cognitive values to set appropriate evidential standards in the face of inductive risks. In this paper, we present an empirical study of the relationship between human reasoning, inductive risk, and non-cognitive values. Our study focused on whether people’s sex, race, and political values can reliably predict their assessment of cases of inductive risk.


Author: Henk Kiers and Jorge Tendeiro (University of Groningen)

Title: Replacing Frequentist Statistics by Bayesian Statistics in Teaching and Research Practice

Abstract: Recognizing the need for a transition from frequentist statistics to Bayesian statistics, it is discussed what kind of Bayesian analysis should be propagated, what implications this can have for science and communicating scientific results, and therefore what challenges can be encountered in order to reform teaching and research practice in this way.


Author: Fayette Klaassen (Utrecht University)

Title: What are prior probabilities and why do we need them?

Abstract: Updating knowledge is a key part of scientific research. Updating starts at the level of theories that explain the world around us. In order to test and update these theories, hypotheses are formulated that express (parts of) such a theory. The hypotheses in turn define a set of parameters and a statistical model that describes the data, so that it is `testable’. In order to update knowledge about a theory, the probabilities of hypotheses have to be updated and consequently parameter distributions too. In this talk the meaning and relevance of prior probabilities are discussed.


Author: Kåre Letrud and Sigbjørn Hernes (Inland Norway University of Applied Sciences)

Title: Affirmative citation distortion in scientific myth debunking: a three-in-one case study

Abstract: In this paper we shall argue that erroneous or unsubstantiated ideas, once engrained in academic publishing and debate, becomes shielded from criticism due to a phenomenon we shall call ‘affirmative citation distortion’: Among publications that cite a work making a substantial case against an old and oft-published claim, the majority is affirming the claim. Furthermore, we theorize that affirmative citation distortion could make critical efforts counter-productive, by contributing to the continued proliferation of the idea.


Author: Richard Morey (Cardiff University)

Title: Shaky foundations: statistics in the scientific reform movement

Abstract: One of the hallmarks of the current reform movement in science is that it critiques statistical behaviour and understanding of scientists. They are right to critique: statistical understanding in science is often shockingly poor, and behaviour based on poor understanding is often correspondingly poor. Science is rife with publication bias, cherry picking, “p hacking”, failure to check assumptions, simplistic use of data visualisations and statistical tests and other issues. The reformers purport to correct flawed understanding; moreover, much of the reform movement is based on statistical arguments for change. But do the reformers have good statistical arguments themselves? Often, the answer is “no”. I will explore statistical confusion in the reform movement itself and potential ramifications.


Author: Barbara Osimani (Università Politecnica delle Marche)

Title: Too good to be true: Bayesian models of reliability, bias and random error

Abstract: Bovens and Hartmann’s (2003) results concerning the failure of the variety of evidence thesis run against the “too-good-to-be-true” intuitions underpinning suspicion of bias for considerable long series of reports from the same instrument.  We developed a model where the instrument may either be reliable but affected by random error, or systematically biased towards delivering positive reports (but non-deterministically so). The VET fails in our model too, but the area of failure is considerably smaller and affects borderline cases. Furthermore, we explain counterintuitive results in Bovens and Hartmann, along which the area of the VET failure grows for increasing consistent reports.


Author: Helene Speyer (Mental Health Center Copenhagen)

Title: The animal within

Abstract: Meta-research should include individuals with lived experience. This case report seeks to illustrate the power of conflicts of interest, told in first person`s perspective. To grasp the power, it is crucial to recognize that inside every researcher, there is an animal, struggling to survive. Researchers have the means, motive and opportunity to commit questionable conduct of research. Means, as we all urge to survive in the system of incentives. The motive is obvious; sex, fame and money. The opportunity will always be there, as preregistering cannot remove all degrees of freedom.


Author: Jacob Stegena (University of Cambridge)

Title: The New Problem of Old Evidence: P-hacking & Pre-analysis Plans

Abstract: P-hacking involves the manipulation of experimental methods and data to find statistically significant results. Many claim that p-hacking is epistemically suspicious, especially in the medical and social sciences, yet scientific methods that amount to p-hacking routinely contribute to legitimate discoveries. The problem with p-hacking is usually articulated from a frequentist perspective. In this paper we articulate the epistemic peril of p-hacking using three resources from philosophy: predictivism, Bayesian confirmation theory, and model selection theory. We use these resources to defend a nuanced position on p-hacking: p-hacking is sometimes, but not always, epistemically pernicious, and we articulate the precise conditions under which this is so. This requires a novel understanding of Bayesianism, since a standard criticism of Bayesian confirmation theory is that it cannot accommodate the influence of biased methods. A methodological device widely used to mitigate the peril of p-hacking is a pre-analysis plan. Some say that following a pre-analysis plan is epistemically meritorious while others deny this, and in practice pre-analysis plans are often violated. We use the formal groundwork developed in the first half of the paper to resolve this debate, offering a modest defence of the use of pre-analysis plans. Further, we argue that pre-analysis plans can be epistemically relevant even if the plan is not strictly followed.


Author: Femke Truijens and Mattias Desmet (Ghent University)

Title: ‘The data’ and the replication crisis in psychology. A qualitative case analysis of validity of data collection in psychotherapy research

Abstract: The replication crisis in psychological science was shown by re-analysis of ‘the data’. Explanations are often focused on method, ranging from selective publication to erroneous or even fraudulent research conduct. We explore a more basic empirical explanation: we scrutinize validity of ‘the data’, as data has to be valid to allow for validity in every step of evidence generation. We use a qualitative case study to discuss how patients’ stories are translated into quantitative data in psychotherapy research. We discuss epistemic consequences of found validity issues and argue for the need of a concept of ‘validity of data collection’.