eller: iron ball (Default)
[personal profile] eller
So, I have to start this one with a disclaimer: committing scientific misconduct is no joke. It is grossly unethical, no exceptions. (As a scientist, I actually have strong feelings about this.) It's more like... political extremism is also not funny (it harms people) but extremist propaganda very well can be... You know. Laughing instead of crying, and all that. Because I am a natural cynic, I conceptually enjoy creative ways of manipulation, logical fallacies, and lying with statistics, while still condemning them in practice. I have, at one conference, played a drinking game involving obviously shitty statistical methods on posters - please, consider it an act of self-defense (the other players and I were merely trying to preserve our mental health) rather than acceptance of bad scientific practice. When done intentionally, this shit needs to be called out (and the perpetrator ousted from the scientific community).

That being said... It's important to be aware. Let's enjoy some good old scientific dishonesty.

Fake data

This is the one that I guess most people think of first when they hear "scientific misconduct", but perpetrators who exaggerate with this are usually caught very quickly (that is, as soon as anyone tries to reproduce their experiment, which will happen if the results sound interesting/spectacular enough) and have ruined their career over nothing.

Example: "We measured the visible light spectrum of the sun and it is green."

Obviously, there is no way in hell the presented data are real. Also, obviously, it is unlikely anything as blatant as this will pass peer review in the first place. (Heck, even real studies with unexpected results have difficulties getting published, because serious journals seriously dislike that kind of risk! Science is extremely conservative!) But more subtle manipulations, like adding a few convenient data points to an otherwise real data set? That happens and is notoriously hard to prove.

Plagiarism

Used to be encountered quite commonly in the wild, but is seriously endangered.

Example: "Our experimental results show for the first time that the sun is yellow."

As the authors of this study are not the first ones to point out that the sun is yellow, this is scientific misconduct: the authors take someone else's result and pass it off as their own. To be fair, in this particular case it is kind of difficult to cite correctly, because we do not know who was the first person to mention that the sun is yellow. (Some stone age person, most likely.) However, it is possible to deal with these issues honestly.

Fortunately, these days, shit like this gets caught more often, because of search engines, automated plagiarism checks, and AI. Most people don't even try anymore: the risk of being caught even decades later with new technology is simply too high. Scientists generally know this.

Wrong citation

I have no idea why this is as common as it is, seeing how it is very easy to catch, yet somehow... Is it because the peer reviewers generally don't bother checking the cited literature?

Example: "Our collected data confirm the results of [Bleh et al., 2023] that the sun is a giant egg yolk in the sky."

The main issue with this is not just that the presented data are wrong... No, this may be a genuine measurement error. The issue that would make it scientific misconduct is when Bleh at al. never made any such statement about the composition of the sun. They merely stated that the visual appearance of the sun is similar to that of an egg yolk!

...yeah, this is the kind of shit that leads to a lot of yelling, thrown drinks at conferences, and/or lifetime feuds between scientists. Bleh is not going to be happy about being misquoted with something that has the potential of damaging their standing in the field!

Hiding your methods

This is extremely common in studies of topics with commercial relevance: people (try to) publish data without fully disclosing their methods.

Example: "We measured that the sun is yellow."

The result is not necessarily wrong (it is plausible enough and agrees with the majority of the literature in the field), but the authors do not disclose what exactly they measured, how they measured it, and so on. No one will be able to reproduce their study! Of course, good peer reviewers are checking manuscripts for this shit, but if it is an experimental method they do not use themselves, they will not necessarily notice some step missing along the way.

Bullshitting with statistics


With the caveat that scientists are not necessarily good at math, and genuine mistakes happen... This is often enough used intentionally to deserve a place on this list.

Example: "We determined the perception of the sun's color by asking three random strangers in the street. The first two answered yellow, the last one answered green. We did a statistical evaluation (here is the box plot) and conclude from this that the sun is perceived on average as a greenish yellow, shifting towards a greener perception during the course of the day."

AAAAARGH! I mean, this has several things that should get the authors fired (but likely will not, because enough scientists are shitty at math). Let's try to list them... Actually, there is so much wrong here that I will likely even forget to list some serious issues with this data treatment. It is that horrible. XD

- Low N. If we are going to make a survey and do a statistical evaluation, we really need to ask more than three people.
- Doing a box plot from three values should be entirely impossible (by definition), so whatever is in that plot, it is not the data the authors actually gathered. (I wish I were making this up, but I have encountered a box plot of three values in the wild. Yes, at the very same conference with the drinking game.)
- I am not certain about the typical way of dealing with survey data, but I am quite certain that the decision to use the average value here... needs some discussion. ;)
- Failure to identify outliers in your statistic. (Here: you know, the person who was being sarcastic.)
- Bullshit construction of a trend. (One vodka for every unjustified use of linear regression, please!)
- Failure to discuss uncertainty.
- Failure to compare to previously published (and, one can assume, drastically different) datasets.
- Failure to correct for known factors that could skew the result. (Like... where did the authors do this survey? Next to a kindergarten? By the front entrance of the Colorblindness Association headquarters? Near the museum of modern art? Demographic matters.)
- ...

I'm sure there is even more wrong with this, but my brain is shutting down. There are many other ways of statistics bullshit (heck, there are whole books about the topic), but... Let's just conclude that 97.7 ±5 % of statistics have some kind of problem, and move on.

Data cherrypicking

To some extent, confirmation bias is normal, and scientists are not free from it. However, most scientists are also aware of this phenomenon and consciously work against it. However, some scientists - especially those peddling fringe hypotheses generally considered bogus by the vast majority of their field - love to indulge. They show only the data that supports their pet hypothesis, only cite other authors whose conclusions agree with their own, and so on.

Example: "We measured that the sun is green; look at our lovely measurement. We threw out all the other data because we did not like the results. This is in perfect agreement with the spectral measurements collected by [Bloergh et al., 2018] as well as the color perception survey conducted by [Blip and Blup, 2022]."

Even if the mentioned authors were indeed cited more or less correctly (which they are not: if you remember, the faked spectral data really said green, but the color survey study actually arrived at an averaged "greenish yellow", so there is a wrong citation as well), the core problem here is the deliberate omission of all those other authors who came to the drastically different conclusion that the sun is yellow. Similar tactics are also often used to create false balance (that is, make it sound as if some fringe hypothesis is just as accepted in the field as that other hypothesis which only, you know, 99.9% of scientists for some reason consider much more valid). Of course, in real life, it is usually done more subtly, proving the intent is difficult, and I have never seen anyone be reprimanded for this shit despite it being ubiquitous.

Citing things you really should not cite.

This is not always easy to catch. Sure, you should not cite anything that does not meet scientific standards (like publications that are not peer-reviewed), but in practice, there are a few grey areas, especially in mathematics, where some groundbreaking stuff first appeared in letters or nonscientific magazines without that making it "uncitable". You should not cite outdated research (though what that means in the individual case is debatable and varies between scientific disciplines: math may still be considered valid a few centuries after first publication because if it's correct it's correct; medicine generally not, unless modern science confirms the old stuff, and even then you should cite the newer stuff that actually meets scientific standards!) or refuted research (though, again, when there are differing scientific standpoints, whether something is "refuted" or merely "still debated" is the source of much... debate XD). You should not cite anything that is not a scientific study at all.

Example: "The sun is hiding behind a tree [Raposo, 1992]."

The problem: while "Mr. Golden Sun" is arguably a lovely children's song, what it is not is a peer-reviewed scientific study. (It is also a wrong citation, because the song is traditional and only popularized by Raposo, but, whatever. XD)

Even if it were a scientific study, it could arguably be considered outdated (because astrophysics are a fast-moving field of science, so a 1992 paper is ancient and, while not totally uncitable because of its age, probably only citable if it a) had been a key publication at the time and b) was still considered valid and a basis of ongoing research) as well as refuted beyond any reasonable doubt (because, while I am not active in that field, I am certain the vast majority of astrophysicists is going to strongly disagree and back that disagreement up with solid data). We could also start criticizing the methodology... Never mind.

Unfortunately, in practice, most cases are not as clear as this one, and while it is probably one of the most common forms of scientific misconduct (often in combination with cherrypicking), it is notoriously hard to prove deceptive intent (after all, who is to say the authors did not consider this ancient paper still highly relevant for the field), notoriously controversial (and good for mudslinging between scientific rivals) with tons of corner cases, and almost never punished.

Awkward situation: I once had to cite something from 1850, in a publication about computational modelling (which is generally thought to be a fast-moving field, too). Of course, it was exactly the mathematical approach I needed, it was unrefuted (and merely, um, not used much since back then), and claiming it as my own invention would have been plagiarism, so there really was no choice... But a lot of incredulity (that this was really necessary), and people (including myself) laughed. A lot.

Inferring causation from correlation

To be fair, this one is more often committed by journalists talking about science than by the scientists themselves, but... It happens, and because the problem is one of the first things science students learn, if it appears in an actual study, it is usually done intentionally.

Example: "The observed rooster crowed every morning. Right afterwards, the sun went up. Therefore, the rooster crowing calls the sun from the horizon every morning."


NO. JUST NO.

There are, of course, some publications playing with spurious correlations by taking some really well correlated datasets that have nothing to do with each other. For example: Storks deliver babies. These things, when done for teaching rather than deception, are really very funny and I adore them. :)

Circular logic

Usually done when elderly authors realize that, with science advancing, their pet hypothesis - you know, the one they based most of their career on - can not be backed up with real data and is likely wrong, but there is too much to lose (job, reputation, ...) to admit it.

Example: "We know that the sun is a giant egg yolk in the sky because it is yellow. It is yellow because egg yolks are yellow."

You have no idea how often this shit appears! Also, the higher an author's reputation in the field, the more likely they are to get away with it, because people do not like to challenge authorities, and scientists are people. I have heard some very circular logic from some very famous scientists!

Overinterpretation of results

Look... Almost all scientists are guilty of this to some extent, not least because the system forces them to. If you want a career (and more funding for future studies), your work needs to be cited by many other scientists. If your results are more spectacular, more people will read (and hopefully also cite) your paper. (Yes, publication titles are usually clickbait these days!) So, to some extent, stressing your research's great importance for the whole field (heck, for the whole of humanity) is not only accepted but enforced. (Those who do not do it do not get funded. It is really as simple as that!) It only becomes outright misconduct when you draw conclusions from your data that you really cannot draw.

Example: "Our observations show that the sun looks like a giant egg yolk. Because egg yolks are valuable food, this implies it can contribute a lot towards ending world hunger. More research into the composition of the sun and its potential nutritional value is needed. Give me funding for the next five years!"


Of course, the boundary between confidence in the relevance of your results and grandiose overstatements is occasionally blurred. ;) However, if you know that your data only means the sun looks like an egg yolk but not that it is one... Yeah, misconduct, absolutely.

Undisclosed conflict of interest

These days, this is rarer than people generally think, and accusations of it (every time a scientist produces a result that other people don't like) are much more prevalent than real cases. And yet...

Example: "There is a lot of aura feedback between the giant egg yolk in the sky and egg yolks on the ground. To optimize the known positive effects of sunbathing as evidenced by [..., ..., ...], you should therefore combine it with eating lots of egg yolks."


There are several problems with this, like the reliance on a principle generally considered to be nonsense by the majority of the scientific community (aura feedback), the data cherrypicking (citing only authors talking about positive health effects of sunbathing while omitting the skin cancer risk), presenting a controversial fringe hypothesis like "the sun is a giant egg yolk in the sky" as universally accepted fact, and combining results with a recommendation of specific behavior, which scientists are not supposed to do, and yet, the largest ethical issue is an omission.

Let's say this study was funded by a large company selling poultry and egg products. That's still legal, by the way, as long as the sponsors do not demand a certain result (and, when in doubt, data manipulation) in exchange for funding. (This is, of course, a slippery slope: kind of like donations for political parties are legal, but bribing politicians in exchange for their support of specific policies is not. There is such a thing as an implicit bribe, which is why studies sponsored by certain companies may not quite get the authors dragged in front of an ethics board but will definitely make other scientists reluctant to cite them unless there is independent verification of the results.) The problem is getting this kind of funding and failing to disclose it (or even deliberately hiding it).

In order to avoid excuses like "oh, I totally forgot" or "oh, I had no idea you wanted to know this", most journals these days demand a declaration of funding and conflicts of interest, where you have to disclose who gave you money as well as whether you yourself are standing to make money with something related to your research (as would be the case if, say, one of the authors or their immediate family members owns a company that sells eggs, or happens to be the author of a cookbook titled "The Best Egg Yolk Recipes"), so authors trying to deceive the public about these things are caught in a direct lie in writing, which makes identifying (and punishing) these cases easy enough.

And the sad thing is, I probably forgot some.

Date: 2024-07-30 05:30 am (UTC)
rdm: (Default)
From: [personal profile] rdm
Sadly, my exhaustive and thorough investigation of the literature (I did one Google search, and looked at the first page) has only found research ethics bingo.
https://bingobaker.com/view/4113208

Profile

eller: iron ball (Default)
eller

December 2025

S M T W T F S
 1 23456
78910111213
14151617181920
21222324252627
28293031   

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 3rd, 2026 02:26 am
Powered by Dreamwidth Studios