This network mapping can also identify a particular strategy used by attackers to split change history across multiple accounts to evade detection. Publishers have pledged to build reputation and status within the Wikipedia community by mixing legitimate page edits with more politically sensitive ones.
“The main message I got from all of this is that the main danger is not vandalism. It’s entryism, ”Miller says.
If the theory is correct, however, it means that it could take years of work for state actors to set up a disinformation campaign capable of going unnoticed.
“Russian-influenced operations can be quite sophisticated and go on for a long time, but it’s not clear to me if the benefits would be that great,” O’Neil says.
Governments often have more blunt instruments at their disposal. Over the years, authoritarian leaders have blocked the site, taken its government organization to court and arrested its publishers.
Wikipedia has been fighting inaccuracies and false information for 21 years. One of the longest-running attempts at disinformation went on for more than a decade after a group of ultranationalists played Wikipedia administrator rules to take control of the Croatian-speaking community, rewriting history to rehabilitate the country’s fascist leaders. of the Second World War. The platform has also been vulnerable to “reputation management” efforts to beautify the biographies of powerful people. Then there are real hoaxes. In 2021, it was discovered that a Chinese editor of Wikipedia spent years writing 200 fictional history articles of medieval Russia, complete with fictional states, aristocrats and battles.
To combat this, Wikipedia has developed a collection of complex rules, governing bodies and public discussion forums run by a self-organizing and self-governing body of 43 million registered users worldwide.
Nadee Gunasena, chief of staff and executive communications at the Wikimedia Foundation, says the organization “welcomes insights into the Wikimedia model and our projects,” particularly in the area of disinformation. But he also adds that the search only covers part of the article’s edit history.
“Wikipedia’s content is protected through a combination of machine learning tools and rigorous human supervision by volunteer editors,” says Gunasena. All content, including the history of each item, is public, while procurement is checked for neutrality and reliability.
The fact that the research has focused on bad actors who have already been found and eradicated could also show that the Wikipedia system is working, O’Neil adds. But while the study didn’t produce a “smoking gun,” it might be invaluable to Wikipedia: “The study is really a first attempt to describe suspicious modification behavior so that we can use those signals to find it elsewhere,” says Miller.
Victoria Doronina, a member of the Wikimedia Foundation’s board of trustees and a molecular biologist, says Wikipedia has historically been targeted by coordinated attacks by “cabals” that aim to distort its content.
“While individual publishers act in good faith and a combination of different viewpoints allows for neutral content to be created, coordinating a specific group outside the Wiki allows the narrative to be distorted,” he says. If Miller and his researchers are right in identifying state strategies for influencing Wikipedia, the next battle on the horizon could be “The Wikimedians against state propaganda,” adds Doronina.
The analyzed behavior of bad actors, Miller says, could be used to create models that can detect misinformation and find out how vulnerable the platform is to the forms of systematic manipulation that have been exposed on Facebook, Twitter, YouTube, Reddit and other large platforms. .
The English-language edition of Wikipedia has 1,026 administrators monitoring over 6.5 million pages, most of the articles of any edition. Tracking down the bad actors mostly relied on someone reporting suspicious behavior. But much of this behavior may not be visible without the right tools. In terms of data science, it is difficult to analyze Wikipedia data because, unlike a tweet or a Facebook post, Wikipedia has many versions of the same text.
As Miller explains, “a human brain simply cannot identify hundreds of thousands of changes on hundreds of thousands of pages to see what the patterns look like.”