plagiarism translation thesis

Link to facebook
Link to linkedin
Link to twitter
Link to youtube
Writing Tips

Can Plagiarism Checkers Detect Translated Text?

3-minute read

6th August 2023

If you’re a student or researcher, then you know how important it is to properly credit your sources and avoid plagiarism. But what should you do when your research includes translated text ?

In this post, we’ll discuss plagiarism across more than one language, whether plagiarism checkers can detect translated text, and how to avoid plagiarism in your work. Keep reading for more.

What Is Cross-Language Plagiarism?

Cross-language plagiarism, also known as multilingual plagiarism, occurs when someone takes content from a source in one language and translates it into another language without proper attribution or citation (i.e., presenting it as their original work). This can include self-plagiarism – using translations of your own previously published work.

While any form of plagiarism is serious and a breach of ethics, many times plagiarism is unintentional. To avoid unintentionally plagiarizing someone else’s work, it’s a good idea to use an online plagiarism detector, regardless of your field of study.

What Is a Plagiarism Checker?

Plagiarism checkers, like those offered by Turnitin or EasyBib , are a great way to check your work for plagiarized text before publication. They maintain a vast database of previously published content, including academic papers, articles, and websites. The tool compares the submitted text to the database to find potential matches.

Typically, plagiarism detectors will provide an originality score or percentage, indicating how much of the submitted text is considered original and how much is potentially plagiarized. Some may include additional information about the matched sources, such as the author or publication year.

If you use a plagiarism checker for your work, will it detect translated text? The answer – probably. If the translated text matches any previously published content in the same or another language, it will likely be flagged as potential plagiarism.

Some advanced plagiarism detection tools can recognize content that has been translated from one language to another. These tools use algorithms to identify similarities between the original text and its translated version. They can detect paraphrased or reworded content, even if it’s in a different language. Some plagiarism checkers even employ contextual analysis to determine whether a translated text is a legitimate adaptation or a case of plagiarism. They take into consideration factors such as the overall structure of the text and the presence of original ideas or insights.

Find this useful?

Subscribe to our newsletter and get writing tips from our editors straight to your inbox.

The effectiveness of plagiarism checkers can vary depending on how advanced the detection method is, and the specific techniques used in the translation. If the translation is significantly different from the original text or if it includes substantial changes and additions, it may not always trigger plagiarism detection.

How to Avoid Plagiarism

To avoid plagiarism (even unintentional plagiarism) when writing a research paper or essay, you can:

● Summarize, paraphrase , and add your insights when incorporating ideas from a source.

● Obtain permission to use published texts when necessary.

● Use an online plagiarism checker.

● Properly cite quoted and paraphrased sources, including translated sources, in the text and on a separate references page if necessary.

To ensure all your sources are properly cited according to your required referencing style, why not have your work professionally proofread? At Proofed, we can check that your work meets the proper citation guidelines. Send in your free sample today and see for yourself!

Share this article:

Post A New Comment

Got content that needs a quick turnaround? Let us polish your work. Explore our editorial business services.

2-minute read

How to Cite the CDC in APA

If you’re writing about health issues, you might need to reference the Centers for Disease...

5-minute read

Six Product Description Generator Tools for Your Product Copy

Introduction If you’re involved with ecommerce, you’re likely familiar with the often painstaking process of...

What Is a Content Editor?

Are you interested in learning more about the role of a content editor and the...

4-minute read

The Benefits of Using an Online Proofreading Service

Proofreading is important to ensure your writing is clear and concise for your readers. Whether...

6 Online AI Presentation Maker Tools

Creating presentations can be time-consuming and frustrating. Trying to construct a visually appealing and informative...

What Is Market Research?

No matter your industry, conducting market research helps you keep up to date with shifting...

Make sure your writing is the best it can be with our expert English proofreading and editing.

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

We're Hiring!
Help Center

Strategies to help university students avoid plagiarism: a focus on translation as an intervention strategy

2018, Journal of Further and Higher Education

Related Papers

Indonesian Journal of Applied Linguistics

Eden Flores

Sustainability

Source use by L2 writers is a significant topic of research in L2 writing. However, scant attention has been given to source use by undergraduate post-novice L2 writers. In contrast to undergraduate novice L2 writers who have just arrived at university and who often have little knowledge of source use and academic writing, undergraduate post-novice L2 writers are those who have achieved some proficiency in source use in academic writing assignments and have got some experience of writing from sources as they progress in their university studies (Keck, 2014; Wette, 2017). In this study, we examined source use by Chinese Year 3 undergraduate EFL writers through an analysis of their source use in essays and their perceptions of the challenges and strategies. The instances of source use in essays written by the students (N = 59) were analyzed in terms of source-use types, accuracy, and functions, which were then compared with those by novice and highly experienced writers in other studi...

Diane Pecorari , Bojana Petric

Bojana Petric , Diane Pecorari

Yeon Hee Choi

International Journal for Educational Integrity

Clare Kinden

William C McDonald

Troy Rubesch

Computers & Education

International Journal for the Scholarship of Teaching and Learning

mariya chankova

RELATED PAPERS

Issues in Language Studies

Ida Fatimawati Adi Badiozaman

Writing Center Journal

Ted Roggenbuck

Zuzana Tomas

PhD Dissertation

sandra jamieson

Journal of Basic Writing

Sarah Stanley

Sasima Charubusp

Profile: Issues in Teachers' Professional Development

PROFILE Journal

Rethinking directions in language learning and teaching at university level

Małgorzata Marzec-Stawiarska

Smruti Mohanty

Across the Disciplines

Elham Manzari

Paula Camusso

Ann Rogerson

Arzal Ismail

Language Teaching Research Quarterly

Omid Mallahi

Lisa Tri Damayanti

Educational Research Review

Gi-Zen Liu , Xiaojing Kou , Rachel 虹又

Janie Brooks

Hamid Allami

Apples – Journal of Applied Language Studies

Muhammad Taufiq Akbar

IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 46, NO. 3.

Research on Humanities and Social Sciences

Khalid Shakir Hussein

The Asian EFL Journal Quarterly June …

Odilea Erkaya

Wirya Surachmat

Jerome Cranston

English Studies at NBU

Veronika Maliborska

Journal of Education in Black Sea Region

Natela Doghonadze

Language Learning

Guangwei Hu

Dr. Shazia N. Awan

Celia Thompson

3L The Southeast Asian Journal of English Language Studies

Wong Fook Fei

Michelle Gleich

Journal of Computer Assisted Learning

Teaching Theology & Religion

Lucretia B Yaghjian

Gi-Zen Liu , 王一

Search: {{$root.lsaSearchQuery.q}}, Page {{$root.page}}

News and Events
Annual Newsletter
Comparative Literature Department Portal
Undergraduates
Alumni and Friends

What is Comp Lit?
Student Spotlights
First Year Writing Prize
Transfer Credit
Recommendation Requests
Advising and Declaring
Major in Comparative Literature
Senior Prize in Literary Translation
Accelerated MA Program in Transcultural Studies
Minor in Translation Studies
Transfer Students
Comp Lit Plagiarism Statement
PhD Program Description
Graduate Student Internships
Recent Dissertations
Professional Development and Placement
Prospective Students
Graduate Certificate in Critical Translation Studies
Graduate Handbook
Graduate Courses
Paths to Comparative Literature
Stay Connected
PhD Alum Profiles
Statement on Plagiarism in Translation
Minor Requirements
Capstone Projects
Spotlight Interviews
Frequently Asked Questions

Please read the following statement carefully, to make sure you understand what constitutes plagiarism in a translation assignment. You may find it difficult to distinguish between your own translation and that of other translators. Plagiarism is often the result of ignorance rather than of an intent to cheat; once you know what the rules are, you are much less likely to break them by mistake.

There are multiple resources that you may use in preparing your translation, such as dictionaries (online or in books), online translation tools (e.g. Google Translate, Babelfish), translation software (e.g. Trados), community-source assistance (e.g. listservs, online forums, discussion groups), and existing translations (online or in print). While you are encouraged to use these helpful tools where relevant, it is important to acknowledge the sources you have used, and to recognize that they cannot replace your own work.

If you are confused or uncertain about how to acknowledge your sources, please consult first with the faculty member who gave you the assignment. For further questions or concerns, you can also make an appointment with the Translation Advisor.

If there is reason to believe that a passage in a translation assignment has been adopted verbatim from another source, you may be asked to complete a new translation of the same passage in your own words, or to translate another paragraph from the same text in the presence of an instructor.

If a translation assignment has been completely or substantially adopted from another source, you will receive a failing grade on your assignment and the instructor will follow LSA procedures for academic misconduct. For more information about plagiarism, consult the following links:

LSA Policies on Academic Integrity and Misconduct

UM Library Resources for Understanding Academic Integrity

Sweetland Center for Writing: Beyond Plagiarism

Please understand that, in the intellectual community of the University, plagiarism is a form of theft. If you are ever unsure if any part of your work might be plagiarism, there is a simple rule of thumb: if in doubt, acknowledge your sources!

Information For
Current Students
Faculty and Staff
More about LSA
How Do I Apply?
LSA Magazine
Student Resources
Academic Advising
Global Studies
LSA Opportunity Hub
Social Media
Update Contact Info
Privacy Statement
Report Feedback

Translation Plagiarism: A Modern Day Concern

May 27, 2021 | Content translation , Copywriting

Original article
Open access
Published: 19 December 2018

Paraphrasing tools, language translation tools and plagiarism: an exploratory study

Felicity M. Prentice ORCID: orcid.org/0000-0003-4962-7413 1 &
Clare E. Kinden 1

International Journal for Educational Integrity volume 14 , Article number: 11 ( 2018 ) Cite this article

49k Accesses

31 Citations

67 Altmetric

Metrics details

In a recent unit of study in an undergraduate Health Sciences pathway course, we identified a set of essays which exhibited similarity of content but demonstrated the use of bizarre and unidiomatic language. One of the distinct features of the essays was the inclusion of unusual synonyms in place of expected standard medical terminology.

We suspected the use of online paraphrasing tools, but were also interested in investigating the possibility of the use of online language translation tools. In order to test the outputs of these tools, we used as a seed document a corpus of text which had been provided to the students as prompt for the essay. This document was put through six free online paraphrasing tools and six separate iterative language translations through the online Google Translate™ tool.

The results demonstrated that free online paraphrasing tools did not identify medical terminology as standardised or accepted nomenclature and substituted synonyms, whereas Google Translate™ largely preserved medical terminology.

We believe that textual indicators such as the absence of standard discipline-based terminology may be of assistance in the identification of machine paraphrased text.

Introduction

Imagine you are reading a student’s essay and are confronted with the following sentence:

A situation that can give resistance and additionally generally safe for botches, and that inspects choices without assaulting the pride and nobility of the individual influencing them, to will prompt better natural decisions.

In an assessment task set for first year undergraduate Health Science students in a pathway program, an alarming proportion of submitted work, nearly 10%, demonstrated linguistic contortions similar to the example given. This led us to consider the following questions:

Were students using online paraphrasing tools to manipulate work which was written in English and which had not been authored by them?

Were students who had English as an Additional Language (EAL) composing work in their first language and then translating this through online language translation tools?

Are there indicators which can identify the use of on-line paraphrasing tools?

All examples of unusual writing provided in this article are indicative of the nature of the student writing encountered but have been altered to retain anonymity while preserving the features of the linguistic anomalies.

While standards of English expression may vary considerably in work submitted by students, it is becoming more common to encounter essays which display standards of writing well below that which is expected of students studying in Higher Education. When the student is from an English as an Additional Language (EAL) background, poor expression in written work has been attributed to lack of facility with the language, clumsy patchwriting, or the use of an online translation tool, such as Google Translate™ (n.d.) ( https://translate.google.com.au ). Mundt and Groves ( 2016 ) contend that when students use an online translation tool to convert their own work from their first language into English this may be considered demonstrative of poor academic practice, as they are not actively developing English language skills. However, as the original work is the result of the student’s own intellectual merit, it is contentious as to whether this qualifies as academic misconduct. In the case of the submissions we received there was reasonable suspicion that the text had not been subject to a language translation tool but had been reengineered by an English-to-English paraphrasing tool. This called into question the source of the original English text, and suggested there was evidence of a genuine breach of academic integrity.

Rogerson and McCarthy ( 2017 ) reported that their initial awareness of paraphrasing tools was through a casual comment by a student. In our case, the serendipitous discovery of online paraphrasing tools was made when one of the authors was following an online forum discussing cheating methods. Prior to this revelation, our assumptions as to the origin of incomprehensible student writing had been more naïve, our explanations being focussed around patchwriting and LOTE-to-English translation tools. However, when encountering the extent of the use of inappropriate synonyms in essays submitted for this particular assessment task, we were moved to examine the text more closely. A review of one or two essays rapidly escalated to the identification of a cluster of essays which bore remarkable similarity in the use of peculiar language, and in particular the inclusion of bizarre synonyms for standard recognised terminology within the health sciences discipline. Further to this, there was significant similarity in the structure of the essays, where the information, and even in-text citations, were provided in an identical sequence. In some cases, the Turnitin® (n.d.) similarity index identified a match between a number of essays, but other suspicious works resulted in an index of 0%. It became clear that paraphrasing tools were probably being used and that students were colluding to paraphrase each other’s essays.

The literature is replete with the lamentations of academics who feel that pursuing academic misconduct forces them in to the role of detective. Collecting evidence, analysing scenarios, motives and prior offences and operating in a quasi-judicial, if not criminological paradigm, does not sit well within the cultural norms of academia (Brimble and Stevenson-Clarke 2006 ; Burke and Sanney 2018 ; Coren 2011 ; Keith-Spiegel et al. 1998 ; Sutherland-Smith 2005 : Thomas and De Bruin 2012 ). Our experiences seemed to resonate so clearly with this sentiment to the point where we felt a profound urge to recreate a television crime show, with essays taped to the wall connected by string, surrounded by tacked-up maps and photographs of the suspects.

The breakthrough came when an essay was so alarmingly absurd that we were able to trace the origin to another student’s essay. The assessment task was to analyse and discuss a scenario regarding a young Indigenous man’s experiences in the Australian Health Care System.

One student included in their essay a description of a Computerised Axial Tomography (CAT) scan which had been plagiarised from a Wikipedia page. However, in transcribing how images were taken from various angles, they had misspelled the word ‘angles’ as ‘angels’. This spelling error had not caused concern, however work submitted by another student provided evidence that there was a curious literary connection between the essays. In this case the second student reported that the CAT Scan images were taken from various ‘Blessed Messengers’.

It was apparent that the second student had used a paraphrasing tool to ‘spin’, that is, to apply synonym substitution, to the essay obtained from their colleague.

Given the poor standard of the output, why would a student resort to using paraphrasing tools? Paraphrasing is a complex and demanding task, requiring students to demonstrate not only understanding of the meaning and purpose of the text, but also to find the linguistic facility to restate this meaning in new and original words, and specifically in the discourse of Academic English (Shi 2006 ).This task is difficult enough when performed in a first language, and the challenge is magnified when the student is from a non-English speaking background (Bretag 2007 ; Carroll 2015 ; Correa 2011 ; Handa and Power 2005 ; Marshall and Garry 2006 ).

Bretag ( 2007 ) describes two aspects of the acquisition of a second language. Basic interpersonal communication skills can be developed in approximately two years, however it is estimated to take five to ten years to develop cognitive academic linguistic proficiency which is necessary to function in an academic learning environment. Patchwriting is when students attempt to paraphrase a source by substituting synonyms in passages while retaining too closely the voice of the original writer (Jamieson 2015 ). This may be classified as an intermediary stage of the development of academic linguistic proficiency representing a form of non-prototypical plagiarism (Pecorari 2003 ). As such, it may not be a deliberate or intentional breach of academic conduct. In students with EAL, the acquisition of the linguistic facility to represent the meaning of a text without resorting to reproducing the author’s actual words may take more than the few months that our students have been studying at an English-speaking University. However, in the cases under consideration, students did not attempt to manually re-engineer text in order to paraphrase but used an online paraphrasing tool to alter the entire corpus of the text. The original source text could be identified in many cases by a recognition of some structural features, for example, the reproduction of the scenario provided to the students.

Original One day, while Doug was out walking, he felt lightheaded and then lost consciousness and fell to the ground. He was brought to the Emergency Department of a major hospital by ambulance for assessment and investigation.

Post paraphrasing tool While one day on his walk Doug he felt bleary eyed and lost awareness and fell onto the ground. He was conveyed to the Emergency Department of the healing facility for significant appraisals and tests.

In some cases the original source was taken from the internet, notably Wikipedia, but in one instance the student lifted and paraphrased text taken directly from a file sharing site. The student did not provide an in-text citation, however the original source was identified by the student including the file sharing website address in the reference list. This has been referred to as illicit paraphrasing (Curtis and Vardanega 2016 ), and actions such as this may call into question the level of intentionality to deceive. The inclusion of a reference, albeit from an inappropriate source, may suggest the student was attempting to participate in the expectations of academic practice. Less generously, it may be assumed that copying material directly from a file sharing site, using a paraphrasing tool to deceive Turnitin® (n.d.) , and then submitting the work, even with a hopeful inclusion in the reference list, demonstrated an intentional breach of academic integrity.

Patchwriting

Strategic word substitution has always been a feature of students’ attempts at paraphrasing, which Howard defined as patchwriting,

Copying from a source text and then deleting some words, altering grammatical structures, or plugging in one synonym for another. (Howard 1999 , p.xvii, in Jamieson 2015 )

While patchwriting by students has been characterised as poor academic practice, it is also seen as a preliminary effort to become familiar with the discourse of academic writing (Pecorari 2003 ).

In the essays considered in this exploratory study, we encountered examples of English expression which indicated that the EAL student was struggling to develop fluency, for example:

Doug leaves his home and move far away from his family to the city. There he have house with an unknown people and he have feeling of loneliness and unhappy. He is not able to get the job and had very small income. He was usually sad and feel bad in himself. It is all these factors lead to a poor health.

We were also able to recognise patchwriting in text that had been appropriated from multiple sources, and these incidents were usually identified by Turnitin® (n.d.) and exemplified by a ‘rainbow’ of colours in the similarity report demonstrating different sources. However, in the essays under investigation the text demonstrated the inclusion of synonyms resulting in writing which was largely unintelligible. Further to this, there had been no manipulation of the syntax of the sentences, which heightened the unidiomatic nature of the writing. Whereas in patchwriting synonyms are manually substituted by the student, online paraphrasing tools achieve this through an automatic function, and thus the question arises, as posited by Rogerson and McCarthy ( 2017 ), as to whether the use of online paraphrasing tools transcends patchwriting to become what Walker describes as illicit paraphrasing (in Pecorari 2003 , p.9).

Expected medical terminology

One of the most obvious issues we encountered in the essays was the use of synonyms for standard medical terminology. Standardised nomenclature and terminology are employed throughout health care to avoid ambiguity in documentation and communication. This provides the interface for meaningful and appropriate communication of medical, nursing and allied health information regarding patient care, and is an essential element of safety and standardisation in care (Pearson and Aromataris 2009 ). In addition, this terminology is used for medical information classification, and has been raised as a priority area in the introduction of electronic health records to ensure interoperability across systems and health disciplines (Monsen et al. 2010 ). The importance of employing correct and predictable terminology has been identified as paramount in avoiding adverse outcomes:

Current research indicates that ineffective communication among health care professionals is one of the leading causes of medical errors and patient harm. (Dingley et al. 2008 , p.1)

Therefore, the acquisition and correct contextual application of medical terminology is a fundamental part of learning in health sciences. Students are exposed to this terminology throughout their studies, and in the case of the assessment task under scrutiny, students were provided a scenario, or enquiry prompt, which included the standard discipline-based terminology (see Appendix ). The lack of standard medical terminology and the inclusion of unusual synonyms for this terminology was a significant feature of the essays. In the event that students were exhibiting difficulties with English expression, or were manually substituting synonyms as seen in patchwriting, it would be expected that the standard terminology would be preserved. This led us to suspect, and subsequently investigate, online paraphrasing tools.

Paraphrasing tools

Spinning is a technique used to produce a new document, or documents, from an original text source by replacing words in such a way as to retain the overall meaning of the text, while avoiding machine-based text matching tools used to identify plagiarism. Machine based paraphrasing tools were developed to enable text spinning as a way of improving website rankings in Google search results and are part of a suite of search engine optimisation (SEO) techniques referred to as Black-Hat marketing. (Lancaster and Clarke 2009 ; Rogerson and McCarthy 2017 ; Zhang et al. 2014 ).

In web-based marketing the goal is to get the highest ranked place in a Google search index.

The Google search engine identifies and calculates the frequency of links between, and website traffic to, each website and ranks sites on the search results accordingly. In Black Hat marketing, the aim is to create sites including blogs, articles and webpages which provide multiple links to the target page, thus ensuring optimisation of the search engine results and a higher overall ranking (Bailey 2018 ).

Google search engines use word matching software which can recognise duplicate text and penalties are applied where this has been detected, hence the need to create paraphrasing tools which will instantly produce duplicate text material which cannot be detected. These paraphrasing tools were designed to hoodwink word matching software but were not intended to emulate human generated text. It is apparent that students are now using these tools to spin text from numerous original sources with the aim to deceive word matching software such as Turnitin® (n.d.) .

The free online automated paraphrasing tools rely principally on synonym substitution without altering the overall syntax of the sentence, resulting in language which is unidiomatic at best, incomprehensible at worst.

When Rogerson & McCarthy published in 2017, they reported that a simple Google search for paraphrasing tools resulted in over 550,000 hits. Our search in 2018 demonstrated a proliferation of paraphrasing sites resulting in over 3,320,000 hits. Cursory examination revealed that many are duplicate sites with the same tool offered under different names. Of greater concern is the increased juxtaposition of advertisements and links to essay purchasing services. Anticipating the vulnerability of the student, some sites offer a free paraphrasing tool but ensure the output is extremely poor.

For example, when the following sentence taken from the assessment scenario:

One day, while Doug was out walking, he felt lightheaded and then lost consciousness and fell to the ground. He was brought to the Emergency Department of a major hospital by ambulance for assessment and investigation.

is entered into a free online paraphrasing tool the following results were obtained:

Brace girl, stretch Doug was at large peripatetic, he felt lightheaded and fit lost consciousness and fell to the ground. He was debasement to the Danger Diversify of a chief sanatorium by ambulance for weight and criticism. Plagiarisma http://plagiarisma.net/spinner.php

One sidereal day, while Doug was out walk, he felt lightheaded and then lost knowingness and downslope to the pulverization. He was brought to the Emergency Department of a major hospital by ambulance for assessment and probe. Rephraser https://www.rephraser.net/instant-paraphrasing-tool/

This word salad is used to entice students into contract cheating, that is, outsourcing the assessment task to be completed by a third party (Lancaster and Clarke 2006 ). The sites provide a link to an essay writing service, in one case with a curiously poorly worded advertisement stating:

Aren’t satisfied with the results? But what to expect from the tool? Hire an expert for a quality rewording! Only $8.39/page. Paraphrasing Online https://www.paraphrasingonline.com

Paraphrasing tools work by creating an intermediate text referred to as “spintax”, where a number of synonyms are provided for each selected word, for example the phrase:

the junior doctor in the rehabilitation centre prepared a discharge summary

is transformed into the intermediary spintax:

the {understudy specialist | lesser specialist | lesser pro} in the {recovery fixate | recovery focus | rebuilding centre} prepared a {release rundown | release report | blueprint}.

Based on a number of parameters, words can be substituted at varying rates within a sentence, however it is non-deterministic. Therefore, for the purpose of Black Hat marketing, this provides a vast number of permutations for the creation of articles which are sufficiently different from each other to evade detection by word matching software (Bailey 2018 ). This explains why students using paraphrasing tools may generate apparently different essays from a single seed document.

To create the spintax, a bank of potentially alternative terms is held in a synonym dictionary, which may be local to the paraphrasing tool, or held in cloud storage (Shahid et al. 2017 ; Zhang et al. 2014 ). In their study, Zhang et al. ( 2014 ) were able to access this dictionary and reverse engineer two paraphrasing tools (Plagiarisma and The Best Spinner) to establish which words are subject to synonym substitution, referred to as ‘mutables’ , and which words do not appear in the synonym dictionary and thus would not be included in the spintax, referred to as ‘immutables’ . This approach, referred to as DSpin, relies on comparing the unchanged text, or immutables, located within the spun text to the original text (Zhang et al. 2014 ). The match of immutable terms between documents (spun and original) will provide evidence of the source of the text. We became interested in the concept of immutable words and how these may be used to identify documents that had been machine paraphrased.

The paraphrasing tools that require a fee-based subscription provide a large number of parameters to manipulate the output, including the contents of the dictionary, the maximum number of synonyms used and replacement frequency, and the replacement of both single words and short phrases (Shahid et al. 2017 ). In this study we assumed that the students were accessing the fee free version of online paraphrasing tools and as a result the output of spinning was less subject to control resulting in more words treated as mutables and thus less discretionary synonym substitution.

As medical terminology is fundamental to the discourse of health sciences, it would be reasonable to classify these words as preferentially immutable . However, the paraphrasing tools do not have the capacity to recognise the significance and importance of these terms, and thus they are within the synonym dictionary as mutables and subject to synonym substitution.

Students in this unit of study are exposed to medical terminology throughout the curriculum, and it is emphasised that these terms are fundamental to the discourse and required for communication in health sciences. Hyland ( 2006 ) notes that becoming a member of a discourse community involves “learning to use language in disciplinary approved ways” (p.38). They are expected to use these terms, and it is clear in the rubric and marking guides that the assessment is aligned to the objective of the acquisition of this specialised language. The scenario provided in this assessment was rich and replete with the terminology, and there was ample opportunity for imitation and reproduction of the writing style and nomenclature. Therefore, the absence of the recognised terminology and the inclusion of unidiomatic and contextually invalid synonyms was particularly obvious to the readers.

Method of analysis

Identifying the use of paraphrasing tools.

It could be argued that the use of synonyms, in particular archaic or unidiomatic words and phrases, is a clear indicator that machine generated paraphrasing has been used. For example, in the papers submitted by students where the use of paraphrasing tools was suspected, the term aboriginal man was substituted with autochthonic person , the hospital became the mending office , the rehabilitation centre the recovery fixate , and the discharge summary the release precis .

In order to investigate the extent to which paraphrasing tools substituted recognised and expected medical terms for unusual synonyms, we selected three essays which we had identified as particularly unusual. We did not know the provenance of these essays, although there was structural evidence that they might have arisen from a single seed document which was an essay submitted by one student in the current cohort.

Table 1 shows the variation from the expected nomenclature.

Comparing online language translation and paraphrasing tools

Prior to learning of the existence of online paraphrasing tools, we had assumed that students were authoring work in their first language, and then using online translation tools to convert the text to English. Perhaps the most notable and available online free translation tool, Google Translate™, was made available as an online tool in 2006 using a statistical machine translation engine to translate text from one language, via English, on to the target language. In 2016 Google implemented a Neural Machine Translation engine, which has provided a more sophisticated and accurate output (Le and Schuster 2016 ). Given the idiomatic nature of language, errors may still occur where a word is translated into a synonym which may not be contextually valid.

To investigate the possibility that students had used Google Translate™, the scenario provided as the enquiry-based learning prompt was used as a seed document to ascertain the changes which might occur when paraphrasing tools and Google Translate™ were employed. The scenario ( Appendix ) was put through a number of paraphrasing tools, and in each case the standard medical terminology was consistently changed. When the scenario was put through Google Translate™, the terminology was changed only rarely.

The scenario document was subject to iterative language translation (Day et al. 2016 ). The text was entered into Google Translate™ for translation to a language other than English, and this translation was copied and re-entered to a refreshed Google Translate™ page for translation back into English. The target languages used were Arabic, Punjabi, Hindi, Chinese (Simplified), Chinese (Traditional) and Vietnamese. The languages were chosen as they represent the principal first languages of the EAL students enrolled in this subject.

The translations were of a generally good quality, displaying minor errors in tense and pronoun gender, but could be easily comprehended. The most accurate translations were Chinese (Simplified and Traditional) and Vietnamese, and the highest number of errors occurred in Arabic, Hindi and Punjabi. In the latter languages there were more substitutions for standardised health terms (Table 2 ).

The original scenario was then put through six paraphrasing tools selected as the top entries generated by a Google search using the term ‘paraphrasing tools’ . This technique follows that used by Rogerson and McCarthy ( 2017 ) based on the assumption that students would use a similar search strategy and select the sites listed at the top of the search results (Table 3 ).

It was not known whether these sites were using the same paraphrasing tool, however, given the multiple outputs available through non-discriminatory synonym substitution, there was ample opportunity for a diverse output.

The results from the output texts were analysed for synonym substitution of recognised and expected medical terminology, and this was compared to the outputs from the iterative language translation through Google Translate™. This technique was used for convenience purposes as the intention was to gain an overall impression of the extent to which medical terms were substituted by paraphrasing tools compared to Google Translate™. As can be seen from Table 4 , the proportion of substituted terms was significantly different. From the 21 standard medical terms there were 73 synonyms from the paraphrasing tools and 7 alternative terms from Google Translate™. Blank spaces in the table indicate that no alterative term was generated by Google Translate™.

Although it is not within the scope of this brief exploratory study to state that there is a measurable difference in synonym substitution between paraphrasing tools and Google Translate™, the above results give a general indication of the observable differences.

When determining whether there is a potential breach in academic integrity, it is important to distinguish between extremely poor English skills, the use of a LOTE-to-English translation device, and the generation of text through a paraphrasing tool. Carter and Inkpen ( 2012 , p.49) note “Machine translated text often seems to be intuitively identifiable by proficient speakers of a language”. If a student has used paraphrasing tools to alter a text to evade detection of plagiarism, then that act of evasion suggests that plagiarism has occurred. Word matching software such as Turnitin® (n.d.) has proven valuable in identifying replication of text from other sources. However, the very purpose of paraphrasing tools is to deceive software developed to detect plagiarism, and it is apparent that to date this strategy has been successful (Lancaster and Clarke 2009 ; Rogerson and McCarthy 2017 ; Shahid et al. 2017 ). Consequently, the burden of detection remains with the human reader who has to become increasing adept at spotting stylistic variations and any other flags relating to mechanisms that have been used to avoid detection (Gillam et al. 2010 ).

The method of detection we suggest, identifying the absence of expected nomenclature such as discipline based terminology, could be considered an extrinsic analysis of the text. The expected immutables of recognised medical terms are substituted with synonyms, and thus treated by the paraphrasing tools as mutables . The paraphrased text is compared to an ideal or external text, that is, the text containing the medical terminology which was expected by the assessor. Shahid et al. ( 2017 ) propose a method of intrinsic analysis of paraphrased text through stylometric analysis:

We observe that style, language, grammatical constructs, and certain linguistic expressions in spun documents deviate from a human author because spinning software introduce artefacts in their output which are specific to a text spinner. (p. 5)

The technique described in their study involves the application of a number of algorithms to a selected text which can lead to identification of the source text. This level of analysis is not currently available to academic staff seeking to identify plagiarism committed through the use of paraphrasing tools. However, Turnitin® (n.d.) is developing an Authorship Investigation tool which will use stylometric and forensic linguistic analysis to provide measurement parameters indicative of authorship of a text ( https://www.turnitin.com/solutions/authorship-investigation ,). Where there is suspicion that contract cheating has occurred, the Authorship Investigation tool will use examples of previous work submitted by a student to ascertain similarity of stylistic features to the work under suspicion. The premise is that a stylometric ‘fingerprint’ of the student’s literary style and expression can be used for comparison to submissions which may have been outsourced to another author. It is anticipated that this tool will be of potentially useful in determining whether a submission has hallmarks which distinguish it from other pieces of writing by the student, but it will not be possible to identify the author of the outsourced work.

In this exploratory study we identified linguistic features of spun text which indicated the use of paraphrasing tools. However, we were reliant on the curious case of the blessed messengers to point towards collusion. This was achieved through close collaboration by the marking staff, and until techniques for reverse engineering of paraphrased text become more widely available, “What ultimately leads to determinations of plagiarism is considerable manual analysis and subjective judgement” (Bretag and Mahmud 2009 , p.54).

Students, and in particular those from an EAL background, experience significant challenges in conforming to academic conventions such as paraphrasing. The availability of free online paraphrasing tools may appear to them as a realistic solution to these challenges despite the word salad which is created by these tools. Whereas EAL students who write original work in their first language and then use online translation tools to convert this to English may be demonstrating poor academic practice, it can be argued that the submitted work is a result of their own intellectual endeavours. Unfortunately, students who use paraphrasing tools to spin text from undisclosed sources, thus evading word matching software, have committed an overt act of academic dishonesty.

In academic writing in the health science discipline, there is an expectation that standard medical terminology will be used. We noted that absence of this in the students’ submissions and investigated the outputs of both paraphrasing tools and Google Translate™. We noted that paraphrasing tools are significantly more likely to substitute inappropriate synonyms for accepted medical nomenclature, whereas Google Translate™ largely preserved these terms intact.

When paraphrasing tools have been applied to text the output is frequently of such poor quality as to render the text unintelligible. We also noted the following features: the language generated will be notable for the use of unidiomatic words and phrases; expected vocabulary such as standard medical terminology will usually be substituted with inappropriate synonyms; word matching software, such as Turnitin® (n.d.) , may not recognise the re-engineered text from the source and thus provide a low similarity index which may not be indicative of the actual level of plagiarism.

When using online translation tools, such as Google Translate™, to convert text from a language other than English to English, there is less likelihood that discipline specific nomenclature, such as standard medical terminology, will be changed to the same extent as paraphrasing tools.

This study demonstrates that there are a number of distinct features which can be identified in the text generated by paraphrasing tools. Awareness of these features will assist in the process of detecting plagiarism. While the emphasis should be on supporting students to develop the skills required to paraphrase appropriately, identifying linguistic markers which provide evidence of the use of paraphrasing tools will be of benefit in the overall management of breaches of academic integrity.

Abbreviations

Computerised Axial Tomography Scan

English as an Additional Language

Emergency Department

Language other than English

Bailey J (2018) A brief history of article spinning. March 8, 2018 Plagiarism Today. https://www.plagiarismtoday.com/2018/03/08/a-brief-history-of-article-spinning/ . Accessed 15 Aug 2018

Bretag T (2007) The emperor's new clothes: yes, there is a link between English language competence and academic standards. People Place 15(1):13

Google Scholar

Bretag T, Mahmud S (2009) A model for determining student plagiarism: electronic detection and academic judgement. J Univ Teach Learn Pract 6(1):49–60

Brimble M, Stevenson-Clarke P (2006) Managing academic dishonesty in Australian universities: implications for teaching, learning and scholarship. Account Account Perform 12(1):32–63

Burke D, Sanney K (2018) Applying the fraud triangle to higher education: ethical implications. J Leg Stud Educ 35(1):5–43

Article Google Scholar

Carroll J (2015) Making decisions on management of plagiarism cases where there is a deliberate attempt to cheat. In: Bretag T (ed) Handbook of academic integrity. Springer, Singapore, pp 567–622

Carter D, Inkpen D (2012) Searching for poor quality machine translated text: learning the difference between human writing and machine translations. In: Canadian conference on artificial intelligence. Springer, Heidelberg, pp 49–60

Coren A (2011) Turning a blind eye: faculty who ignore student cheating. J Acad Ethics 9(4):291–305

Correa M (2011) Academic dishonesty in the second language classroom: instructors’ perspectives. Mod J Lang Teach Methods 1(1):65–79

Curtis GJ, Vardanega L (2016) Is plagiarism changing over time? A 10-year time-lag study with three points of measurement. High Educ Res Dev 35(6):1167–1179

Day S, Williams H, Shelton J, Dozier G (2016) Towards the development of a Cyber Analysis & Advisement Tool (CAAT) for mitigating de-anonymization attacks. Paper presented at the 27th modern artificial intelligence and cognitive science conference, Dayton, Ohio, USA, April 22–23, 2016.

Dingley C, Daugherty K, Derieg MK, Persing R (2008) Improving patient safety through provider communication strategy enhancements. In: Advances in Patient Safety: New Directions and Alternative Approaches, vol 3. Agency for Healthcare Research and Quality, USA

Gillam L, Marinuzzi J, Ioannou P (2010) Turnitoff-defeating plagiarism detection systems. Subject Centre for Information and Computer Sciences . Available via http://epubs.surrey.ac.uk/790662/2/HEA-ICS_turnitoff.pdf . Accessed 16 Sept 2018

Google Translate™ (n.d.) https://translate.google.com.au . Accessed 16 Aug 2018

Handa N, Power C (2005) Land and discover! A case study investigating the cultural context of plagiarism. J Univ Teach Learn Pract 2(3):8

Hyland K (2006) English for academic purposes. An advanced resource book. Routledge, London

Howard RM (1999) Standing in the shadow of giants: Plagiarists, authors, collaborators (No. 2). Greenwood Publishing Group.

Jamieson S (2015) Is it plagiarism or patchwriting? Toward a nuanced definition. In: Bretag T (ed) Handbook of academic integrity. Springer, Singapore

Keith-Spiegel P, Tabachnick B, Whitley B, Washburn J (1998) Why professors ignore cheating: opinions of a national sample of psychology instructors. Ethics Behav 8(3):215–227

Lancaster T, Clarke R (2006) Eliminating the successor to plagiarism? Identifying the usage of contract cheating sites. In: Proceedings of 2nd international plagiarism conference

Lancaster T, Clarke R (2009) Automated essay spinning–an initial investigation, Paper presented at the 10th annual conference of the subject Centre for Information and Computer Sciences

Le Q, Schuster M (2016) A neural network for machine translation, at production scale. Google AI Blog. Available via https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html

Marshall S, Garry M (2006) NESB and ESB students’ attitudes and perceptions of plagiarism. Int J Educ Integr 2(1):26–37

Monsen K, Honey M, Wilson S (2010) Meaningful use of a standardized terminology to support the electronic health record in New Zealand. Appl Clin Inform 1(4):368

Mundt K, Groves M (2016) A double-edged sword: the merits and the policy implications of Google translate™ in higher education. Eur J High Educ 6(4):387–401

Paraphrasing Online (n.d.) https://www.paraphrasingonline.com . Accessed 14 Aug 2018

Paraphrasing Tool (n.d.) https://paraphrasing-tool.com . Accessed 14 Aug 2018

Pearson A, Aromataris E (2009) Patient safety in primary healthcare: a review of the literature. Australian Commission on Safety and Quality in Health Care, Adelaide

Pecorari D (2003) Good and original: plagiarism and patchwriting in academic second-language writing. J Second Lang Writ 12(4):317–345

Plagiarisma (n.d.) http://plagiarisma.net/spinner.php . Accessed 14 Aug 2018

PrePostSEO (n.d.) https://www.prepostseo.com/free-online-paraphrasing-tool . Accessed 14 Aug 2018

Rewriter Tools (n.d.) https://www.rewritertools.com/paraphrasing-tool . Accessed 14 Aug 2018

Rogerson AM, McCarthy G (2017) Using internet based paraphrasing tools: original work, patchwriting or facilitated plagiarism? Int J Educ Integr 13(1):2

SEOMagnifier (n.d.) https://seomagnifier.com/online-paraphrasing-tool Accessed 14 Aug 2018

Shahid U, Farooqi S, Ahmad R, Shafiq Z, Srinivasan P, Zaffar F (2017) Accurate Detection of Automatically Spun Content via Stylometric Analysis. In: Data Mining (ICDM), 2017 IEEE International Conference. IEEE, New Orleans

Shi L (2006) Cultural backgrounds and textual appropriation. Lang Aware 15(4):264–282

Sutherland-Smith W (2005) Pandora’s box: academic perceptions of student plagiarism in writing. J Engl Acad Purp 4(1):83–95

Thomas A, De Bruin G (2012) Student academic dishonesty: what do academics think and do, and what are the barriers to action? Afr J Bus Ethics 6(1):13–24

Turnitin® (n.d.), Introducing Authorship Investigation. https://www.turnitin.com/solutions/authorship-investigation . Accessed 21 Sept 2018

Zhang Q, Wang DY, Voelker GM (2014) DSpin: detecting automatically spun content on the web. NDSS https://doi.org/10.14722/ndss.2014.23004

Download references

No funding was sought or provided for this study.

Availability of data and materials

No data or materials outside of those presented in the manuscript are held by the authors.

Author information

Authors and affiliations.

La Trobe College Australia, Sylvia Walton Building, Bundoora, VIC, 3086, Australia

Felicity M. Prentice & Clare E. Kinden

You can also search for this author in PubMed Google Scholar

Contributions

FP 80%. CK 20%. Both authors have read and approved the final manuscript.

Corresponding author

Correspondence to Felicity M. Prentice .

Ethics declarations

Authors’ information.

Not applicable

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Scenario for assessment task

Doug is a nineteen year old aboriginal man who has had Type I Diabetes Mellitus since he was 13. Doug was born in a small town in a remote area of Victoria. Despite not completing Year 8, he decided to move away from his family to the city. He has not been able to get a job and has very little income. He misses his family, friends and community, but is determined that they not find out that he is unhappy. Doug has a history of homelessness but has been living for the past 2 weeks in a share house with people he doesn’t know well. Doug does not see the same doctor for his diabetes , he visits many different clinics , depending on where he is living at the time.

As a consequence of the fall, he sustained a head injury which resulted in severe and persistent headaches, loss of coordination and difficulty with walking. In the Emergency Department (ED) a number of tests were undertaken (including a CAT scan , blood sugar test and full examination). It was identified that prior to the fall he had experienced an episode of ‘ insulin shock ’.

Following an 8 h stay in the Emergency department , Doug was transferred to the Neurology ward for assessment and monitoring. In addition to the medical records available in the central electronic filing system, a member of the ED team provided a ‘ handover ’ to the Nurse Unit Manager of the Neurology Ward .

It was when he was on this ward that the nursing staff identified that Doug has limited knowledge of his diabetes including where to access support and advice, and how to monitor his glucose levels and adjust his insulin dose properly. A team consisting of nurses , the ward physiotherapist , a social worker , and a neurologist met on three occasions to discuss Doug’s case. They used the information from the Emergency Department admission , the assessments undertaken by the team of health professionals , and included Doug in all their decisions. They identified his issue with Diabetes management, but as this was a short admission to the Neurology Ward , they did not have the resources to follow this up. After a 4 day stay in the Neurology ward , the healthcare team decided that Doug would benefit from being transferred to a rehabilitation centre . A junior doctor who had just joined the ward was given the task of writing the discharge summary .

Doug was taken by patient transport to a rehabilitation centre which was not part of the acute hospital , but an independently run organisation. The brief discharge summary was sent with Doug describing the initial head injury and noting the need for ongoing therapy to assist his co-ordination and walking. While in the rehabilitation centre , Doug was assessed by the physiotherapist , occupational therapist , doctor , and of course the nursing staff who monitored Doug daily. They did not seek any additional information from the acute hospital , only using the discharge summary as a basis for Doug’s care. They did not formally meet, but they each wrote notes in Doug’s medical record .

On day six of his admission to the rehab centre , the Nurse Unit Manager observed Doug confidently walking in the ward corridor by himself. As a very experienced Rehab Nurse she decided that Doug could be discharged home based on his ability to independently toilet and ambulate. In addition, she was under considerable pressure by the senior management of the Rehab Centre to discharge patients to free up beds. Without consulting the other staff, the Nurse Unit Manager informed Doug that he was to be discharged the following day as he now appeared fine and had no consequences from his “little bump on the head.”

Doug was discharged the next day and returned to the house he was sharing. None of his housemates had even realised he had been away. Five days following his discharge home, Doug was again admitted to the ED by ambulance, having suffered a fall at home while trying to descend the stairs from the second floor where his bedroom was located. He fractured his left tibia as a result of the fall. He told the ED staff that he had not been eating well, but that he had still injected his usual insulin dose just prior to the fall.

(Bold and italics provided by authors to highlight standard medical terminology).

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Prentice, F.M., Kinden, C.E. Paraphrasing tools, language translation tools and plagiarism: an exploratory study. Int J Educ Integr 14 , 11 (2018). https://doi.org/10.1007/s40979-018-0036-7

Download citation

Received : 28 September 2018

Accepted : 15 November 2018

Published : 19 December 2018

DOI : https://doi.org/10.1007/s40979-018-0036-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Online language translation
Medical terminology

International Journal for Educational Integrity

ISSN: 1833-2595

Submission enquiries: Access here and click Contact Us
General enquiries: [email protected]

Translation = plagiarism?

Manon is in the writing process of her thesis. She wondered: " I found an interesting text in a language other than English. If I translate it, should I mention the source?"

Summary: In which language should i search on the web? Are you free in the translation? Why cite others’ sources?

1. In which language should I search on the web?

The search for information is becoming international!

Since the arrival of the Internet, knowledge sharing has become easier and automatic translators have made accessing it much easier. Some languages are richer than others in terms of web content , for example English is the most common language on the web and accounts for more than half of all Internet data.

"English accounting for 54 percent of the top 10 million websites." " Two Worlds: Languages IRL and Online " from the website statista.com

However, depending on the subject matter, it is a good strategy to explore these different avenues and languages in order to have an adequate overview of the subject you wish to address.

2. Are you free in the translation?

The information as it is found remains to be translated. The translation may deviate slightly from the original text to improve comprehension . If you are not bilingual, you can use different translation applications through open access sites such as Google translate or Deepl Translator . Word-for-word translation is not always accurate and if you have any doubts, you can quote the sentence in its original language with quotation marks + the author and explain its meaning . You can also paraphrase the information found in your own words, without quoting the text but still mentioning the author .

You are therefore free to translate information found in foreign languages. However, it is important to cite your sources .

3. Why cite others’ sources?

For further reflection .

When the reader is interested in a subject, they often want to deepen their knowledge . The reader of your work must have the opportunity to do so with the sources cited and your well-constructed bibliography .

"Readers can use the list of references to explore a subject further and enrich their understanding . Indeed, readers can easily and quickly find the references for the works they would like to explore." " An effective bibliography: great, but how? " from Compilatio.

Real and concrete information

Where did this information come from? A magician's hat? From your girlfriend? Or a specific source? The source allows to validate the information presented and gives a real and concrete dimension to what you’ve written, thus making you more credible to the reader.

Expert position

You have not created all the content of your work, and that is a fact. However, you did some research that led you to a deep understanding of the subject. This research is what makes you an expert on the presented subject, and it should be showcased.

"Have you considered this? By making the choice not to plagiarise and to provide your own ideas, you yourself become an "inventor" " " Why is plagiarism prohibited? What are my incentives to respect copyright? " from Compilatio

Quiet conscience

Doing a job that requires a lot of time and investment is always a little stressful. Save yourself the added stress of wondering if your proofreader will realize that some parts have been plagiarized and cite your sources . Build an effective bibliography and check your work with Compilatio Studium , the plagiarism detection and writing assistance software. You'll set your mind at ease, and will be better for it in the long run.

Specific questions during defense

When it comes time for the oral defense of your thesis, dissertation, or assignment, the teachers will ask you specific questions . If you master your subject, it will only be a formality. But, if you have plagiarized an idea, that's when the teachers will expose you (if they haven't done so before). They will ask you for explanations to ensure that you fully understand what you have described in your brief.

"If you are not certain that you have fully grasped the author's concept, don't incorporate it in your project. You will have a hard time putting it into your own words and your teacher will quickly sense that you are uncomfortable with the theory, then will probably ask you questions specifically on this point ." " The power of paraphrasing " Compilatio

Tribute to the original author

Whether in their own language or not, the original author has made an effort to reflect and write and it is fair and honest to mention it to pay tribute to them. Furthermore, the law protects authors.

According to article L122-4 of the Intellectual Property Code: "Any representation or reproduction, in whole or in part, made without the consent of the author or his successors or assigns is unlawful. The same shall apply to translation, adaptation or transformation, arrangement or reproduction by any art or process whatsoever."

To answer Manon, translating without citing the original author is considered plagiarism and is called translingual plagiarism.

Sources: " Two Worlds: Languages IRL and Online " Statista.com. 19/02/2019. Constulted on 17/03/2022. " Translation Plagiarism: burning issue in modern plagiarism detection " Medium.com. Consulted on 17/03/2022. For further more: " An effective bibliography: great, but h ow? " Compilatio, 17/02/2022. " The power of Paraphrasing " Compilatio, 08/03/2022. " Quotation rules to avoid plagiarism: how to properly cite your sources " Compilatio, 28/01/2022.

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
HHS Author Manuscripts

A Primer on Plagiarism: Resources for Educators in China

Gregory c. gray.

Duke Kunshan University (China), Duke University (USA), and Duke-National University of Singapore (Singapore).

Laura K. Borkenhagen

Duke University.

Nancy S. Sung

National Science Foundation and formerly Head of the National Science Foundation’s Beijing Office, from 2014-2018.

Shenglan Tang

Global Health Research Center at Duke Kunshan University and Professor at Duke University.

Introduction

In the past 20 years, China has experienced rapid development in scientific publication ( Van Noorden, 2016 ). In 2016, more than 470,000 scientific articles were published by Chinese researchers, bringing China to the top of the list of countries with the largest number of published articles ( National Science Board, 2018 ). At the same time, commitment to scientific ethics in China has come into question, as China also leads the list of countries with the highest proportion of retractions ( Ataie-Ashtiani, 2017 ). Among reasons for retractions, plagiarism is a particular concern. ( Ataie-Ashtiani, 2017 ; Lei & Zhang, 2017 ; Mack, 2016 ; Qiu, 2015 ; Van Noorden, 2016 ).

A 2010 survey of Chinese researchers found that more than half of the respondents felt that academic misconduct was a serious problem in China, with the majority focusing upon plagiarism or inappropriate authorship ( Liao et al., 2017 ). When given the same survey five years later, these perceptions were essentially unchanged ( Liao et al., 2017 ). Lei et al . found the retraction ratio in China has increased more than three-fold during the past two decades with two large peaks in 2010 and 2015 ( Lei & Zhang, 2017 ). About three quarters of all retractions were due to misconduct, and 41% of misconduct was due to plagiarism ( Lei & Zhang, 2017 ) ( Table 1 ).

This table is adapted from Lei et al., 2017 .

Plagiarism in China has ranged anywhere from a few copied sentences to misappropriation of entire documents. Wei Yang, Former President of the National Natural Science Foundation of China in Beijing, reported instances where documents such as grant proposals had been found for sale on the internet ( Qiu, 2015 ). Publications from China have also been criticized for their lack of references. Among Chinese articles retracted in 2016, the number of references per article was often below 10, which is often interpreted as an indication of lower quality work ( Ataie-Ashtiani, 2017 ).

Consequently, this has led to a bad reputation for Chinese researchers as a whole, with some journal editors admitting a prejudice when reviewing work from unfamiliar Chinese authors. This reputation is further damaged by the observation that 36.6 % of Chinese author retractions during the last two decades are from repeat offenders with more than five retractions due to “fraud, plagiarism, or faked peer review” ( Lei & Zhang, 2017 ). Senior scientific scholars are calling for multi-faceted interventions that range from setting up the Committee on Publishing Ethics to media exposure, with education in ethics as the center-piece of strategies for change ( Qiu, 2015 ).

As senior researchers and educators at a US/Chinese institution, Professors Gray (guarantor) and Tang have had their scientific works plagiarized by Chinese researchers. As the former senior research administrator for the US National Science Foundation efforts in China, Dr. Sung has witnessed the devastating impact of plagiarism on the reputation of Chinese researchers and their institutions. The aims of this review are to summarize the characteristics of different types of plagiarism, to offer suggestions for incorporating ethical training into Chinese educators’ curricula, as well as to point to various available scientific ethics training resources.

Types of Plagiarism

Plagiarism has been defined in multiple ways. The US Department of Health and Human Services Office of Research Integrity (1994) defines plagiarism as “the theft or misappropriation of intellectual property and the substantial unattributed textual copying of another’s work”. Merriam Webster (2018) online defines plagiarism, “to steal and pass off (the ideas or words of another) as one’s own; to use (another’s production) without crediting the source; to commit literary theft; present as new and original an idea or product derived from an existing source.” This stealing can involve not only words but also processes, results, and images ( Office of the President, 2000 ). There are a range of different types of plagiarism from unintentional to misappropriation large bodies of text. A number of forms of plagiarism are discussed below and summarized in Table 2 .

Forms and characteristics of plagiarism.

Direct or word plagiarism

The most common and familiar form of plagiarism is copying of another’s words without appropriate acknowledgement, direct or word plagiarism. The severity of direct plagiarism ranges from copying a series of words to copying an entire manuscript. One standard for this form of plagiarism is if an author uses six consecutive words from another’s work without using quotation marks, even if a reference has been cited ( Masic, 2014 ). Reuse of graphics, figures, and photographs without the original publisher’s written permission and cited source also falls under this category.

While direct plagiarism may seem like an easily recognized offense, there are still gray areas. For example, in scientific manuscripts, copying results and discussion sections is considered one of the most egregious forms of plagiarism; however, there are differing opinions regarding whether an author has committed plagiarism if he or she copies text from the methods section of his or her own previous report ( Debnath, 2016 ; Vitse & Poland, 2012 ).

Direct plagiarism is especially common when a paper is written in a language other than the author’s native language ( Higgins, Lin, & Evans, 2016 ) and is made easier in recent decades by the widespread availability of online text ( Pechenik, 2010 ). Anecdotally, one coauthor of this report has had large sections of his and his graduate students published reports abstracted, compiled, and reported as a new work by a former collaborating Chinese scientist, who denied the offense upon confrontation. Only after confronting the Chinese author’s academic dean and two journal chief editors was the plagiarized work retracted.

Idea plagiarism

A foundation for scientific advancement is building upon previous scientific findings and ideas. However, if an author presents others’ ideas, designs, models, processes, etc. without citations, then this is considered idea plagiarism. This can be through copying previously, publically reported ideas, as well as from informal exposures to novel ideas. When compared to other forms of plagiarism (such as direct), idea plagiarism is more difficult to prove, especially if a record of presenting the idea has not been previously archived ( Debnath, 2016 ; Vitse & Poland, 2012 ). Regrettably, the threat of idea plagiarism may inhibit scientists from discussing their work with other scientific groups lest another team beat them to peer-review publication and assume credit for the novel idea.

Self-plagiarism

Self-plagiarism involves an author or authors presenting previously reported work as new; this may include portions of a previous report or republication of an entire article ( Vitse & Poland, 2012 ). A chief motivator for replicate publication is to increase publication counts for career advancement. While using text from a previous report might be forgiven in describing a component of a similar method, an author should cite and often gain publisher permission to redundantly report research results.

As a rule of thumb, the overlap of an author or team of authors’ previous publications should be no more than one-third ( World Association of Medical Editors, 2018 ). Additionally, if an author has previously reported preliminary or partial data in a scientific forum (e.g. international conference), he or she should cite that previous presentations in the more complete scientific manuscript submitted to a scientific journal for publication ( Anderson & Steneck, 2011 ). Such previous preliminary reports can lead to conflicts in copyright permissions, which scientific journals may need to explore.

Translation plagiarism

Similar to direct plagiarism, another form of plagiarism is to translate novel data or ideas from one language to another, representing it as unique and one’s own creation without crediting the original work. For example, one co-author of this article, together with his colleagues, published a Chinese book based on a nationwide health service survey in the early 1990s. A Chinese student studying at a UK university used the data from the book to write up his PhD dissertation, without citing or mentioning the original work. Another coauthor of this report drafted a detailed scientific proposal to a US granting agency, sharing it with a Chinese collaborator. Later it was learned that the Chinese collaborator engaged his student in translating the document into Chinese for submission to a Chinese funding agency without notifying or crediting the original author. Translational plagiarism may also occur during the peer-review of grant proposals or scientific manuscripts when a reviewing scientist plagiarizes the original text and ideas from a reviewed work.

Source plagiarism

Source plagiarism occurs when authors do not read original sources of new scientific information and instead glean information from those sources by reading and crediting only secondary sources. The problem with source plagiarism is the possible misinterpretation of the original report by reviewing only the secondary source’s interpretation. This can result in the spreading of inaccurate information.

Hence, when citing novel research, one should be careful to review and credit the original published work ( Mohan, Shetty, Shetty, & Pandya, 2015 ). If the original resource is not clearly cited, the authors could be criticized for concealing the original data source and for making it difficult for future researchers to recognize and benefit from the original researchers’ work ( Mohan et al., 2015 ). When gaining access to an original source is a barrier or when referencing the ideas proposed by the secondary source, it is imperative that the secondary source be cited properly as a review.

Ghost and guest writing

Ghost writing occurs when someone contributes significantly to manuscript development but is not credited for their work ( Citrome, 2017 ). Conversely, guest or honorary authorship occurs when someone is recognized as an author but has not met criteria for authorship. Often guest authors are recognized for their administrative or other governance authority for the research work or included due to their scientific accomplishments simply to enhance the probability of publication of the manuscript in a desired journal ( Citrome, 2017 ).

A secondary form of guest authorship is the inappropriate designation of multiple authors as first authors or as senior authors with an indication “that these authors contributed equally” when they clearly could not have contributed equally to the work. This seems an especially unethical and even a ridiculous impossibility when the “equally contributing” authors exceed more than two persons. However, many institutions in China drive such unethical multiple recognition by only valuing the first and senior authorship positions in a scientific manuscript. Ghost and guest authorship can often be fueled by “blurring of lines” when it comes to what defines contribution worthy of authorship ( Rohwer, Young, Wager, & Garner, 2017 ).

Who is Plagiarizing and Why?

Students, researchers, and professors in any country have the potential to plagiarize, though some may have more motive or be more susceptible to unintentional plagiarism than others. Students and young researchers may experience greater pressure to be productive publishers in pursuit of a degree or career advancement, leading to less regard to ethical guidelines ( Debnath, 2016 ; Rohwer et al., 2017 ).

In working in China, we have learned that some Chinese degree programs will not permit a student to graduate until the student publishes a first author peer-reviewed manuscript in a scientific journal of relatively high impact. Students under such pressure are also often more naïve to some forms of plagiarism, making them more likely to unintentionally commit offense ( Armstrong, 1993 ; Debnath, 2016 ). One study in which Chinese college students enrolled in English as a second language classes found that while all of these students understood the word “plagiarism,” less than 50% had received formal instruction on plagiarism ( Zhang, 2014 ). This illustrates the importance of proper scientific ethics training early in an academic career.

While students and young researchers may be thought to have the motive to plagiarize, more experienced researchers have faced similar pressures and also been found guilty of plagiarism ( Dyer, 2016 ; National Post, 2012 ; Retraction Watch, 2018 ). Over the past decade, more and more Chinese universities and research institutions have used a number of peer-reviewed publications and their impact factor scores as important criteria when considering promotion of faculty and researchers. Also, frequently, financial incentives are offered to the first and senior authors of manuscripts published in high impact journals, especially in leading institutions. These incentives are often not trivial, with some institutions offering bonus payment of up to $165,000 for a single publication (The Economist, 2018 ). Consequently, there is much competitive pressure to win publication in high impact English journals and sometimes this pressure outweighs caution regarding ethical guidelines.

Ignorance to some of the nuances of plagiarism extends beyond students to more senior Chinese researchers. For instance, one study in which in-depth interviews with Chinese researchers were conducted found that many didn’t see a problem with reporting methods or reusing text without citations ( Li, 2013 ). Ultimately, researchers at any level may be the perpetrator of plagiarism whether through ignorance, the unethical behavior of a trusted team member, or simple malfeasance.

While researchers in the U.S. also face competitive pressures, with their career advancement also depending upon high-impact publications, they are more likely than Chinese researchers to have had specific training on research misconduct such as plagiarism. For example, both the U.S. National Institutes of Health (2018) and the National Science Foundation (2018) require that institutions receiving grants develop instruction in the Responsible Conduct of Research (RCR), covering plagiarism as one of many other topics. The Natural Science Foundation of China, which is the primary funder of basic science research in China, does not have such requirements of its grantees. Formal programs at universities in China tend to be limited to covering issues of data security and do not attend to issues such as responsible conduct of research, including plagiarism.

Another cultural issue is the Chinese notion of ‘imitating the master’--the sense that one learns from and honors a great artist by copying his technique, rather than by innovating. This cultural value, intended as a form of respect, does not justify or excuse the disproportionate level of retractions among Chinese-authored publications. It may, however, diminish the seriousness of plagiarism in the minds of Chinese students in the absence of formal training in scientific ethics.

Why is Plagiarism Harmful?

Simply put, plagiarism is an act of denial of due credit. In general, researchers earn favorable recognition through their published works. Plagiarism, if not corrected, undermines this merit-based system by inappropriately rewarding plagiarizing authors for the honest labor of other authors who developed the original work ( Armstrong, 1993 ; Mohan et al., 2015 ). Hence, plagiarism may be viewed as a form of stealing, corruption, or other malfeasance directly attacking the scientific community which relies upon ethical scientific behavior.

Plagiarism also wastes the time and energy of scientific journal staff, panels of peer-reviewers, and the scientific community in general who must often rely upon journals to publish new knowledge to push their research fields forward. Plagiarism can also impact the success and value of ethical scientists in that review and publication of their works may have to be unnecessarily postponed while the journal and peer reviewers evaluate the plagiarized manuscript ( Anderson & Steneck, 2011 ).

Perhaps worst of all, plagiarism has potential to skew a body of research by reporting published data multiple times ( Anderson & Steneck, 2011 ). This could alter conclusions made in reviews and meta-analyses, and ultimately could influence evidence-based decisions made by clinicians, scholars, and many other professionals.

Strategies for Reducing Plagiarism

Research ethics training.

The first defense against plagiarism is appropriately educating students, researchers, and research faculty regarding the various types of plagiarism and the consequences of such scientific ethics violation. One study published in 2017 suggested that institutional policies on plagiarism in China lack adequate guidelines for ethics education and training ( Hu & Sun, 2017 ). Students, researchers, and research faculty must be very familiar with the requirements to correctly cite previously published or unpublished works and how quotation marks are to be used when exact text is extracted.

It is also important that researchers understand how to distinguish original ideas or conclusions in previous work from their own thoughts ( Pechenik, 2010 ). Pechenik’s guide offers many suggestions for avoiding plagiarism in the early notetaking and outline drafting stages of a written work. Keeping a record of where thoughts or text originated from, as well as distancing oneself from the original body of work can help reduce unintentional plagiarism ( Pechenik, 2010 ). Further, it is important to consult primary literature as much as possible and avoid citation of reviews of secondary sources ( Price, 2014 ). Reviewing primary literature not only prevents the temptation of using someone else’s summarizing thoughts but also helps preserve the findings of the original works. A list of resources for such scientific research ethic training is made available in the supplemental material ( Table 3 ).

Sources for Ethical Training on Plagiarism

The second defense in preventing plagiarism is to strengthen ethics training by appealing to the moral fabric and integrity of researchers from multiple value systems, any one of which may register as important to an individual. In China, this could mean appealing to researchers to avoid the stain of corruption which undermines the Communist party’s success, or appealing to the teaching of Confucius to avoid dishonoring one’s family through public humiliation from an ethics violation, or appealing to one’s value of right and wrong based upon the Judeo-Christian commandment of not stealing. Whatever the effective strategy there seems to be need for compelling Chinese research professionals to avoid crossing the moral “redline” of plagiarism even under high pressures from the scientific work environment.

University and institution policies

Plagiarism prevention can also be employed at higher levels through written policies at academic and scientific institutions that would aggressively discourage misconduct. In academia, students, researchers, and faculty should be warned that writing assignments will often be screened with plagiarism detection software, and when detected, violators will be punished following concrete written policies. Examples of violations should be publicly communicated such that individuals and their institutions are concerned.

A recognition of serious consequences for plagiarism can be a powerful deterrent to bad scientific behavior. Many institutions in the USA, Canada, and European countries have written policies regarding punishment for acts of plagiarism, while only a few of universities and institutions in China have clearly developed similar policies. One would think that having such no-plagiarism-will-be tolerated policies and enforcing them would give such Chinese institutions a strong stamp of quality their faculty might acknowledge in manuscript submissions.

As for preventing ghost writing and guest authorship, researchers should be pointed to international guidelines for authorship such as that drafted by the International Committee of Medical Journal Editors ( International Committee of Medical Journal Editors, 2018 ). Minor contributors to a project should only be recognized in the acknowledgements. In addition to adhering to the scientific ethics code themselves, all co-authors on an article need to be vigilant to protect their reputation by ensuring that their colleagues haven’t intentionally or unintentionally plagiarized data. Authors may also wish to screen their draft manuscripts prior to journal submission to avoid any potential embarrassments.

Academic journal policies

Recognition of plagiarism can occur at the time of manuscript submission through manuscript software screening, through peer-review, or after publication. Recognition of plagiarism after publication is clearly more painful for the plagiarizing authors and also for the journal; it embarrasses all involved, especially if formal retractions are demanded by the original authors or the original publishing body. Recognition of plagiarism before publication may not have the same negative impact since the journal may choose to simply reject the manuscript, not wishing to confront the authors or to notify the authors’ institutional administrators. This “no action” response compounds the problem, however. If the authors are not confronted and are ignorant of their ethical violation, they may simply submit the manuscript to another journal, continuing the ethical problem. If the authors are confronted and in denial, they may again choose to push the problem down the road to another journal.

One co-author on this paper was asked to review the same paper submitted sequentially to two different journals, having pointed out to the first that the authors had failed to properly cite sources of data. A better deterrent would be for the journal to explain in the instructions to authors that suspected plagiarized manuscripts will be rejected, and the authors, their institutional leadership, and if egregious, the funding sources will be formally notified. In the notification the journal may indicate a willingness to reexamine the manuscript should the suspected plagiarism be mitigated. This type of journal response would likely inhibit intentional plagiarism and nudge institutions to make sure their researchers had appropriate ethical training.

Research publications are very important. They can have profound impact upon health policy, clinical interventions, and future funded research. As China’s scientific enterprise moves toward center stage in the world, it is imperative that appropriate attention be paid to training its scientists to avoid plagiarism and to adopt global standards for research integrity.

China now tops the list of countries with the largest annual number of scientific publications. At the same time, China also leads the list of countries with the highest proportion of scienfitic publication retractions.
The rise in this academic misconduct in China has given Chinese researchers a bad reputation and likely led to lower manuscript acceptance rates in academic journals.
Plagiarism can be thwarted by strengthening ethics training for students and researchers, as well as implementing penalties for plagiarism offenses in universities, research institutions, and academic journals.

Gregory C. Gray is a senior researcher and educator in the field of One Health.

Laura K. Borkenhagen coordinates many studies in Southeast Asia and China for the Duke One Health team.

The views expressed in this paper do not necessarily represent the views of the National Science Foundation or the US Government.

Contributor Information

Gregory C. Gray, Duke Kunshan University (China), Duke University (USA), and Duke-National University of Singapore (Singapore).

Laura K. Borkenhagen, Duke University.

Nancy S. Sung, National Science Foundation and formerly Head of the National Science Foundation’s Beijing Office, from 2014-2018.

Shenglan Tang, Global Health Research Center at Duke Kunshan University and Professor at Duke University.

Anderson MS, & Steneck NH (2011). The problem of plagiarism . Urologic Oncology , 29 ( 1 ), 90–94. doi: 10.1016/j.urolonc.2010.09.013 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Armstrong JD (1993). Plagiarism - what is it, whom does it offend, and how does one deal with it . American Journal of Roentgenology , 161 ( 3 ), 479–484. [ PubMed ] [ Google Scholar ]
Ataie-Ashtiani B (2017). Chinese and Iranian scientific publications: fast growth and poor ethics . Science and Engineering Ethics , 23 ( 1 ), 317–319. doi: 10.1007/s11948-016-9766-1 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Citrome L (2017). Authorship: Musings about guests and ghosts . International Journal of Clinical Practice , 71 ( 7 ). doi:ARTN e1298610.1111/ijcp.12986 [ PubMed ] [ Google Scholar ]
Debnath J (2016). Plagiarism: A silent epidemic in scientific writing - reasons, recognition and remedies . Medical Journal Armed Forces India , 72 ( 2 ), 164–167. doi: 10.1016/j.mjafi.2016.03.010 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Dyer O (2016). Peer reviewer stole article and published it as his own . BMJ , 355 , i6768. doi: 10.1136/bmj.i6768 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Higgins JR, Lin F-C, & Evans JP (2016). Plagiarism in submitted manuscripts: incidence, characteristics and optimization of screening—case study in a major specialty medical journal . Research Integrity and Peer Review , 1 ( 1 ), 13. [ PMC free article ] [ PubMed ] [ Google Scholar ]
Hu G, & Sun X (2017). Institutional policies on plagiarism: the case of eight Chinese universities of foreign languages/international studies . System , 66 , 56–68. [ Google Scholar ]
International Committee of Medical Journal Editors. (2018). Defining the role of authors and contributors . Retrieved from http://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authors-and-contributors.html . Accessed 14 November 2018.
Lei L, & Zhang Y (2017). Lack of improvement in scientific integrity: an analysis of WoS retractions by chinese researchers (1997–2016) . Science Engineering Ethics . doi: 10.1007/s11948-017-9962-7 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Li Y (2013). Text-based plagiarism in scientific writing: what Chinese supervisors think about copying and how to reduce it in students’ writing . Science and Engineering Ethics , 19 ( 2 ), 569–583. [ PMC free article ] [ PubMed ] [ Google Scholar ]
Liao Q-J, Zhang Y-Y, Fan Y-C, Zheng M-H, Bai Y, Eslick GD, … He H (2017). Perceptions of Chinese biomedical researchers towards academic misconduct: A comparison between 2015 and 2010 . Science and Engineering Ethics , 1–17. [ PubMed ] [ Google Scholar ]
Mack C (2016). Plagiarism . Journal of Micro/Nanolithography MEMS MOEMS , 15 ( 4 ), 10.1117/1111.JMM.1115.1114.040101. [ CrossRef ] [ Google Scholar ]
Masic I (2014). Plagiarism in scientific research and publications and how to prevent it . Materia Socio-Medica , 26 ( 2 ), 141–146. doi: 10.5455/msm.2014.26.141-146 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Mohan M, Shetty D, Shetty T, & Pandya K (2015). Rising from plagiarising . Journal of Maxillofacial and Oral Surgery , 14 ( 3 ), 538–540. doi: 10.1007/s12663-014-0705-x [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
National Institutes of Health. Responsible conduct of research (RCR) training and applying for an NIH training grant . Retrieved from https://www.training.nih.gov/rcr_and_applying_for_nih_training_grants . Accessed 14 November 2018.
National Post. (2012). Top Canadian scientist and award-winning student caught in ‘blatant plagiarism’ of text . Retreived from https://nationalpost.com/news/canada/university-of-waterloo-researchers-issue-retraction-and-apology-after-using-u-s-experts-text-and-information . Accessed 14 November 2018.
National Science Board. (2018). Science and engineering indicators 2018 . Retreived from Alexandria, VA: National Science Foundation; https://www.nsf.gov/statistics/indicators/ . Accessed 14 November 2018. [ Google Scholar ]
National Science Foundation. Responsible conduct of research (RCR) . Retrieved from https://www.nsf.gov/bfa/dias/policy/rcr.jsp . Accessed 14 November 2018. [ Google Scholar ]
Office of the President. (2000). Office of Science and Technology Policy: Federal policy on research misconduct . Federal register , 65 ( 235 ), 76260–76264. [ Google Scholar ]
Pechenik JA (2010). A short guide to writing about biology : New York: Longman. [ Google Scholar ]
Price B (2014). Avoiding plagiarism: guidance for nursing students . Nursing Standard , 28 ( 26 ), 45–51; quiz 52. doi: 10.7748/ns2014.02.28.26.45.e8514 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Qiu J (2015). Safeguarding research integrity in China . National Science Review , 2 ( 1 ), 122–125. doi: 10.1093/nsr/nwv002 [ CrossRef ] [ Google Scholar ]
Retraction Watch. Plagiarism costs author five papers in five different journals . Retreived from http://retractionwatch.com/2017/06/28/plagiarism-costs-author-five-papers-five-different-journals/ . Accessed 14 November, 2018.
Rohwer A, Young T, Wager E, & Garner P (2017). Authorship, plagiarism and conflict of interest: views and practices from low/middle-income country health researchers . BMJ open , 7 ( 11 ), e018467. [ PMC free article ] [ PubMed ] [ Google Scholar ]
Economist The. (2018). Tsinghua University may soon top the world league in science research . Retreived from https://www.economist.com/china/2018/11/17/tsinghua-university-may-soon-top-the-world-league-in-science-research . Accessed 17 December 2018.
US Department of Health & Human Services Office of Research Integrity. (1994). ORI Policy on Plagiarism . Retrieved from https://ori.hhs.gov/ori-policy-plagiarism . Accessed 14 November 2018.
Van Noorden R (2016). China by the numbers . Nature News , 534 ( 7608 ), 452. doi:doi: 10.1038/534452a [ PubMed ] [ CrossRef ] [ Google Scholar ]
Vitse CL, & Poland GA (2012). Plagiarism, self-plagiarism, scientific misconduct, and VACCINE: protecting the science and the public . Vaccine , 30 ( 50 ), 7131–7133. doi: 10.1016/j.vaccine.2012.08.053 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Webster, Merriam. Definition of plagiarize. (2018). Retrieved from https://www.merriam-webster.com/dictionary/plagiarizing . Accessed 14 November 2018.
World Association of Medical Editors. Self plagiarism of textbook chapters . Retrieved from http://www.wame.org/about/self-plagiarism-of-textbook-chapters . Accessed 14 November 2018.
Zhang C (2014). Plagiarism in their own words: what Chinese and American students say about academic dishonesty . Chinese Journal of Applied Linguistics , 37 ( 3 ), 373–391. [ Google Scholar ]

Research article
Open access
Published: 27 July 2020

Testing of support tools for plagiarism detection

Tomáš Foltýnek 1 , 2 ,
Dita Dlabolová 1 ,
Alla Anohina-Naumeca 3 ,
Salim Razı 4 ,
Július Kravjar 5 ,
Laima Kamzola 3 ,
Jean Guerrero-Dib 6 ,
Özgür Çelik 7 &
Debora Weber-Wulff 8

International Journal of Educational Technology in Higher Education volume 17 , Article number: 46 ( 2020 ) Cite this article

33k Accesses

44 Citations

73 Altmetric

Metrics details

There is a general belief that software must be able to easily do things that humans find difficult. Since finding sources for plagiarism in a text is not an easy task, there is a wide-spread expectation that it must be simple for software to determine if a text is plagiarized or not. Software cannot determine plagiarism, but it can work as a support tool for identifying some text similarity that may constitute plagiarism. But how well do the various systems work? This paper reports on a collaborative test of 15 web-based text-matching systems that can be used when plagiarism is suspected. It was conducted by researchers from seven countries using test material in eight different languages, evaluating the effectiveness of the systems on single-source and multi-source documents. A usability examination was also performed. The sobering results show that although some systems can indeed help identify some plagiarized content, they clearly do not find all plagiarism and at times also identify non-plagiarized material as problematic.

Introduction

Teddi Fishman, former director of the International Centre for Academic Integrity, has proposed the following definition for plagiarism: “ Plagiarism occurs when someone uses words, ideas, or work products, attributable to another identifiable person or source, without attributing the work to the source from which it was obtained, in a situation in which there is a legitimate expectation of original authorship, in order to obtain some benefit, credit, or gain which need not be monetary “(Fishman, 2009 , p. 5). Plagiarism constitutes a severe form of academic misconduct. In research, plagiarism is included in the three “cardinal sins”, FFP—Fabrication, falsification, and plagiarism. According to Bouter, Tijdink, Axelsen, Martinson, and ter Riet ( 2016 ), plagiarism is one of the most frequent forms of research misconduct.

Plagiarism constitutes a threat to the educational process because students may receive credit for someone else’s work or complete courses without actually achieving the desired learning outcomes. Similar to the student situation, academics may be rewarded for work which is not their own. Plagiarism may also distort meta-studies, which make conclusions based on a number or percentage of papers that confirm or refute a certain phenomenon. If these papers are plagiarized, then the number of actual experiments is lower and conclusions of the meta-study may be incorrect.

There can also be other serious consequences for the plagiarist. The cases of politicians who had to resign in the aftermath of a publicly documented plagiarism case are well known, not only in Germany (Weber-Wulff, 2014 ) and Romania (Abbott, 2012 ), but also in other countries. Scandals involving such high-profile persons undermine citizens’ confidence in democratic institutions and trust in academia (Tudoroiu, 2017 ). Thus, it is of great interest to academic institutions to invest the effort both in plagiarism prevention and in its detection.

Foltýnek, Meuschke, and Gipp ( 2019 ) identify three important concerns in addressing plagiarism:

Similarity detection methods that for a given suspicious document, are expected to identify possible source document(s) in a (large) repository;

Text-matching systems that maintain a database of potential sources, employ various detection methods, and provide an interface to users;

Plagiarism policies that are used for defining institutional rules and processes to prevent plagiarism or to handle cases that have been identified.

This paper focuses on the second concern. Users and policymakers expect what they call plagiarism detection software , but more exactly should be referred to as text-matching software , to use state-of-the-art similarity detection methods. The expected output is a report with all the passages that are identical or similar to other documents highlighted, together with links to and information about the potential sources. To determine how the source was changed and whether a particular case constitutes plagiarism or not, an evaluation by a human being is always needed, as there are many inconclusive or problematic results reported. The output of such a system is often used as evidence in a disciplinary procedure. Therefore, both the clarity of the report and the trustworthiness of its content are important for the efficiency and effectiveness of institutional processes.

There are dozens of such systems available on the market, both free and paid services. Some can be used online, while others need to be downloaded and used locally. Academics around the globe are naturally interested in the question: How far can these systems reach in detecting text similarities and to what extent are they successful? In this study, we will look at the state-of-the-art text-matching software with a focus on non-English languages and provide a comparison based on specific criteria by following a systematic methodology.

The investigation was conducted by nine members of the European Network for Academic Integrity (ENAI) in the working group TeSToP, Te sting of S upport To ols for P lagiarism Detection. There was no external funding available, the access to the various systems was provided to the research group free of charge by the companies marketing the support tools.

The paper is organized as follows. The next section provides a detailed survey of related work. This is followed by a specification of the methodology used to carry out the research and then a description of the systems used in the research. After reporting on the results acquired, discussion and conclusion points are given.

Survey of related work

Since the beginning of this century, considerable attention has been paid, not only to the problem of plagiarism, but also to text-matching software that is widely used to help find potentially plagiarized fragments in a text. There are plenty of scientific papers that postulate in their titles that they offer a classification, a comparative study, an overview, a review, a survey, or a comparison of text-matching software tools. There are, however, issues with many of the papers. Some, such as Badge and Scott ( 2009 ) and Marjanović, Tomašević, and Živković ( 2015 ), simply refer to comparative tests performed by other researchers with the aim of demonstrating the effectiveness of such tools. Such works could be useful for novices in the field who are not familiar with such automated aides, but they are meaningless for those who want to make an informed choice of a text-matching tool for specific needs.

Many research works offer only a primitive classification of text-matching software tools into several categories or classes. Others provide a simple comparative analysis that is based on functional features. They are usually built on a description of the tools as given on their official websites (e.g. Nahas, 2017 ; Pertile, Moreira, & Rosso, 2016 ), a categorization given in another source (e.g. Chowdhury & Bhattacharyya, 2016 ; Lukashenko, Graudina, & Grundspenkis, 2007 ; Urbina et al., 2010 ) or a study of corresponding literature and use of intelligent guess (e.g. Lancaster & Culwin, 2005 ). These types of research give a good insight into a broad scope of the functional features, focus, accessibility, and shortcomings of text-matching software. However, they are still incomplete for guiding the selection of a tool, as they do not evaluate and compare the performance of software systems and their usability from the viewpoint of end-users.

The most frequently mentioned categorizations are as follows:

Software that checks text-based documents, source code, or both (Chowdhury & Bhattacharyya, 2016 ; Clough, 2000 ; Lancaster & Culwin, 2005 ; Lukashenko et al., 2007 );

Software that is free, private, or available by subscription (Chowdhury & Bhattacharyya, 2016 ; Lancaster & Culwin, 2005 ; Lukashenko et al., 2007 ; Nahas, 2017 ; Pertile et al., 2016 ; Shkodkina & Pacauskas, 2017 ; Urbina et al., 2010 ; Vandana, 2018 );

Software that is available online (web-based) or can be installed on a desktop computer (Lancaster & Culwin, 2005 ; Marjanović et al., 2015 ; Nahas, 2017 ; Pertile et al., 2016 ; Shkodkina & Pacauskas, 2017 ; Vandana, 2018 );

Software that operates intra-corpally, extra-corpally, or both (Lancaster & Culwin, 2005 ; Lukashenko et al., 2007 ; Marjanović et al., 2015 ).

Additionally, some researchers include unconventional comparative criteria. Pertile et al. ( 2016 ) indicate if a tool can make a citation analysis, a content analysis, structural analysis, or a paraphrase analysis. Lancaster and Culwin ( 2005 ) take into account the number of documents that are processed together to generate a numeric value of similarity and the computational complexity of the methods employed to find similarities. McKeever ( 2006 ) classifies text-matching software tools into search-based systems, systems performing linguistic analysis, software based on collusion detection, and systems for detecting software plagiarism.

Shkodkina and Pacauskas ( 2017 ) have defined 28 comparison criteria that are divided into four categories: affordability, material support, functionality, and showcasing. They compared three tools based on the criteria defined. However, it is not clear how the comparison was actually conducted, whether only by studying information available on product websites, or by trying out each tool. The descriptive part of their comparison does not contain references to information sources. Moreover, the set of criteria includes the ability of a tool to recognize different types of plagiarism (such as paraphrasing, translation, obfuscation, or self-plagiarism) and there are no indications of how these criteria were evaluated.

Shynkarenko and Kuropiatnyk ( 2017 ) have not compared available text-matching software tools, but they have defined more than 30 requirements for the development of such tools based on the analysis of other authors’ works and the documentation of tools. They provide a comparison between the 27 tools mentioned in their paper and the defined requirements.

It is rather surprising that with the variety of research work on text-matching software, only a few of them address the performance and usability of these tools. Moreover, some of them do not make a comparative analysis of performance, but mainly check working principles and capabilities of the tools based on testing them on different kinds of submissions. At the end of the last century, Denhart ( 1999 ) published a lively discussion of his check of three systems. He uploaded his senior thesis and a mini-essay made up of randomly selected sentences from four well-known authors with some slight paraphrasing to the systems. He found problems with properly quoted material and the inability to find many plagiarized sentences in the mini-essay. He also mentioned poor usability for one of the systems that otherwise had quite good performance results.

Culwin and Lancaster ( 2000 ) used a more carefully constructed text and checked four tools operating at that time using six sentences: four original sentences from two famous works widely available on the web, one paraphrased sentence from an essay available on a free essay site, and an original sentence from a newly indexed personal website. They checked the performance of tools and described if the text was found or not and at which sites. They also addressed some usability problems of systems for tutors and students.

Maurer, Kappe, and Zaka ( 2006 ) checked three tools in relation to verbatim plagiarism, paraphrasing, tabular information processing, translation plagiarism, image/multimedia processing, reference validity check, and a possibility to exclude/select sources. Despite that they are not describing the experiments in detail, there is evidence that they used a prepared sample of texts. These included a paragraph from proceedings that was paraphrased using a simple automatic word replacement tool, text compiled from documents available on the Internet, tabular information, and text in languages with special characters. They conclude that tools work reasonably well when plagiarized text is available on the internet or in other electronic sources. However, text-matching software fails to match paraphrasing plagiarism, plagiarism based on non-electronically available documents, and translation plagiarism. They also do not do well when processing tabular information and special characters.

Vani and Gupta ( 2016 ) used a small text fragment from the abstract of a scientific article and modified it based on four main types of obfuscation: verbatim plagiarism, random obfuscation, translation obfuscation, and summary obfuscation. Using the prepared text sample, they checked three tools and found that tools fail to find translation and summary obfuscations.

Křížková, Tomášková, and Gavalec ( 2016 ) made a comparative analysis of five systems completing two test series that used the same eight articles. The first test consisted of the articles without any modifications; the second test included manually modified articles by reordering words in the text. Their analysis consisted mainly of the percentage of plagiarism found and the time spent by the systems for checking the articles. They then applied multi-criteria decision-making for choosing the best system. However, there is no clear indication of the comparison goal, information about the already presented plagiarism in each of the articles, or how much plagiarism found by the systems matched the initially presented plagiarism. They also addressed usability by using a criterion “additional support” that includes a possibility to edit text directly on the website, multilingual checking, availability of vast information about plagiarism, etc.

Bull, Collins, Coughlin, and Sharp ( 2001 ) used a well-planned methodology and checked five systems identified through a survey of the academic staff from the higher education sector. They compared many functional features and also tested the performance of the tools. The criteria for evaluation contained among other issues the clarity of reports, the clarity of the instructions, the possibility to print the results, and the ease of interpreting the results that refer to the usability of tools. To test the performance they used eleven documents from six academic disciplines and grouped them into four categories according to the type of plagiarized material: essays from on-line essay banks, essays with verbatim plagiarism from the internet, essays written in collusion with others but with no internet material included, and essays written in collusion and containing some copied internet material. They tested the documents over a period of 3 months. In the end, they concluded that the tools were “ effective in identifying the types of plagiarism that they are designed to detect ” (Bull et al., 2001 , p. 5). However, not all tools performed well in their experiments and they also reported on some anomalies in their results.

Chaudhuri ( 2008 ) examined only one particular tool and used 50 plagiarized papers from many different sources (freely available databases, subscription databases, open access journals, open sources, search engines, etc.) in different file formats. The researcher found that the tool is unable to match papers from subscribed databases, to process cited and quoted material, and articles from open access journals.

Luparenko ( 2014 ) tested 22 tools that were selected as popular ones based on an analysis of scientific literature and web sources. She considered many criteria related to functional specification (such as type, availability of free trial mode, need for mandatory registration at a website, number of users that have access to the program, database, acceptable file formats, etc.) and also checked the performance of the tools using one scientific paper in the Ukrainian language and another one in English. Moreover, the checking was done using three different methods: entering the text in the field of website, uploading a file, and submitting the URL of the article. She measured the checking time and evaluated the quality of the report provided by tools, as well as reported the percentage of unique text found in each of the articles.

The Croatian researchers Birkić, Celjak, Cundeković, and Rako ( 2016 ) tested four tools that are widely used in Europe and have the possibility to be used at the national and institutional level. They compared such criteria as the existence of an API (application programming interface) and the possibility to integrate it as a plug-in for learning management systems, database scope, size of the user community, and other criteria. The researchers tested the tools using two papers for each type of submission: journal articles, conference papers, master’s and doctoral theses, and student papers. However, they did not include different types of plagiarism and evaluated the checking process with a focus on quote recognition, tool limitations, and interface intuitiveness.

Kakkonen and Mozgovoy ( 2010 ) tested eight systems using 84 test documents from several sources (internet, electronically unpublished books or author’s own prepared texts, paper mills) that contained several types of plagiarism: verbatim copying, paraphrasing (e.g. adding more spaces, making intentional spelling errors, deleting or adding commas, replacing words by synonyms, etc.) and applying technical tricks. The technical tricks included the use of homoglyphs, which involve substituting similar-looking characters from different alphabets, and adding a character in a white-colored font as an empty space or including text as images. The authors provided a very detailed description of the experiments conducted and used a well-planned methodology. Their findings include problems with submissions from a paper mill, difficulties in identification of synonymous and paraphrased text, as well as finding the source for text obfuscated by technical tricks.

However, the most methodologically sound comparisons were conducted by Debora Weber-Wulff and her team (Weber-Wulff, Möller, Touras, & Zincke, 2013 ) between 2004 and 2013 (see http://plagiat.htw-berlin.de/start-en/ ). In their last testing experiment in 2013, the researchers compared 15 tools that were selected based on previous comparisons. The testing set contained both plagiarized and original documents in English, German, and Hebrew. The test set included various types of plagiarism from many different sources. They found serious problems with both false positives and false negatives, as well as usability problems such as many clicks needed for simple tasks, unclarity of reports, or language issues.

Summarizing the related work discussed in this chapter, it is worth mentioning that available studies on text-matching software:

rarely address the evaluation of performance and usability of such tools, but mostly include a simple overview of their functional features, primitive categorization, or trivial comparisons;

infrequently provide justification for the selection of tools based on well-defined reasons but often only mention the popularity of the tools;

seldom use a well-planned scientific methodology and a well-considered corpus of texts in cases in which they evaluate the performance of the tools;

do not report “explicitly on experimental evaluations of the accuracy and false detection rates” (Kakkonen & Mozgovoy, 2010 , p. 139).

Taking into account the rapid changes in the field (some tools are already out of service, others have been continuously improved, and new tools are emerging) the need for comparative studies that in particular test the performance of the tools is constant. McKeever ( 2006 , p. 159) also notes that “ with such a bewildering range of products and techniques available, there is a compelling need for up-to-date comparative research into their relative effectiveness ”.

Methodology

The basic premise of this software test is that the actual usage of text-matching software in an educational setting is to be simulated. The following assumptions were made based on the academic experience of some members of the testing group before preparing for the test:

Students tend to plagiarize using documents found on the internet, especially Wikipedia.

Some students attempt to disguise their plagiarism.

Very few students use advanced techniques for disguising plagiarism (for example, homoglyphs).

Most plagiarizing students do not submit a complete plagiarism from one source, but use multiple sources.

Instructors generally have many documents to test at one time.

There are legal restrictions on instructors submitting student work.

In some situations the instructor only reviews the reports, submission is done either by the students themselves or by a teaching assistant.

Instructors do not have much time to spend on reviewing reports.

Reports must be stored in printed form in a student’s permanent record if a sanction is levied.

Universities wish to know how expensive the use of a system will be on a yearly basis.

Not all of these assumptions were able to be put to test, for example, most systems would not tell us how much they charge, as this is negotiated on an individual basis with each institution.

Testing documents

In order to test the systems, a large collection of intentionally plagiarized documents in eight different languages were prepared: Czech, English, German, Italian, Latvian, Slovak, Spanish, and Turkish. The documents used various sources (Wikipedia, online articles, open access papers, student theses available online) and various plagiarism techniques (copy & paste, synonym replacement, paraphrase, translation). Various disguising techniques (white characters, homoglyphs, text as image) were used in additional documents in Czech. The testing set also contained original documents to check for possible false positives and a large document to simulate a student thesis.

One of the vendors noted in pre-test discussions that they perceived Turnitin’s exclusive access to publisher’s databases as an unfair advantage for that system. As we share this view and did not want to distort the results, documents with restricted access were deliberately not included.

All testing documents were prepared by TeSToP team members or their collaborators. All of them were obliged to adhere to the written guidelines. As a result, each language set contained at least these documents:

a Wikipedia article in a given language with 1/3 copy & paste, 1/3 with manual synonym replacement, and 1/3 manual paraphrase;

4–5 pages from any publicly available source in a given language with 1/3 copy & paste, 1/3 with manual synonym replacement, and 1/3 manual paraphrase;

translation of the English Wikipedia article on plagiarism detection, half using Google Translate and half translated manually;

an original document, i.e. a document which is not available online and has not been submitted previously to any text-matching software;

a multi-source document in three variations, once as a complete copy & paste, once with manual synonym replacement and once as a manual paraphrase.

The basic multi-source document was created as a combination from five different documents following the pattern ABCDE ABCDE ABCDE ABCDE ABCDE, where each letter represents a chunk of text from a specific source. Each chunk was one paragraph (approx. 100 words) long. The documents were taken from Wikipedia, open access papers, and online articles. In some languages, there were additional documents included to test specific features of the systems.

Table 1 gives an overview of the testing documents and naming convention used for each language set.

Some language sets contained additional documents. Since many Slovak students study at Czech universities and the Czech and Slovak languages are very similar, a translation from Slovak to Czech was included in the Czech set and vice versa. There is also a significant Russian minority in Latvia so that a translation from Russian to Latvian was also included. The German set contained a large document with known plagiarism to test the usability of the systems, but it is not included in the coverage evaluation.

The documents were prepared in PDF, DOCX, and TXT format. By default, the PDF version was uploaded. If a system did not allow that format, DOCX was used. If DOCX was not supported, TXT was used. Some systems do not enable uploading documents at all, so the text was only copied and pasted from the TXT file. Permission to use the sources in this manner was obtained from all original authors. The permissions were either implicit (e.g. Creative Commons license), or explicit consent was obtained from the author.

Testing process

Between June and September 2018, we contacted 63 system vendors. Out of these, 20 agreed to participate in the testing. Three systems had to be excluded because they do not consider online sources and one because it has a word limit of 150 words for its web interface. In the next stage, the documents were submitted to the systems by authorized TeSToP members at a time unknown to the vendor. System default parameters were used at all times; if values such as minimum word run are discernable, they were recorded. After submission of the documents, one system withdrew from testing. Thus 15 systems were tested using documents in eight languages.

For evaluation, the following aspects were considered:

coverage: How much of the known plagiarism was found? How did the system deal with the original text?

usability: How smooth was the testing process itself? How understandable are the reports? How expensive is the system? Other usability aspects.

To perform the coverage evaluation, the results were meticulously reviewed in both the online interface and the PDF reports, if available. Since the percentages of similarity reported do not include exact information on the actual extent of plagiarism and may even be misleading, a different evaluation metric was used. The coverage was evaluated by awarding 0 – 5 points for each test case for the amount of text similarity detected:

5 points: all or almost all text similarity detected;

4 points: a major portion;

3 points: more than half;

2 points: half or less;

1 point: a very minor portion;

0 points: one sentence or less.

For original work that produced false positives, the scale was reversed. Two or three team members independently examined each report for a specific language and discussed cases in which they did not agree. In some cases, it was difficult to assign points from the above-mentioned categories, especially for the systems which show only the matches found and hide the rest of the documents. If the difference between evaluators was not higher than 1 point, the average was taken. The interpretation of the above-mentioned scale was continuously discussed within the whole team.

To perform a usability evaluation, we designed a set of qualitative measures stemming from available literature (e.g. Badge & Scott, 2009 ; Chowdhury & Bhattacharyya, 2016 ; Hage, Rademaker, & van Vugt, 2010 ; Martins, Fonte, Henriques, & da Cruz, 2014 ) and our experience. There were three major areas identified:

Testing process;

Test results;

Other aspects.

Two independent team members assessed all systems in all criteria, only giving a point if the criteria are satisfactory and no points if not. After that, they discussed all differences together with a third team member in order to reach a consensus as far as possible. If an agreement was not possible, half a point was awarded. It a system offered such a functionality, but if the three researchers testing the systems were unable to find it without detailed guidance, 0.5 points were awarded.

Our testing took place between Nov 2018 and May 2019. During this time, we tested both coverage and usability. An additional test of multi-source document took place between August and November 2019. Since the present research did not benefit from any funding, the researchers were expected to fulfil their institutional workloads during the research period. Considering the size of the project team from various countries, we could make significant progress only during semester breaks, which explains the length of the testing process. It should be noted that we tested what the systems offered at the time of data collection. We used features that were allowed by the access given to us by the vendors.

The methodology was sent to all vendors, so that they were informed about the aim of the testing and other aspects of the process. The vendors were informed about categories of our testing (coverage criteria and usability criteria), as well as the fact we planned using documents in multiple languages.

Since the analysis and interpretation of the data are quite sensitive, we approached this period with the utmost care. As suggested by Guba and Lincoln ( 1989 ), member check is an effective technique for establishing the trustworthiness criteria in qualitative studies. Therefore, having analyzed and reported the data, we sent a preprint of the results to the vendors. Team members closely evaluated the issues raised by the vendors. Not all of them were able to be addressed in this paper, but as many as possible were incorporated. Because of the rigorous efforts to establish the validity of the results and the reliability of the study in this process, this study was further delayed.

Overview of systems

In this chapter, a brief description of all of the web-based systems involved in the test is given. The information presented here is based on the information provided by the companies operating the systems—either from their website, or they were provided upon request by telephone or email using the list of questions documented in (Weber-Wulff, 2019 ). Links to the main pages of the systems can be found in the Appendix .

The Akademia system presents itself as an anti-plagiarism system. It is intended for use at all levels of educational institutions and also for commercial institutions. The primary focus is on the region of Kosovo and Albania. The system was introduced in 2018. It is run by the company Sh.PK Academy Platform located in Pristina, Kosovo (Innovation Centre Kosovo, 2018 ).

Copyscape declares itself to be a plagiarism checker. The primary aim is to provide a tool for owners of websites to check if their original content was not used by others. They also provide a service of regular checks and email alerts. Copyscape, which started in 2004 (Greenspan, 2019 ), is operated by a private company, Indigo Stream Technologies Ltd., which is apparently based in Gibraltar. It does not have its own database but uses Google services to crawl the web.

Docol©c describes itself as a system for finding “ similarities between text documents on the Internet ” (Docol©c, 2019 ). It is intended for institutional use and focuses on German-speaking countries. According to the company, the system is used by more than 300 educational institutions in Austria, Germany, and Switzerland, plus around 20 universities worldwide and is integrated into the conference systems EDAS and OpenConf. Docol©c is operated by a private company, Docol©c UG (haftungsbeschränkt) & Co KG, based in Germany. It was developed in the years 2004–2005 at the University of Braunschweig— intended for personal use only. In 2006, it became available commercially. It uses MS Bing services to crawl the web and enables its customers to connect and browse their own databases. The license costs depend on the number of pages to be scanned per year and per institution.

DPV is part of the Slovenian National Portal of Open Science, which also provides an academic repository. The project, which is supported by the Slovenian higher education institutions, started in 2013. The detection software was developed by researchers from Slovenian universities. The operation of the system is partially funded by the European Union from the European Regional Development Fund and the Ministry of Education, Science and Sport (Ojsteršek et al., 2014 ).

Dupli Checker presents itself as a plagiarism checker. It is a free tool, but each search is limited to 1,000 characters). It does not focus on any specific users or purposes. There is no information about who operates it available at the website, we were also not able to receive such information when we asked directly via email. The website offers a variety of tools such as a paraphrasing tool and many search engine optimization (SEO) and other website management tools. Additionally, according to the statement found on their web site, they “have delivered over 1,000,000 pages of high-quality content which attracts large amounts of traffic to [their] client’s websites” (Dupli Checker, 2019 ), so that it appears that they also offer a copywriting service.

The system intihal.net is operated by a Turkish private company, Asos Eğitim Bilişim Danışmanlik, and focuses on that language. According to direct information from the representatives, the system is being used by 50 Turkish universities and it has been operating approximately since 2017.

PlagAware is operated by a German private company, PlagAware Unternehmergesellschaft (haftungsbeschränkt). PlagAware states that it has 39,000 active users, and focuses on a wide range of customers—universities, schools and businesses, that are offered institutional licenses, and individuals, who can use purchased credits for individual checks of documents. They promise to perform the comparison with 10 billion online documents and reference texts provided by the user.

Plagiarism Software is operated by a private company settled in Pakistan. They focus on any type of individual users and claim to have 500,000 users. According to the information from the representative of the company, they started approximately in 2014 (on the web they claim to have seven years of experience which would date back to 2012) and they are using search engine services to browse the web. They offer five levels of pricing that differ according to the amount of the content being compared.

PlagiarismCheck.org presents itself as a plagiarism checking tool. It is operated by a company based in the United Kingdom and it has been on the market since 2011. Since around 2017 they are focusing on the B2B market. They state that they have more than 77,000 users in 72 countries. They use MS Bing for online searches and for the English language. The representatives claim they are able to do synonym detection. They provide three levels of institutional licenses.

PlagScan presents itself as a plagiarism checker. It is operated by the German company PlagScan GmbH and was launched in 2009. They state that they have more than 1,500 organizations as customers. Although they focus on higher education, high schools, and businesses, PlagScan is also available for single users. They search the internet using MS Bing, published academic articles, their so-called “Plagiarism Prevention Pool”, and optionally a customer’s own database. PlagScan offers multiple pricing plans for each type of customer, as well as options for a free trial.

StrikePlagiarism.com presents itself as a plagiarism detection system, operated by the Polish company Plagiat.pl. It provides its services to over 500 universities in 20 countries. Apart from universities, it is also used by high schools and publishers. They state that they are market leaders in Poland, Romania, and Ukraine. In 2018, they signed a Memorandum of Cooperation with the Ukrainian Ministry of Education and Science. The software searches in multiple databases and aggregators.

Turnitin was founded in 1999 by four students and grew to be an internationally known company. In 2014, they acquired the Dutch system Ephorus and “ joined forces ” (Ephorus, 2015 ). In 2019 they themselves were taken over by a US investment company, Advance (Turnitin, 2019 ). With a focus on institutional users only, they are used by 15,000 institutions in 150 countries. Turnitin uses its own crawler to search the web including also an archive of all previously indexed web pages (Turnitin, n.d. ). Turnitin further compares the texts against published academic articles, as well as their own database of all assignments which have ever been submitted to the system, and optionally institutional databases. They are also developing many additional software tools for educators to use in teaching and giving feedback.

Unicheck , which declares itself to be a plagiarism checker, was launched in 2014 under the name Unplag. The brand name changed in 2017. It is operated by the company UKU Group ltd registered in Cyprus (Opencorporates, 2019 ). It is being used by 1,100 institutions in 69 countries, and apart from institutional users (high schools, higher education institutions, and business), they also offer their services for personal use. The pricing plans differ according to the type of user. Unicheck compares the documents with web content, open access sources and for business customers, also with their private library. They also claim to perform homoglyph (similar-looking character replacement) detection.

Urkund, which presents itself as a fully-automated text-recognition system for dealing with detection, handling, and prevention of plagiarism (Urkund, 2019 ), was founded in 1999. It is currently owned by a private equity fund, Procuritas Capital Investors VI, located in Stockholm. They claim to be a leader in the Nordic countries, and to have clients in 70 countries worldwide—mainly academic institutions and high schools, including over 800 of Sweden’s high schools. They crawl the web “ with the aid of an ordinary search engine ” (Urkund, 2019 ) and they also compare the documents with student submissions to the system.

Viper presents itself as a plagiarism checker. It was founded in 2007. Viper focuses on all types of customers; the pricing is based on the pay-as-you-go principle. Currently, it is owned by All Answers Limited ( 2019 ), which according to the information at the website, gives an impression of an essay mill. It is interesting to see the progress in the way Viper uses the uploaded content on their “Terms and conditions” page. In 2016 the page stated “[w] hen you scan a document, you agree that 9 months after completion of your scan, we will automatically upload your essay to our student essays database which will appear on one of our network of websites so that other students may use it to help them write their own essays” (Viper, 2016 ). The time span was shortened to 3 months some time afterwards (Viper, 2019a ). These paragraphs have been removed from the current version of the page (Viper, 2019b ). On a different page, it is noted that “when you scan your work for plagiarism using Viper Premium it will never be published on any of our study sites” (Viper, 2019c ). In e-mail communication, Viper claims that they are not using any essay without the author’s explicit consent.

Coverage results

This section discusses how much of the known text similarity was found by the systems. As they have various strengths and weaknesses, it is not possible to boil down the results to a single number that could easily be compared. Rather, the focus is on different aspects that will be discussed in detail. All tables in this section show the averages of the evaluation, therefore the maximum possible score is 5 and the minimum possible score is 0. Boldface indicates the maximum value achieved per each line, providing an answer to the question as to which system performed best for this specific criterion . All the values are shaded from red (worst) to dark green (best) with yellow being intermediate.

Language comparison

Table 2 shows the aggregated results of the language comparisons based on the language sets. It can be seen that most of the systems performed better for English, Italian, Spanish, and German, whereas the results for Latvian, Slovak, Czech, and Turkish languages are poorer in general. The only system which found a Czech student thesis from 2010 which is publicly available from a university webpage, was StrikePlagiarism.com.The Slovak paper in an open-access journal was not found by any of the systems. Urkund was the only system that found an open-access book in Turkish. It is worth noting that a Turkish system, intihal.net, did not find this Turkish source.

Unfortunately, our testing set did not contain documents in Albanian or Slovenian, so we were not able to evaluate the potential strengths of the national systems (Akademia and DPV). And due to the restrictions on our account, it was not possible for us to process the Italian language in Akademia, although that should now be possible.

There are interesting differences between the systems depending on the language. PlagScan performed best on the English set, Urkund on Spanish, Slovak, and Turkish, PlagAware on German and StrikePlagiarism.com on Czech set. Three systems (PlagiarismCheck.org, PlagScan, and StrikePlagiarism.com) achieved the same maximum score for the Italian set.

Besides the individual languages, we also evaluated language groups according to a standard linguistic classification, that is, Germanic (English and German), Romanic (Italian and Spanish), and Slavic (Czech and Slovak). Table 3 shows the results for these language subgroups. Systems achieved better results with Germanic and Romanic languages, their results are comparable. The results for Slavic languages are noticeably worse.

Types of plagiarism sources

This subsection discusses the differences between various types of sources with the results given in Table 4 . The testing set contained Wikipedia extracts, open-access papers, student theses, and online documents such as blog posts. The systems generally yielded the best results for Wikipedia sources. The scores between the systems vary due to their ability to detect paraphrased Wikipedia articles. Urkund scored the best for Wikipedia, Turnitin found the most open access papers, StrikePlagiarism.com scored by the best in the detection of student theses and PlagiarismCheck.org gave the best result for online articles.

Since it is assumed that Wikipedia is an important source for student papers, the Wikipedia results were examined in more detail. Table 5 summarizes the results from 3 × 8 single-source documents (one article per language) and Wikipedia scores from multi-source documents containing one-fifth of the text taken from the current version of Wikipedia. In general, most of the systems are able to find similarities to the text that has been copied and pasted from Wikipedia.

Over a decade ago, Bretag & Mahmud ( 2009 ), p. 53 wrote:

“The text-matching facility in electronic plagiarism detection software is only suited to detect ‘word-for-word’ or ‘direct’ plagiarism and then only in electronic form. The more subtle forms of plagiarism, plus all types of plagiarism from paper-based sources, are not able to be detected at present.”

Technological progress, especially in software development, advances rapidly. It is commonly expected that text-matching in the sense of finding both exact text matches and paraphrased ones should be a trivial task today. The testing results do not confirm this.

The results in Table 5 are quite surprising and indicate insufficient systems. The performance on plagiarism from Wikipedia disguised by a synonym replacement was generally poorer and almost no system was able to satisfyingly identify manual paraphrase plagiarism. This is surely due to both the immense number of potential sources and the exponential explosion of potential changes to a text.

Plagiarism methods

The same aggregation as was done in Table 5 for Wikipedia was also done over all 16 single-source and eight multi-source documents. Not only copy & paste, synonym replacement, and manual paraphrase were examined, but also translation plagiarism.

Translations were done from English to all languages, as well as from Slovak to Czech, from Czech to Slovak and from Russian to Latvian. The results are shown in Table 6 , which confirms that software performs worse on synonym replacement and manual paraphrase plagiarism.

As has been shown in other investigations (Weber-Wulff et al., 2013 ) translation plagiarism is very seldom picked up by software systems. The worst performance of the systems in this test was indeed the translation plagiarism, with one notable exception—Akademia. This system is the only one that performs semantic analysis and allows users to choose the translation language. Unfortunately, their database—with respect to the languages of our testing—is much smaller than the database of other systems. However, the performance drop between copy-paste and translation plagiarism is much smaller for Akademia than for the other systems.

Given the very poor performance of the systems for translation plagiarism, it did not make sense to distinguish Google Translate and manual translation. The vast majority of the systems did not find text translated by either of them.

Some systems found a match in translation from Slovak to Czech, nevertheless, it was due to the words identical in both languages. For other languages, the examination of the system outputs for translation plagiarism revealed that the only match the systems found in translated documents was for the references. Matches in the references might be an indicator of translation plagiarism, but of course, if two papers use the same source, they should be written in an identical form if they follow the same style guide. This is an important result for educators.

Single-source vs. multi-source documents

One scenario that was considered in the test was when a text is compiled from short passages taken from multiple sources. This seems to be much closer to a real-world setting, in which plagiarism of a whole document is less likely, whereas ‘patch-writing’ or ‘compilation’ is a frequent strategy of student writers, especially second-language student writers (Howard, 1999 , p. 117ff). Surprisingly, some systems performed differently for these two scenarios (see Table 7 ). To remove a bias caused by different types of sources, the Wikipedia-only portions were also examined in isolation (see Table 8 ), the results are consistent in both cases.

Usability results

The usability of the systems was evaluated using 23 objective criteria which were divided into three groups of criteria related to the system workflow process, the presentation of the results, and additional aspects. The points were assigned based on researcher findings during a specific period of the time.

Workflow process usability

The first criteria group is related to the usability of the workflow process of using the systems. It was evaluated using the following questions:

Is it possible to upload and test multiple documents at the same time?

Does the system ask to fill in metadata for documents?

Does the system use original file names for the report?

Is there any word limit for the document testing?

Does the system display text in the chosen language only?

Can the system process large documents (for example, a bachelor thesis)?

The results are summarized in Table 9 . With respect to the workflow process, five systems were assigned the highest score in this category. The scores of only five systems were equal to or less than 3. Moreover, the most supported features are the processing of large documents (13 systems), as well as displaying text in the chosen language and having no word limits (12 systems). Uploading multiple documents is a less supported feature, which is unfortunate, as it is very important for educational institutions to be able to test several documents at the same time.

Result presentation usability

The presentation and understandability of the results reported by the systems were evaluated in a second usability criteria group. Since the systems cannot determine plagiarism, the results must be examined by one or more persons in order to determine if plagiarism is present and a sanction warranted. It must be necessary to download the result reports and to be able to locate them again in the system. Some systems rename the documents, assigning internal numbering to them, which makes it extremely difficult to find the report again. Many systems have different formats for online and downloadable reports. It would be useful for the report review if the system kept the original formatting and page numbers of the document being analyzed in order to ease the load of evaluation.

It is assumed that the vast majority of the universities require that the evidence of found similarities be documented in the report so that they can be printed out for a student’s permanent record. This evidence is examined by other members of the committee, who may not have access to the system at a disciplinary hearing.

The results related to the presentation group are summarized in Table 10 and all criteria are listed below:

Reports are downloadable.

Results are saved in the user’s account and can be reviewed afterwards.

Matched passages are highlighted in the online report.

Matched passages are highlighted in the downloaded report (offline).

Evidence of similarity is demonstrated side-by-side with the source in the online report.

Evidence of similarity is demonstrated side-by-side with the source in the downloaded report.

Document formatting is not changed in the report.

Document page numbers are shown in the report.

The report is not spoiled by false positives.

None of the systems was able to get the highest score in the usability group related to the test results. Two systems (PlagScan and Urkund) support almost all features, but six systems support half or fewer features. The most supported features are the possibility to download result reports and highlighting matched passages in the online report. Less supported features are a side-by-side demonstration of evidence in the downloaded report and in the online report, as well as keeping document formatting.

Other usability aspects

Besides the workflow process and the result presentation, there are also other system usability aspects worth evaluating, specifically:

System costs are clearly stated in the system homepage.

Information about a free system trial version is advertised on the webpage.

The system can be integrated as an API to a learning management system.

The system can be integrated with the Moodle platform.

The system provides call support.

The call support is provided in English.

English is properly used on the website and reports.

There are no external advertisements.

In order to test the call support, telephone numbers were called from a German university telephone during normal European working hours (9:00–17:00 CET/GMT + 1). A checklist (Weber-Wulff, 2019 ) was used to guide the conversation if anyone answered the phone. English was used as the language of communication, even when calling German companies. For intihal.net, the call was not answered, but an hour later the call was returned. StrikePlagiarism.org did not speak English, but organized someone who did. He refused, however, to give information in English and insisted that email support be used. Plagiarism Software publishes a number in Saudia Arabia, but returned the call from a Pakistani number. PlagAware only has an answering machine taking calls, but will respond to emails. The woman answering the Turnitin number kept repeating that all information was found on the web pages and insisted that since this was not a customer calling, that the sales department be contacted. Each of these systems was awarded half a point. Akademia published a wrong number on their web page as far as could be discerned, as the person answering the phone only spoke a foreign language. In case we did not reach anyone via phone and thus we could not assess their ability to speak English, we assigned 0 points for this criterion.

As shown in Table 11 , only PlagiarismCheck.org and Unicheck fulfilled all criteria. Five systems were only able to support less than half of the defined features. The most supported features were no grammatical mistakes seen and no external advertisements. Problematic areas are not stating the system costs clearly, unclear possible integration with Moodle, and the lack of provision of call support in English.

In the majority of the previous research on testing of text-matching tools, the main focus has been on coverage. The problem with most of these studies is that they approach coverage from only one perspective. They only aim at measuring the overall coverage performance of the detection tools, whereas the present study approaches coverage from four perspectives: language-based coverage, language subgroup-based coverage, source-based coverage, and disguising technique-based coverage. This study also includes a usability evaluation.

It must be noted that both the coverage and the usability scores are based on work that was done with potentially older versions of the systems. Many companies have responded to say that they now are able to deal with various issues. This is good, but we can only report on what we saw when we evaluated the systems. If any part of the evaluation was to be repeated, it would have to be repeated for all systems. It should be noted that similar responses have come from vendors for all of Weber-Wulff’s tests, such as (Weber-Wulff et al., 2013 ).

It must be also noted that selection of usability criteria and their weights reflect personal experience of the project team. We are fully aware that different institutions may have different priorities. To mitigate this limitation, we have published all usability scores, allowing for calculations using individual weights.

Language-based coverage

With respect only to the language-based coverage, the performance of the tools for eight languages was evaluated in order to determine which tools yield the best results for each particular language. The results showed that best-performing tools with respect only to coverage are (three systems tied for Italian):

PlagAware for German,

PlagScan for English, Italian,

PlagiarismCheck.org for Italian, Latvian,

Strikeplagiarism.com for Czech, Italian and

Urkund for Slovak, Spanish, and Turkish.

It is worth noting that, in an overall sense, the text-matching tools tested yield better results for widely spoken languages. In the literature, language-based similarity detection mainly revolves around identifying plagiarism among documents in different languages. No study, to our knowledge, has been conducted specifically on the coverage of multiple languages. In this respect, these findings offer valuable insights to the readers. As for the language subgroups, the tested text-matching tools work best for Germanic languages and Romanic languages while results are not satisfactory for Slavic languages.

Source-based coverage testing

Source-based coverage testing was made using four types of sources; Wikipedia, open access papers, a student thesis and online articles. For many students, Wikipedia is the starting point for research (Howard & Davies, 2009 ), and thus can be regarded as one of the primary sources for plagiarists. Since a Wikipedia database is freely available, it is expected that Wikipedia texts should easily be identifiable. Testing the tools with Wikipedia texts demonstrates the fundamental ability to catch text matches.

Three articles per language were created, each of which was made using a different disguising technique (copy & paste, synonym replacement and manual paraphrase) for all eight languages. The best performing tools for the sources tested over all languages were

PlagiarismCheck.org for online articles,

StrikePlagiarism.com for the student thesis (although this may be because the student thesis was in Czech),

Turnitin for open-access papers and

Urkund for Wikipedia

Since Wikipedia is assumed to be a widely used source, it was worth investigating Wikipedia texts deeper. The results revealed that the majority of tools are successful at detecting similarity with copy & paste from Wikipedia texts, except for Intihal.net, DPV and Dupli Checker respectively. However, a considerable drop was observed in synonym replacement texts in all systems, except for Urkund, PlagiarismCheck.org and Turnitin. Unlike other systems, Urkund, PlagiarismCheck.org and Turnitin yielded promising results in synonym replacement texts. This replicates the result of the study of Weber-Wulff et al. ( 2013 ), in which Urkund and Turnitin were found to have the best results among 16 tools.

As for the paraphrased texts, all systems fell short in catching similarity at a satisfactory level. PlagiarismCheck.org was the best performing tool in paraphrased texts compiled from Wikipedia. Overall, Urkund was the best performing tool at catching similarity in Wikipedia texts created by all three disguising techniques.

One aspect of Wikipedia sources that is not adequately addressed by the text-matching software systems is the proliferation of Wikipedia copies on the internet. As discussed in Weber-Wulff et al. ( 2013 ), this can lead to the appearance of many smallish text matches instead of one large one. In particular, this can happen if the copy of the ever-changing Wikipedia in the database of the software system is relatively old and the copies on the internet are from newer versions. A careless teacher may draw false conclusions if they focus only on the quantity of Wikipedia similarities in the report.

Disguising technique-based coverage

The next dimension of coverage testing is disguising technique-based coverage . In this phase, documents were created using copy & paste, synonym replacement, paraphrase, and translation techniques. In copy & paste documents, all systems achieved acceptable results except DPV, intihal.net and Dupli Checker. Urkund was the best tool at catching similarity in copy & paste texts. The success of some of the tools tested in catching similarity in copy & paste texts has also been validated by other studies such as Turnitin (Bull et al., 2001 ; Kakkonen & Mozgovoy, 2010 ; Maurer et al., 2006 ; Vani & Gupta, 2016 ) and Docol©c (Maurer et al., 2006 ).

For synonym replacement texts, the best-performing tools from copy & paste texts continued their success with a slight decline in scores, except for PlagiarismCheck.org which yielded better results in synonym replacement texts than copy & paste texts. Plagiarism Software and Viper showed the sharpest decline in their scores for synonym replacement. Urkund and PlagiarismCheck.org were the best tools in this category.

For paraphrased texts, none of the systems was able to provide satisfactory results. However, PlagiarismCheck.org, Urkund, PlagScan and Turnitin scored somewhat better than the other systems. PlagScan (Křížková et al., 2016 ) and Turnitin (Bull et al., 2001 ) also scored well in paraphrased texts in some studies.

In translated texts, all the systems were unable to detect translation plagiarism, with the exception of Akademia. This system allows users an option to check for potential translation plagiarism. The systems detected translation plagiarism mainly in the references, not in the texts. This is similar to the previous research findings and has not been improved since then. For example, Turnitin and Docol©c have previously been shown not to be efficient in detecting translation plagiarism (Maurer et al., 2006 ). To increase the chances of detecting translation plagiarism, paying extra attention to the matches with the reference entries should be encouraged since matches from the same source can be a significant indicator of translation plagiarism. However, it should be noted that some systems may omit matches with the reference entries by default.

Multi-source coverage testing

In the last phase of coverage testing, we tested the ability of systems to detect similarity in the documents that are compiled from multiple sources. It is assumed that plagiarised articles contain text taken from multiple sources (Sorokina, Gehrke, Warner, & Ginsparg, 2006 ). This type of plagiarism requires additional effort to identify. If a system is able to find all similarity in documents which are compiled from multiple sources, this is a significant indicator of its coverage performance.

The multi-source results show that Urkund, the best performing system in single-source documents, shares the top score with PlagAware in multi-source documents, while Dupli Checker, DPV and intihal.net yielded very unsatisfactory results. Surprisingly, only the performance of two systems (Akademia and Unicheck) demonstrated a sharp decline in multi-source documents whereas the performance of ten systems actually improved for multi-source documents. This shows that the systems perform better in catching short fragments in a multi-source text rather than the whole document taken from a single source.

As for the general testing, the results are highly consistent with the Wikipedia results which contributes the validity of the single-source and multi-source testing. Again, in single-source documents, Urkund obtained the highest score, while PlagAware is the best performing system in multi-source documents. Dupli Checker, DPV and intihal.net obtained the least scores in both categories. Most of the systems demonstrated better performance for multi-source documents than for single-source ones. This is most probably explained by the chances the systems had for having access to a source. If one source was missing in the tool’s database, it had no chance to identify the text match. The use of multiple sources gave the tools multiple chances of identifying at least one of the sources. This points out quite clearly the issue of false negatives: even if a text-matching tool does not identify a source, the text can still be plagiarized.

Overall coverage performance

Based on the total coverage performance, calculated as an average of the scores for each testing document, we can divide the systems into four categories (sorted alphabetically within each category) based on their overall placement on a scale of 0 (worst) to 5 (best).

Useful systems - the overall score in [3.75 – 5.0]:

There were no systems in this category

Partially useful systems - the overall score in [2.5 – 3.75):

PlagAware, PlagScan, StrikePlagiarism.com, Turnitin, Urkund

Marginally useful systems - the overall score in [1.25 – 2.5):

Unsuited for academic institutions - the overall score in [0 – 1.25):

Dupli Checker, DPV, intihal.net

The second evaluation focus of the present study is on usability. The results can be interpreted in two ways, either in a system-based perspective or a feature-based one, since some users may prioritize a particular feature over others. For the system-based usability evaluation, Docol © c, DPV, PlagScan, Unicheck, and Urkund were able to meet all of the specified criteria. PlagiarismCheck.org, Turnitin, and Viper were missing only one criterion (PlagiarismCheck.org dropped the original file names and both Turnitin and Viper insisted on much metadata being filled in).

In the feature-based perspective, the ability to process large documents, no word limitations, and using only in the chosen language were the features most supported by the systems. Unfortunately, the uploading of multiple documents at the same time was the least supported feature. This is odd, because it is an essential feature for academic institutions.

A similar usability evaluation was conducted by Weber-Wulff et al. ( 2013 ). In this study, they created a 27-item usability checklist and evaluated the usability of 16 systems. Their checklist includes similar criteria of the present study such as storing reports, side-by-side views, or effective support service. The two studies have eight systems in common. In the study of Weber-Wulff et al. ( 2013 ), the top three systems were Turnitin, PlagAware, and StrikePlagiarism.com while in the present study Urkund, StrikePlagiarism, and Turnitin are the best scorers. Copyscape, Dupli Checker, and Docol © c were the worst scoring systems in both studies.

Another similar study (Bull et al., 2001 ) addressed the usability of five systems including Turnitin. For usability, the researchers set some criteria and evaluated the systems based on these criteria by assigning stars out of five. As a result of the evaluation, Turnitin was given five stars for the clarity of reports, five stars for user-friendliness, five stars for the layout of reports and four stars for easy-to-interpret criteria.

The similarity reports are the end products of the testing process and serve as crucial evidence for decision makers such as honour boards or disciplinary committees. Since affected students may decide to ask courts to evaluate the decision, it is necessary for there to be clear evidence, presented with the offending text and a potential source presented in a synoptic (side-by-side) style, and including metadata such as page numbers to ease verifiability. Thus, the similarity reports generated were the focus of the usability evaluation.

However, none of the systems managed to meet all of the stated criteria. PlagScan (no side-by-side layout in the offline report) and Urkund (did not keep the document formatting) scored seven out of eight points. They were closely followed by Turnitin and Unicheck which missed two criteria (no side-by-side layout in online or offline reports).

The features supported most were downloadable reports and some sort of highlighting of the text match in the online reports. Two systems, Dupli Checker and Copyscape, do not provide downloadable reports to the users. The side-by-side layout was the least supported feature. While four systems offer side-by-side evidence in their online reports, only one system (Urkund) supports this feature in the offline report. It can be argued that the side-by-side layout is an effective way to make a contrastive analysis in deciding whether a text match can be considered plagiarism or not, but this feature is not supported by most of the systems.

Along with the uploading process and the understandability of reports, we also aimed to address certain features that would be useful in academia. Eight criteria were included in this area:

clearly stated costs,

the offer of a free trial,

integration to an LMS (Learning Management System) via API,

Moodle integration (as this is a very popular LMS),

availability of support by telephone during normal European working hours (9–15),

availability of support by telephone in English,

proper English usage on the website and in the reports, and

no advertisements for other products or companies.

The qualitative analysis in this area showed that only PlagiarismCheck.org and Unicheck were able to achieve a top score. PlagScan scored seven points out of eight and was followed by PlagAware (6.5 points), StrikePlagiarism.com (6.5 points), Docol © c and Urkund (6 points). Akademia (2 points), DPV (2 points), Dupli Checker (3 points), intihal.net (3 points) and Viper (3 points) did not obtain satisfactory results.

Proper English usage was the most supported feature in this category, followed by no external advertisements. The least supported feature was clearly stated system costs, only six systems fulfilled this criterion. While it is understandable that a company wants to be able to charge as much as they can get from a customer, it is in the interests of the customer to be able to compare the total cost of use per year up front before diving into extensive tests.

In order to calculate the overall usability score, the categories were ranked based on their impact on usability. In this respect, the interpretation of the reports was considered to have the most impact on usability, since similarity reports can be highly misleading (also noted by Razı, 2015 ) when they are not clear enough or have inadequate features. Thus, the scores from this category were weighted threefold. The workflow process criteria were weighted twofold and the other criteria were weighted by one. The maximum weighted score was thus 47. Based on these numbers, we classified the systems into three categories (the boundaries for these categories were 35, 23, and 11:

Partially useful systems: DPV, PlagAware, PlagiarismCheck.org, StrikePlagiarism.com, Viper;

Marginally useful systems: Akademia, Dupli Checker, Copyscape, intihal.net, Plagiarism Software.

Unsuited for academic institutions: -

Please note that these categories are quite subjective, as our evaluation criteria are subjective and the weightings as well. For other use cases, the criteria might be different.

Combined coverage

If the results for coverage and usability are combined on a two-dimensional graph, Fig. 1 emerges. In this section, the details of the coverage and usability are discussed.

Coverage and Usability combined. X-axis: Total score for coverage; Y-axis: Total weighted score for usability

Coverage is the primary limitation of a web-based text-matching tool (McKeever, 2006 ) and the usability of such a system has a decisive influence on the system users (Liu, Lo, & Wang, 2013 ). Therefore, Fig. 1 presents a clear portrayal of the overall effectiveness of the systems. Having determined their criteria related to the coverage and usability of a web-based text-matching tool, clients can decide which system works best in their settings. Vendors are given an idea about the overall effectiveness of their systems among the tools tested. This diagram presents an initial blueprint for vendors to improve their systems and the direction of improvement.

One important result that can be seen in this diagram is that the usability performance of the systems is relatively better than their coverage performance (see Fig. 1 ). As for coverage, the systems demonstrated at best only average performance. Thus, it has been shown that the systems tested fall short in meeting the coverage expectations. They are useful in the sense that they find some text similarity that can be considered plagiarism, but they do not find all such text similarity and they also suffer from false positives.

Conclusions and recommendations

This study is the output of an intensive two-year collaboration and systematic effort of scholars from a number of different European countries. Despite the lack of external funding, the enthusiasm-driven team performed a comprehensive test of web-based text-matching tools with the aim to offer valuable insights to academia, policymakers, users, and vendors. Our results reflect the state of the art of text-matching tools between November 2018 and November 2019. Testing of the text-matching tools is not a new endeavour, however, previous studies generally have fallen short in providing satisfactory results. This study tries to overcome the problems and shortcomings of previous efforts. It compares 15 tools using two main criteria (coverage and usability), analyzing testing documents in eight languages, compiled from several sources, and using various disguising techniques.

A summary of the most important findings includes the following points:

Some systems work better for a particular language or language family. Coverage of sources written in major languages (English, German, and Spanish) is in general much better than coverage of minor language sources (Czech or Slovak).

The systems’ performance varies according to the source of the plagiarized text. For instance, most systems are good at finding similarity to current Wikipedia texts, but not as good for open access papers, theses, or online articles.

The performance of the systems is also different depending on the disguising technique used. The performance is only partially satisfactory in synonym replacement and quite unsatisfactory for paraphrased and translated texts. Considering that patchwriting, which includes synonym replacement and sentence re-arranging, is a common technique used by students, vendors should work to improve this shortcoming.

The systems appear to be better at catching similarity in multi-source documents than single-source ones, although the test material was presented in blocks and not mixed on a sentence-by-sentence level.

As for the usability perspective, this study clearly shows how important the similarity reports and how user-friendly the testing process of the systems are. The users can see which features are supported by the systems and which are not. Also, vendors can benchmark their features with other systems.

Based on our results, we offer the following recommendations for the improvement of the systems, although we realize that some of these are computationally impossible:

Detect more types of plagiarism, particularly those coming from synonym replacement, translation, or paraphrase. Some semantic analysis results look promising, although their use will increase the amount of time needed for evaluating a system.

Clearly identify the source location in the report, do not just report “Internet source” or “Wikipedia”, but specify the exact URL and date stored so that an independent comparison can be done.

Clearly identify the original sources of plagiarism when a text has been found similar to a number of different sources. For example, if Wikipedia and another page that has copied or used text from Wikipedia turn up as potential sources, the system should show both as possible sources of plagiarism, prioritizing showing Wikipedia first because it is more likely to be the real source of plagiarism. Once Wikipedia has been determined as a potential source, this particular article should be closely compared to see if there is more from this source.

Avoid asking users to enter metadata (for example, author, title, and/or subject) in the system along with the text or file as mandatory information. It is good to have this feature available, but it should not be mandatory.

Lose the single number that purports to identify the amount of similarity. It does not, and it is misused by institutions as a decision maker. Plagiarism is multi-dimensional and must be judged by an expert, not a machine. For example, a system could report the number of word match sequences found, the longest one, the average length of sequences, the number of apparent synonym substitutions, etc.

Design useful reports and documentation. They must be readable and understandable both online and printed as a PDF. Special care should be taken with printed forms, as they will become part of a student’s permanent record. It must show users the suspected text match side-by-side with the possible sources of plagiarism, highlighting the text that appears similar.

Distinguish false positives from real plagiarism. Many of these false positives occur due to commonly used phrases within the context or language employed, or ignoring variant quotation styles (German or French quotation marks or indentation).

A number of important points for educators need to be emphasized:

Despite the systems being able to find a good bit of text overlap, they do not determine plagiarism . There is a prevalent misconception about these tools. In the literature, most of the studies use the term ‘plagiarism detection tools’. However, plagiarism and similarity are very different concepts. What these tools promise is to find overlapping texts in the document examined. Overlapping texts do not always indicate plagiarism, thus the decision about whether plagiarism is present or not should never be taken on the basis of a similarity percentage. The similarity reports of these tools must be inspected by an experienced human being, such as a teacher or an academic, because all systems suffer from false positives (correctly quoted material counted as similarity) and false negatives (potential sources were not found).

Translation plagiarism can sometimes be found by a number of matches in references.

Another problem related to these tools is the risk of their possible cooperation with essay mills; this is because technically a company can store uploaded documents and share them with third parties. In the ‘Terms and Conditions’ sections of some tools, this notion is clearly stated. Uploading documents to such websites can cause a violation of ethics and laws, and teachers may end up with legal consequences. Thus, users must be skeptical about the credibility of the tools before uploading any documents to retrieve a similarity report.

It is necessary to obtain the legal consent of students before uploading their work to third parties. Since this legal situation can be different from country to country or even from university to university, make sure that the relevant norms are being respected before using such systems.

Because of European data privacy laws, for higher education institutions in the EU it must be certain that the companies are only using servers in the EU if they are storing material.

Teachers must make sure that they do not violate a non-disclosure agreement by uploading student work to the text-matching software.

Detecting plagiarism happens far too late in the writing process. It is necessary to institute institution-wide efforts to prevent academic misconduct and to develop a culture of excellence and academic integrity. This encourages genuine learning and shows how academic communication can be done right, instead of focusing on policing and sanctioning.

Considering both the number of participating systems and the number of testing documents and language variety, this paper describes the largest testing which has ever been conducted. We hope the results will be useful both for educators and for policymakers who decide which system to use at their institution. We plan to repeat the test in 3 years to see if any improvements can be seen.

Acknowledgements

We are deeply indebted to the contributions made to this investigation by the following persons:

● Gökhan Koyuncu and Nil Duman from the Canakkale Onsekiz Mart University (Turkey) uploaded many of the test documents to the various systems;

● Jan Mudra from Mendel University in Brno (Czechia) contributed to the usability testing, and performed the testing of the Czech language set;

● Caitlin Lim from the University of Konstanz (Germany) contributed to the literature review;

● Pavel Turčínek from Mendel University in Brno (Czechia) prepared the Czech language set;

● Esra Şimşek from the Canakkale Onsekiz Mart University (Turkey), helped in preparing the English language set;

● Maira Chiera from University of Calabria (Italy) prepared the Italian language set;

● Styliani Kleanthous Loizou from the University of Nicosia (Cyprus) contributed to the methodology design;

● We wish to especially thank the software companies that provided us access to their systems free of charge and patiently extended our access as the testing took much more time than originally anticipated.

● We also wish to thank the companies that sent us feedback on an earlier version of this report. We are not able to respond to every issue raised, but are grateful for them pointing out areas that were not clear.

Additional information

A pre-print of this paper has been published on Arxiv: http://arxiv.org/abs/2002.04279

This research did not receive any external funding. HTW Berlin provided funding for openly publishing the data and materials.

Author information

Authors and affiliations.

Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, 613 00, Brno, Czechia

Tomáš Foltýnek & Dita Dlabolová

University of Wuppertal, Wuppertal, Germany

Tomáš Foltýnek

Riga Technical University, Riga, Latvia

Alla Anohina-Naumeca & Laima Kamzola

Canakkale Onsekiz Mart University, Çanakkale, Turkey

Slovak Centre for Scientific and Technical Information, Bratislava, Slovakia

Július Kravjar

Universidad de Monterrey, Mexico, Mexico

Jean Guerrero-Dib

Balikesir University, Balikesir, Turkey

Özgür Çelik

University of Applied Sciences HTW Berlin, Berlin, Germany

Debora Weber-Wulff

You can also search for this author in PubMed Google Scholar

Contributions

TF managed the project and performed the overall coverage evaluation. DD communicated with the companies that are providing the systems. AAN and LK wrote the survey of related work. SR and ÖÇ wrote the discussion and conclusion. LK and DWW performed the usability evaluation. DWW designed the methodology, made all the phone calls, and improved the language of the final paper. All authors meticulously evaluated the similarity reports of the systems and contributed to the whole project. All authors read and approved the final manuscript. The contributions of others who are not authors are listed in the acknowledgements.

Corresponding author

Correspondence to Tomáš Foltýnek .

Ethics declarations

Competing interests.

Several authors are involved in organization of the regular conferences Plagiarism across Europe and Beyond , which receive funding from Turnitin, Urkund, PlagScan and StrikePlagiarism.com . One team member received “Turnitin Global Innovation Awards” in 2015. These facts did not influence the research in any phase.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These are the main contact URLs for the 15 systems evaluated.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Foltýnek, T., Dlabolová, D., Anohina-Naumeca, A. et al. Testing of support tools for plagiarism detection. Int J Educ Technol High Educ 17 , 46 (2020). https://doi.org/10.1186/s41239-020-00192-4

Download citation

Received : 13 February 2020

Accepted : 16 March 2020

Published : 27 July 2020

DOI : https://doi.org/10.1186/s41239-020-00192-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Text-matching software
Software testing
Plagiarism detection tools
Usability testing

Soch with Coach
Straight Up

IISER Pune: Plagiarising of thesis plus alleged sexual, mental harassment — the ordeal of this PhD scholar

(The story was originally published on February 28, 2023)

A PhD scholar from IISER Pune attempted suicide in 2022 due to alleged plagiarism of her thesis. Her sister with whom she presently resides, claims that her health is still delicate, and alleges that despite this, the institute has not taken any action. Last month, the All India Research Scholars' Association (AIRSA) wrote to Rakesh Ranjan, Additional Secretary (TE) and Research Coordination, IISERs, seeking an appointment to discuss the scholar's case.

"We would like to bring your kind attention to the matter of harassment (of all sorts including sexual harassment) and atrocities done with a brilliant female student at IISER, Pune (running under the Ministry of Education, Govt of India), forcing the innocent female scholar to attempt suicide," the letter states.

"Multiple representations were sent to the Ministry of Education, Govt of India including the Hon’ble Minister of Education in the last six months beginning September 2022 onwards to save her life, career, and dignity but nothing was done. The authorities starting from the institution up to the highest level are protecting the culprits through all means," the document alleges further.

Read Also : IIIT Allahabad: PhD scholar alleges harassment, discrimination at the hands of faculty

Dr Lal Chandra Vishwakarma, President of AIRSA said, "We have not received any reply to our communication yet. But we want the government to take note. There are thousands of similar harassment incidents happening in a majority of institutions. Only a handful of them is reported because most of the scholars are afraid of the authorities."

He added that AIRSA has been requesting the government to employ some mechanism in the higher education institutions to keep such incidents in check. On February 17, the organisation held a nationwide protest over various issues, the chief among which were a hike in stipends and harassment of scholars in institutes. And the IISER Pune case was referred to in their press release.

The narrative

The PhD scholar's, on the condition of anonymity, says that her sister was pursuing an integrated PhD in Mathematics under Dr Anindya Goswami at IISER Pune. She has been living with her since the outbreak of COVID in 2020. However, the scholar was presenting her thesis from home to Goswami through online meetings, which were also attended by an MSc student Shristi Gupta. When Shristi presented her MSc thesis, the survivor's sister alleges that she had copied her sister's work entirely.

Dr Goswami had previously made sexual advances at the scholar but she had refused, she alleged in her statement to the Deputy Commissioner of Police (DCP), adding that the consequence was the plagiarised thesis. She also mentioned to the institute several times that she wasn't feeling safe with her supervisor. "Just because my sister refused to compromise her modesty, she was treated in this way. It is highly unfair," she stated. However, fearing that Dr Goswami would damage her career if she complained, the scholar started working on a fresh thesis, which was ready to be published by July 2022.

"However, Gupta's name was once more put out as the author of this thesis. When questioned by my sister, Dr Goswami stated that he wanted to launch Gupta academically. Her request for a new supervisor was also denied by the institute. Unable to take it in, my sister attempted suicide," alleges a tearful sister of the scholar. She lodged an FIR and also mailed a detailed complaint to the director of IISER Pune. But she further alleges that no action was taken. The scholar's degrees are pending at the institute, she said.

IISER Pune's reply

When questioned by EdexLive , IISER Pune denied all the claims. An official from the institute, who wishes to remain anonymous, shared IISER Pune's statement on the incident, which was published on January 12 this year. "Despite the best efforts of the institute towards redressal, it is unfortunate that the student and her family could not be convinced of the institute’s response and have chosen to file an FIR against the supervisor and the IISER Pune administration," the statement reads.

According to the document, after receipt of the complaint, "The institute's academic ethics committee conducted an inquiry into the complaint. The committee found the PhD and master's research work to be collaborative in nature and have not found evidence for plagiarism as alleged. The committee found that the supervisor did not ensure proper sharing of credit between two collaborating PhD and master's students, and that the master's thesis did not acknowledge the PhD student's academic contribution."

"Thus, commensurate with the lapse, the supervisor has been disallowed from admitting any new thesis students in his research group for a year, and the Master's thesis has been asked to be modified in order to acknowledge the PhD student's academic contribution," the statement reads further. It also notes that an independent faculty member was assigned the work of collaborating between the scholar and Dr Goswami. The scholar was additionally offered counsellor support.

But the sister claims that these measures were inadequate. The scholar did not require counselling nor was the published thesis a collaborative work. Meanwhile, asked about the sexual harassment complaints, the official from IISER Pune stated that the institute had only received a complaint of plagiarism.

Plagiarism Across Languages and Cultures: A (Forensic) Linguistic Analysis

Living reference work entry
First Online: 26 December 2018
Cite this living reference work entry

Rui Sousa-Silva 4

317 Accesses

1 Citations

3 Altmetric

A considerably high volume of research into plagiarism has been conducted in recent years, most of which focused on educational approaches. Other studies, however, attempted to establish, especially from a forensic linguistic perspective, the extent to which linguistic analyses like the ones used in forensic contexts could help determine the degree of plagiarism in written assignments. However, most of these focused on the role of the linguist as a forensic consultant and/or expert dealing especially with attorneys and being involved in court cases, and rarely, if ever, have they applied linguistic research into academic plagiarism. Indeed, plagiarism analysis has traditionally focused on determining the uniqueness of a suspect text, while disregarding important cross-cultural circumstances. This chapter discusses plagiarism as a cross-cultural/cross-linguistic phenomenon. It examines the perceptions of higher education students and lecturers/tutors in two different countries in order to assess, firstly, whether speakers from different countries share the same concept of plagiarism, or on the contrary whether they have different perceptions. Secondly, based on these perceptions, it is asked whether a distinction needs to be made between judgments of intentional and unintentional instances of plagiarism. Thirdly, this chapter discusses the potential role of the linguist in demonstrating the alleged plagiarist’s intention, and the corresponding ethical implications. The chapter ends by arguing that a cross-cultural analysis, combined with an understanding of the legal context, is crucial in detecting and analyzing plagiarism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Al-Marashi, I. (2002). Iraq’s security and intelligence network: A guide and analysis. Middle East Review of International Affairs, 6 (3).

Google Scholar

Angèlil-Carter, S. (2000). Stolen language?: Plagiarism in writing . Harlow: Longman.

Ascensão, J. d. O. (1992). Direitos de Autor e Direitos Conexos . Coimbra: Coimbra Editora.

Cohen, S. (1972). Folk devils and moral panics . Oxon/New York: Routledge.

Coulthard, M. (2004). Author identification, idiolect and linguistic uniqueness. Applied Linguistics, 25 (4), 431–447.

Article Google Scholar

Coulthard, M., & Johnson, A. (2007). An introduction to forensic linguistics: Language in evidence . London/New York: Routledge.

Book Google Scholar

Coulthard, M., Johnson, A., Kredens, K., & Woolls, D. (2010). Four forensic linguists’ responses to suspected plagiarism. In M. Coulthard & A. Johnson (Eds.), The Routledge handbook of forensic linguistics (pp. 523–538). Milton Park, Abingdon/New York: Routledge.

Chapter Google Scholar

Eiras, H., & Fortes, G. (2010). Dicion{á}rio de Direito Penal e Processo Penal . Lisboa: Quid Juris.

Finnis, J. (1991). Intention and side-effects. In R. G. Frey & C. W. Morris (Eds.), Liability and responsibility: Essays in law and morals (pp. 32–64). Cambridge: Cambridge University Press.

Garner, B. A. (2009). Black’s law dictionary (9th ed.). St. Paul: West.

Glendinning, I. (2014). Impact of policies for plagiarism in higher education across Europe – Results of the Project . Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, 63 (1):207–216

Goldstein, P. (2003). Copyright’s highway: From Gutenberg to the celestial jukebox . Stanford: Stanford University Press.

Howard, R. M. (1995). Plagiarisms, authorships, and the academic death penalty. College English, 57 (7), 788–806. https://doi.org/10.2307/378403 .

Howard, R. M., & Robillard, A. E. (2008). Plagiarisms. In R. M. Howard & A. E. Robillard (Eds.), Pluralizing plagiarism: Identities, contexts, pedagogies (pp. 1–7). Portsmouth: Boynton/Cook.

Jameson, D. A. (1993). The ethics of plagiarism: How genre affects writers’ use of source materials. Bulletin of the Association for Business Communication, 56 (2), 18.

Johnson, A. (1997). Textual kidnapping – A case of plagarism among three student texts? The International Journal of Speech, Language and the Law, 4 (2), 210–225.

Kress, G. (2000). Multimodality. In Multiliteracies: Literacy learning and the design of social futures . London and New York: Routledge.

Kress, G., & van Leeuwen, T. (2006). Reading images the grammar of visual design . Oxon and New York: Routledge. https://doi.org/10.1017/CBO9781107415324.004 .

Mota-Ribeiro, S., & Pinto-Coelho, Z. (2011). Para além da superfície visual: os anúncios publicitários vistos à luz da semiótica social. Representações e discursos da heterossexualidade e de género. Comunicação e Sociedade, 19 , 227–246.

Partridge, L., & West, J. (2003). Plagiarism: Perceptions, and occurrence amongst transnational postgraduate students in the Graduate School of Education. In H. Marsden, M. Hicks, & A. Bundy (Eds.), Educational integrity: Plagiarism and other perplexities, Proceedings of the 1st Australasian Integrity Conference, 21–22 November (pp. 149–154). Adelaide: University of South Australia.

Pecorari, D. (2008). Academic writing and plagiarism: A linguistic analysis . London: Continuum.

Scollon, R. (1994). As a matter of fact: The changing ideology of authorship and responsibility in discourse. World Englishes, 13 (1), 33–46.

Scollon, R. (1995). Plagiarism and ideology: Identity in intercultural discourse. Language in Society, 24 , 1–28.

Sousa-Silva, R. (2012). Legitimated plagiarism: An investigation of textual borrowing in official documents. In A. A. C. Teixeira (Ed.), Interdisciplinary insights on fraud and corruption – 1st OBEGEF conference booklet , Porto: Universidade do Porto.

Sousa-Silva, R. (2013). Detecting plagiarism in the forensic linguistics turn . Unpublished PhD thesis. Birmingham: Aston University.

Sousa-Silva, R. (2014). Detecting translingual plagiarism and the backlash against translation plagiarists. Language and Law/Linguagem e Direito, 1 (1), 70–94.

Sutherland-Smith, W. (2005). Pandora’s box: Academic perceptions of student plagiarism in writing. Journal of English for Academic Purposes, 4 (1), 83–95. https://doi.org/10.1016/j.jeap.2004.07.007 .

Turell, M. T. (2004). Textual kidnapping revisited: The case of plagarism in literary translation. The International Journal of Speech, Language and the Law, 11 (1), 1–26.

Turell, M. T. (2007). Plagio y traducci{ó}n literaria. Vasos Comunicantes, 37 (1), 43–54.

Turell, M. T. (2008). Plagiarism. In J. Gibbons & M. T. Turell (Eds.), Dimensions of forensic linguistics (Vol. 9, pp. 265–299). Oxford: John Benjamins.

Turell, M. T. (2013). Presidential address. In Proceedings of the 3rd European conference of the International Association of Forensic Linguists on the theme of “Bridging the gaps between language and the law” . Porto: Universidade do Porto – Faculdade de Letras.

Williams, K., & Carroll, J. (2009). Referencing and understanding plagiarism . Basingstoke: Palgrave Macmillan.

Woolls, D. (2003). Better tools for the trade and how to use them. Forensic Linguistics, 10 (1), 102–112. https://doi.org/10.1558/sll.2003.10.1.102 .

Woolls, D., & Coulthard, M. (1998). Tools for the trade. International Journal of Speech, Language and the Law, 5 (1), 33–57. http://www.equinoxjournals.com/ojs/index.php/IJSLL/article/view/508/3884 .

Download references

Acknowledgments

This work was partially supported by Grant SFRH/BD/47890/2008 and SFRH/BPD/100425/2014 FCT-Fundação para a Ciência e Tecnologia, Portugal, co-financed by POPH/FSE.

Author information

Authors and affiliations.

Universidade do Porto – Faculdade de Letras/CLUP, Porto, Portugal

Rui Sousa-Silva

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Sousa-Silva .

Editor information

Editors and affiliations.

Department of Geography, University of Kentucky Department of Geography, Lexington, KY, USA

Stanley D Brunn

Deutscher Sprachatlas, Marburg University, Marburg, Hessen, Germany

Roland Kehrein

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry.

Sousa-Silva, R. (2019). Plagiarism Across Languages and Cultures: A (Forensic) Linguistic Analysis. In: Brunn, S., Kehrein, R. (eds) Handbook of the Changing World Language Map. Springer, Cham. https://doi.org/10.1007/978-3-319-73400-2_191-1

Download citation

DOI : https://doi.org/10.1007/978-3-319-73400-2_191-1

Received : 27 September 2018

Accepted : 27 September 2018

Published : 26 December 2018

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-73400-2

Online ISBN : 978-3-319-73400-2

eBook Packages : Springer Reference Earth and Environm. Science Reference Module Physical and Materials Science Reference Module Earth and Environmental Sciences

Publish with us

Policies and ethics

Find a journal
Track your research

IMAGES

Translation plagiarism: Human-based translation case study
10 Types of Plagiarism
Is it removing Plagiarism a big deal? Learn How to remove Plagiarism
Thesis Writing: How to Avoid Plagiarism
Thesis on Plagiarism
How to remove plagiarism from Any Thesis

VIDEO

Motivations and methods in translation research
Paraphrasing of Research Paper, Thesis, Publication in 699 Rs only
What are the penalties for plagiarising at Imperial College London?
How to reduce plagiarism 100% From thesis and Research paper || Remove plagiarism from Article
Using the Turnitin Similarity Report (for instructors who previously used PlagScan)
Artificial Intelligence

COMMENTS

Translation Plagiarism
The extent of translation plagiarism in the body of published research literature is difficult to assess. Plagiarism in all its varieties is commonly listed with falsification and fabrication to constitute the three major classes of research misconduct, and together the three are the primary reasons why journals issue retractions (Marcus and Oransky 2017; Teixeira da Silva and Dobránszki 2017).
academic writing
Further, when you submit your manuscript, some institutions, by default (mine is doing it), use commercial plagiarism detection software. If I created this type of software, I would include features to detect translations and plagiarism. Second question: Please see the previous answer. Third question: In my thesis there is a load of equations ...
translation
Using Google Translate or another machine translation engine is absolutely plagiarism. Translating a text from one language to another requires work, mental energy, and one is relying on an algorithm to do it (rather than doing it oneself). If one submits a text translated by a machine without citing that machine, it is work being passed off as ...
Can Plagiarism Checkers Detect Translated Text?
The effectiveness of plagiarism checkers can vary depending on how advanced the detection method is, and the specific techniques used in the translation. If the translation is significantly different from the original text or if it includes substantial changes and additions, it may not always trigger plagiarism detection. How to Avoid Plagiarism
Is it plagiarism to translate a research paper and use it without
A good translation is not a mechanical process but constitutes a creative work in itself and therefore is considered a work with its own rights (at least in German copyright law). Again, this doesn't change the fact that you have to obtain permission of the copyright holder of the original work for publishing your translation.
PDF ON TRANSLATED PLAGIARISM IN ACADEMIC DISCOURSE
translated plagiarism, as plagiarism in be translation accorded academic interest is that of plagiarism (Sousa- Silva far has focused on textual plagiarism cross-language and based plagiarism, on ...
Strategies to help university students avoid plagiarism: a focus on
2.3 Translation as an intervention strategy Although this study does not look at translation per se, it does focus on translation from the original English source text to the home language and back to English (known as back translation) as an interim strategy to improve comprehension and paraphrasing of the text and thereby avoid plagiarism ...
Statement on Plagiarism in Translation
Please read the following statement carefully, to make sure you understand what constitutes plagiarism in a translation assignment. You may find it difficult to distinguish between your own translation and that of other translators. Plagiarism is often the result of ignorance rather than of an intent to cheat; once you know what the rules are ...
PDF Plagiarism Across Languages and Cultures: A (Forensic ...
violation should be a type of plagiarism or, on the contrary, of academic dishonesty or misconduct. Another terminological imprecision is the one involving the concept of"plagia-rized. Usually, both experts and laypeople refer to the copy as the " "plagiarized " material.
Translation Plagiarism: a Modern Day Concern
2.6. White Smoke. Translation plagiarism is undoubtedly a modern-day concern that might also cause trouble for you if you don't act fast. In this article, we will tell you all about this modern phenomenon and how you can check and remove it. The concept of copying a translation might be new for most of you, and in this post we go into all the ...
Is it self-plagiarism if I publish a translated version of my ...
If I have an article published in one language, but I want to publish it in another language, what should I do to avoid plagiarism? If I translate my paper by myself and publish it in a journal in a different language, then, is it plagiarism? If I want to avoid plagiarism, but still want to communicate my research in a different language, what should I do?
Paraphrasing tools, language translation tools and plagiarism: an
In a recent unit of study in an undergraduate Health Sciences pathway course, we identified a set of essays which exhibited similarity of content but demonstrated the use of bizarre and unidiomatic language. One of the distinct features of the essays was the inclusion of unusual synonyms in place of expected standard medical terminology.We suspected the use of online paraphrasing tools, but ...
plagiarism
Your student worked hard and produced an excellent thesis; Your student plagiarised a thesis from another language; In either case, this warrants extra attention from you. The first step would be to discuss the work with him further. If he was able to discuss said work intelligently, this would be an indication that he did write the thesis.
Translation = plagiarism?
Beware of plagiarism by translation. When searching for a thesis, dissertation or assignment, information may come from different languages. Beware of plagiarism by translation. NEW : AI detection for teachers You are? Teachers Students Writers ...
A Primer on Plagiarism: Resources for Educators in China
Translation plagiarism . Similar to direct plagiarism, another form of plagiarism is to translate novel data or ideas from one language to another, representing it as unique and one's own creation without crediting the original work. For example, one co-author of this article, together with his colleagues, published a Chinese book based on a ...
Reassessing Academic Plagiarism
It also encompasses translation plagiarism, ... PhD thesis, London School of Economics and Political Science. Kruse, K. M. (2000). White flight: Resistance to desegregation of neighborhoods, schools and businesses in Atlanta, 1946-1966 (PhD thesis, 2000). Ithaca, New York: Cornell University.
Testing of support tools for plagiarism detection
He uploaded his senior thesis and a mini-essay made up of randomly selected sentences from four well-known authors with some slight paraphrasing to the systems. He found problems with properly quoted material and the inability to find many plagiarized sentences in the mini-essay. ... Translation plagiarism can sometimes be found by a number of ...
PDF PLAGIARISM
Presenting text, digital work (e.g. computer code or programs), music, video recordings or images copied with only minor. changes from sources such as the internet, books, journals or any other media, without due acknowledgement; (...)". (Collery, 2020, emphasis added by Thorsten Beck)
Translation Plagiarism: burning issue in modern plagiarism ...
1. Translation plagiarism is relatively a new term. It means that individuals copy the works of others as their own. Even when you take the written content and translate it from one language into ...
Text Recycling / Self-Plagiarism in NPS Theses and Dissertations
Yes. Consider asking your advisor about placing your thesis on "hold" until the status of your submitted paper is determined. Holds are requested on the Thesis Release and Approval Form. If the journal accepts your article after your thesis has been published, cite . your thesis. in the article. Talk to the publisher about author
Plagiarism Detection: Methodological Approaches
Plagiarism detection is an area of expertise of forensic linguistics that investigates suspicious text similarity. The expert linguist examines texts to gather evidence as to the relationship of dependence or independence between the suspicious pair of texts (Butters, 2008, 2012; Coulthard et al., 2010; Guillén-Nieto, 2020b; Sousa-Silva, 2014, 2015; Turell, 2004, 2008; Woolls, 2010, 2012).
IISER Pune PhD Scholar Alleges Plagiarism and Harassment: A Detailed
22 Apr 2024, 8:18 am. (The story was originally published on February 28, 2023) A PhD scholar from IISER Pune attempted suicide in 2022 due to alleged plagiarism of her thesis. Her sister with whom she presently resides, claims that her health is still delicate, and alleges that despite this, the institute has not taken any action.
Plagiarism Across Languages and Cultures: A (Forensic ...
New plagiarism cases hit the news on a daily basis; hence, as previously stated (Sousa-Silva 2013, p. 56), the media plays a dual role as "carrier" and "producer" of moral panics, as argued by Cohen . As more plagiarism cases are reported in the news, fear of others' plagiarizing increases, and so do the number of cases reported.

Can Plagiarism Checkers Detect Translated Text?

What Is Cross-Language Plagiarism?

What Is a Plagiarism Checker?

Find this useful?

How to Avoid Plagiarism

Share this article:

Got content that needs a quick turnaround? Let us polish your work. Explore our editorial business services.

How to Cite the CDC in APA

Six Product Description Generator Tools for Your Product Copy

What Is a Content Editor?

The Benefits of Using an Online Proofreading Service

6 Online AI Presentation Maker Tools

What Is Market Research?

Make sure your writing is the best it can be with our expert English proofreading and editing.

Strategies to help university students avoid plagiarism: a focus on translation as an intervention strategy

Related Papers

RELATED PAPERS

RELATED TOPICS

Search: {{$root.lsaSearchQuery.q}}, Page {{$root.page}}

Translation Plagiarism: A Modern Day Concern

Paraphrasing tools, language translation tools and plagiarism: an exploratory study

Introduction

Expected medical terminology

Method of analysis

Comparing online language translation and paraphrasing tools

Abbreviations

Availability of data and materials

Author information

Contributions

Corresponding author

Ethics declarations

Competing interests

Publisher’s Note

Scenario for assessment task

Rights and permissions

About this article

Share this article

International Journal for Educational Integrity

Translation = plagiarism?

1. In which language should I search on the web?

2. Are you free in the translation?

3. Why cite others’ sources?

Real and concrete information

Expert position

Quiet conscience

Specific questions during defense

Tribute to the original author

A Primer on Plagiarism: Resources for Educators in China

Laura K. Borkenhagen

Nancy S. Sung

Shenglan Tang

Introduction

Types of Plagiarism

Direct or word plagiarism

Idea plagiarism

Self-plagiarism

Translation plagiarism

Source plagiarism

Ghost and guest writing

Who is Plagiarizing and Why?

Why is Plagiarism Harmful?

Strategies for Reducing Plagiarism

University and institution policies

Academic journal policies

Contributor Information

Testing of support tools for plagiarism detection

Introduction

Survey of related work

Methodology

Testing documents

Testing process

Overview of systems

Coverage results

Language comparison

Types of plagiarism sources

Plagiarism methods

Single-source vs. multi-source documents

Usability results

Workflow process usability

Result presentation usability