英語論文の書き方】第85回 「研究データと関連文書の管理(パート4):ファイルの複製」について

2022年10月27日 15時22分

第84回では「研究データと関連文書の管理(パート3):データ検証とカスタム開発ソフトウェア」を取り上げました。

第85回(今回)のテーマは
「研究データと関連文書の管理(パート4):ファイルの複製」についてです。
 
この記事のパート1~3では、データを保存することと調査を実証することの重要性を開設し、プロジェクトファイルの設定とデータを有効化するためのヒントについてお話しました。
 
今回のパート4は、このシリーズの最終回となります。
未来の研究者にとって助けになると期待される「ファイルの複製」方法について論じたいと思います。
 
複製フォルダ(ディレクトリ)を作って研究仲間にシェアし、データを正確に分析してもらうことが可能になれば理想的ですよね。
 
それでは、記事をお読みください。

Managing your study data and the supporting documentation. Part 4: Replication Files By Geoffrey Hart

In part 1,part 2, and part 3 of this series, I described the importance of archiving your data and documenting your research, and provided suggestions on how to set up your project files and validate your data. In this final part, I'll discuss how to prepare your "replication files". The goal of creating these files is to help future researchers repeat your research or include your results in a meta-analysis. Your replication files should include enough information that another researcher understands what you did sufficiently well that they could repeat the steps you performed and obtain similar results. They should also provide enough information that another researcher could repeat your data analysis to obtain identical results. You can think of this data as the “public” versions of the “private” files that you store in the research directories I described in previous articles. Ideally, structure the replication folder (directory) so that you can simply give your colleagues a copy of the whole folder and be confident that they will be able to analyze the data correctly.
 
Note: This series of articles is based on the following book, but with the details modified for non-psychological research: Berenson, K.R. 2018. Managing Your Research Data and Documentation. American Psychological Association. 105 p. including index. (https://www.apa.org/pubs/books/4313048)
 
Replication information should include only data that you are legally or ethically allowed to share. For any data that is proprietary or confidential, include only the subset of the data that you can legitimately share. For proprietary information, your employer’s intellectual property manager can provide detailed guidance. For human data, where privacy is important, store the identities of the participants separately from their data so that nobody who uses the dataset can identify the individuals unless an institutional review board or other authority determines that it’s necessary to identify individuals.
 
Replication information should include more than your own data. It should also include source data that you downloaded from a database or obtained from a colleague. This ensures that the same data will be available if you or a future researcher needs that data. If you want to share a colleague’s data, always ask for their permission before you make their data available. In many cases, it’s safer if future researchers obtain the data directly from the researchers who created it, along with any necessary explanations.)
 
Note: Develop a procedure that will ensure that if you modify a source data file that will also be included in the replication folder, you will remember to update the copy in the replication folder so that the two copies remain identical. For example, before you send replication files to a colleague, compare the creation dates with your original files. If the file in the Replication Files folder is older, determine whether it's necessary to replace it with the newer version.
 
It's also appropriate to include methodology documents, such as copies of blank data-entry forms or (for human studies) participant recruitment forms. If the replication folder is intended solely for your own use, include final versions of grant applications that you can reuse, with appropriate modification, in future funding applications. If a colleague provided a copy of one of their papers that contains detailed methodology that you used in your study, save that paper; if you downloaded the paper from a journal site, store it outside the Replication directory so that you protect these copyrighted materials by not sharing them with other researchers. Instead, provide a document that lists all manuscripts that you consulted, and a link to the Web page where the manuscript can be obtained. The advantage of retaining the downloaded copy is that you won’t have to go looking for it again when you perform your next study; the advantage of the file that contains the links is that your colleagues can obtain their own copy of the file with a single click.
 
Journal articles are well designed to provide an overview of the steps an author followed in their research, but they generally assume that readers are already familiar with details of the methods. As a result, they rarely provide full details of a procedure, and their structure is inefficient when it comes time for you to actually use that method in your own research. For example, the Methods section rarely starts with a list of materials and lab instruments you will need to gather together before you can perform an analysis. Moreover, you will often see phrases such as “using the method of Hart (2021)” instead of a description of the steps that author actually followed. In long or complex studies, an author may cite a dozen or more external documents that readers of the published paper must obtain and read before they can repeat your experiment. Why not make life easier for all future researchers who use your methods by providing all necessary details of those methods in the form of a laboratory manual or similar resource?
 
For example, to make your documented methods easy to use, rewrite them like a recipe in a cookbook:
  1. Start with a complete list of ingredients (e.g., laboratory chemicals) and tools (e.g., a gas chromatograph; 50 Erlenmeyer flasks, each with a volume of 200 mL). Researchers can obtain these materials before they begin their analysis rather than discovering halfway through a laboratory procedure that they lack a crucial chemical or tool.
  2. Next, describe each step of the analysis as a numbered sentence presented in exactly the same order you would follow to perform the procedure. In effect, you are transforming the journal paper’s high-level overview of the analysis into the equivalent of a step-by-step laboratory procedures manual.
 
Carefully note any lessons you learned from previous analyses or from the present analysis. For example, provide details of mistakes to avoid (e.g., types of bias), things you could do to make the work easier (e.g., specific software settings), ways to improve the validity of your data (e.g., triangulation, suggestions for dealing with skewed data distributions), and problems to avoid in the experimental design or data analysis (e.g., specify the minimum sample size that is likely to be required to achieve statistical significance). Describe each problem by explaining its cause, how to avoid that cause, and what to do if you’re unable to avoid that cause. Provide the most important of these lessons in the Methods section of your paper so that future researchers can benefit from your problems and solutions even if they don’t ask for your replication files. In particular, describe any procedures you developed to minimize errors in data entry and analysis.
 
For any research method, it’s essential to ask a colleague who is unfamiliar with your specific research to test your description to reveal any implicit assumptions you made that should be made explicit (e.g., "why did you do that?"), to identify any missing steps, and to ensure that there is no question about what must be done in each step. If a protocol will be repeated over many years and the original researchers may have retired or moved to a different research institute by the time a new group of graduate students arrives, ask the graduate students to review the protocol to ensure that they understand it. Then ask them to follow the protocol at least once under standard conditions to ensure that they get the same results you obtained under those conditions; any error suggests that your instructions may contain flaws that must be corrected.
 
Note: It’s useful to create a “Read Me First” document that explains all the contents of the Replication directory (what information is present, where to look for each type of information, and how to use the information when you find it).
 
For large datasets, you may find it inconvenient or difficult to provide the data to other researchers, particularly if the data is sufficiently important that it will be reused by many future researchers. Answering hundreds of requests to provide your data takes time that would be better spent on your research. To simplify the task of sharing your data, use a public data repository, so that other researchers can access your data without bothering you. The best data repository varies among journals. For example, many genetics journals ask authors to save gene sequences in a location such as the DNA Database of Japan (https://www.ddbj.nig.ac.jp/index-e.html), whereas journals that specialize in Arabidopsis genetics may ask you to use The Arabidopsis Information Resource (https://www.arabidopsis.org/). Science asks its authors to use a non-profit publicly accessible site such as Dryad (https://datadryad.org/stash), Dataverse (https://dataverse.org/), or Zenodo (https://zenodo.org/).
 

Data and file formats

What is the best format for storing data? There’s some debate. Some authors recommend saving all files in PDF format because the contents of the files cannot be easily altered (i.e., can’t be modified by mistake) and because this format is likely to remain readable for a long time. Though that’s a reasonable proposal, it suffers from a significant problem: file formats become popular for many years, then disappear with little warning. (This happened recently with Adobe’s Flash software, which is no longer supported by most computer operating systems.) In more than 35 years of working with computers, I’ve seen many extremely popular word processor formats such as WordStar and WordPerfect 5 become obsolete, making files stored in these formats unreadable by the software that replaced them (e.g., Microsoft Word). In addition, it’s difficult to extract information in a usable format from a PDF file if the file contains complicated layout (e.g., tables with multiple columns of text). For these reasons, “text” (Unicode) format files, which only include Unicode characters, are a better choice. For the foreseeable future, any program will be able to read text files, so the data stored in such files will remain readable for a long time before they are no longer accessible.
 
Where formatting is important, as in the case of data tables, you have two main options. First, for tabular data such as the contents of an Excel spreadsheet that does not contain calculation formulas, you can save the data in comma-delimited or tab-delimited formats. In this format, the software adds a comma or a tab character between consecutive values and a “hard return” character to mark the end of each row of data. (Don’t use comma-delimited format for data such as text that contains punctuation! There are ways to accomplish this, but it’s more work than necessary and there’s still a significant risk of errors.) Most spreadsheets and databases can read these files easily.
 
Second, for information that requires more formatting, consider the HTML format, since it offers two significant advantages over competing formats: it is stored in text format, so it can be easily read by almost all software, and it contains formatting tags (the words inside the < > brackets) that define the meaning of a group of information (e.g., a paragraph, a heading, a table cell) or that specify its formatting. It’s easy to search for and replace these tags if that becomes necessary. For data that relies more heavily on a structure, the closely related XML format may be a better choice. It’s still a text format, but the structure is more rigorously controlled than in an HTML document.
 
Note: Research data is increasingly moving beyond text and numbers to include sound and graphics. Because these formats are frequently replaced or upgraded (e.g., to support greater compression to reduce file size, to include metadata), the best solution if you want these types of data to remain available for a decade or more may be to add an annual reminder in your computer’s calendar software that you should open the files in your current software and save them again in one of the newer formats provided by that software.
 

Final thoughts

To learn more about archiving your data and methods, consult the Teaching Integrity in Empirical Research (https://www.projecttier.org/) project, whose goal is to provide “guidance to students conducting quantitative research to help ensure that their work is transparent and reproducible”. If their guidelines are not directly applicable to your field of research, perhaps you could work with colleagues to create specific guidelines for your field. Future researchers will be grateful.
 

無料メルマガ登録

メールアドレス
お名前

これからも約2週間に一度のペースで、英語で論文を書く方向けに役立つコンテンツをお届けしていきますので、お見逃しのないよう、上記のフォームよりご登録ください。
 
もちろん無料です。

バックナンバー

第1回 if、in case、when の正しい使い分け:確実性の程度を英語で正しく表現する

第2回 「装置」に対する英語表現

第3回 助動詞のニュアンスを正しく理解する:「~することが出来た」「~することが出来なかった」の表現

第4回 「~を用いて」の表現:by と with の違い

第5回 技術英文で使われる代名詞のitおよび指示代名詞thisとthatの違いとそれらの使用法

第6回 原因・結果を表す動詞の正しい使い方:その1 原因→結果

第7回 原因・結果を表す動詞の使い方:その2 結果→原因

第8回 受動態の多用と誤用に注意

第9回 top-heavyな英文を避ける

第10回 名詞の修飾語を前から修飾する場合の表現法

第11回 受動態による効果的表現

第12回 同格を表す接続詞thatの使い方

第13回 「技術」を表す英語表現

第14回 「特別に」を表す英語表現

第15回 所有を示すアポストロフィー + s ( ’s) の使い方

第16回 「つまり」「言い換えれば」を表す表現

第17回 寸法や重量を表す表現

第18回 前置詞 of の使い方: Part 1

第19回 前置詞 of の使い方: Part 2

第20回 物体や物質を表す英語表現

第21回 句動詞表現より1語動詞での表現へ

第22回 不定詞と動名詞: Part 1

第23回 不定詞と動名詞の使い分け: Part 2

第24回 理由を表す表現

第25回 総称表現 (a, theの使い方を含む)

第26回研究開発」を表す英語表現

第27回 「0~1の数値は単数か複数か?」

第28回 「時制-現在形の動詞の使い方」

第29回  then, however, therefore, for example など接続副詞の使い方​

第30回  まちがえやすいusing, based onの使い方-分詞構文​

第31回  比率や割合の表現(ratio, rate, proportion, percent, percentage)

第32回 英語論文の書き方 総集編

第33回 Quality Review Issue No. 23 report, show の時制について​

第34回 Quality Review Issue No. 24 参考文献で日本語論文をどう記載すべきか​

第35回 Quality Review Issue No. 25 略語を書き出すときによくある間違いとは?​

第36回 Quality Review Issue No. 26 %と℃の前にスペースを入れるかどうか

第37回 Quality Review Issue No. 27 同じ種類の名詞が続くとき冠詞は付けるべき?!​

第38回 Quality Review Issue No. 22  日本人が特に間違えやすい副詞の使い方​

第39回 Quality Review Issue No. 21  previous, preceding, earlierなどの表現のちがい

第40回 Quality Review Issue No. 20 using XX, by XXの表現の違い

第41回 Quality Review Issue No. 19 increase, rise, surgeなど動詞の選び方

第42回 Quality Review Issue No. 18 論文での受動態の使い方​

第43回 Quality Review Issue No. 17  Compared with とCompared toの違いは?​

第44回 Reported about, Approach toの前置詞は必要か?​

第45回 Think, propose, suggest, consider, believeの使い分け​

第46回 Quality Review Issue No. 14  Problematic prepositions scientific writing: by, through, and with -3つの前置詞について​

第47回 Quality Review Issue No. 13 名詞を前から修飾する場合と後ろから修飾する場合​

第48回 Quality Review Issue No. 13 単数用法のThey​

第49回 Quality Review Issue No. 12  study, investigation, research の微妙なニュアンスのちがい

第50回 SinceとBecause 用法に違いはあるのか?

第51回 Figure 1とFig.1の使い分け

第52回 数式を含む場合は現在形か?過去形か?

第53回 Quality Review Issue No. 8  By 2020とup to 2020の違い

第54回 Quality Review Issue No. 7  high-accuracy data? それとも High accurate data? 複合形容詞でのハイフンの使用

第55回 実験計画について

第56回 参考文献について

第57回 データの分析について

第58回 強調表現について

第59回 共同研究の論文執筆について

第60回 論文の略語について

第61回 冠詞の使い分けについて

第62回 大文字表記について

第63回 ダッシュの使い分け

第64回 英語の言葉選びの難しさについて

第65回 過去形と能動態について

第66回 「知識の呪い」について

第67回 「文献の引用パート1」について

第68回 「文献の引用パート2」について

第69回 「ジャーナル用の図表の準備」について

第70回 「結論を出す ~AbstractとConclusionsの違い~」について

第71回 「研究倫理 パート1: 研究デザインとデータ報告」について

第72回 「研究倫理 パート2: 読者の時間を無駄にしない」について

第73回 「記号と特殊文字の入力」について

第74回 「Liner regression(線形回帰)は慎重に」について

第75回 「Plagiarism(剽窃)を避ける」について

第76回 研究結果がもたらす影響を考える

第77回 「データの解析(パート1):データ探索を行う」について

第78回 「データの解析(パート2):統計分析」について

第79回 「データの解析(パート3):データを提示する」について

第80回 データ、その他の大事なものをバックアップする(パート1)

第81回 「データ以外のもの(パート2)」について

第82回「研究データと関連文書の管理(パート1):研究内容を文書で厳密に記録することがなぜ大切なのか」について 

第83回「研究データと関連文書の管理(パート2):必要なプロジェクトファイル、フォルダ(ディレクトリ)の構成とデータの消去 

第84回研究データと関連文書の管理(パート3):データ検証とカスタム開発ソフトウェア


〒300-1206
茨城県牛久市ひたち野西3-12-2
オリオンピアA-5

TEL 029-870-3307
FAX 029-870-3308
ワールド翻訳サービス スタッフブログ ワールド翻訳サービス Facebook ワールド翻訳サービスの動画紹介