【英語論文の書き方】第79回 「データの解析(パート3):データを提示する」について
2021年3月11日 14時43分
第78回では「データの解析(パート2):統計分析」を取り上げました。
第79(今回)のテーマは
「データの解析(パート3):データを提示する」についてです。
この三部作の記事のパート1と2では、
発見したことを確認するためのデータ探索の方法と、
予備的に理解した内容を確かめるために
厳密にデータ分析を行う方法についてお話しました。
最後の記事となる今回のパート3では、
発見したことを読者に対して説明するためには
どのようにデータを提示すれば良いのか、
また、あなたの解釈が正しいものであると
読者を納得させるためにはどうすれば良いのか、
という点についてお伝えします。
ここでの第一目標は、結論を裏付けるようなデータを提示すること、
そして、その結論を支持するための、
説得力のある根拠につながる一連の結果を選択することです。
このプロセスは、統計で使ったような
数学的、統計的な方法で説明するのではなく、
説得力をもって、自分の考えと議論を整理するものとして考えてみてください。
(これらの詳細は、Methodsの部分で述べるものとなります)
それでは、記事をお読みください。
第79(今回)のテーマは
「データの解析(パート3):データを提示する」についてです。
この三部作の記事のパート1と2では、
発見したことを確認するためのデータ探索の方法と、
予備的に理解した内容を確かめるために
厳密にデータ分析を行う方法についてお話しました。
最後の記事となる今回のパート3では、
発見したことを読者に対して説明するためには
どのようにデータを提示すれば良いのか、
また、あなたの解釈が正しいものであると
読者を納得させるためにはどうすれば良いのか、
という点についてお伝えします。
ここでの第一目標は、結論を裏付けるようなデータを提示すること、
そして、その結論を支持するための、
説得力のある根拠につながる一連の結果を選択することです。
このプロセスは、統計で使ったような
数学的、統計的な方法で説明するのではなく、
説得力をもって、自分の考えと議論を整理するものとして考えてみてください。
(これらの詳細は、Methodsの部分で述べるものとなります)
それでは、記事をお読みください。
Analyzing your data (part 3 of 3): presenting your data By Geoffrey Hart
In parts 1 and 2 of this three-part article, I described how to explore your data to see what you’ve discovered and how to rigorously analyze the data to confirm your preliminary interpretations. In this concluding part, I’ll discuss how to present your data to show your readers what you’ve discovered and convince them your interpretation is correct. Here, the primary goal is to present data that support your conclusions, and to choose a sequence of results that creates a compelling argument in favor of your conclusions. Think of this as organizing your thoughts and your argument in a persuasive way, not as describing the mathematical and statistical methods used in your analysis; those details will be present in the Methods section.
In part 2 of this article, I discussed some problems that result from transforming your data. A less problematic form of transformation is to express results as a proportion of some base value, such as the value in a control or the initial value in a time series, rather than examining only the raw data. The analysis then changes from a comparison of sample means to a comparison of changes in those means. The changes may be based on a difference (i.e., you calculate the final value minus the original value) or a proportion (you divide all values by the original value) or by the proportional change in values (you subtract the original value from the current value, and divide that difference by the original value). One popular technique is to use z-scores, which transform all values into a number of standard deviations from the mean. Other standardizations include expressing values per unit area, per unit mass, or per capita.
Note: Always provide the standard deviation or standard error with every mean, or a box plot that shows the variation around a median, so that readers will understand the magnitude of the variation in your results. Present the sample size to provide additional insights into that variation.
Standardization is a powerful way to clarify changes in a data series and differences between treatments because it accounts for factors that might bias your interpretation of those changes, such as differences in the initial value. However, as is the case in any transformation of data, you must remember to account for the consequences of the transformation. The raw values and standardized values have different meanings. For example, if only a small percentage of a region’s farmland becomes degraded due to an unsustainable agricultural practice, the percentage suggests that the impacts are not serious. The proportion is, after all, small. But if this degradation occurs over a very large agricultural area, the total area that became degraded becomes large and important. Conversely, what seems like a large proportional change based on the transformed data may prove to be unimportant in practice. For example, if the survival rate for a plant disease increases from 1 in 1000 to 2 in 1000, the increase is [2–1]/1 =1.00 = 100%. However, that increase has little practical significance for farmers, and is as likely to result from random chance as it does from a successful treatment. An increase from 100 in 1000 to 200 in 1000 represents exactly the same proportional change (100%), but the survival of 100 additional individuals is more likely to be important.
Like any transformation, standardizing your data loses some information and changes the nature of the data you’re looking at. Keep those changes in mind as you decide how to present your interpretation.
Researchers prefer positive (i.e., statistically significant) results because journals have a strong bias against reporting negative results (i.e., a lack of statistical significance). However, negative results can be very important, as in the case of a medicine that produces no beneficial effect. If you designed your experiment well, have carefully controlled your selection of the study population, have obtained a large dataset, have validated your data by repeatedly calibrating your instruments against lab standards, and have replicated your results, you can be more confident that the negative result is real. For subjective data, such as the data generated by many sociology and psychology studies, asking a colleague to classify the results to see whether they agree with your classification increases confidence in the classification results. Where interpretations differ, you can discuss the difference and try to design a criterion that makes the classification more objective. With luck, that criterion will help you to agree about the correct classification.
Additional confidence can be provided using an experimental design based on triangulation. If two methods of measuring the same variable agree, the probability that a negative result is an error rather than a true lack of difference is much lower. For example, you could calculate the area of a leaf using a digital caliper and an empirically derived relationship between length, width, and area, or you could scan the leaf and use software to calculate its area. Similarly, if analyses of two different aspects of the same process lead to the same conclusion, that also reduces the likelihood that the lack of significance is an error. For example, if you measure the effect of activation of a gene using both the RNA produced by the gene and proteins produced by transcription of that RNA, and both results show no change in the study system in response to that gene expression, you can be more confident that activation of the gene produced no significant effect.
In extreme cases, negative results may even reverse the conclusions you reached in previous research. This can be a very good thing if it improves understanding of your subject. As an example, see Sager (2020). Of course, if you want to replace the prevailing understanding of a phenomenon with a new understanding, you’ll need strong evidence, and lots of it, to convince everyone. Tell readers what additional research will be required to support your proposed new description.
Next, choose an efficient sequence to work through the data in a figure or table.
For example:
Standardizing data
Note: Although terminology varies, normalization usually refers to transformations that are intended to produce a normal distribution, whereas standardization is intended to account for different initial values in different treatments, regardless of their statistical distribution.In part 2 of this article, I discussed some problems that result from transforming your data. A less problematic form of transformation is to express results as a proportion of some base value, such as the value in a control or the initial value in a time series, rather than examining only the raw data. The analysis then changes from a comparison of sample means to a comparison of changes in those means. The changes may be based on a difference (i.e., you calculate the final value minus the original value) or a proportion (you divide all values by the original value) or by the proportional change in values (you subtract the original value from the current value, and divide that difference by the original value). One popular technique is to use z-scores, which transform all values into a number of standard deviations from the mean. Other standardizations include expressing values per unit area, per unit mass, or per capita.
Note: Always provide the standard deviation or standard error with every mean, or a box plot that shows the variation around a median, so that readers will understand the magnitude of the variation in your results. Present the sample size to provide additional insights into that variation.
Standardization is a powerful way to clarify changes in a data series and differences between treatments because it accounts for factors that might bias your interpretation of those changes, such as differences in the initial value. However, as is the case in any transformation of data, you must remember to account for the consequences of the transformation. The raw values and standardized values have different meanings. For example, if only a small percentage of a region’s farmland becomes degraded due to an unsustainable agricultural practice, the percentage suggests that the impacts are not serious. The proportion is, after all, small. But if this degradation occurs over a very large agricultural area, the total area that became degraded becomes large and important. Conversely, what seems like a large proportional change based on the transformed data may prove to be unimportant in practice. For example, if the survival rate for a plant disease increases from 1 in 1000 to 2 in 1000, the increase is [2–1]/1 =1.00 = 100%. However, that increase has little practical significance for farmers, and is as likely to result from random chance as it does from a successful treatment. An increase from 100 in 1000 to 200 in 1000 represents exactly the same proportional change (100%), but the survival of 100 additional individuals is more likely to be important.
Like any transformation, standardizing your data loses some information and changes the nature of the data you’re looking at. Keep those changes in mind as you decide how to present your interpretation.
Evaluating non-significant results
Sometimes a specific experimental design fails to reveal a significant difference between treatments. Ask yourself why. For example, researchers who don’t review the literature to learn the expected magnitude of the variation before they design their experiment often choose a too-small sample size, leading to high variation in the results that obscures the differences. Alternatively, budget and time constraints may force you to use a too-small sample. In that case, you may need to present your study as exploratory, with the goal of increasing understanding of the study system so you can design a better experiment for your subsequent research.Researchers prefer positive (i.e., statistically significant) results because journals have a strong bias against reporting negative results (i.e., a lack of statistical significance). However, negative results can be very important, as in the case of a medicine that produces no beneficial effect. If you designed your experiment well, have carefully controlled your selection of the study population, have obtained a large dataset, have validated your data by repeatedly calibrating your instruments against lab standards, and have replicated your results, you can be more confident that the negative result is real. For subjective data, such as the data generated by many sociology and psychology studies, asking a colleague to classify the results to see whether they agree with your classification increases confidence in the classification results. Where interpretations differ, you can discuss the difference and try to design a criterion that makes the classification more objective. With luck, that criterion will help you to agree about the correct classification.
Additional confidence can be provided using an experimental design based on triangulation. If two methods of measuring the same variable agree, the probability that a negative result is an error rather than a true lack of difference is much lower. For example, you could calculate the area of a leaf using a digital caliper and an empirically derived relationship between length, width, and area, or you could scan the leaf and use software to calculate its area. Similarly, if analyses of two different aspects of the same process lead to the same conclusion, that also reduces the likelihood that the lack of significance is an error. For example, if you measure the effect of activation of a gene using both the RNA produced by the gene and proteins produced by transcription of that RNA, and both results show no change in the study system in response to that gene expression, you can be more confident that activation of the gene produced no significant effect.
In extreme cases, negative results may even reverse the conclusions you reached in previous research. This can be a very good thing if it improves understanding of your subject. As an example, see Sager (2020). Of course, if you want to replace the prevailing understanding of a phenomenon with a new understanding, you’ll need strong evidence, and lots of it, to convince everyone. Tell readers what additional research will be required to support your proposed new description.
Presenting datasets clearly and consistently
Help your readers follow your description of the data by choosing a criterion for judging a result’s importance. Statistical significance is one obvious criterion, but significant results may not be meaningful in practice, as in the example of proportional changes that I described earlier in this article. Choose an appropriate characteristic of the data you are describing. For example, when you discuss the vectors for the variables in a redundancy analysis or principal-coordinates ordination, you can limit your description to only the vectors that are longer than a certain threshold length and that lie at an angle of <30° from the axis. Other vectors may be significant, but their correlation with the axis will be weaker, and that means you can omit those vectors from your discussion. The criterion you choose tells you which results you should focus on, which is particularly important when you can’t discuss every result (e.g., in a large, multi-variable dataset).Next, choose an efficient sequence to work through the data in a figure or table.
For example:
- In a linear regression analysis, describe the trend for each regression line separately. For example, y increased continuously with increasing x in treatment 1, but decreased continuously in the control. Next, examine the differences between each pair of lines. Treatment 1 may have values less than those in treatment 2 up to a certain point, then achieve higher values subsequently.
- In a table that presents multiple variables for each treatment, describe each variable, one at a time, to show how that variable differs among the treatments. Then repeat this process for the next variable and the next one until you reach the end of the variables.
Constraining your presentation
Be cautious about extrapolating beyond the range of your data. Your data often describes only a small portion of the total range of possible values for a variable. If that total range is much larger, extending your interpretation beyond the range of your data is risky. For example, the beneficial response to a drug often increases with increasing dosage, right up to the point that the drug reaches toxic levels in the patient. Even if your intuitive knowledge of the situation suggests the behavior does not change for smaller or larger values, explain why you believe your assumption is valid, and suggest any cautions that are required if someone tries to extrapolate beyond your data.Acknowledgments
I’m grateful for the reality check on my statistical descriptions provided by Dr. Julian Norghauer (https://www.statsediting.com/about.html). Any errors in this article are my sole responsibility.Reference
Sager, W.W. 2020. Massif redo. Scientific American May 2020:48-53.無料メルマガ登録
これからも約2週間に一度のペースで、英語で論文を書く方向けに役立つコンテンツをお届けしていきますので、お見逃しのないよう、上記のフォームよりご登録ください。
もちろん無料です。
バックナンバー
第1回 if、in case、when の正しい使い分け:確実性の程度を英語で正しく表現する
第2回 「装置」に対する英語表現
第3回 助動詞のニュアンスを正しく理解する:「~することが出来た」「~することが出来なかった」の表現
第4回 「~を用いて」の表現:by と with の違い
第5回 技術英文で使われる代名詞のitおよび指示代名詞thisとthatの違いとそれらの使用法
第6回 原因・結果を表す動詞の正しい使い方:その1 原因→結果
第7回 原因・結果を表す動詞の使い方:その2 結果→原因
第8回 受動態の多用と誤用に注意
第9回 top-heavyな英文を避ける
第10回 名詞の修飾語を前から修飾する場合の表現法
第11回 受動態による効果的表現
第12回 同格を表す接続詞thatの使い方
第13回 「技術」を表す英語表現
第14回 「特別に」を表す英語表現
第15回 所有を示すアポストロフィー + s ( ’s) の使い方
第16回 「つまり」「言い換えれば」を表す表現
第17回 寸法や重量を表す表現
第18回 前置詞 of の使い方: Part 1
第19回 前置詞 of の使い方: Part 2
第20回 物体や物質を表す英語表現
第21回 句動詞表現より1語動詞での表現へ
第22回 不定詞と動名詞: Part 1
第23回 不定詞と動名詞の使い分け: Part 2
第24回 理由を表す表現
第25回 総称表現 (a, theの使い方を含む)
第26回研究開発」を表す英語表現
第27回 「0~1の数値は単数か複数か?」
第28回 「時制-現在形の動詞の使い方」
第29回 then, however, therefore, for example など接続副詞の使い方
第30回 まちがえやすいusing, based onの使い方-分詞構文
第31回 比率や割合の表現(ratio, rate, proportion, percent, percentage)
第32回 英語論文の書き方 総集編
第33回 Quality Review Issue No. 23 report, show の時制について
第34回 Quality Review Issue No. 24 参考文献で日本語論文をどう記載すべきか
第35回 Quality Review Issue No. 25 略語を書き出すときによくある間違いとは?
第36回 Quality Review Issue No. 26 %と℃の前にスペースを入れるかどうか
第37回 Quality Review Issue No. 27 同じ種類の名詞が続くとき冠詞は付けるべき?!
第38回 Quality Review Issue No. 22 日本人が特に間違えやすい副詞の使い方
第39回 Quality Review Issue No. 21 previous, preceding, earlierなどの表現のちがい
第40回 Quality Review Issue No. 20 using XX, by XXの表現の違い
第41回 Quality Review Issue No. 19 increase, rise, surgeなど動詞の選び方
第42回 Quality Review Issue No. 18 論文での受動態の使い方
第43回 Quality Review Issue No. 17 Compared with とCompared toの違いは?
第44回 Reported about, Approach toの前置詞は必要か?
第45回 Think, propose, suggest, consider, believeの使い分け
第46回 Quality Review Issue No. 14 Problematic prepositions scientific writing: by, through, and with -3つの前置詞について
第47回 Quality Review Issue No. 13 名詞を前から修飾する場合と後ろから修飾する場合
第48回 Quality Review Issue No. 13 単数用法のThey
第49回 Quality Review Issue No. 12 study, investigation, research の微妙なニュアンスのちがい
第50回 SinceとBecause 用法に違いはあるのか?
第51回 Figure 1とFig.1の使い分け
第52回 数式を含む場合は現在形か?過去形か?
第53回 Quality Review Issue No. 8 By 2020とup to 2020の違い
第54回 Quality Review Issue No. 7 high-accuracy data? それとも High accurate data? 複合形容詞でのハイフンの使用
第55回 実験計画について
第56回 参考文献について
第57回 データの分析について
第58回 強調表現について
第59回 共同研究の論文執筆について
第60回 論文の略語について
第61回 冠詞の使い分けについて
第62回 大文字表記について
第63回 ダッシュの使い分け
第64回 英語の言葉選びの難しさについて
第65回 過去形と能動態について
第66回 「知識の呪い」について
第67回 「文献の引用パート1」について
第68回 「文献の引用パート2」について
第69回 「ジャーナル用の図表の準備」について
第70回 「結論を出す ~AbstractとConclusionsの違い~」について
第71回 「研究倫理 パート1: 研究デザインとデータ報告」について
第72回 「研究倫理 パート2: 読者の時間を無駄にしない」について
第73回 「記号と特殊文字の入力」について
第74回 「Liner regression(線形回帰)は慎重に」について
第75回 「Plagiarism(剽窃)を避ける」について
第76回 研究結果がもたらす影響を考える
第77回 「データの解析(パート1):データ探索を行う」について
第78回 「データの解析(パート2):統計分析」について
第2回 「装置」に対する英語表現
第3回 助動詞のニュアンスを正しく理解する:「~することが出来た」「~することが出来なかった」の表現
第4回 「~を用いて」の表現:by と with の違い
第5回 技術英文で使われる代名詞のitおよび指示代名詞thisとthatの違いとそれらの使用法
第6回 原因・結果を表す動詞の正しい使い方:その1 原因→結果
第7回 原因・結果を表す動詞の使い方:その2 結果→原因
第8回 受動態の多用と誤用に注意
第9回 top-heavyな英文を避ける
第10回 名詞の修飾語を前から修飾する場合の表現法
第11回 受動態による効果的表現
第12回 同格を表す接続詞thatの使い方
第13回 「技術」を表す英語表現
第14回 「特別に」を表す英語表現
第15回 所有を示すアポストロフィー + s ( ’s) の使い方
第16回 「つまり」「言い換えれば」を表す表現
第17回 寸法や重量を表す表現
第18回 前置詞 of の使い方: Part 1
第19回 前置詞 of の使い方: Part 2
第20回 物体や物質を表す英語表現
第21回 句動詞表現より1語動詞での表現へ
第22回 不定詞と動名詞: Part 1
第23回 不定詞と動名詞の使い分け: Part 2
第24回 理由を表す表現
第25回 総称表現 (a, theの使い方を含む)
第26回研究開発」を表す英語表現
第27回 「0~1の数値は単数か複数か?」
第28回 「時制-現在形の動詞の使い方」
第29回 then, however, therefore, for example など接続副詞の使い方
第30回 まちがえやすいusing, based onの使い方-分詞構文
第31回 比率や割合の表現(ratio, rate, proportion, percent, percentage)
第32回 英語論文の書き方 総集編
第33回 Quality Review Issue No. 23 report, show の時制について
第34回 Quality Review Issue No. 24 参考文献で日本語論文をどう記載すべきか
第35回 Quality Review Issue No. 25 略語を書き出すときによくある間違いとは?
第36回 Quality Review Issue No. 26 %と℃の前にスペースを入れるかどうか
第37回 Quality Review Issue No. 27 同じ種類の名詞が続くとき冠詞は付けるべき?!
第38回 Quality Review Issue No. 22 日本人が特に間違えやすい副詞の使い方
第39回 Quality Review Issue No. 21 previous, preceding, earlierなどの表現のちがい
第40回 Quality Review Issue No. 20 using XX, by XXの表現の違い
第41回 Quality Review Issue No. 19 increase, rise, surgeなど動詞の選び方
第42回 Quality Review Issue No. 18 論文での受動態の使い方
第43回 Quality Review Issue No. 17 Compared with とCompared toの違いは?
第44回 Reported about, Approach toの前置詞は必要か?
第45回 Think, propose, suggest, consider, believeの使い分け
第46回 Quality Review Issue No. 14 Problematic prepositions scientific writing: by, through, and with -3つの前置詞について
第47回 Quality Review Issue No. 13 名詞を前から修飾する場合と後ろから修飾する場合
第48回 Quality Review Issue No. 13 単数用法のThey
第49回 Quality Review Issue No. 12 study, investigation, research の微妙なニュアンスのちがい
第50回 SinceとBecause 用法に違いはあるのか?
第51回 Figure 1とFig.1の使い分け
第52回 数式を含む場合は現在形か?過去形か?
第53回 Quality Review Issue No. 8 By 2020とup to 2020の違い
第54回 Quality Review Issue No. 7 high-accuracy data? それとも High accurate data? 複合形容詞でのハイフンの使用
第55回 実験計画について
第56回 参考文献について
第57回 データの分析について
第58回 強調表現について
第59回 共同研究の論文執筆について
第60回 論文の略語について
第61回 冠詞の使い分けについて
第62回 大文字表記について
第63回 ダッシュの使い分け
第64回 英語の言葉選びの難しさについて
第65回 過去形と能動態について
第66回 「知識の呪い」について
第67回 「文献の引用パート1」について
第68回 「文献の引用パート2」について
第69回 「ジャーナル用の図表の準備」について
第70回 「結論を出す ~AbstractとConclusionsの違い~」について
第71回 「研究倫理 パート1: 研究デザインとデータ報告」について
第72回 「研究倫理 パート2: 読者の時間を無駄にしない」について
第73回 「記号と特殊文字の入力」について
第74回 「Liner regression(線形回帰)は慎重に」について
第75回 「Plagiarism(剽窃)を避ける」について
第76回 研究結果がもたらす影響を考える
第77回 「データの解析(パート1):データ探索を行う」について
第78回 「データの解析(パート2):統計分析」について