【英語論文の書き方】第57回 データの分析について

2018年8月23日 10時00分

まずは前回の振り返りです。

第56回では次のテーマを取り上げました。
参考文献について
論文を書く上でとても重要な参考文献。
参考文献は、自身の論文を手に取ってくれた方への情報提供であり、参考にした先行する研究情報を明示し、自身の論文内容の信頼性を明確にするものです。
文献を引用する主な理由を取り上げました。

そして、
第57回(今回)のテーマはデータの分析についてです。
「第55回実験計画について」の続きとして、データ分析の際に注意すべき点について取り上げています。
実験計画を立て、データを収集した後、そのデータはどのように分析すればよいのでしょうか。

-データ分析時の注意点-
・正規性とデータの変換
・線形性を誤って仮定してしまうと…
・分析する前に慎重に検討しましょう!

最後に、第55回で触れた「三角測量」について補足しています。

いずれも、選択圧、水、硬貨などの例を用いて解説しています。
分析方法を間違えてしまうと、本来得られるべき結果が得られないことがあります。適切なデータ分析をして最大限の成果をあげるために、ぜひご活用ください。

Analyzing your data By Geoff Hart

 Experimental design can be the hardest part of any research project. That’s unfortunate, because if you design your study poorly, you’ll collect data that is at best difficult to analyze or at worst meaningless. But once you have collected your data, you face an additional challenge: how to analyze it. In this article, I’ll build on my previous article on designing effective research by describing a few common problems you should avoid when you analyze your research data.

Normality and data transformations

 Researchers often believe that data must be normally distributed before they can analyze the data. If the sample size is relatively small, this is indeed a requirement for some familiar statistical tests, such as Pearson’s correlation coefficient or analysis of variance (ANOVA). This is why you’ll often see researchers describe how they transformed their data (e.g., using the logarithm of the measured values) before testing for significance. But in some cases, such as general linear models, it’s more important that the error term (the residuals) be distributed normally. For example, see the paper by Marc Kéry and Jeff Hatfield (2003, Bulletin of the Ecological Society of America 84(2):92–94.).
 In addition to being unnecessary in many cases, transformation of the raw data can create problems. The most obvious—and therefore the easiest to forget—is that if the original data were not normally distributed, the transformed data conceal that lack of normality. When a statistical distribution is strongly skewed (non-normal), that may reveal the action of an important physical or biological phenomenon. An example might be the variable selection pressure caused by fishing, which preferentially eliminates the largest individuals in a population.
 Transforming data to produce a normal distribution is not appropriate if the real-world phenomenon cannot reasonably be expected to have a normal distribution. For example, biological sex more closely approximates a bimodal distribution (one with only two values: male or female). Trying to transform data on the distribution of sexes to create a normal (unimodal) distribution would be illogical and misleading. Moreover, closer examination often reveals intermediate values (in this case, intersex individuals who have properties between those of the two dominant sexes), and in psychological research, the transformation would conceal important phenomena such as the difference between gender (one’s perceived sex) and biological (chromosomal) sex.
 The most important point about testing for normality is not related to normality itself, but rather to the fact that most statistical tests depend on specific assumptions and have specific requirements. If your data don’t meet those requirements, you must either choose a different test, or must transform the data in a way that doesn’t distort its meaning. Many researchers incorrectly believe that their statistical software will warn them whether a given test is appropriate for their data. This belief is often wrong. Before you use any statistical test, learn whether your software will confirm that the test is valid, or whether you must manually confirm that the test is appropriate.

(Incorrectly) assuming linearity

 Another common problem relates to regression analysis. Many researchers choose simple linear regression for scatterplots that show some degree of curvature (i.e., evidence of a nonlinear relationship). Linear regression has an additional problem: it assumes that the values of a variable can increase or decrease infinitely, with no maximum or minimum value and no asymptote. For some physical processes, this may be a reasonable assumption; for most physical processes and all biological processes, this assumption is illogical and incorrect. As a result, even a strong and highly significant linear regression can be misleading.
 A further problem is that even when a linear relationship is valid for a specific range of conditions, the relationship may change dramatically outside that range. Among physical processes, consider water: if you apply heat to liquid water, the temperature will increase at a rate defined by the water’s heat capacity (about 4.2 kJ/kg-K). However, once the water freezes or boils, its heat capacity changes drastically, and predicting temperature changes using the linear relationship derived for liquid water will produce dramatically incorrect results. Among biological processes, consider an organism’s population growth rate. Most organisms exhibit linear and exponential (nonlinear) growth during different growth stages or at different population densities. You can’t predict nonlinear population growth using linear regression, and vice versa.
 These examples demonstrate the importance of thinking about your data’s physical meaning before you determine how to analyze the data. In many cases, it will be necessary to use nonlinear regression (e.g., a sigmoidal ∫ curve with asymptotes) or piecewise regression (with different linear or nonlinear equations for different ranges of data). Always critically consider the meaning of your data instead of blindly relying on the statistical results.

Think carefully before you analyze!

 Statistical analysis and data presentation are far more complex than I can discuss in a short series of articles. Unfortunately, many researchers don’t learn these skills thoroughly as undergraduates, and have no time in graduate school to improve their understanding. As a result, they rely on emulating sometimes-flawed examples in the research literature. If you’re uncertain about your design, ask an expert (a statistician) for help. Statisticians can also help you analyze messy data that you might be unable to analyze on your own..

Additional thoughts on triangulation

 In my previous article on experimental design, I wrote about "triangulation" as a way to validate hypotheses using datasets for two or more parameters. What you’re doing is describing a single phenomenon from two or more perspectives by formulating hypotheses that describe different aspects of the phenomenon. For example, consider coins. We know from experience that one side of the coin presents its denomination (i.e., amount and currency type) and the other side has a symbol of the nation that created the coin. If we examine one side of a round metal object and see a denomination, we can hypothesize that it is a coin and that the other side will have a national symbol—but we can only be confident of this if we actually examine the other side.

Acknowledgment

 I thank Dr. Julian Norghauer (https://www.statsediting.com/) for suggestions about additional design and analysis topics, and for providing a reality check on what I’ve written.

***

 Geoffrey Hart is a Canadian science editor with more than 30 years of experience. His goal in writing these articles is to help you write more efficiently and communicate the importance of your research more successfully. If there’s a topic you want him to cover or a question you want him to answer, please contact World Translation Services to make this request

無料メルマガ登録

メールアドレス
お名前

これからも約2週間に一度のペースで、英語で論文を書く方向けに役立つコンテンツをお届けしていきますので、お見逃しのないよう、上記のフォームよりご登録ください。
 
もちろん無料です。

バックナンバー

第1回 if、in case、when の正しい使い分け:確実性の程度を英語で正しく表現する

第2回 「装置」に対する英語表現

第3回 助動詞のニュアンスを正しく理解する:「~することが出来た」「~することが出来なかった」の表現

第4回 「~を用いて」の表現:by と with の違い

第5回 技術英文で使われる代名詞のitおよび指示代名詞thisとthatの違いとそれらの使用法

第6回 原因・結果を表す動詞の正しい使い方:その1 原因→結果

第7回 原因・結果を表す動詞の使い方:その2 結果→原因

第8回 受動態の多用と誤用に注意

第9回 top-heavyな英文を避ける

第10回 名詞の修飾語を前から修飾する場合の表現法

第11回 受動態による効果的表現

第12回 同格を表す接続詞thatの使い方

第13回 「技術」を表す英語表現

第14回 「特別に」を表す英語表現

第15回 所有を示すアポストロフィー + s ( ’s) の使い方

第16回 「つまり」「言い換えれば」を表す表現

第17回 寸法や重量を表す表現

第18回 前置詞 of の使い方: Part 1

第19回 前置詞 of の使い方: Part 2

第20回 物体や物質を表す英語表現

第21回 句動詞表現より1語動詞での表現へ

第22回 不定詞と動名詞: Part 1

第23回 不定詞と動名詞の使い分け: Part 2

第24回 理由を表す表現

第25回 総称表現 (a, theの使い方を含む)

第26回研究開発」を表す英語表現

第27回 「0~1の数値は単数か複数か?」

第28回 「時制-現在形の動詞の使い方」

第29回  then, however, therefore, for example など接続副詞の使い方​

第30回  まちがえやすいusing, based onの使い方-分詞構文​

第31回  比率や割合の表現(ratio, rate, proportion, percent, percentage)

第32回 英語論文の書き方 総集編

第33回 Quality Review Issue No. 23 report, show の時制について​

第34回 Quality Review Issue No. 24 参考文献で日本語論文をどう記載すべきか​

第35回 Quality Review Issue No. 25 略語を書き出すときによくある間違いとは?​

第36回 Quality Review Issue No. 26 %と℃の前にスペースを入れるかどうか

第37回 Quality Review Issue No. 27 同じ種類の名詞が続くとき冠詞は付けるべき?!​

第38回 Quality Review Issue No. 22  日本人が特に間違えやすい副詞の使い方​

第39回 Quality Review Issue No. 21  previous, preceding, earlierなどの表現のちがい

第40回 Quality Review Issue No. 20 using XX, by XXの表現の違い

第41回 Quality Review Issue No. 19 increase, rise, surgeなど動詞の選び方

第42回 Quality Review Issue No. 18 論文での受動態の使い方​

第43回 Quality Review Issue No. 17  Compared with とCompared toの違いは?​

第44回 Reported about, Approach toの前置詞は必要か?​

第45回 Think, propose, suggest, consider, believeの使い分け​

第46回 Quality Review Issue No. 14  Problematic prepositions scientific writing: by, through, and with -3つの前置詞について​

第47回 Quality Review Issue No. 13 名詞を前から修飾する場合と後ろから修飾する場合​

第48回 Quality Review Issue No. 13 単数用法のThey​

第49回 Quality Review Issue No. 12  study, investigation, research の微妙なニュアンスのちがい

第50回 SinceとBecause 用法に違いはあるのか?

第51回 Figure 1とFig.1の使い分け

第52回 数式を含む場合は現在形か?過去形か?

第53回 Quality Review Issue No. 8  By 2020とup to 2020の違い

第54回 Quality Review Issue No. 7  high-accuracy data? それとも High accurate data? 複合形容詞でのハイフンの使用

第55回 実験計画について

第56回 参考文献について


〒300-1206
茨城県牛久市ひたち野西3-12-2
オリオンピアA-5

TEL 029-870-3307
FAX 029-870-3308
ワールド翻訳サービス スタッフブログ ワールド翻訳サービス Facebook ワールド翻訳サービスの動画紹介