Avtor sekcije: Danielle J. Navarro and David R. Foxcroft
Histogrami
Let’s begin with the humble histogram. Histograms are one of the simplest
and most useful ways of visualising data. They make most sense when you have an
interval or ratio scale variable (e.g., the afl.margins
variable from the
aflsmall_finalists
data set that we used in
Descriptive statistics) and what you want to do is get an overall
impression of the variable. Most of you probably know how histograms work,
since they’re so widely used, but for the sake of completeness I’ll describe
them. All you do is divide up the possible values into bins and then count
the number of observations that fall within each bin. This count is referred
to as the frequency or density of the bin and is displayed as a vertical bar.
The afl.margins
variable contains 33 games in which the winning margin was
less than 10 points and it is this fact that is represented by the height of
the leftmost bar that we showed earlier in Descriptive statistics,
and Fig. 20. With these earlier graphs we used an advanced
plotting package in R which, for now, is beyond the capability of jamovi. But
jamovi gets us close, and drawing this histogram in jamovi is pretty
straightforward. Open up the Plots
options under Exploration
→
Descriptives
and click the Histogram
check box, as shown in
Fig. 21. jamovi defaults to labelling the y-axis as
density
and the x-axis with the variable name. The bins are selected
automatically, and there is no scale, or count, information on the y-axis
unlike the previous Fig. 20. But this does not matter too
much because after all what we are really interested in is our impression
of the shape of the distribution: is it normally distributed or is there a
skew or kurtosis? Our first impressions of these characteristics come from
drawing a histogram.
Ena od dodatnih funkcij, ki jih ponuja jamovi, je možnost izrisa krivulje gostote. To lahko storite tako, da v možnosti Plots
kliknete potrditveno polje Density
(in odstranite potrditev Histogram
), s čimer dobite graf, prikazan v Fig. 22. Graf gostote prikazuje porazdelitev podatkov v neprekinjenem intervalu ali časovnem obdobju. Ta diagram je različica histograma, ki za izris vrednosti uporablja jedrno glajenje, ki z glajenjem šuma omogoča bolj gladko porazdelitev. Vrhovi na grafu gostote pomagajo prikazati, kje so vrednosti v intervalu zgoščene. Prednost ploskev gostote pred histogrami je, da bolje določajo obliko porazdelitve, saj nanje ne vpliva število uporabljenih binsov (vsaka vrstica, uporabljena v tipičnem histogramu). Histogram, ki ga sestavljajo le 4 košarice, ne bi dal dovolj razločne oblike porazdelitve, kot bi jo dal histogram z 20 košaricami. Vendar pri diagramih gostote to ni težava.
Čeprav bi bilo treba to sliko za dobro predstavitveno grafiko (tj. takšno, ki bi jo vključili v poročilo) precej popraviti, pa kljub temu dobro opisuje podatke. Pravzaprav je velika prednost histograma ali diagrama gostote v tem, da (pravilno uporabljen) prikaže celotno razpršitev podatkov, tako da si lahko ustvarite precej dober občutek o tem, kako so ti videti. Slaba stran histogramov je, da niso zelo kompaktni. Za razliko od nekaterih drugih grafov, o katerih bom govoril, je na eno sliko težko spraviti 20-30 histogramov, ne da bi preobremenili gledalca. In seveda, če so vaši podatki v nominalnem merilu , so histogrami neuporabni.