<- c("readxl", "dplyr", "janitor", "tidyr", "ggplot2", "lme4", "marginaleffects", "lmerTest")
pkgs install.packages(pkgs)
Fish heart activity
In this part of the fish heart activity we will use R to
- summarise your fish heart measurements
- visualise the measurements, and
- do some statistical testing.
We begin by installing some package
When finished, we load those packages
library("readxl") # to import data from Excel workbooks
library("dplyr") # for data wrangling
library("janitor") # for data cleaning
library("tidyr") # for moar data wrangling
library("ggplot2") # for data visualisation
library("marginaleffects") # for a evaluating model estimates
library("lme4") # for a proper model
library("lmerTest") # for _reasons_
(Note we do not show the messages that get printed when the packages are loaded.)
Log in to Posit cloud
Go to <posit.cloud> and log in
Then click on New Project in the top right of the screen
Give a name to your project, say Dyr og væv fish heart
, while the R session is being deployed on a virtual computer
Start a new script
Click the small +
icon in the top left and select R Script
or use File
> New File
> R Script
Save this script; click the small “disk” icon or use File
> Save
Give the file the name fish-heart-analysis.R
(note the .R
extension — case sensitive! — and no spaces)
Download the data to RStudio
We will start be downloading the data
download.file("https://bit.ly/fish-weights", "fish-weights.xlsx")
This will download the Excel workbook we created from your data during the break
It will create a file in your project called fish-hearts.xlsx
Read the data into R
Next we need to read the data into R itself; currently the data are just sitting in a Excel workbook on a computer in the cloud.
<- read_excel("fish-weights.xlsx", "fish-meta") |>
fish_meta mutate(fish_number = as.character(fish_number))
<- read_excel("fish-weights.xlsx",
fish_weight "fish-weight")
We can view the data by typing the object name at the console
<-
is the assignment operator — AKA arrow
fish_meta
# A tibble: 14 × 4
fish_number body_mass_g total_length_cm fork_length_cm
<chr> <dbl> <dbl> <dbl>
1 14 455. 32 29
2 4 370. 29.6 27
3 1 505. 33.5 31
4 5 537. 35.8 32
5 6 513. 33 30.5
6 3 510 32 30
7 9 725. 37.7 34.7
8 13 591. 34.5 31.5
9 10 460 33.5 30.5
10 8 615. 34.2 37.7
11 2 474. 34 30.5
12 11 773. 38 33.8
13 12 621. 35 32.2
14 7 770. 39 35.5
fish_weight
# A tibble: 65 × 2
fish_number observation
<dbl> <dbl>
1 14 0.334
2 14 0.339
3 14 0.343
4 14 0.329
5 14 0.330
6 14 0.330
7 4 0.309
8 4 0.306
9 4 0.306
10 1 0.475
# ℹ 55 more rows
This is a data frame, basically R’s version of an Excel sheet
- the columns are the variables,
- the rows are the observations,
- each variable is of the same length (number of elements)
Merging data
<- fish_weight |>
fish mutate(fish_number = as.character(fish_number)) |>
left_join(fish_meta, by = join_by(fish_number)) |>
rename(weight = observation)
Summarise your own technical replicates
Now we can do a simple data summary to filter the fish heart data to leave only your own observations and then compute the mean of your replicates
<- "1" ## <- put your fish number in
my_fish |>
fish filter(fish_number == my_fish) |>
summarise(avg_weight = mean(weight))
# A tibble: 1 × 1
avg_weight
<dbl>
1 0.457
The |>
character is known as the pipe; when you see it in code read it as meaning “and then…”.
The filter()
step in the pipeline filters the full data set to include only the select pair of data. Then we use summarise()
to create a summary of that pair’s data, computing the average weight using the mean()
function.
In words then we can describe what we did as
- assign my pair number to the object
my_pair
, then - take the
fish
data, and then - filter it to keep only my pair’s data, and then
- summarise the remaining data by computing the average weight of my technical replicates.
We can compute an estimate of the uncertainty in this average weight (as an estimate of the weight of the average fish heart) using the standard error:
\widehat{\sigma}_{\overline{\text{weight}}} = \frac{\widehat{\sigma}}{\sqrt{n}}
(Note the typo in the video — the denominator should be \sqrt{n}; one of my cats was crying at the office door to get in!)
We can modify the pipeline we just used to also compute the standard error of the average weight of your fish heart. Copy the code you wrote above and paste a new version of it and then edit the summarise()
line so that the code looks like
<- "1" ## <- put your pair number in
my_fish |>
fish filter(fish_number == my_fish) |>
summarise(avg_weight = mean(weight),
std_err = sd(weight) / sqrt(n()))
# A tibble: 1 × 2
avg_weight std_err
<dbl> <dbl>
1 0.457 0.00892
Summarise each pair’s technical replicates
We can use almost the same code to compute the average for each pair’s data and the associated standard errors
|>
fish group_by(fish_number) |>
summarise(avg_weight = mean(weight),
std_err = sd(weight) / sqrt(n()))
# A tibble: 14 × 3
fish_number avg_weight std_err
<chr> <dbl> <dbl>
1 1 0.457 0.00892
2 10 0.367 0.00423
3 11 0.561 0.00347
4 12 0.512 0.00307
5 13 0.536 0.00599
6 14 0.334 0.00234
7 2 0.297 0.00480
8 3 0.374 0.00437
9 4 0.307 0.00105
10 5 0.460 0.00742
11 6 0.564 0.00308
12 7 0.634 0.00694
13 8 0.455 0.00220
14 9 0.659 0.00364
Note that this time we do not need the filter()
step in the pipeline; instead we replace that with a group_by()
step. The summarise()
step remains the same.
Visualise the data
Next we can plot the data. For this we will use ggplot()
from the ggplot2 package.
|>
fish ggplot(aes(x = fish_number, y = weight)) +
geom_point(position = position_jitter(width = 0.2), alpha = 0.5)
We add a little random noise in the x-axis direction to show the individual data points better. Setting some transparency (via the alpha
argument; the alpha value is the name given to transparency when we refer to colours.)
We can add the means and standard errors, that we computed earlier, to the plot. Go back and copy the code block where we computed the means and standard errors for each pair’s data, and paste a new copy below the code for the plot. Then modify the first line so we assign the output to a new object:
<- fish |> # <--- change this line
avg_fish_wt group_by(fish_number) |>
summarise(avg_weight = mean(weight),
std_err = sd(weight) / sqrt(n()))
One way to visualise this data is to use a confidence interval, the definition of which is a little technical. For a 95% confidence interval
if we were to repeat the exercise 100 times, collecting new data each time, on average 95% of the intervals we create will contain the true value
A simple rule of thumb that we can use to create a 95% interval is to compute an upper and lower limit such that
- the upper limit is the mean plus 2 times the standard error
- the lower limit is the mean minus 2 times the standard error
We do this operation in the next line
<- avg_fish_wt |>
avg_fish_wt mutate(lwr_ci = avg_weight - (2 * std_err),
upr_ci = avg_weight + (2 * std_err))
Now we can replot the data using the same code as before (copy and paste a new version of it), but we add an additional layer
|>
fish ggplot(aes(x = fish_number, y = weight)) +
geom_point(position = position_jitter(width = 0.2), alpha = 0.5) +
geom_pointrange(data = avg_fish_wt,
aes(y = avg_weight, ymin = lwr_ci, ymax = upr_ci),
colour = "red",
fatten = 2
)
Body condition
We can get an idea of the body condition of the fish by plotting body_mass_g
against total_length_mm
<- labs(x = "Fork length (cm)", y = "Body mass (g)")
fish_bc_labs |>
fish_meta ggplot(aes(x = fork_length_cm, y = body_mass_g)) +
geom_point() +
fish_bc_labs
Describe in a few words the relationship between fork length and body mass.
Are there any fish that do not fit the general pattern?
How might you describe the condition of this fish?
We can describe this relationship using a linear model. A linear model is the name we give to models that are linear in their parameters, not those that describe straight lines. A linear model is one that is linear in its parameters, i.e.
\hat{y}_i = \beta_0 + \beta_1 x_1 + \beta_2 x_2, + \cdots
Here, the parameters are the \beta_j and they are only involved in additions or multiplications, they are not found in powers (exponents). \beta_0 is the model intercept, while the other \beta_j are estimates of by how much the response, y, changes if we increase the covariate (here x_1 or x_2 depending on which \beta_j we are looking at) by one (1) unit, holding all other covariates at some fixed value(s).
For the body condition relationship, our model will be
\widehat{\mathtt{body\_mass\_g}}_i = \beta_0 + \beta_1 \times \mathtt{fork\_length\_cm}_i
The hat over body_mass_g
is used to indicated the fitted or predicted values of body mass, while the subscript i references the individual fish.
A fuller description of the model would be
\begin{align*} \mathtt{body\_mass\_g}_i & \sim \text{Normal}(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \beta_1 \times \mathtt{fork\_length\_cm}_i \end{align*}
where we are stating that
each fish’s body mass is an observation from a normal distribution with it’s own mean (\mu_i), determined by the fork length of the fish, and a shared standard deviation \sigma.
What we mean is
For example: for a fish with a fork_length_cm
of 30cm, we would expect its body mass to be a value from a normal distribution with a mean of 498.78g and a standard deviation of 73.16g. The observed body_mass_g
for this fish is 510g.
Where did these numbers come from? They come from the linear model described above that I fitted to the data. Let’s all fit this model now using R
<- lm(body_mass_g ~ fork_length_cm, data = fish_meta) fish_bc
We use the lm()
function to fit a linear model where we assume the response is conditionally distributed normal (Gaussian). The first argument to lm()
is the formula for the model we want to fit. The variable to the left of the ~
is the response, the variable we are modelling, while we put the covariates (independent variables) that we use to model the response on the right of the ~
, separated by +
. As we only have one covariate in our model there is only one variable named on the right of the ~
. We have to tell R where to find the variables, which we do with the data
argument.
The estimated values for \beta_0 and \beta_1 are called the model coefficients. We can access them via the coef()
function
coef(fish_bc)
(Intercept) fork_length_cm
-587.31093 36.20308
We can ignore the intercept (\beta_0) here (it’s not useful as it gives the expected body_mass_g
in a fish with a fork_length_cm
equal to 0! This doesn’t make any biological sense! This should flag a problem with this entire model that we’ll come to at the end of this document.) The value for \beta_1, labelled fork_length_cm
, is by how much the estimated body_mass_g
of a fish would change if its fork_length_cm
was increased by 1cm.
We can test to see if this magnitude of change in body_mass_g
for a unit increased in fork_length_mm
is statistically interesting using summary()
summary(fish_bc)
Call:
lm(formula = body_mass_g ~ fork_length_cm, data = fish_meta)
Residuals:
Min 1Q Median 3Q Max
-162.245 -32.862 -5.881 41.732 136.247
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -587.31 232.69 -2.524 0.026710 *
fork_length_cm 36.20 7.28 4.973 0.000324 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 73.16 on 12 degrees of freedom
Multiple R-squared: 0.6733, Adjusted R-squared: 0.6461
F-statistic: 24.73 on 1 and 12 DF, p-value: 0.0003236
In the Estimate
column we see the same estimates of the coefficients (rounded slightly) that we saw using coef(fish_bc)
. The column labelled Std. Error
is the standard error of the estimates; it is a measure of our uncertainty in the estimates; it is similar to the standard error of the mean that we computed previously.
What we mean by statistically interesting is that an estimate is unlikely to have been observed if the true value of the estimate was equal to 0. This is an hypothesis, and in and of itself is not scientifically interesting; it is called the Null hypothesis because it represents an expectation of no effect (hence “null”) of fork_length_mm
on the body_mass_g
of our fish.
To test this null hypothesis we need a test statistic. For math reasons, the typical test statistic in a Normal linear regression model is a t statistic. It is computed as
t_{\mathtt{fork\_length\_cm}} = \frac{\beta_{\mathtt{fork\_length\_cm}}}{\text{SE}_{\mathtt{fork\_length\_cm}}}\;.
If we plug in the values from the table we get
t_{\mathtt{fork\_length\_cm}} = \frac{36.2}{7.28} = 4.973 \;.
Given this value for the test statistic, we compute the probability with which we would observe the value of | t_{\mathtt{fork\_length\_cm}} | under the null hypothesis of no effect. We do this using the sampling distribution of the test statistic under the null hypothesis. In this case the sampling distribution is a t distribution with degrees of freedom equal to the the residual degrees of freedom of the model. For our model, the residual degrees of freedom is 12; we have 14 observations and estimated two parameters (the intercept, \beta_0, and the effect of fork_length_cm
, \beta_{\mathtt{fork\_length\_cm}}) from these data.
This probability is shown in the column labelled Pr(>|t|)
. We can interpret this value as indicating that if we re-ran your fish weight exercise 10,000 times, we would expect to see an estimated effect of fork_length_cm
as large as 36.2 only about 3 times if there were no relationship between body_mass_g
and fork_length_cm
.
3 times out of 10,000 experiments is quite a rare event. Hence we would conclude that this result is statistically interesting. Often you’ll here such a result being described as statistically significant. In the terminology of hypothesis testing we would conclude that the null hypothesis of no effect is unlikely to be true given the data we observed and thus reject the null hypothesis.
Note that we compute the p value by assuming that the true effect of fork_length_cm
is equal to 0, i.e. we assume that the true effect is 0. Hence the p value tells us nothing about the probability that the effect is equal to 0 (we said it was equal to 0 when we did the test!), and it tells us nothing about the probability that the alternative hypothesis is true (we assumed the null hypothesis was true when running the test!)
Why all the above is as it is will be explained during the course Cellen II in the next semester. We’ll also explain the general idea behind sampling distributions etc. And we’ll explain what all the other numbers in the output from summary()
mean.
Notice that we have considered whether the result is statistically interesting. We took this approach because we wanted to explain the typical statistical output you will encounter and how it is computed.
A far more interesting quantity to consider is whether the result is biologically interesting? You can only answer that using domain expertise.
To me, a change in body mass of ~30g for a 1cm increase in fork length seems like a biologically interesting increase in body mass. But, I’m not a vet 😉
As a final task in this section, let’s visualise the estimated relationship
plot_predictions(fish_bc, by = "fork_length_cm") +
geom_point(data = fish_meta, aes(x = fork_length_cm, y = body_mass_g)) +
fish_bc_labs
The plot shows:
- the observed data,
- the estimated regression line (black line), and
- the 95% confidence interval around the estimated regression line (shaded band)
The confidence interval represents the uncertainty in the estimated regression line:
if we repeated the exercise 100 times, collecting new data each time and fitting a model to each new data set and computing a confidence interval on the estimated regression line for each data set, 95 of those intervals would contain the true value (regression slope).
Heart mass vs body mass
Now we will plot the data for heart weight vs body mass
<- labs(x = "Body mass (g)", y = "Heart mass (g)")
fish_labs |>
fish ggplot(aes(y = weight, x = body_mass_g)) +
geom_point() +
fish_labs
We can investigate the relationship between these two variables using another linear model. Again, for now anyway, we’ll assume each fish heart mass is normally distributed with a mean that depends on body_mass_g
. Hence the model we fit is
<- lm(weight ~ body_mass_g, data = fish) heart_m
View the model summary
summary(heart_m)
Call:
lm(formula = weight ~ body_mass_g, data = fish)
Residuals:
Min 1Q Median 3Q Max
-0.098958 -0.041448 0.005705 0.040112 0.155942
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.539e-03 3.705e-02 -0.257 0.798
body_mass_g 8.251e-04 6.462e-05 12.769 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.05747 on 63 degrees of freedom
Multiple R-squared: 0.7213, Adjusted R-squared: 0.7169
F-statistic: 163 on 1 and 63 DF, p-value: < 2.2e-16
Find the estimated change in heart mass for an increase in body mass of 1g.
Is this estimate statistically interesting?
The estimated change in heart mass is ~0.00083g (~0.83mg) for a 1g increase in body mass.
This result is statistically interesting because under the null hypothesis of no effect, we would have observed the data we did (or the estimated effect we did) with probability much less than 0.0000001, i.e. much less than 1 in a million.
The actual p value is too small to reliably compute its value on a computer, which doesn’t do math properly but instead uses a system called floating point arithmetic. Hence R reports the p value as being <2.2e-16
, which is 2.2 \times 10^{-16} — which is a very, very, very small number.
Draw the estimated regression line with the data superimposed.
plot_predictions(heart_m, by = "body_mass_g") +
geom_point(data = fish, aes(x = body_mass_g, y = weight)) +
fish_bc_labs
Reality check
Everything you did today should come with a very big health warning on it. We have had to simplify things considerably because
- the observations in the fish heart mass model are not independent — you collected between 3 and 6 observations per fish heart,
- it is unlikely that true distributions in both response variables we looked at are normal.
Why does independence matter?
The models we used assumed each observation is a unique, independent observation. Instead, you took between 3 and 6 replicate measurements per fish. The values you observed for the mass of a single heart will be more similar to one another than if you’d weighed fish hearts from the same number of different fish.
The end result is that our model looks much better than it should because it assumes we have 65 independent observations when in fact we only observed the mass of a heart from each of 14 fish.
Hence the residual degrees of freedom in the heart_m
model are much larger than they should be if we’d accounted for this.
One way to proceed would be to take the average of the replicated values for each fish and then fit the same model that we did. This would have the right residual degrees of freedom, but…
- it would be throwing away data, and
- as we have different numbers of replicates per fish, the averages have different variances, and our model assumes that the variances of the data are equal
But, regardless, let’s try that. First, compute the average fish heart mass per fish — note that we did this last week, and it is included above, but I repeat this code here, with 1 change:
<- fish |>
avg_heart_mass group_by(fish_number) |>
summarise(avg_mass_g = mean(weight),
std_err = sd(weight) / sqrt(n())) |>
left_join(fish_meta, by = join_by(fish_number))
Now we can fit our model
<- lm(avg_mass_g ~ body_mass_g, data = avg_heart_mass) avg_mass_m
and summarise it
summary(avg_mass_m)
Call:
lm(formula = avg_mass_g ~ body_mass_g, data = avg_heart_mass)
Residuals:
Min 1Q Median 3Q Max
-0.094003 -0.044656 0.000969 0.035199 0.141467
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0028845 0.0835083 0.035 0.973013
body_mass_g 0.0008176 0.0001445 5.659 0.000106 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.06406 on 12 degrees of freedom
Multiple R-squared: 0.7274, Adjusted R-squared: 0.7047
F-statistic: 32.03 on 1 and 12 DF, p-value: 0.0001057
The results of this model are comparable to the ones we observed before, but they are less precise because we have fewer observations.
Technically, we should fit a weighted regression, because each of the 14 averages has a different precision (variance) because it is the result of averaging a different number of replicates.
A better way to proceed that doesn’t throw out data is to use a mixed effects model. With this model we can use all the fish heart mass measurements while accounting for the fact that the data are clustered (grouped) by fish_number
. This model will also allow us to estimate the amount of variation within and between fishes in terms of their heart masses.
We can fit the equivalent mixed effects model using
<- lmer(weight ~ body_mass_g + (1 | fish_number),
mass_mixed data = fish)
and summarise it with
summary(mass_mixed)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: weight ~ body_mass_g + (1 | fish_number)
Data: fish
REML criterion at convergence: -314.6
Scaled residuals:
Min 1Q Median 3Q Max
-1.62784 -0.47680 -0.08279 0.64778 2.38491
Random effects:
Groups Name Variance Std.Dev.
fish_number (Intercept) 0.004070 0.06380
Residual 0.000114 0.01068
Number of obs: 65, groups: fish_number, 14
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.816e-03 8.348e-02 1.200e+01 0.034 0.973650
body_mass_g 8.176e-04 1.444e-04 1.200e+01 5.661 0.000105 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
body_mass_g -0.979
The output labelled Fixed effect:
largely resembles the output for avg_mass_m
— it would be effectively exact if we had the same number of replicates per fish — and nothing here would change our conclusion that the effect of body_mass_g
on weight
(fish heart mass) is statistically interesting. But do note the much larger p value now compared to the one we got earlier when we assumed the data were independent.
The output labelled Random effects:
contains information about how much variation there is within individual fish and between fishes. The row labelled fish_number
is the estimate of the variation between fishes, while the row labelled Residual
is the variation within a fish.
As an aside; last week we computed and estimate for the the average mass of a fish heart. We might get a better estimate if we recomputed that estimated using the mixed model:
<- lmer(weight ~ 1 + (1 | fish_number),
mass_mixed_2 data = fish)
fixef(mass_mixed_2)
(Intercept)
0.465407
which we can compare with the simple estimate we get by averaging all the data:
|>
fish summarise(avg_mass = mean(weight))
# A tibble: 1 × 1
avg_mass
<dbl>
1 0.455
Why does the normal thing matter?
We assumed that each response value in our models is normally distributed with it’s mean given by the estimated value from the model. The normal distribution describes data that are
- continuous, and
- can take any value, positive or negative, from negative infinity (-∞) to positive infinity (+∞)
Our fish heart masses and body masses certainly meet the first criteria, but the second criteria allows for fish that couldn’t possibly exist.
We cannot observe a fish heart mass or body mass that is less than — or even equal to — 0g (technically 🤓 there is some lower mass limit below which we cannot go because our balances are not sensitive enough). Hence at 0g mass, we have no fish and there is zero variance in the mass. If the variance in fish heart mass decreases to zero as the heart mass approaches 0, the variance must increase to some extent as the heart mass increases away from 0.
We would call this non-constant variance or heteroscedasticity. This won’t stop us fitting the normal linear model — we just did it! — but it will make the inference — deciding if something was statistically interesting or not — invalid. How badly invalid? We don’t know without fitting models as it depends on many things.
One solution to this problem is to fit a model where we don’t assume the data are conditionally distributed normal. An option would be to use a Gamma or a log-normal distribution for example.
The Gamma and log-normal distributions look like this (for some values of the parameters of these distributions):
Notice how these distributions don’t extend into negative territory and they are skewed. Both of these distributions would be better choices for fish heart mass and fish body mass.
If we couple this with the need to account for the non-independence of our data, we would need a model called a generalized linear mixed model (GLMM). This model is easy (LOL) to fit, in this case using the Gamma distribution
## rescale the variables to be more similar to one another
<- fish |>
fish mutate(
heart_mass_mg = weight * 1000,
body_mass_mg = body_mass_g * 1000,
z_body_mass_mg = scale(body_mass_mg))
<- glmer(heart_mass_mg ~ z_body_mass_mg + (1 | fish_number),
mass_glmm data = fish, family = Gamma(link = "log"))
summary(mass_glmm)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: Gamma ( log )
Formula: heart_mass_mg ~ z_body_mass_mg + (1 | fish_number)
Data: fish
AIC BIC logLik deviance df.resid
538.2 546.9 -265.1 530.2 61
Scaled residuals:
Min 1Q Median 3Q Max
-1.49110 -0.44028 -0.07036 0.39538 1.49969
Random effects:
Groups Name Variance Std.Dev.
fish_number (Intercept) 0.005692 0.07545
Residual 0.001475 0.03841
Number of obs: 65, groups: fish_number, 14
Fixed effects:
Estimate Std. Error t value Pr(>|z|)
(Intercept) 6.12043 0.05225 117.129 < 2e-16 ***
z_body_mass_mg 0.19634 0.04628 4.243 2.21e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
z_bdy_mss_m -0.040
But to understand the output, interpret it, and check that all its assumptions are met requires quite a few more statistics classes than you have currently taken. But this is the kind of model we should expect you to fit, use, and interpret, at the end of your degree.