[R] R Style Guide by Hadley Wickham

해당 포스트는 Hadley Wickham이 작성한 'The tidyverse style guide' 를 번역하여 정리한 글입니다.

Lists
Intro - 0. Welcome
Analysis - 1. Files
Analysis - 2. Syntax (1)
Analysis - 2. Syntax (2)
Analysis - 3. Functions
Analysis - 4. Pipes
Analysis - 5. ggplot2
Packages - 6. Files
Packages - 7. Documentation
Packages - 8. Tests
Packages - 9. Error messages
Packages - 10. News
Packages - 11. Git/GitHub

9. 오류 메세지(Error messages)

오류 메시지(Error messages)는 문제에 대한 일반적인 설명으로 시작한 다음 무엇이 잘못되었는지에 대한 간결한 설명을 제공

해야 합니다. 구두점(Punctuation) 및 서식(fommatting)을 일관되게 사용하면 오류 구문을 쉽게 분석할 수 있습니다.

대부분의 나쁜 예시들은 tidyverse 코드에서 나타나기에, 이 가이드는 거의 전적으로 염원을 담았습니다. - Hadley Wickham

9.1 문제 상황(Problem statement)

모든 오류 메시지는 일반적인 문제 설명으로 시작해야 합니다. 또한 간결하지만 유익한 정보여야 합니다. (This is hard!)

가능한 유익한 정보를 제공하는 것이 권장되지만, 각 문장은 현지화(localisation)와 번역(translation)이 가능하도록 매우 간단해야 합니다. A Localization Horror Story: It Could Happen To You는 오류 메시지의 현지화 문제점을 잘 요약한 것입니다. 지금 당장은 현지화된 메시지를 지원하지 않을 수 있지만, 앞으로는 가능한 쉽게 할 수 있도록 해야합니다.

이상적으로 각 문장은 하나의 구문을 포함해야 하며 하나의 변수 수량만 언급해야 합니다. 복잡한 문장을 피하기 위해서는 글머리 기호 목록(bullet list)에 정보를 나열하는 것을 선호합니다. 상황별 정보 목록(contextual information)으로 시작하고 잘못된 사용자 입력(faulty user input)에 대한 정보 목록(list of information)으로 마무리하면 됩니다.

이러한 목록들은 "UTF-8"을 사용할 수 있는 경우(그리고 파란색과 빨간색 색상을 사용 가능한 경우) ℹ와 ✖를 접두사(prefix)로 각각 붙여야 하며, 그 외에는 ASCII * 문자를 사용하면 됩니다.

문제의 원인이 분명하다면 "필수(must)"를 사용하십시오.

dplyr::nth(1:10, "x")
#> Error: `n` must be a numeric vector:
#> ✖ You've supplied a character vector.

dplyr::nth(1:10, 1:2)
#> Error: `n` must have length 1
#> ✖ You've supplied a vector of length 2.

명확한 절단(cut) 원인은 일반적으로 잘못된 유형(types) 또는 길이(lengths)를 포함합니다.

예상한 내용을 명시할 수 없는 경우, "할수없음(can't)"를 사용하십시오.

mtcars %>% pull(b)
#> Error: Can't find column `b` in `.data`.

as_vector(environment())
#> Error: Can't coerce `.x` to a vector.

purrr::modify_depth(list(list(x = 1)), 3, ~ . + 1)
#> Error: Can't find specified `.depth` in `.x`.

문제 상황(problem statement)은 문장을 사용해야하며 완성형으로 끝내야 합니다.

오류 메시지(error message)를 생성한 함수 이름으로 혼동하지 않으려면 stop(call. = FALSE), rlang::abort(), Rf_errorcall(R_NilValue, ...)를 사용하시면 됩니다. 이러한 정보는 종종 유익하지 않으며, traceback() 또는 IDE 를 이용해 쉽게 액세스 할 수 있습니다.

글머리 기호 목록 ℹ 및 ✖ 요소에는 간단한 문장을 사용하시는 것이 좋습니다.

문장은 짧고 글머리 기호여야 합니다.

# Good
vec_slice(letters, 100)
#> Must index an existing element:
#> ℹ There are 26 elements.
#> ✖ You've tried to subset element 100.

# Bad
vec_slice(letters, 100)
#> Must index an existing element.
#> There are 26 elements and you've tried to subset element 100.

상황별 정보는 먼저 다음과 같아야 합니다.

# Good
vec_slice(letters, 100)
#> Must index an existing element:
#> ℹ There are 26 elements.
#> ✖ You've tried to subset element 100.

# Bad
vec_slice(letters, 100)
#> Must index an existing element:
#> ✖ You've tried to subset element 100.
#> ℹ There are 26 elements.

9.2 오류 위치(Error location)

문제가 있는 구성 요소의 위치(location), 이름(name), AND/OR 내용을 표시하도록 최선을 다해야 합니다. 목표는 사용자가 문제를 쉽게 찾아서 수정할 수 있도록 하는 것입니다.

# Good
map_int(1:5, ~ "x")
#> Error: Each result must be a single integer:
#> ✖ Result 1 is a character vector.

# Bad
map_int(1:5, ~ "x")
#> Error: Each result must be a single integer

(정확한 문제 파악이 쉽지 않은 경우가 많습니다. 낮은 수준(lower-level)에서 생성된 오류 메시지가 원래의 원인을 알 수 있도록 추가적인 주장(arguments)을 전달하는 것이 필요할 수 있습니다. 자주 사용하는 기능(function)의 경우, 이러한 노력은 일반적으로 가치가 있습니다.)

만약 오류 원인이 확실하지 않다면, 오류 원인에 대한 의견(opinion)을 제시하여 사용자가 잘못된 방향으로 가지 않도록 해야 합니다.

# Good
pull(mtcars, b)
#> Error: Can't find column `b` in `.data`.

tibble(x = 1:2, y = 1:3, z = 1)
#> Error: Columns must have consistent lengths: 
#> ✖ Column `x` has length 2
#> ✖ Column `y` has length 3

# Bad: implies one argument at fault
pull(mtcars, b)
#> Error: Column `b` must exist in `.data`

pull(mtcars, b)
#> Error: `.data` must contain column `b`

tibble(x = 1:2, y = 1:3, z = 1)
#> Error: Column `x` must be length 1 or 3, not 2

여러 문제(multiple issues)가 있거나 여러 인수(arguments) 또는 항목(items)에 걸쳐 불일치가 발견되는 경우, 글머리 기호 목록을 사용하시는 것이 좋습니다.

# Good
purrr::reduce2(1:4, 1:2, `+`)
#> Error: `.x` and `.y` must have compatible lengths:
#> ✖ `.x` has length 4
#> ✖ `.y` has length 2

# Bad: harder to scan
purrr::reduce2(1:4, 1:2, `+`)
#> Error: `.x` and `.y` must have compatible lengths: `.x` has length 4 and 
#> `.y` has length 2

문제 목록이 길다면 처음 몇 개만 표시하도록 잘라내십시오.

# Good
#> Error: NAs found at 1,000,000 locations: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...

오류 메시지를 올바르게 복수화(pluralise)하려면 ngetext()를 사용하십시오. 다른 언어로의 정확한 번역과 관련된 몇 가지 어려움은 ?ngetext()의 노트를 참고하시면 됩니다.

9.3 힌트(Hints)

오류의 출처가 분명하고 공통적인 경우, 오류를 수정하는 방법에 대한 힌트(Hints)를 제공하는 것이 좋습니다.

UTF-8을 사용할 수 있는 경우 접두사 ℹ(색상이 사용 가능한 경우 파란색)를 사용하시면 됩니다.

dplyr::filter(iris, Species = "setosa")
#> Error: Filter specifications must be named.
#> ℹ Did you mean `Species == "setosa"`?

ggplot2::ggplot(ggplot2::aes())
#> Error: Can't plot data with class "uneval". 
#> ℹ Did you accidentally provide the results of aes() to the `data` argument?

힌트(Hints)는 항상 물음표(?)로 끝나야하며, 오류의 원인이 근본 원인에서 거리가 먼 경우에는 더욱 중요합니다.

# Bad
mean[[1]]
#> Error in mean[[1]] : object of type 'closure' is not subsettable

# BETTER
mean[[1]]
#> Error: Can't subset a function.

# BEST
mean[[1]]
#> Error: Can't subset a function.
#> ℹ Have you forgotten to define a variable named `mean`?

위와 같이 사용자를 잘못된 방향으로 이끄는 것을 피하고 싶기 때문에 좋은 힌트를 쓰기가 어렵습니다. 일반적으로 문제가 자주 발생하는 경우가 아니면 힌트(Hints) 쓰는 것을 피해야하며, 잘못된 사용의 일반적인 패턴은 stackoverflow 검색 등을 통해 쉽게 찾을 수 있습니다.

9.4 구두점(Punctuation)

오류(Errors)는 문장(sentence case)으로 작성되어야 하며 완성형으로 끝나야 합니다.

글머리 기호(Bullets)의 형식(format)도 유사해야 합니다. 인수(argument) 또는 열 이름(column name)이 아닌 경우 첫 단어는 대문자로 입력하시는 것이 좋습니다.

문제 설명에서는 단수형을 선호합니다.

# Good
map_int(1:2, ~ "a")
#> Error: Each result must be coercible to a single integer:
#> ✖ Result 1 is a character vector.

# Bad
map_int(1:2, ~ "a")
#> Error: Results must be coercible to single integers: 
#> ✖ Result 1 is a character vector

만약 여러 문제를 발견할 수 있다면 최대 5가지 문제를 나열하십시오. 이를 통해 사용자는 동일한 원인을 가지는 많은 오류에 압도되지(overwhelmed) 않고 한 번에 여러 문제를 해결할 수 있습니다.

# BETTER
map_int(1:10, ~ "a")
#> Error: Each result must be coercible to a single integer:
#> ✖ Result 1 is a character vector
#> ✖ Result 2 is a character vector
#> ✖ Result 3 is a character vector
#> ✖ Result 4 is a character vector
#> ✖ Result 5 is a character vector
#> ... and 5 more problems

문제 설명(problem statement)과 오류 위치(error location) 사이에서 자연스러운 연결자를 선택하십시오.
- 문맥(context)에 따라 , not, ; 또는 : 를 사용하시면 됩니다.
인수 이름(arguments)은 백틱(backticks)으로 둘러싸는 것이 좋습니다(예: x). 열(columns)과 인수(arguments)를 구분하려면 “column”을 사용하십시오(예: Column x).
- variable"은 의미가 모호하기 때문에 사용하지 않는 것이 좋습니다.
이상적으로 오류 메시지의 각 구성 요소는 80자 미만이어야 합니다. 긴 오류 메시지에 수동 줄 바꿈을 추가하지 마십시오. 콘솔(console)이 예상보다 좁거나 훨씬 넓으면 올바르게 표시되지 않습니다. 대신 글머리 기호를 사용하여 오류를 더 짧은 논리적 구성 요소로 분리하시는 것이 좋습니다.

9.5 이전과 이후(Before and after)

tidyverse 주변에서 더 많은 예를 모았습니다.

dplyr::filter(mtcars, cyl)
#> BEFORE: Argument 2 filter condition does not evaluate to a logical vector 
#> AFTER:  Each argument must be a logical vector:
#> * Argument 2 (`cyl`) is an integer vector.

tibble::tribble("x", "y")
#> BEFORE: Expected at least one column name; e.g. `~name` 
#> AFTER:  Must supply at least one column name, e.g. `~name`.

ggplot2::ggplot(data = diamonds) + ggplot2::geom_line(ggplot2::aes(x = cut))
#> BEFORE: geom_line requires the following missing aesthetics: y
#> AFTER:  `geom_line()` must have the following aesthetics: `y`.

dplyr::rename(mtcars, cyl = xxx)
#> BEFORE: `xxx` contains unknown variables
#> AFTER:  Can't find column `xxx` in `.data`.

dplyr::arrange(mtcars, xxx)
#> BEFORE: Evaluation error: object 'xxx' not found.
#> AFTER:  Can't find column `xxx` in `.data`.