Skip to content

Anomaly Detection for OTT-Serverlog Data (2022 Seoultech Data Mining Team Project)

Notifications You must be signed in to change notification settings

yureutaejin/OTT-Serverlog-anomaly-detection

Repository files navigation

2022 ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹ ํŒ€ํ”„๋กœ์ ํŠธ

analysis report

๋ฐ˜๋“œ์‹œ README์™€ ์ฝ”๋“œ๋ฅผ ๊ฐ™์ด ๋ด์ฃผ์„ธ์š”
<๊ตฌ์„ฑ ์„ค๋ช…>
final_result_ipynb => ์ตœ์ข… ๊ตฌํ˜„์ฝ”๋“œ ๋ฐ ์‹œ๊ฐํ™”์ฝ”๋“œ
dataset => raw dataset
wd => RNR ๋ฐฐ๋ถ„ ์ž‘์—… ํด๋”
temp_result => ์ž‘์—… ์ค‘ ์ž„์‹œ ์ €์žฅ๋œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„๋“ค
final_result_answer => ์ œ์ถœ๋œ ๋ผ๋ฒจ๋ง ๋ฐ์ดํ„ฐ์…‹


Index

1. introduction

  • ๋ถ„์„ ์ฃผ์ œ
  • ๋ถ„์„ ๋ฐฐ๊ฒฝ
  • ๋ฐ์ดํ„ฐ ๋ฐ ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ ์„ค๋ช…

2. Data EDA & Preprocessing

  • ๋ฐ์ดํ„ฐ ๊ฒฐ์ธก์น˜ ํ™•์ธ
  • ๋ฐ์ดํ„ฐ index ๋ณ€๊ฒฝ
  • ๋ฐ์ดํ„ฐ describe ๋ฐ visualization
  • ์ƒ๊ด€๊ด€๊ณ„ ๋ถ„์„
  • ๊ฒฐ์ธก์น˜ ์‹œ๊ฐํ™”
  • ์„ ํ˜• ๋ณด๊ฐ„๋ฒ•
  • ์‹œ๊ฐ„๊ธฐ์ค€ ๋ณด๊ฐ„๋ฒ•
  • Train / Test split
  • ๋‹ค๋ณ€์ˆ˜ ๋Œ€์น˜
  • KNN ๋ณด๊ฐ„๋ฒ•
  • ์„œ๋ฒ„๋ณ„ Fail ๋น„์œจ ๊ตฌํ•˜๊ธฐ / ๊ฐ ์„œ๋ฒ„๋ณ„ ๋ฐ์ดํ„ฐ split
  • Autocorrelation visualizaition
  • Stationarity ์‹œ๊ฐํ™”
  • ADF Test
  • ์„œ๋ฒ„๋ณ„ ์ƒ๊ด€๊ณ„์ˆ˜

3. Modeling

Unsupervised Learning

  • K_Means
  • Isolation Forest

Neural Network

  • LSTM-AE

4. Result

5. Conclusion & Discussion


1. introduction

๋ถ„์„์ฃผ์ œ

์‹ค์‹œ๊ฐ„ OTT ์„œ๋น„์Šค ์ด์šฉ์ž ์ˆ˜ ์ถ”์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด์ƒ ๋ฐœ์ƒ ์‹œ์ ์„ ํƒ์ง€

๋ถ„์„๋ฐฐ๊ฒฝ

์ŠคํŠธ๋ฆฌ๋ฐ ๋ฐ OTT ์„œ๋น„์Šค๊ฐ€ ๋งค์šฐ ๋งŽ์•„์ง€๊ณ  ์žˆ๋Š” ์š”์ฆˆ์Œ, ๋Š๊น€์—†์ด ์•ˆ์ •์ ์œผ๋กœ ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ด ๊ธฐ์—…์—๊ฒŒ๋„ ์†Œ๋น„์ž ๊ฒฝํ—˜์—๋„ ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค.

์‹ค์‹œ๊ฐ„ ์ œ๊ณต๋˜๋Š” ์ŠคํŠธ๋ฆฌ๋ฐํ˜• ์„œ๋น„์Šค๋“ค์˜ stability๋ฅผ ์ €ํ•˜์‹œํ‚ค๋Š” ์š”์ธ์€ ๋Œ€๊ฐœ ์ œ๊ณตํ•˜๋Š” ์„œ๋ฒ„ ๋ฐ ๋„คํŠธ์›Œํฌ ๊ณผ๋ถ€ํ•˜์— ๋‹ฌ๋ ค์žˆ๋‹ค. ํŠนํžˆ๋‚˜ ํŠน์ •์‹œ๊ฐ„์— ์œ ์ €๊ฐ€ ๊ธ‰๊ฒฉํ•˜๊ฒŒ ๋ชฐ๋ฆฌ๊ฒŒ๋˜๋ฉด ํŠธ๋ž˜ํ”ฝ ๋˜ํ•œ ๊ธ‰๊ฒฉํ•˜๊ฒŒ ์ฆ๊ฐ€ํ•ด ์„œ๋น„์Šค ์ œ๊ณต์— ์ฐจ์งˆ์ด ์ƒ๊ธด๋‹ค. ์ฝ”๋กœ๋‚˜ 19 ์žฅ๊ธฐํ™”์— ๋”ฐ๋ผ ๋„คํŠธ์›Œํฌ ํŠธ๋ž˜ํ”ฝ์ด ํฌ๊ฒŒ ์ฆ๊ฐ€ํ–ˆ์œผ๋ฉฐ, ํŠนํžˆ๋‚˜ ์ŠคํŠธ๋ฆฌ๋ฐ ์„œ๋น„์Šค ๊ธฐ์—…๋“ค์€ ํฌ๊ฒŒ ์˜ํ–ฅ์„ ๋ฐ›๊ณ ์žˆ๋‹ค. ๋Œ€ํ•œ๋ฏผ๊ตญ์€ ๋„คํŠธ์›Œํฌ ์ธํ”„๋ผ๊ฐ€ ์ž˜ ๊ตฌ์ถ•๋˜์–ด์žˆ์–ด ์•„์ง๊นŒ์ง€ ๋ฌธ์ œ๊ฐ€ ์—†์—ˆ์œผ๋‚˜, ๋„ทํ”Œ๋ฆญ์Šค ๋“ฑ์˜ ํ•ด์™ธ ๊ธฐ์—…๋“ค์€ ๊ธฐ๋ณธ ์ŠคํŠธ๋ฆฌ๋ฐ ํ™”์งˆ์„ ๋‚ฎ์ถ”๊ณ  ๋‹ค์šด๋กœ๋“œ ์†๋„๋ฅผ ๋Šฆ์ถ”๋Š” ๋“ฑ ์‚ฌ์šฉ์ž ๊ฒฝํ—˜ ๋งŒ์กฑ์„ ๋–จ์–ด๋œจ๋ ค์„œ๋ผ๋„ ๋„คํŠธ์›Œํฌ ๊ณผ๋ถ€ํ™” ๋ฐฉ์ง€์— ์ด๋ ฅ์„ ๊ธฐ์šธ์ด๋Š” ์ค‘์ด๋‹ค.

๋”ฐ๋ผ์„œ ์‹ค์‹œ๊ฐ„ OTT ์„œ๋น„์Šค ์ด์šฉ์ž ์ˆ˜ ์ถ”์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด์ƒ ๋ฐœ์ƒ ์‹œ์ ์„ ํƒ์ง€ ๋ฐ ๋ผ๋ฒจ๋งํ•  ๊ฒƒ์ด๋‹ค. ํ•ด๋‹น ๋ถ„์„์˜ ๊ฒฐ๊ณผ๋Š” ํŒจํ„ด ๋“ฑ์„ ์—ฐ๊ตฌํ•˜์—ฌ ๊ธ‰๊ฒฉํ•˜๊ฒŒ request๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒฝ์šฐ ํ•ด๋‹น ์‹œ๊ฐ„์—๋งŒ ์ŠคํŠธ๋ฆฌ๋ฐ ํ™”์งˆ์„ ๋‚ฎ์ถ”๊ฑฐ๋‚˜ ๋‹ค์šด๋กœ๋“œ ์†๋„๋ฅผ ๋Šฆ์ถ”๋Š” ๋“ฑ์˜ ๋ฐฉ์•ˆ์œผ๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค.

๋ฐ์ดํ„ฐ ๋ฐ ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ ์„ค๋ช…

๋ฐ์ดํ„ฐ ์„ค๋ช…

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜: AIFactory ๋„คํŠธ์›Œํฌ ์ง€๋Šฅํ™”๋ฅผ ์œ„ํ•œ ์ธ๊ณต์ง€๋Šฅ ํ•ด์ปคํ†ค

๋ฏธ๋””์–ด ์„œ๋ฒ„ 13์ข…์œผ๋กœ๋ถ€ํ„ฐ ์ˆ˜์ง‘๋œ 5๋ถ„ ์ฃผ๊ธฐ์˜ ํŠธ๋žœ์žญ์…˜ ๋ฐ์ดํ„ฐ 24๊ฐœ์›”์น˜๊ฐ€ ์ œ๊ณต

ํŒŒ์ผ๋ช… ์„ค๋ช… :

INFO: ์ƒํ’ˆ ๊ฐ€์ž…/ํ•ด์ง€, ์•ฝ๊ด€ ๋™์˜, ๊ตฌ๋งค, ํฌ์ธํŠธ ์กฐํšŒ๋ฅผ ์œ„ํ•œ ์„œ๋ฒ„

LOGIN: ๋กœ๊ทธ์ธ, ๋ณธ์ธ ์ธ์ฆ, PIN ๊ด€๋ฆฌ๋ฅผ ์œ„ํ•œ ์„œ๋ฒ„

MENU: ์ดˆ๊ธฐ ๋ฉ”๋‰ด, ์ฑ„๋„ ์นดํ…Œ๊ณ ๋ฆฌ ๋ฉ”๋‰ด ์ œ๊ณต์„ ์œ„ํ•œ ์„œ๋ฒ„

STREAM: VOD ์ŠคํŠธ๋ฆฌ๋ฐ์„ ์œ„ํ•œ ์„œ๋ฒ„

๋ฐ์ดํ„ฐ ์ปฌ๋Ÿผ ์„ค๋ช… *์„œ๋ฒ„ ์œ ํ˜• ๋ณ„ ์ œ๊ณต๋˜๋Š” ์ปฌ๋Ÿผ์— ์ผ๋ถ€ ์ฐจ์ด๊ฐ€ ์žˆ์Œ์„ ์•ˆ๋‚ด๋“œ๋ฆฝ๋‹ˆ๋‹ค (์ž์„ธํ•œ ์‚ฌํ•ญ์€ ๋ฒ ์ด์Šค๋ผ์ธ ์ฝ”๋“œ ์ฐธ์กฐ)

Timestamp: [YYYYMMDD_HHmm(a)-HHmm(b)] ํ˜•์‹์„ ๊ฐ€์ง€๋ฉฐ

์ˆ˜์ง‘ ๋ฒ”์œ„๋Š” YYYY๋…„ MM์›” DD์ผ HH์‹œ mm๋ถ„(a)๋ถ€ํ„ฐ HH์‹œ mm๋ถ„(b)

Server: ์ˆ˜์ง‘ ์„œ๋ฒ„ ๋ถ„๋ฅ˜(ํŒŒ์ผ๋ช… ์„ค๋ช… ์ฐธ๊ณ )

Request: ์ˆ˜์ง‘ ๋ฒ”์œ„ ๋‚ด ๋ฐœ์ƒํ•œ ์„œ๋น„์Šค ์š”์ฒญ ์ˆ˜

Success: ์ˆ˜์ง‘ ๋ฒ”์œ„ ๋‚ด ๋ฐœ์ƒํ•œ ์„œ๋น„์Šค ์š”์ฒญ ์„ฑ๊ณต ์ˆ˜

Fail: ์ˆ˜์ง‘ ๋ฒ”์œ„ ๋‚ด ๋ฐœ์ƒํ•œ ์„œ๋น„์Šค ์š”์ฒญ ์‹คํŒจ ์ˆ˜

Session: ์ˆ˜์ง‘ ์‹œ์ ์˜ ๋ฏธ๋””์–ด ์ŠคํŠธ๋ฆฌ๋ฐ ์„ธ์…˜ ์ˆ˜

์„œ๋ฒ„ ์ค‘ ํ•˜๋‚˜๋ผ๋„ ์ด์ƒ์ด๋ผ๋ฉด ์ตœ์ข… ์ด์ƒ์ด๋ผ ๊ฐ„์ฃผ

ํŒŒ์ดํ”„๋ผ์ธ

  1. EDA & Preprocessing

  2. modeling (kmeans, isolation forest, LSTM-AE)

  3. model run

  4. result visualization

  5. result score (check answer score by F2 score evaluation in AIFACTORY)

  6. adjust hyperparameter and re-score

(unsupervised๋Š” ๋ฐ˜๋ณต๋ฌธ์œผ๋กœ๋Š” ์‹œ๊ฐ„์ด ๋„ˆ๋ฌด ์˜ค๋ž˜๊ฑธ๋ฆฌ๊ณ  ide ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ดˆ๊ณผ๋˜์–ด lstm-ae ๊ฒฐ๊ณผ๋ฅผ ์ฐจ์šฉํ•ด์„œ outlier_fraction์„ ์„ ์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.)


2. Data EDA & Preprocessing

  • ๋ฐ์ดํ„ฐ ๊ฒฐ์ธก์น˜ ํ™•์ธ => timestamp index๋Š” ์—ฐ์†๋˜์–ด ๋ฌธ์ œ ์—†์ง€๋งŒ ์ปฌ๋Ÿผ๋งˆ๋‹ค ๊ฒฐ์ธก์น˜๊ฐ€ ์กด์žฌ

Untitled

  • ๋ฐ์ดํ„ฐ describe ๋ฐ visualization => ์—ฐ์†์ ์ธ ๋ฐ์ดํ„ฐ๋ฅผ ๊ด€์ฐฐํ•œ ๊ฒฐ๊ณผ Request์™€ Fail์ด ํฌ๊ฒŒ ์ƒ์Šนํ•˜๋Š” ์ง€์ ๋“ค์„ ํ™•์ธ ๊ฐ€๋Šฅ

output.png

  • ์ƒ๊ด€๊ด€๊ณ„ ๋ถ„์„ => fail ๋ฐ์ดํ„ฐ์˜ ์ƒ๊ด€๊ณ„์ˆ˜๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ๋‚ฎ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ.

Untitled

  • ๊ฒฐ์ธก์น˜ ์‹œ๊ฐํ™”

Info์™€ Loggin ์ œ๊ณต ๋ฐ์ดํ„ฐ์— ๊ฒฐ์ธก์น˜๊ฐ€ ๋งŽ์Œ์„ ํ™•์ธ๊ฐ€๋Šฅ

Info์™€ Loggin ์ œ๊ณต ๋ฐ์ดํ„ฐ์— ๊ฒฐ์ธก์น˜๊ฐ€ ๋งŽ์Œ์„ ํ™•์ธ๊ฐ€๋Šฅ

์˜ค๋ฅธ์ชฝ์˜ ์ŠคํŒŒํฌ๋ผ์ธ์€ ๋ฐ์ดํ„ฐ ์™„์ „์„ฑ์˜ ์ผ๋ฐ˜์ ์ธ ๋ชจ์–‘์„ ์š”์•ฝํ•˜๊ณ  ๋ฐ์ดํ„ฐ์„ธํŠธ์—์„œ ์ตœ๋Œ€ ๋ฐ ์ตœ์†Œ nullity๊ฐ€ ์žˆ๋Š” ํ–‰์„ ๋‚˜ํƒ€๋ƒ„

์˜ค๋ฅธ์ชฝ์˜ ์ŠคํŒŒํฌ๋ผ์ธ์€ ๋ฐ์ดํ„ฐ ์™„์ „์„ฑ์˜ ์ผ๋ฐ˜์ ์ธ ๋ชจ์–‘์„ ์š”์•ฝํ•˜๊ณ  ๋ฐ์ดํ„ฐ์„ธํŠธ์—์„œ ์ตœ๋Œ€ ๋ฐ ์ตœ์†Œ nullity๊ฐ€ ์žˆ๋Š” ํ–‰์„ ๋‚˜ํƒ€๋ƒ„

๋ด๋“œ๋กœ๊ทธ๋žจ์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ณ€์ˆ˜ ์™„์„ฑ์˜ ์ƒ๊ด€ ๊ด€๊ณ„๋ฅผ ๋ณด๋‹ค ์™„๋ฒฝํ•˜๊ฒŒ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ƒ๊ด€ ๊ด€๊ณ„ ํžˆํŠธ๋งต์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋Š” ์Œ๋ณ„ ์ถ”์„ธ๋ณด๋‹ค ๋” ๊นŠ์€ ์ถ”์„ธ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ. ๋ด๋“œ๋กœ๊ทธ๋žจ์€ ๊ณ„์ธต์  ํด๋Ÿฌ์Šคํ„ฐ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜ ( ์˜ ์ œ๊ณต scipy )์„ ์‚ฌ์šฉํ•˜์—ฌ nullity ์ƒ๊ด€ ๊ด€๊ณ„(์ด์ง„ ๊ฑฐ๋ฆฌ๋กœ ์ธก์ •)๋ฅผ ํ†ตํ•ด ๋ณ€์ˆ˜๋ฅผ ์„œ๋กœ ๋น„๋‹. ํŠธ๋ฆฌ์˜ ๊ฐ ๋‹จ๊ณ„์—์„œ ๋‚˜๋จธ์ง€ ํด๋Ÿฌ์Šคํ„ฐ์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ์กฐํ•ฉ์— ๋”ฐ๋ผ ๋ณ€์ˆ˜๊ฐ€ ๋ถ„ํ• ๋จ. ๋‹จ์กฐ๋กœ์šด ๋ณ€์ˆ˜ ์ง‘ํ•ฉ์ด ๋งŽ์„์ˆ˜๋ก ์ „์ฒด ๊ฑฐ๋ฆฌ๊ฐ€ 0์— ๋” ๊ฐ€๊น๊ณ  ํ‰๊ท  ๊ฑฐ๋ฆฌ(y์ถ•)๊ฐ€ 0์— ๋” ๊ฐ€๊นŒ์›€. ์ด ๊ทธ๋ž˜ํ”„๋ฅผ ํ•ด์„ํ•˜๋ ค๋ฉด ํ•˜ํ–ฅ์‹ ๊ด€์ ์—์„œ ์ฝ์–ด์•ผ ํ•จ. 0์˜ ๊ฑฐ๋ฆฌ์—์„œ ํ•จ๊ป˜ ์—ฐ๊ฒฐ๋œ ํด๋Ÿฌ์Šคํ„ฐ ์žŽ์€ ์„œ๋กœ์˜ ์กด์žฌ๋ฅผ ์™„์ „ํžˆ ์˜ˆ์ธก. ํ•œ ๋ณ€์ˆ˜๋Š” ๋‹ค๋ฅธ ๋ณ€์ˆ˜๊ฐ€ ์ฑ„์›Œ์งˆ ๋•Œ ํ•ญ์ƒ ๋น„์–ด ์žˆ๊ฑฐ๋‚˜ ํ•ญ์ƒ ๋‘˜ ๋‹ค ์ฑ„์›Œ์ง€๊ฑฐ๋‚˜ ๋‘˜ ๋‹ค ๋น„์–ด ์žˆ์„ ์ˆ˜ ์žˆ์Œ. ์ด ํŠน์ • ์˜ˆ์—์„œ ๋ด๋“œ๋กœ๊ทธ๋žจ์€ ํ•„์š”ํ•˜๋ฏ€๋กœ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ์— ์กด์žฌํ•˜๋Š” ๋ณ€์ˆ˜๋ฅผ ํ•จ๊ป˜ ๋ถ™์ž„.

Untitled

  • ์„ ํ˜• ๋ณด๊ฐ„๋ฒ•

๊ฒฐ์ธก์น˜๊ฐ€ 0์œผ๋กœ ์ฒ˜๋ฆฌ๋จ.

๊ฒฐ์ธก์น˜๊ฐ€ 0์œผ๋กœ ์ฒ˜๋ฆฌ๋จ.

  • ์‹œ๊ฐ„๊ธฐ์ค€ ๋ณด๊ฐ„๋ฒ• โ‡’ nan๊ฐ’์ด ์—ฐ์†๋œ๋‹ค๋ฉด ๋ณด๊ฐ„๋˜์ง€ ์•Š์Œ. โ‡’ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋น„๋ก€ํ•˜์—ฌ ๊ฐ’์ด ์ž…๋ ฅ๋จ.

์‹œ๊ฐ„๊ธฐ์ค€ ๋ณด๊ฐ„๋ฒ• ๊ฒฐ๊ณผ

์‹œ๊ฐ„๊ธฐ์ค€ ๋ณด๊ฐ„๋ฒ• ๊ฒฐ๊ณผ

  • Train / Test split โ‡’ 2017๋…„์€ Train, 2018๋…„์€ Test๋กœ ์‚ฌ์šฉ
  • ๋‹ค๋ณ€์ˆ˜ ๋Œ€์น˜

IterativeImputer ํด๋ž˜์Šค๋Š” ๋ˆ„๋ฝ ๋œ ๊ฐ’์ด์žˆ๋Š” ๊ฐ ๊ธฐ๋Šฅ์„ ๋‹ค๋ฅธ ๊ธฐ๋Šฅ์˜ ํ•จ์ˆ˜๋กœ ๋ชจ๋ธ๋งํ•˜๊ณ  ํ•ด๋‹น ์ถ”์ •์น˜๋ฅผ ๋Œ€์น˜์— ์‚ฌ์šฉ. ๋ฐ˜๋ณต๋œ ๋ผ์šด๋“œ ๋กœ๋นˆ ๋ฐฉ์‹์œผ๋กœ ์ˆ˜ํ–‰. ๊ฐ ๋‹จ๊ณ„์—์„œ ํŠน์„ฑ ์—ด์€ ์ถœ๋ ฅ y ๋กœ ์ง€์ •๋˜๊ณ  ๋‹ค๋ฅธ ํŠน์„ฑ ์—ด์€ ์ž…๋ ฅ X ๋กœ ์ฒ˜๋ฆฌ . ํšŒ๊ท€ ๋ณ€์ˆ˜๋Š” ์•Œ๋ ค์ง„ y์— ๋Œ€ํ•ด (X, y) ์— ์ ํ•ฉ. ๊ทธ๋Ÿฐ ๋‹ค์Œ ํšŒ๊ท€ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ y์˜ ๊ฒฐ์ธก๊ฐ’์„ ์˜ˆ์ธก. ์ด๋Š” ๊ฐ ๊ธฐ๋Šฅ์— ๋Œ€ํ•ด ๋ฐ˜๋ณต์ ์ธ ๋ฐฉ์‹์œผ๋กœ ์ˆ˜ํ–‰๋œ ๋‹ค์Œ max_iter ๋Œ€์น˜๋ผ์šด๋“œ์— ๋Œ€ํ•ด ๋ฐ˜๋ณต๋˜๊ณ  ์ตœ์ข… ๋Œ€์น˜๋ผ์šด๋“œ์˜ ๊ฒฐ๊ณผ๊ฐ€ return.

์Œ์ˆ˜๊ฐ’์ด return๋จ.

์Œ์ˆ˜๊ฐ’์ด return๋จ.

  • KNN ๋ณด๊ฐ„๋ฒ•

Request = Success+Fail ๊ฐ’์ด ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ๋‚˜์˜ด. K-NN(k nearest neighbours) ์ด๋ž€ classification์— ์‚ฌ์šฉ๋˜๋Š” ๊ฐ„๋‹จํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜. 'feature similarity'๋ฅผ ์ด์šฉํ•ด ๊ฐ€์žฅ ๋‹ฎ์€(๊ทผ์ ‘ํ•œ) ๋ฐ์ดํ„ฐ๋ฅผ K๊ฐœ๋ฅผ ์ฐพ๋Š” ๋ฐฉ์‹.

2017๋…„ (Train)

2017๋…„ (Train)

2018๋…„ (Test)

2018๋…„ (Test)

  • ์„œ๋ฒ„๋ณ„ Fail ๋น„์œจ ๊ตฌํ•˜๊ธฐ

โ‡’ column๋ช…์— request๊ฐ€ ์žˆ์œผ๋ฉด, request ๋‹ค์Œํ–‰์ธ success๋ฅผ request์žˆ๋Š” ์—ด๋กœ ๋‚˜๋ˆ  ์„ฑ๊ณต ๋น„์œจ ํ–‰์„ ๋‚˜๋ˆ ์ฃผ๊ณ , request๊ฐ€ 0์ธ ๊ฒฝ์šฐ์—๋Š” nan์ด ๋ฐœ์ƒํ•˜๋ฏ€๋กœ fillna(0)

  • Autocorrelation visualization โ‡’ ์‹œ๊ณ„์—ด์ž๋ฃŒ๋ฅผ ๋‹ค๋ฃจ๋ฏ€๋กœ ์—ฐ์†๋˜๋Š” ์˜ค์ฐจํ•ญ๋“ค์˜ ์ƒ๊ด€๊ฐ€๋Šฅ์„ฑ ์‹œ๊ฐํ™”

ex)

final_pipeline ipynb ํ™•์ธ ํ•„์š” ex) visualization ์˜ˆ์‹œ

  • Stationarity ์‹œ๊ฐํ™” โ‡’ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ ์ด์šฉ

ex)

final_pipeline ipynb ํ™•์ธ ํ•„์š” ex) visualization ์˜ˆ์‹œ

final_pipeline ipynb ํ™•์ธ ํ•„์š” ex) visualization ์˜ˆ์‹œ

  • ADF Test

stationary ํ†ต๊ณ„์  ๊ฒ€์ •์œผ๋กœ stationary์˜ ๊ฒฝ์šฐ๋Š” ์‹œ๊ฐ„์ด ๋ณ€ํ•ด๋„ ์ผ์ •ํ•œ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋Š” ๊ฒฝ์šฐ๋ฅผ ๋งํ•˜๊ณ , non-stationary์˜ ๊ฒฝ์šฐ๋Š” ์‹œ๊ฐ„์ด ๋ณ€ํ•ด๋„ ์ผ์ •ํ•œ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด์ง€ ์•Š๋Š” ๊ฒฝ์šฐ๋ฅผ ๋งํ•จ. ๋ณ€์ˆ˜๋ณ„ ADF Test ์ง„ํ–‰

๊ฒฐ๊ณผ์ ์œผ๋กœ ADF Test ์ž˜ ํ†ต๊ณผ

  • ์„œ๋ฒ„๋ณ„ ์ƒ๊ด€๊ณ„์ˆ˜

success์™€ request๋Š” ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ๋†’์•„์„œ request, fail, ratio 3๊ฐ€์ง€ feature๋ฅผ ์ถ”์ถœ

ex)

final_pipeline ipynb ํ™•์ธ ํ•„์š” ex)correlation ์˜ˆ์‹œ

final_pipeline ipynb ํ™•์ธ ํ•„์š” ex)correlation ์˜ˆ์‹œ


3. Modeling

K-means

simple concept

clustering๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ  cluster์— ํ• ๋‹นํ•˜์ง€ ์•Š๋Š” ๊ฐ์ฒด๋“ค์€ ์ด์ƒ์น˜๋กœ ์ทจ๊ธ‰ํ•œ๋‹ค. Anomaly Score by K-Means Clustering-based Anomaly Detection(KMC) โ‘  ์ ˆ๋Œ€์  ๊ฑฐ๋ฆฌ : A anomaly score (a1) = B anomaly score (b1) โ‘ก ์ƒ๋Œ€์  ๊ฑฐ๋ฆฌ : A anomaly score (a1/a2) < B anomaly score (b1/b2)

outliers_fraction โ†’ hyper_parmeter. annomaly ์˜ˆ์ƒ row์˜ ๊ฐฏ์ˆ˜๊ฐ€ ๋ชจ๋“  column์— ๋™์ผํ•˜๊ฒŒ ๋‚˜์˜ด, ์ตœ์†Ÿ๊ฐ’์„ threshold๋กœ ์ง€์ •

์ •๋‹ต๋ฐ์ดํ„ฐ score๋ฅผ ๋ณด๋ฉด์„œ outlier_fractions๋ฅผ ๋ณ€๊ฒฝํ•ด์•ผ ํ•จ.

cluster์™€ point๊ฐ„ ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ

cluster์™€ point๊ฐ„ ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ

outliers_fraction โ†’ hyper_parmeter. annomaly ์˜ˆ์ƒ row์˜ ๊ฐฏ์ˆ˜๊ฐ€ ๋ชจ๋“  column์— ๋™์ผํ•˜๊ฒŒ ๋‚˜์˜ด, ์ตœ์†Ÿ๊ฐ’์„ threshold๋กœ ์ง€์ • labeling์€ 2๊ฐœ๋กœ๋งŒ ๋‚˜๋ˆŒ ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— n_clusters๋Š” 2๋กœ ๊ณ ์ •

outliers_fraction โ†’ hyper_parmeter. annomaly ์˜ˆ์ƒ row์˜ ๊ฐฏ์ˆ˜๊ฐ€ ๋ชจ๋“  column์— ๋™์ผํ•˜๊ฒŒ ๋‚˜์˜ด, ์ตœ์†Ÿ๊ฐ’์„ threshold๋กœ ์ง€์ • labeling์€ 2๊ฐœ๋กœ๋งŒ ๋‚˜๋ˆŒ ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— n_clusters๋Š” 2๋กœ ๊ณ ์ • ย 

Isolation Forest

simple concept

Tree๋ฅผ ์ด์šฉํ•œ anomaly detection์„ ์œ„ํ•œ unsupervised algorithm.

Regression Decision Tree๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์‹คํ–‰๋จ.

Regression Tree๊ฐ€ ์žฌ๊ท€ ์ด์ง„ ๋ถ„ํ• ์„ ์ด์šฉํ•˜์—ฌ ์˜์—ญ์„ ๋‚˜๋ˆ„๋Š” ๊ฐœ๋…์„ ์ด์šฉ

(์ง€์ ์„ ๋ถ„๋ฆฌํ•ด์„œ ๊ฒฉ๋ฆฌํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ํŒŒํ‹ฐ์…˜์˜ ์ˆ˜ = ๋ฃจํŠธ ๋…ธ๋“œ~)

Untitled

์ •์ƒ ๋ฐ์ดํ„ฐ โ‡’ ๋งŽ์€ ์žฌ๊ท€ ์ด์ง„๋ถ„ํ• 

๋น„์ •์ƒ ๋ฐ์ดํ„ฐ โ‡’ ์ƒ๋Œ€์ ์œผ๋กœ ๋” ์ ์€ ๋ถ„ํ• 

์ฆ‰ depth๊ฐ€ ์งง์„ ์ˆ˜๋ก ๋น„์ •์ƒ ๋ฐ์ดํ„ฐ์— ๊ฐ€๊น๋‹ค๊ณ  ํŒ๋‹จ

์žฅ์  : ํด๋Ÿฌ์Šคํ„ฐ๋ง anomaly detection algorithm์— ๋น„ํ•ด ๊ณ„์‚ฐ๋Ÿ‰์ด ๋งค์šฐ ์ ๊ณ  Robustํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Œ.

outliers_fraction โ†’ hyper_parmeter. annomaly ์˜ˆ์ƒ row์˜ ๊ฐฏ์ˆ˜๊ฐ€ ๋ชจ๋“  column์— ๋™์ผํ•˜๊ฒŒ ๋‚˜์˜ด, ์ตœ์†Ÿ๊ฐ’์„ threshold๋กœ ์ง€์ •

์ •๋‹ต๋ฐ์ดํ„ฐ score๋ฅผ ๋ณด๋ฉด์„œ outlier_fractions๋ฅผ ๋ณ€๊ฒฝํ•ด์•ผ ํ•จ.

clustering๊ณผ ๋™์ผํ•˜๊ฒŒ outliers_fraction(contamination) ์ง€์ •์ด ํ•„์š”

clustering๊ณผ ๋™์ผํ•˜๊ฒŒ outliers_fraction(contamination) ์ง€์ •์ด ํ•„์š”

result

Untitled

final_pipeline ipynb ํ™•์ธ ํ•„์š” ex) kmeans result ์˜ˆ์‹œ

final_pipeline ipynb ํ™•์ธ ํ•„์š” ex) kmeans result ์˜ˆ์‹œ

kmeans ์ด์ƒ์น˜ ๊ฐœ์ˆ˜

kmeans ์ด์ƒ์น˜ ๊ฐœ์ˆ˜

final_pipeline ipynb ํ™•์ธ ํ•„์š” ex) isolation forest result ์˜ˆ์‹œ

final_pipeline ipynb ํ™•์ธ ํ•„์š” ex) isolation forest result ์˜ˆ์‹œ

If ์ด์ƒ์น˜ ๊ฐœ์ˆ˜

If ์ด์ƒ์น˜ ๊ฐœ์ˆ˜

LSTM-AE

๋…ผ๋ฌธ ์ฐธ๊ณ 

LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection (ICML 2016)

(Pankaj Malhotra, Anusha Ramakrishnan, Gaurangi Anand, Lovekesh Vig, Puneet Agarwal, Gautam Shroff)

LSTM?

basicํ•œ RNN(Vanila RNN)๊ตฌ์กฐ์˜ the problem of Long-Term Dependencies(์žฅ๊ธฐ์˜์กด์„ฑ ๋ฌธ์ œ)๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋งŒ๋“ค์–ด์ง„ ๋ชจ๋ธ

RNN์€ ๊ด€๋ จ ์ •๋ณด์™€ ๊ทธ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ง€์  ์‚ฌ์ด ๊ฑฐ๋ฆฌ๊ฐ€ ๋ฉ€ ๊ฒฝ์šฐ ์—ญ์ „ํŒŒ์‹œ ๊ทธ๋ž˜๋””์–ธํŠธ๊ฐ€ ์ ์ฐจ ์ค„์–ด ํ•™์Šต๋Šฅ๋ ฅ์ด ํฌ๊ฒŒ ์ €ํ•˜๋จ.

RNN์ฒ˜๋Ÿผ neural network layer ํ•œ ์ธต ๊ตฌ์„ฑ ๋Œ€์‹ , LSTM์€ 4๊ฐœ์˜ layer์™€ cell state๋กœ ๊ตฌ์„ฑํ•˜์—ฌ iteration์ด ์ฆ๊ฐ€(state๊ฐ€ ์˜ค๋ž˜ ๊ฒฝ๊ณผํ•˜๋”๋ผ๋„) ๊ทธ๋ž˜๋””์–ธํŠธ๊ฐ€ ๋น„๊ต์  ์ž˜ ์ „ํŒŒ๋จ.

h(t) โ‡’ ๋‹จ๊ธฐ ์ƒํƒœ์šฉ ๋ฒกํ„ฐ

c(t) โ‡’ ์žฅ๊ธฐ ์ƒํƒœ์šฉ ๋ฒกํ„ฐ

input_gate โ‡’ cell state์— ์œ ์ง€ํ•  ์ •๋ณด๋ฅผ ์„ ํƒ(1)

tanh_layer โ‡’ input_gate ํ†ต๊ณผํ•œ ์ •๋ณด๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” layer

forget_gate โ‡’ ๋ฒ„๋ฆด ์ •๋ณด ์„ ํƒ (0)

cell state update โ‡’ input_gate์™€ forget_gate ์—…๋ฐ์ดํŠธ

output_gate โ‡’ ouput ๋‚ด๋ณด๋‚ผ ์ •๋ณด ๊ฒฐ์ •

RNN, LSTM ๋น„๊ต

RNN, LSTM ๋น„๊ต

input gate, forget gate

input gate, forget gate

cell state

cell state

Auto encoder?

Auto Encoder๋Š” ๋ชจ๋ธ์˜ ์ถœ๋ ฅ ๊ฐ’๊ณผ ์ž…๋ ฅ ๊ฐ’์ด ๋น„์Šทํ•ด์ง€๋„๋ก ํ•™์Šต์ด ์ˆ˜ํ–‰

  1. Mapping Layer(Encoder) โ‡’ Encoder์—์„œ๋Š” Input ๋ฐ์ดํ„ฐ๋ฅผ Bottleneck Layer ๋กœ ๋ณด๋‚ด Input ์ •๋ณด๋ฅผ ์ €์ฐจ์›์œผ๋กœ ์••์ถ•ํ•˜๋Š” ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.
  2. Bottleneck Layer
  3. Demapping Layer(Decoder) โ‡’ Decoder ์—์„œ๋Š” ์••์ถ•๋œ ํ˜•ํƒœ์˜ Input ์ •๋ณด๋ฅผ ์›๋ž˜์˜ Input ๋ฐ์ดํ„ฐ๋กœ ๋ณต์›ํ•œ๋‹ค.
  4. Output Layer

Auto-Encoder์˜ ๋ชฉํ‘œ๋Š” input ๋ฐ์ดํ„ฐ์™€ ๋™์ผํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด๋ฏ€๋กœ Output Layer๋ฅผ ํ†ตํ•ด ๋‚˜์˜ค๋Š” ์˜ˆ์ธก ๊ฐ’๊ณผ ์‹ค์ œ ๊ฐ’์˜ ์ฐจ์ด๋ฅผ Loss Function์œผ๋กœ ์ •์˜ํ•˜๊ณ  ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, ํ•ด๋‹น Loss Function์„ Reconstruction Error๋ผ๊ณ  ํ•œ๋‹ค.

Reconstruction Error๊ฐ€ threshold๋ฅผ ์ดˆ๊ณผํ•˜๋ฉด anomalies, ์•„๋‹ˆ๋ฉด normal๋กœ ๊ทœ์ •.

Untitled

Untitled

Untitled

Neural Network construct

LSTM layer ๋ฐฉ์‹์œผ๋กœ auto encoder ๊ตฌ์ถ•

L1 L2 โ‡’ encoder

L3 โ‡’ bottle neck

L4 L5 โ‡’ decoder

L2 3 4 5๋ฅผ L3 4 5 6๋กœ ์ž˜๋ชปํ‘œ์‹œ

L2 3 4 5๋ฅผ L3 4 5 6๋กœ ์ž˜๋ชปํ‘œ์‹œ

hyperparameter โ‡’ optimizer: adam, loss_func : MSE, activation func: relu, epoch: 20, batch size: 32

hyperparameter โ‡’ optimizer: adam, loss_func : MSE, activation func: relu, epoch: 20, batch size: 32

reconstruction error๋ฅผ ํ†ตํ•œ threshold ์ง€์ • โ‡’ anomalies ํŒ๋‹จ

reconstruction error๋ฅผ ํ†ตํ•œ threshold ์ง€์ • โ‡’ anomalies ํŒ๋‹จ

result

minmaxscaling์„ ํ•œ ๊ฒƒ๊ณผ ์•ˆํ•œ ๊ฒƒ, ๋‘ ๊ฐœ ๋™์‹œ ์ง„ํ–‰

epoch ์ง„ํ–‰์— ๋”ฐ๋ผ mae ๊ฐ์†Œ

epoch ์ง„ํ–‰์— ๋”ฐ๋ผ mae ๊ฐ์†Œ

prediction - minmax_scaling

prediction - minmax_scaling

epoch ์ง„ํ–‰์— ๋”ฐ๋ผ mae ๊ฐ์†Œ

epoch ์ง„ํ–‰์— ๋”ฐ๋ผ mae ๊ฐ์†Œ

prediction - no_scaling

prediction - no_scaling

final_pipeline ipynb ํ™•์ธ ํ•„์š” ex) LSTM-AE result ์˜ˆ์‹œ

final_pipeline ipynb ํ™•์ธ ํ•„์š” ex) LSTM-AE result ์˜ˆ์‹œ


4. Result

๋ชจ๋ธ๋ณ„๋กœ Prediction ๊ฒฐ๊ณผ๋ฅผ ์ข…ํ•ฉ โ‡’ ์„œ๋ฒ„๋“ค ์ค‘ ํ•œ timeline์— ํ•˜๋‚˜๋ผ๋„ anomalies ๋“ฑ์žฅ ์‹œ ํ•ด๋‹น timeline ์ „์ฒด๋ฅผ anomaly๋กœ ํŒ๋‹จ

๋ผ๋ฒจ์„ ๋งŒ๋“ค์–ด๋‚ด๋Š” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋”ฐ๋กœ ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ evaluation์€ ๋ถˆ๊ฐ€.

AIFactory ๊ฒฝ์ง„ ๋Œ€ํšŒ score ์ฑ„์ ๊ธฐ๋Šฅ์„ ์ด์šฉ.(๋‚ด๋ถ€ ์ •๋‹ต ๋ผ๋ฒจ ๋ฐ์ดํ„ฐ์…‹)

F2-score๋กœ ํ‰๊ฐ€๋จ. (Precision๋ณด๋‹ค Recall์— advantage๋ฅผ ์ฃผ๋Š” ๊ฒฝ์šฐ)

Untitled

lstm AE์˜ ๊ฒฝ์šฐ ์ผ์ • ์ ์ˆ˜ ์ด์ƒ์˜ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ์Œ. lstm_not_scaling์ด minmaxscaling์„ ์‚ฌ์šฉํ•œ ๊ฒƒ๋ณด๋‹ค ๊ฒฐ๊ณผ๊ฐ€ ์ข‹์•˜์Œ.
๋”ฐ๋ผ์„œ LSTM-AE์—์„œ ๊ตฌํ•ด์ง„ ์ด์ƒ์น˜ ๋น„์œจ์„ K-means์™€ Isolation Forest์— ์ ์šฉ์‹œ์ผœ๋ณด์•˜๋”๋‹ˆ ํฐ ํญ์˜ ํ–ฅ์ƒ์ด ์ด๋ฃจ์–ด์กŒ์Œ.

Untitled


5. Conclusion & Discussion

Visualization

(3D ์ด๋ฏธ์ง€๊ฐ€ memory๋ฅผ ๊ณผํ•˜๊ฒŒ ์š”๊ตฌ โ‡’ final_result_visualization_compare.ipynb์— ๋”ฐ๋กœ ์ฝ”๋“œ๋งŒ ๊ตฌํ˜„)

์œ„์—์„œ๋ถ€ํ„ฐ LSTM, Isolation Forest, Kmeans ๊ฒฐ๊ณผ๋ฌผ

info_data_test

Untitled

Untitled

Untitled

login_1_data_test

Untitled

Untitled

Untitled

login_2_data_test

Untitled

Untitled

Untitled

login_3_data_test

Untitled

Untitled

Untitled

login_4_data_test

Untitled

Untitled

login_5_data_test

Untitled

Untitled

Untitled

menu_1_data_test

Untitled

Untitled

Untitled

menu_2_data_test

Untitled

Untitled

Untitled

menu_3_data_test

Untitled

Untitled

Untitled

menu_4_data_test

Untitled

Untitled

Untitled

Isolation Forest์™€ Kmeans์˜ ๊ฒฐ๊ณผ๋ฌผ์€ ์œ ์‚ฌํ•œ ๋ถ„ํฌ๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ LSTM AutoEncoder์„ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜๋ฉด ์ด์ƒ์น˜ ํŒ๋ณ„ ๋ฐฉ์‹์ด ๋‹ฌ๋ผ ๋‹ค๋ฅธ ๋ถ„ํฌ๋ฅผ ๊ทธ๋ฆฌ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Discussion

3๊ฐ€์ง€ ๋ฐฉ๋ฒ•๋ก ์—์„œ ๊ฐ™์€ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™”๋‹ค๋ฉด 0 or 3 , ์•„๋‹ˆ๋ผ๋ฉด 1, 2

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

IF vs Kmeans IF vs LSTM Kmeans vs LSTM

Kmeans - IF

Kmeans - IF

IF - LSTM

IF - LSTM

Kmeans - LSTM

Kmeans - LSTM

LSTM์ •์ƒ๊ทธ๋ž˜ํ”„ ์ƒ์œผ๋กœ ๋น„์Šทํ•œ ๋ถ„ํฌ๋ฅผ ๋ณด์˜€์ง€๋งŒ ๋‹ค๋ฅธ ๊ฒฐ๊ณผ ๊ฐ’์„ ๋ณด์ด๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์•˜์Šต๋‹ˆ๋‹ค.

ํŠนํžˆ ๊ฐ€์žฅ ์ ์€ ์ด์ƒ์น˜๋ฅผ ๊ฐ์ง€ํ–ˆ๋˜ LSTM๊ณผ ํƒ€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋น„๊ตํ–ˆ์„ ๋•Œ K-means์˜ ๊ฒฝ์šฐ LSTM๊ณผ ๋‹ค๋ฅธ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒฝ์šฐ๊ฐ€ K-means๋ณด๋‹ค ๋งŽ์•˜์Šต๋‹ˆ๋‹ค.

Conclusion

๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ์ด์ƒ์น˜ ๋น„์œจ ์„ค์ • (ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ)์— ๋ฏผ๊ฐํ•˜๋‹ค๋Š” ์‚ฌ์‹ค์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๋น„๋ก ํ•™์Šต์‹œ๊ฐ„์€ LSTM-AE์— ๋น„ํ•ด ํ›จ์”ฌ ์งง์€ ์‹œ๊ฐ„์„ ๋ณด์ด์ง€๋งŒ ์ ๋‹นํ•œ ์ด์ƒ์น˜ ๋น„์œจ์„ ์•Œ์ง€ ๋ชปํ•˜๋‹ค๋ฉด ๊ฐ’์„ ์ฐพ๋Š” ๊ฒƒ์— ์˜ค๋žœ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋˜ํ•œ Kmeans์™€ Isolation Forest์˜ ์ด์ƒ์น˜ ๋น„์œจ์ด ๊ฐ™๊ฒŒ ์ง€์ •ํ•˜๋”๋ผ๋„ ๋งŽ์€ ์ˆ˜๊ฐ€ ๋‹ค๋ฅธ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (์•ฝ ์ ˆ๋ฐ˜ ๊ฐ€๋Ÿ‰์ด ๋‹ค๋ฅด๊ฒŒ ๋ถ„๋ฅ˜)

Untitled

์—ฌ๋Ÿฌ ์ด์ƒ์น˜ ํƒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์—๋Š” ์žฅ๋‹จ์ ๋“ค์ด ๋ถ„๋ช…ํ•ฉ๋‹ˆ๋‹ค.

๋‹ค๋ณ€๋Ÿ‰ ๋ณ€์ˆ˜, label์˜ ์œ ๋ฌด, ์—ฐ์‚ฐ๋Ÿ‰ ๋“ฑ ๋‹ค์–‘ํ•œ ์กฐ๊ฑด์— ๋”ฐ๋ผ ์ ์ ˆํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์„ ํƒํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

ํ•ด๋‹น ํ”„๋กœ์ ํŠธ ๋ถ„์„์€ Neural Network์˜ ์ผ์ข…์ธ LSTM-AE๋กœ unsupervised๋กœ๋Š” ์ •ํ•˜๊ธฐ ํž˜๋“  anomaly threshold๋ฅผ ํ•™์Šต์— ๋”ฐ๋ผ ์ž๋™์œผ๋กœ ์ •ํ•˜๊ณ  ์ด์ƒ์น˜ ๋ผ๋ฒจ์ด ์—†์–ด๋„ ์–ด๋Š์ •๋„ ์ด์ƒ์น˜๋ผ๊ณ  ํŒ๋‹จ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ์ค€์„ ์ œ์‹œํ–ˆ๋‹ค๋Š” ๊ฒƒ์˜ ์˜์˜๊ฐ€ ์žˆ์œผ๋ฉฐ, ์ดํ›„ ์ถ”๊ฐ€์—ฐ๊ตฌ๋กœ๋Š” ๋ผ๋ฒจ๋ง ๊ฒฐ๊ณผํŒจํ„ด ๋“ฑ์„ ์—ฐ๊ตฌํ•˜์—ฌ ๊ธ‰๊ฒฉํ•˜๊ฒŒ request์™€ Fail ๋น„์œจ์ด ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒฝ์šฐ ํ•ด๋‹น ์‹œ๊ฐ„์—๋งŒ ์ŠคํŠธ๋ฆฌ๋ฐ ํ™”์งˆ์„ ๋‚ฎ์ถ”๊ฑฐ๋‚˜ ๋‹ค์šด๋กœ๋“œ ์†๋„๋ฅผ ๋Šฆ์ถ”๋Š” ๋“ฑ์˜ ๋ฐฉ์•ˆ์œผ๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ˜„์žฌ๋Š” ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„์—์„œ KNN์„ ์‚ฌ์šฉํ–ˆ์ง€๋งŒ ์ตœ์‹  ๊ธฐ๋ฒ•์„ ์—ฐ๊ตฌํ•ด์„œ ๊ฒฐ์ธก์น˜๋‚˜ ๋ถˆ๋Ÿ‰ํ•œ ๋ฐ์ดํ„ฐ row(request๊ฐ€ 0์ธ ๋“ฑ ์ œ๊ณตํ•œ ํšŒ์‚ฌ ์ž์ฒด์˜ ๋ฌธ์ œ)์— ์‚ฌ์šฉํ•  ๊ฒƒ์ด๋ฉฐ, epoch ์ˆ˜๋ฅผ ๋” ๋Š˜๋ ค ๊ฒฐ๊ณผ๋ฅผ ํ–ฅ์ƒ ์‹œํ‚ฌ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

About

Anomaly Detection for OTT-Serverlog Data (2022 Seoultech Data Mining Team Project)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published