'우리는 개발자' 카테고리의 글 목록 (4 Page)

우리는 개발자

[gradle] gradle window 환경에서 특정 directory delete fail. 2020.02.20
[java] GC에 관하여. 2020.02.19
[Git] .gitignore을 무시하고 added, modified, removed가 표시될때 2020.02.19
[kubernetes] mac에 minikube설치하기. 2020.02.18
[elasticsearch] refresh, flush, optimize API. 루씬과의 비교. 2020.01.27
[elasticsearch] nested type, nested type aggregation. 2020.01.11
[elasticsearch] fielddata, doc_values에 대한 이해. 2020.01.11
Web UI 개발할때는 storybook을 이용해보자 2019.12.20

[gradle] gradle window 환경에서 특정 directory delete fail.

2020. 2. 20. 11:02

gradlew run 하는 것이 너무 오래 걸려 커맨드를 강제 종료하였더니,
그 이후부터는 gradlew run도 안되고, gradlew clean 도 안됨.

#에러메세지 양상은 대략 아래와 같음.
FAILURE: Build failed with an exception. * What went wrong: Execution failed for task ':distribution:run'. > Unable to delete directory 'xxxxxxxx' Failed to delete some children. This might happen because a process has files open or has its working directory set in the target directory. - ......

#해결방법.
win+R 버튼,
resmon 을 입력하고 클릭
cpu 탭에서 위의 메시지에 나온 디렉터리 파일중에 실행중인것을 프로세스 종료!
나는 java.exe 종료했더니

그 이후로 gradlew clean 성공함.

저작자표시 비영리 변경금지 (새창열림)

'우리는 개발자' 카테고리의 다른 글

[리눅스] 자주 사용되는 명령어 모음집. (centos기준) (0)	2020.02.25
[java] gc에 대하여 2 - object generation (0)	2020.02.23
[java] jre, jdk, jvm 용어 정리 (0)	2020.02.23
[java] GC에 관하여. (0)	2020.02.19
[Git] .gitignore을 무시하고 added, modified, removed가 표시될때 (0)	2020.02.19

[java] GC에 관하여.

2020. 2. 19. 08:42

jvm위에 동작하는 분산스토리지를 운영하기 위해, GC개념을 좀 더 정립하고가려한다.

아주 쉽게 정리된 글이 있어, 참고하여 정리해보았다.
https://d2.naver.com/helloworld/1329

불러오는 중입니다...

1. GC. General Garbage Collection.
java에서 메모리를 두 가지 영역으로 나눈다.
Young 영역과 Old 영역.

young generation 영역

old generation 영역

Q) old영역에 있는 객체가 young영역의 객체를 참조하는 경우는 어떻게 처리 될까?
A) old영역에는 512바이트의 덩어리로 되어 있는 카드 테이블이 존재.
카드 테이블에는 old영역에 이는 객체가 young 영역의 객체를 참조할 때마다 정보가 표시됨.
young영역의 GC를 실행할 때에는 old 영역에 있는 모든 객체의 참조를 확인하지 않고,
이 카드 테이블만 뒤져서 GC대상인지를 파악한다.
카드 테이블은 write barrier를 사용하여 관리함.

2. Young 영역의 구성
young영역은 1개의 eden영역과 2개의 survivor영역으로 나뉘고,
새로 생성한 객체 대부분이 eden영역에 생성됨.

위의 절차를 보면 survivor영역 2개 중 하나는 반드시 비어있는 상태로 남아있어야한다는것을 알수있음.
만약 두 survivor영역에 모두 데이터가 존재하거나, 두 영역 모두 사용량이 0이라면 시스템의 비정상이라고 생각하면 된다.

참고. bump-the-pointer, TLABs(Thread-Local Allocation Buffers)
HotSpot VM에서는 보다 빠른 메모리 할당을 위해 사용하는 기술.
bump-the-pointer는 Eden영역에 할당된 마지막 객체를 추적.
새로운 객체를 생성할 떄 마지막에 추가된 객체만 점검하도록함.
그러나 멀티 스레드 환경이라면?
Thread-safe하기 위해 여러 스레드 사용하는 객체를 Eden영역에 저장하려면, lock이 발생.
lock-contention 때문에 성능 저하.
그래서 TLABs 가 각각의 스레드가 각각의 몫에 해당하는 Eden영역의 작은 덩어리를 가질 수 있도록 하는 것.

3. Old영역에 대한 GC
serial GC, parallel GC, parallel old GC, Concurrent Mark&Sweep GC, G1GC 있다.

JAVA버전 마다 사용가능한 GC방식은 다른데,
나는 CMS와 G1GC방식만 살펴봐야지.

3-1. CMS GC
Serial GC와 비슷한데, 다른 점은 GC가 발생하면서도 다른 스레드들을 실행중인 상태에서 동시에 진행됨.
클래스 로더에서 가장 가까운 객체 중 살아있는 객체만 찾음
concurrent mark단계에서 방금 살아있다고 확인한 객체에서 참조하고 이는 객체들을 확인.
remark단계에서는 concurrent단계에서 새로 추가하거나 참조가 끊기 객체를 확인.
concurrent sweep단계에서는 쓰레기를 정리하는 작업을 실행.
mark, seep 단계가 concurrent하게 진행됨. 즉, 다른 스레드가 실행되고 이는 상황에서 진행됨.
초기 mark와 remark에서만 stop-the-world갈 발생하므로 짧다!

그런데 단점이 존재하니,
* 다른 GC방식보다 메모리와 cpu를 더 많이 사용하고,
* compaction단계가 기본적으로 제공되지 않는다.

따라서,
* CMS GC를 사용할 때에는 신중히 검토한 후에 사용해야함.
* 조각난 메모리가 많아 compaction작업을 실행하면 다른 GC의 stop-the-world시간보다 더 길어질 수 있기에
* compaction 작업이 얼마나 자주, 오랫동안 수행되는지 확인해야한다.

각종 그림은 아래 오라클 기술섹션에서 볼수있다.
https://www.oracle.com/java/technologies/

Oracle Java Technologies | Oracle

In this issue, we examine three leading frameworks for microservices: Javalin, which is a very lightweight, unopinionated Kotlin-based web framework; Micronaut, which handles all feature injection at compile time and so loads extremely fast; and Helidon, w

www.oracle.com

3-2. G1 GC
지금까지의 Young영역과 old영역을 잊고, 그냥 큰 바둑판의 각 영역에 객체를 할당하고 GC를 실행한다.
어떤 GC보다 빠르고, JDK7에서 정식으로 G1 GC를 포함하여 제공하낟.

--------------

이것으로 오늘 아침 GC 공부를 마침.
나중에 오라클 홈페이지에서 jvm옵션들좀 봐야겠다아.

'우리는 개발자' 카테고리의 다른 글

[리눅스] 자주 사용되는 명령어 모음집. (centos기준) (0)	2020.02.25
[java] gc에 대하여 2 - object generation (0)	2020.02.23
[java] jre, jdk, jvm 용어 정리 (0)	2020.02.23
[gradle] gradle window 환경에서 특정 directory delete fail. (0)	2020.02.20
[Git] .gitignore을 무시하고 added, modified, removed가 표시될때 (0)	2020.02.19

[Git] .gitignore을 무시하고 added, modified, removed가 표시될때

2020. 2. 19. 07:45

.gitignore에서 분명 node_modules를 무시하라고 명시해줬는데 $ git status 를 하면 해당 변경분에 해서 표시를 해줬다.
이럴때는 git의 cahce를 초기화 해주면 된다.

$ git rm -r --cached .
$ git add .
$ git commit -m "fixed untracked files"

저작자표시 비영리 변경금지 (새창열림)

'우리는 개발자' 카테고리의 다른 글

[리눅스] 자주 사용되는 명령어 모음집. (centos기준) (0)	2020.02.25
[java] gc에 대하여 2 - object generation (0)	2020.02.23
[java] jre, jdk, jvm 용어 정리 (0)	2020.02.23
[gradle] gradle window 환경에서 특정 directory delete fail. (0)	2020.02.20
[java] GC에 관하여. (0)	2020.02.19

[kubernetes] mac에 minikube설치하기.

2020. 2. 18. 23:26

1. brew설치.

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

아래 virtualbox 홈페이지에서 맥용으로 다운로드 하여 패키지 설치.
https://www.virtualbox.org/wiki/Downloads

Downloads – Oracle VM VirtualBox

Download VirtualBox Here you will find links to VirtualBox binaries and its source code. VirtualBox binaries By downloading, you agree to the terms and conditions of the respective license. If you're looking for the latest VirtualBox 6.0 packages, see Virt

www.virtualbox.org

3. minikube 설치

brew install minikube

#권한 에러시 아래 명령어로 권한 변경.
sudo chown -R [:username] /usr/....

5. minikube 시작

minikube start

이제 kubectl 명령어 사용 가능!

minikube설치를 위해 하이퍼바이저가 필요해서 virtualbox를 설치한 것인데,
하이퍼바이저란, 컴퓨터에서 다수의 운영 체제를 동시에 실행하기 위한 논리적 플랫폼.

'우리는 개발자 > Data Engineering' 카테고리의 다른 글

[Spark] Scala 프로젝트 구성 및 시작하기 (sbt, g8) (0)	2020.02.26
[elasticsearch] es를 intellij 에서 debugging 하기. (0)	2020.02.26
[elasticsearch] refresh, flush, optimize API. 루씬과의 비교. (0)	2020.01.27
[elasticsearch] nested type, nested type aggregation. (0)	2020.01.11
[elasticsearch] fielddata, doc_values에 대한 이해. (0)	2020.01.11

[elasticsearch] refresh, flush, optimize API. 루씬과의 비교.

2020. 1. 27. 11:45

엘라스틱서치에서 샤드는 refresh, flush, optimize API 과정을 통해 관리된다.
엘라스틱서치의 위의 내용은 모두 루씬 내용인데 서로 용어가 다르기 때문에 잘 정리해 둘 필요가 있다.

루씬	엘라스틱서치
flush	refresh
commit	flush
merge	optimize API

루씬에서는 데이터가 in-memory buffer 기반으로 처리된다.
데이터변경사항이 들어오면 segment를 생성하고, 시스템 캐시에 캐시된 후에, 디스크 동기화가 이루어짐.

1. 루씬에서의 flush = 엘라스틱서치의 refresh.
- segment 생성시 커널 시스템 캐시에 세그먼트가 캐시되어 읽기가 가능해진다.
- 루씬의 ReOpen() 함수를 이용해 IndexSearcher에서 읽을 수 있는 상태.
- 일정주기마다 업데이트 된 문서가 ReOpen() 함수로 처리.

- es 클러스터에 존재하는 모든 샤드에서는 기본적으로 1초마다 한번씩 refresh작업이 수행된다.
- 인덱스를 새로고침한다는 의미인데, refresh가되면 새로 추가한 데이터의 검색이 가능해진다.
- 대량 인덱스 시에는 -1로 비활성화해두면 인덱싱할때 이점이 있다.

2. 루씬에서의 commit = 엘라스틱서치의 flush.
- 물리적으로 디스크 기록을 수행하는 fsync() 함수 호출 작업이다.
- flush가 있기 때문에 매번 commit 필요가 없고, 일정 주기로 commit이 수행된다.
- 루씬에서의 flush작업은 디스크로 쓰기가 이루어지기 전이기 때문에, flush작업까지만 되고 시스템에 문제가 발생하면 데이터 유실 발생 가능성이 있다.

- es에서의 flush는 루씬의 commit 작업과 함께 새로운 translog를 시작한다.
- * translog는 루씬에는 없는 내용으로 샤드의 장애복구를 위해 재공 되는 특수한 파일이다.
- * 샤드는 자신에게 일어나는 모든 변경사항을 translog에 먼저 기록하고, 내부 루씬을 호출한다.
- * 시간이 지나면 translog 파일 크기도 증가한다. 루씬에서 commit이 이루어지면 translog에서 commit 지점까지의 내용이 삭제된다.
- * 데이터가 커널 시스템 캐시에 있다 디스크에 동기화 되지 못하고 유실될 가능성을 대비하여 transhlog를 만든 것이다.
- es에서 flush 작업은 default로 5초에 한번씩 수행되고, api를 통해 flush 주기 조절이 가능하나 추천하지 않는다.

3. 루씬의 merge = 엘라스틱서치의 optimize API
- 검색 성능을 높이기 위해 검색 대상이 되는 세그먼트를 병합하여 세그먼트 수를 줄이는 작업이다.
- 검색 대상이 되는 세그먼트 수를 줄이면, 검색 횟수를 줄일 수 있고, 검색 성능이 올라간다.
- commit작업을 동반하기 때문에 비용이 크다.

- es에서는 forced merge API를 통해 루씬 merge 작업을 강제 수행할 수 있다.
- 파편화된 다수 세그먼트들을 병합한다.
- 강제 수행하지 않더라도, 백그라운드로 주기적으로 수행된다.

저작자표시 비영리 변경금지 (새창열림)

'우리는 개발자 > Data Engineering' 카테고리의 다른 글

[elasticsearch] es를 intellij 에서 debugging 하기. (0)	2020.02.26
[kubernetes] mac에 minikube설치하기. (0)	2020.02.18
[elasticsearch] nested type, nested type aggregation. (0)	2020.01.11
[elasticsearch] fielddata, doc_values에 대한 이해. (0)	2020.01.11
[elasticsearch] cluster update setting. persistent, transient, default. (0)	2019.12.20

[elasticsearch] nested type, nested type aggregation.

2020. 1. 11. 08:24

elasticsearch에서는 nested type을 사용할 수 있다.
문서안에 object array를 저장할 수 있고, 그것이 nested type이다.

es가이드에 나온 예제 처럼 user object를 array형태로 넣을 수 있다.

{
  "group" : "fans",
  "user" : [
    {
      "first" : "John",
      "last" :  "Smith",
      "score" : 90
    },
    {
      "first" : "Alice",
      "last" :  "White",
      "scroe" : 100
    }
  ]
}

내가 임의로 score라는 필드도 추가해 넣어보았다.
그러면 nested 영역에 들어가지 않는 "group"이라는 필드를 key로 score의 sum을 구하고 싶다면 어떻게 해야할까?

아래와 같이 reverse_nested 구문을 사용하면 된다.
주의 할 것은 먼저 nested한 영역부터 작성해주고 그 안에 reverse_nested, nested 밖 내용을 작성해주면 된다.

  "aggs": {
    "user": {
      "nested": {
        "path": "user"
      },
      "aggs": {
        "score": {
          "sum": {
            "field": "combined_struct.score"
          },
          "aggs": {
            "group_by_key": {
              "reverse_nested": {}, 
              "aggs": {
                "group_sum_score": {
                  "terms": {
                    "field": "group"
                  }
                }
              }
            }
          }
        }
      }
    }
  }

https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html

Nested datatype | Elasticsearch Reference [7.5] | Elastic

Because nested documents are indexed as separate documents, they can only be accessed within the scope of the nested query, the nested/reverse_nested aggregations, or nested inner hits. For instance, if a string field within a nested document has index_opt

www.elastic.co

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html

Nested Aggregation | Elasticsearch Reference [7.5] | Elastic

A special single bucket aggregation that enables aggregating nested documents. For example, lets say we have an index of products, and each product holds the list of resellers - each having its own price for the product. The mapping could look like: PUT /p

www.elastic.co

저작자표시 비영리 변경금지 (새창열림)

'우리는 개발자 > Data Engineering' 카테고리의 다른 글

[kubernetes] mac에 minikube설치하기. (0)	2020.02.18
[elasticsearch] refresh, flush, optimize API. 루씬과의 비교. (0)	2020.01.27
[elasticsearch] fielddata, doc_values에 대한 이해. (0)	2020.01.11
[elasticsearch] cluster update setting. persistent, transient, default. (0)	2019.12.20
[elasticsearch] java heap memory 설정 하기 + es node 재시작. (0)	2019.12.18

[elasticsearch] fielddata, doc_values에 대한 이해.

2020. 1. 11. 08:02

elasticsearch에는 fielddata와 doc_values라는 것이 있고, 주요 개념이므로 이해가 필요하다.
더 근본적으로는 루씬의 개념이기 때문에 루씬에서 storedField와 docValue내용을 찾아보는 것이 좋다.

elasticsearch에서 data를 mapping할때에 keyword type과 text type이 있다.
keyword type의 경우 exact매칭에서 사용하고, text type의 경우 analyzed 매칭에 사용된다.
text type의 경우는 형태소 분석을 통해 field를 여러 terms로 나눠서 역인덱싱 과정을 거치게 되고,
keyword type은 그대로 역인덱싱 된다.
* 역인덱스란 키워드가가 어떤 문서에 포함되는지를 저장한다.

검색(search)이라는 것은 "어떤 문서가 이 키워드를 포함하는지가 궁금"하므로 역인덱스된 정보를 통해 검색이 빠른 검색이 가능하다.
그러나 sort, aggregation, accessing field value 와 같은 패턴은 "이 문서에서 이 field value값이 무엇인지"가 관심이므로 역인덱스 정보가 아닌 document를 key로, field정보를 담은 데이터 구조가 필요하다. 그 데이터 구조가 fielddata라는 것이다.

key	value
doc1	a:1, b:4, c:7
doc2	a:2, b:5, c:8
doc3	a:3, b:6, c:9

그런데 fielddata의 경우 in-memory구조로 작동하기 때문에 많은 heap memory를 소비하게된다. 일단 field가 heap에 로딩되면 그것은 segment의 lifetime동안 남아있게된다. 따라서 비용이 높은 프로세스가된다.

text field를 사용하게 되면 fielddata 데이터 구조를 사용할 수 있는데 위의 설명과 같이 높은 비용때문에 default false로 되어있다.
필요한 경우는 fielddata=true로 옵션을 변경하여 사용하되, memory사용에 주의한다.

keyword field에서는 fileddata의 in-memory에서 동작하는 구조를 개선하여, on-disk data structure인 doc_values 사용이 가능하다.
doc_values는 아래와 같이 column-oriented fashion으로 더욱 유리하게 sort, aggregation 등을 할 수 있다.

key	doc1	doc2	doc3
a	1	2	3
b	4	5	6
c	7	8	9

keyword type과 text type은 이렇게 analyzed 되냐 안되냐의 차이뿐 아니라 fielddata, doc_values와 같은 데이터 구조 사용 여부도 달라지므로 적절한 data mapping과 옵션 설정이 중요하다.

* 위의 설명은 친절한 es 가이드와 루씬내용을 따로 찾아 정리하였습니다.
* es 가이드를 상세히 읽고, 그에 따른 루씬 내용을 찾아보면 이해하기가 좋은 것 같아요.

https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html

fielddata | Elasticsearch Reference [7.5] | Elastic

Most fields are indexed by default, which makes them searchable. Sorting, aggregations, and accessing field values in scripts, however, requires a different access pattern from search. Search needs to answer the question "Which documents contain this term?

www.elastic.co

https://www.elastic.co/guide/en/elasticsearch/reference/current/doc-values.html

doc_values | Elasticsearch Reference [7.5] | Elastic

Most fields are indexed by default, which makes them searchable. The inverted index allows queries to look up the search term in unique sorted list of terms, and from that immediately have access to the list of documents that contain the term. Sorting, agg

www.elastic.co

저작자표시 비영리 변경금지 (새창열림)

'우리는 개발자 > Data Engineering' 카테고리의 다른 글

[elasticsearch] refresh, flush, optimize API. 루씬과의 비교. (0)	2020.01.27
[elasticsearch] nested type, nested type aggregation. (0)	2020.01.11
[elasticsearch] cluster update setting. persistent, transient, default. (0)	2019.12.20
[elasticsearch] java heap memory 설정 하기 + es node 재시작. (0)	2019.12.18
[elasticsearch] kibana 설치, 연동하기 + filebeat설치하기. (2)	2019.12.17

Web UI 개발할때는 storybook을 이용해보자

2019. 12. 20. 22:28

UI개발할때 항상 내부의 기능과 연결이 되어있으면 코드의 유지보수가 어려운데 storybook을 사용하면 isolation이 되서 UI만 따로 구성할 수 있는 장점이 있다. UI컴포넌트들을 생성하고, 그 컴포넌트들을 사용하면 끝!

동작하는 방식은 yarn으로 설치를 하고 나만의 stroybook을 로컬에 띄우고 UI를 컴포넌트 별로 생성하고 관리하면 된다. 스토리북을 처음에 동작시키면 아래와 같이 브라우저에 표시가 되는데

내가 만약 UI 컴포넌트 별로 여러개의 UI를 만든다면 아래와 같이 구성이 된다. story에는 내가 작성한 코드들이 들어있고, 버튼을 누를때 action이 있다면 actions에 표시가 된다.

https://storybook.js.org/docs/basics/introduction/

Introduction Edit this page Storybook is a user interface development environment and playground for UI components. The tool enables developers to create components independently and showcase components interactively in an isolated development environment.

storybook.js.org

스토리북에 대해서 학습하고 싶으면 아래 튜토리얼을 따라해보자.

https://www.learnstorybook.com/

Storybook Tutorials

Learn Storybook teaches frontend developers how to create UIs with components and design systems. Our free in-depth guides are created by Storybook maintainers and peer-reviewed by the open source community.

www.learnstorybook.com

Airbnb에서도 다음과 같이 자신만의 stroybook을 만들어서 운영하고 있다.

https://airbnb.io/react-dates/?path=/story/drp-calendar-props--open-up

Storybook

airbnb.io

이렇게 UI 컴포넌트 별로 생성해서 프로젝트에서의 의존성을 줄이는게 좋을것 같다는 생각!

저작자표시 비영리 변경금지 (새창열림)

'우리는 개발자 > Web' 카테고리의 다른 글

React Routing 처음 구조 잡을때 참고하면 좋은 사이트 (0)	2019.12.20
React 를 시작하다. (0)	2019.12.20

PREV 1 2 3 4 5 6 7 ···9 NEXT

더블리의 12층