見出し画像

先進企業のデータ分析基盤の構成(単語集まとめ)


Gunosy@cosme(istyle)ZOZO、この3社のデータ基盤と分析に関する下記の資料を読み解くための単語まとめです。

データ分析基盤Developers Night #3〜データを活かす基盤の作り方〜
https://techplay.jp/eventreport/740541?fbclid=IwAR1hclqLTvdMwfdUFbBIlfYqeLjV2szmJHsUcSikkkfF115dLDyYywZ3bxQ

サーバサイドが、ApacheとMySQLとCGIの時代が懐かしすぎます。ビッグデータで、めっちゃ複雑になっていて楽しい状況。あとで、カテゴライズをまとめ直したいと思っています。まず、β版的に一覧でメモ。

ETL
https://www.ashisuto.co.jp/eai_blog/article/201811_etl.html
https://ja.wikipedia.org/wiki/Extract/Transform/Load
https://boxil.jp/mag/a2415/
https://it-koala.com/etl-1220

Digdag
https://tech.griphone.co.jp/2018/12/09/advent-calendar-20181209/
https://dev.classmethod.jp/server-side/understanding-digdag-architecture-and-concepts/
https://qiita.com/hiroysato/items/d0fe5e2d88c267413a82

Embulk
https://speakerdeck.com/civitaspo/embulknizu-rinai5tufalsekoto
https://techlog.voyagegroup.com/entry/2017/07/31/173839
https://qiita.com/hiroysato/items/397f36c4838a0a93e352
https://qiita.com/tashiro_gaku/items/f7fa0f1a99c759d947a7

Vertica
https://www.ashisuto.co.jp/product/category/database/vertica/
http://vertica-tech.ashisuto.co.jp/about-vertica/
https://gihyo.jp/dev/serial/01/database-vertica/0001

Amazon Athena
https://aws.amazon.com/jp/athena/
https://docs.aws.amazon.com/ja_jp/athena/latest/ug/what-is.html
https://qiita.com/miyasakura_/items/174dc73f706e8951dbdd

Redis
https://aws.amazon.com/jp/redis/
https://qiita.com/gold-kou/items/966d9a0332f4e110c4f8
https://future-architect.github.io/articles/20190821/
https://ja.wikipedia.org/wiki/Redis

Sidekiq
https://qiita.com/nysalor/items/94ecd53c2141d1c27d1f
https://system.blog.uuum.jp/entry/2017/10/17/110000
https://qiita.com/zaru/items/8385fdddbd1be25fe370
http://shiro-16.hatenablog.com/entry/2015/10/12/192412
https://re-engines.com/2017/12/20/rails%E3%81%A7sidekiq%E3%82%92%E4%BD%BF%E3%81%A3%E3%81%A6%E3%81%BF%E3%81%9F/

DRE
https://gunosiru.gunosy.co.jp/entry/gunosytechlab_dre
https://devops.cioapplicationseurope.com/cxoinsights/data-reliability-engineering-tackling-the-data-quality-problem-nid-1485.html
https://tech.mercari.com/entry/2015/11/18/153421

ETL Workflow
https://www.abhishek-tiwari.com/etl-workflow-modeling/
https://support.treasuredata.com/hc/en-us/sections/360000316848-ETL-and-Workflow-Tools
https://dev.classmethod.jp/cloud/aws/20190621-aws-glue-workflow/#ct-1

Apache Airflow
https://airflow.apache.org/
https://dev.classmethod.jp/tool/airflow-getting-started/
https://dev.classmethod.jp/server-side/what-is-airflow/
https://qiita.com/chan-p/items/526bbed95fdc73142c59
https://www.slideshare.net/takemikami/apache-airflow-73211709
https://medium.com/programming-soda/apache-airflow%E3%81%A7%E3%82%A8%E3%83%B3%E3%83%89%E3%83%A6%E3%83%BC%E3%82%B6%E3%83%BC%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7%BF%92%E3%83%91%E3%82%A4%E3%83%97%E3%83%A9%E3%82%A4%E3%83%B3%E3%82%92%E6%A7%8B%E7%AF%89%E3%81%99%E3%82%8B-part2-1bc31fd872c8

Amazon EMR
https://aws.amazon.com/jp/emr/
https://docs.aws.amazon.com/ja_jp/emr/latest/ManagementGuide/emr-what-is-emr.html

Amazon EC2 Spot Fleet
https://dev.classmethod.jp/cloud/aws/auto-scaling-for-ec2-spot-fleets/
https://docs.aws.amazon.com/ja_jp/AWSEC2/latest/UserGuide/spot-fleet.html
http://www.mpon.me/entry/2018/01/24/224609
https://qiita.com/f96q/items/28d3c2dd7ad55bf06747

CTAS(CREATE TABLE AS) Amazon Athena
https://dev.classmethod.jp/cloud/aws/amazon-athena-support-ctas/
https://docs.aws.amazon.com/ja_jp/athena/latest/ug/ctas.html
https://docs.aws.amazon.com/ja_jp/athena/latest/ug/ctas-examples.html

Apache Kafka
https://www.redhat.com/ja/topics/integration/what-is-apache-kafka
https://kafka.apache.org/
https://qiita.com/sigmalist/items/5a26ab519cbdf1e07af3
https://www.slideshare.net/hadoopxnttdata/hadoop-spark-conference-japan-2019nttdata

Apache Spark
https://qiita.com/Hiroki11x/items/4f5129094da4c91955bc
https://www.slideshare.net/hadoopxnttdata/apache-spark-for-beginners-ntt-data-saruta-spark-conference-japan-2016
https://www.slideshare.net/hadoopxnttdata/apache-spark-nttdata-devsummit2016
https://www.atmarkit.co.jp/ait/articles/1608/24/news014.html
https://spark.apache.org/

PySpark
https://speakerdeck.com/chie8842/pythondeda-liang-detachu-li-pysparkwoyong-itadetachu-li-tofen-xi-falsekihon
https://www.slideshare.net/iktakahiro/pyspark1
https://qiita.com/mkyz08/items/0c1d8fa47179933c3a56
https://ohke.hateblo.jp/entry/2018/09/15/230000
https://dev.classmethod.jp/tool/pyspark-with-jupyter-on-mac/

Elasticsearch
https://aws.amazon.com/jp/elasticsearch-service/the-elk-stack/what-is-elasticsearch/
https://www.elastic.co/jp/what-is/elasticsearch
https://qiita.com/nskydiving/items/1c2dc4e0b9c98d164329
https://ja.wikipedia.org/wiki/Elasticsearch
https://dev.classmethod.jp/etc/es-01/

HBase
https://ja.wikipedia.org/wiki/Apache_HBase
https://www.publickey1.jp/blog/10/hbasenosql.html
https://thinkit.co.jp/article/11882

Hive
https://ja.wikipedia.org/wiki/Apache_Hive
https://www.idcf.jp/words/hive.html
https://www.gixo.jp/blog/12453/

Presto
https://qiita.com/haramiso/items/122d4ea0e5660e0b4e41
https://tug.red/entry/2014/07/10/150250/
https://www.idcf.jp/words/presto.html

Fluentd
https://www.fluentd.org/
https://knowledge.sakura.ad.jp/1336/
https://qiita.com/ritorut18/items/4230ec6b524be15ede01
https://dev.classmethod.jp/cloud/aws/fluentd-settings-with-some-os-logs/
https://abicky.net/2017/10/23/110103/

Yarn
https://yarnpkg.com/lang/ja/
https://qiita.com/lelouch99v/items/c97ff951ca31298f3f24
https://www.webprofessional.jp/yarn-vs-npm/
https://qiita.com/jigengineer/items/c75ca9b8f0e9ce462e99

データフォーマット(カラムナフォーマット)
https://engineer.retty.me/entry/columnar-storage-format
https://speakerdeck.com/chie8842/karamunahuomatutofalsekihon-2
https://www.publickey1.jp/blog/11/post_175.html
https://techblog.yahoo.co.jp/entry/20190924753251/
https://ja.wikipedia.org/wiki/%E5%88%97%E6%8C%87%E5%90%91%E3%83%87%E3%83%BC%E3%82%BF%E3%83%99%E3%83%BC%E3%82%B9%E7%AE%A1%E7%90%86%E3%82%B7%E3%82%B9%E3%83%86%E3%83%A0
https://qiita.com/koijima/items/0eed4272c97af4198886

Apache Avro
https://tech.mercari.com/entry/2019/05/20/115839
https://bufferings.hatenablog.com/entry/2017/06/25/142430

ORC
https://orc.apache.org/
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-orc?hl=ja
https://qiita.com/n_Sekiguchi/items/05e67d9a06d1fd3258c6
https://docs.aws.amazon.com/ja_jp/athena/latest/ug/columnar-storage.html

Parquet
https://dev.classmethod.jp/cloud/aws/amazon-athena-using-parquet/
https://qiita.com/TaigoKuriyama/items/cedcc9436f4456191601

Hadoop HDFS
https://oss.nttdata.com/hadoop/hadoop.html
http://e-words.jp/w/HDFS.html
https://dev.classmethod.jp/hadoop/hadoop-advent-calendar-03-hdfs/
https://www.slideshare.net/techblogyahoo/hdfs-136527062
https://blog.cloudera.co.jp/small-files-big-foils-addressing-the-associated-metadata-and-application-challenges-2e0ce893723

Kafka Connect
https://www.clear-code.com/blog/2018/2/26.html
https://www.slideshare.net/keigosuda/apache-kafka-kafka-connect-etl-70167024

Looker
https://www.ksk-anl.com/product/looker/
https://dev.classmethod.jp/business/business-analytics/looker-overview/
https://enterprisezine.jp/dbonline/detail/12264

arm treasure data
https://www.treasuredata.co.jp/
https://www.treasuredata.co.jp/learn/why-treasure-data/
https://jp.techcrunch.com/2018/08/03/arm-treasure-data/

Power BI
https://powerbi.microsoft.com/ja-jp/
https://powerbi.microsoft.com/ja-jp/what-is-power-bi/
https://www.cloudtimes.jp/dynamics365/blog/what-is-powerbi.html

facebook Prophet
https://qiita.com/japanesebonobo/items/a7309bc07c59c11bee7b
https://www.kazukiio.com/entry/2018/08/23/134214
https://haltaro.github.io/2018/07/22/summer-prophet
https://data.gunosy.io/entry/change-point-detection-prophet

Matrix Factorization
https://qiita.com/ysekky/items/c81ff24da0390a74fc6c
https://qiita.com/michi_wkwk/items/52660778ad6a900965ee
https://ameblo.jp/principia-ca/entry-10980281840.html
https://medium.com/@catindog/pytorch%E3%81%A7%E3%82%88%E3%82%8A%E6%B7%B1%E3%81%84matrix-factorization-9eda184a7f66

Re:dash
https://redash.io/
https://seleck.cc/614
https://cloudpack.media/38436



最後まで読んで頂き、ありがとうございます。