Box の大量データを GCS に送りたい

背景

Box にたくさんデータがあって、それを Google Cloud Platform (以降 GCP)の Google Cloud Storage に送りたいです。ひとつひとつのファイルサイズはテキストなのに数十MBから数百MBとわりと大きいし、数は数百から数千とかありそうです。転送の頻度はせいぜい毎月です。

Box のデータをいったん手元にダウンロードするのも、さらにそれをアップロードするのも大変でどうしたものかと調べていたら rclone に行き当たりました。お金がたくさん使えるなら trocco を使いたいところではあるけどなかなかお高いので諦めました。

結論

rclone を使って以下のように実行するとローカルのストレージを消費せずにファイルの転送ができちゃいます。

$ rclone copy box:org/file.txt gcs:dst/

これは良いですね。ということで、以下に設定の過程を書きます。

設定

インストール

本当はコンテナでいきたかったのですが、ブラウザでクラウドプロバイダに飛んで設定するのができなかったので諦めて Homebrew で入れました。

$ brew install rclone

Box の設定

ポイントとしては以下です。

  • name は box にしたこれは実際に使う際の接頭辞になります

  • ストレージの選択はもちろん Box を選択する

  • ブラウザが開いて認可を求められるので承諾する

$ rclone config

No remotes found, make a new one?
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n

Enter name for new remote.
name> box

Option Storage.
Type of storage to configure.
Choose a number from below, or type in your own value.
(中略)
8 / Box
\ (box)
(中略)
Storage> box

Option client_id.
OAuth Client Id.
Leave blank normally.
Enter a value. Press Enter to leave empty.
client_id>

Option client_secret.
OAuth Client Secret.
Leave blank normally.
Enter a value. Press Enter to leave empty.
client_secret>

Option box_config_file.
Box App config.json location
Leave blank normally.
Leading `~` will be expanded in the file name as will environment variables such as `${RCLONE_CONFIG_DIR}`.
Enter a value. Press Enter to leave empty.
box_config_file>

Option access_token.
Box App Primary Access Token
Leave blank normally.
Enter a value. Press Enter to leave empty.
access_token>

Option box_sub_type.
Choose a number from below, or type in your own string value.
Press Enter for the default (user).
1 / Rclone should act on behalf of a user.
\ (user)
2 / Rclone should act on behalf of a service account.
\ (enterprise)
box_sub_type>

Edit advanced config?
y) Yes
n) No (default)
y/n>

Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine

y) Yes (default)
n) No
y/n>

2022/08/12 14:51:34 NOTICE: If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth?state=xxxxxxxxxxxx
2022/08/12 14:51:34 NOTICE: Log in and authorize rclone for access
2022/08/12 14:51:34 NOTICE: Waiting for code...
2022/08/12 14:58:36 NOTICE: Got code
Configuration complete.
Options:
- type: box
- token: {"access_token":"XXXxxxYYYyyy","token_type":"bearer","refresh_token":"AAAaaaBBBbbb","expiry":"2022-08-12T16:01:25.35476+09:00"}
Keep this "box" remote?
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d>

Current remotes:

Name Type
==== ====
box box

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q

GCS の設定

ポイントとしては以下です。

  • name は gcs にしました

  • プロジェクト番号は GCP のコンソールでプロジェクトのトップページにアクセスすると表示されています

  • リージョンはバケットのリージョンに合わせて。自分は諸事情により Oregon です

  • bucket_policy_only の値は true にしました。 GCP 力が足りない。

$ rclone config
Current remotes:

Name Type
==== ====
box box

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> n

Enter name for new remote.
name> gcs

Option Storage.
Type of storage to configure.
Choose a number from below, or type in your own value.
(中略)
17 / Google Cloud Storage (this is not Google Drive)
\ (google cloud storage)
(中略)
Storage> 17

Option client_id.
OAuth Client Id.
Leave blank normally.
Enter a value. Press Enter to leave empty.
client_id>

Option client_secret.
OAuth Client Secret.
Leave blank normally.
Enter a value. Press Enter to leave empty.
client_secret>

Option project_number.
Project number.
Optional - needed only for list/create/delete buckets - see your developer console.
Enter a value. Press Enter to leave empty.
project_number> 123456789012

Option service_account_file.
Service Account Credentials JSON file path.
Leave blank normally.
Needed only if you want use SA instead of interactive login.
Leading `~` will be expanded in the file name as will environment variables such as `${RCLONE_CONFIG_DIR}`.
Enter a value. Press Enter to leave empty.
service_account_file>

Option anonymous.
Access public buckets and objects without credentials.
Set to 'true' if you just want to download files and don't configure credentials.
Enter a boolean value (true or false). Press Enter for the default (false).
anonymous>

Option object_acl.
Access Control List for new objects.
Choose a number from below, or type in your own value.
Press Enter to leave empty.
/ Object owner gets OWNER access.
1 | All Authenticated Users get READER access.
\ (authenticatedRead)
/ Object owner gets OWNER access.
2 | Project team owners get OWNER access.
\ (bucketOwnerFullControl)
/ Object owner gets OWNER access.
3 | Project team owners get READER access.
\ (bucketOwnerRead)
/ Object owner gets OWNER access.
4 | Default if left blank.
\ (private)
/ Object owner gets OWNER access.
5 | Project team members get access according to their roles.
\ (projectPrivate)
/ Object owner gets OWNER access.
6 | All Users get READER access.
\ (publicRead)
object_acl>

Option bucket_acl.
Access Control List for new buckets.
Choose a number from below, or type in your own value.
Press Enter to leave empty.
/ Project team owners get OWNER access.
1 | All Authenticated Users get READER access.
\ (authenticatedRead)
/ Project team owners get OWNER access.
2 | Default if left blank.
\ (private)
3 / Project team members get access according to their roles.
\ (projectPrivate)
/ Project team owners get OWNER access.
4 | All Users get READER access.
\ (publicRead)
/ Project team owners get OWNER access.
5 | All Users get WRITER access.
\ (publicReadWrite)
bucket_acl>

Option bucket_policy_only.
Access checks should use bucket-level IAM policies.
If you want to upload objects to a bucket with Bucket Policy Only set
then you will need to set this.
When it is set, rclone:
- ignores ACLs set on buckets
- ignores ACLs set on objects
- creates buckets with Bucket Policy Only set
Docs: https://cloud.google.com/storage/docs/bucket-policy-only
Enter a boolean value (true or false). Press Enter for the default (false).
bucket_policy_only> true

Option location.
Location for the newly created buckets.
Choose a number from below, or type in your own value.
Press Enter to leave empty.
(中略)
26 / Oregon
\ (us-west1)
(中略)
location> 26

Option storage_class.
The storage class to use when storing objects in Google Cloud Storage.
Choose a number from below, or type in your own value.
Press Enter to leave empty.
1 / Default
\ ()
2 / Multi-regional storage class
\ (MULTI_REGIONAL)
3 / Regional storage class
\ (REGIONAL)
4 / Nearline storage class
\ (NEARLINE)
5 / Coldline storage class
\ (COLDLINE)
6 / Archive storage class
\ (ARCHIVE)
7 / Durable reduced availability storage class
\ (DURABLE_REDUCED_AVAILABILITY)
storage_class>

Edit advanced config?
y) Yes
n) No (default)
y/n>

Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine

y) Yes (default)
n) No
y/n>

2022/08/12 15:27:52 NOTICE: If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth?state=xxxxxxxxxxxx
2022/08/12 15:27:52 NOTICE: Log in and authorize rclone for access
2022/08/12 15:27:52 NOTICE: Waiting for code...
2022/08/12 15:27:58 NOTICE: Got code
Configuration complete.
Options:
- type: google cloud storage
- project_number: 123456789012
- location: us-west1
- token: {"access_token":"xxxXXXyyyYYYY","token_type":"Bearer","refresh_token":"AAAaaaBBBbbb","expiry":"2022-08-12T16:27:57.877108+09:00"}
Keep this "gcs" remote?
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d>

Current remotes:

Name Type
==== ====
box box
gcs google cloud storage

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q

この記事が気に入ったらサポートをしてみませんか?