Minio¶
Minio is an object store very similar to AWS S3. It allows your data to be remotely accessible via the S3 protocol.
Python API¶
Steps to access data using Python Minio client API
- Log into Minio. If you get an error "Expecting a policy to be set for user
X
or one of their groups", contact Pasha, Arfath or Kohli, Armaan with the 'X' String and the names of the buckets you need access to so we can give you access to these buckets. - Create a "Service Account" with a simple access key and the recommended secret key. Copy the secret key to clipboard on creation.
- Store your access key and secret key in a python-dotenv file.
Create .env¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
- Change permissions to the dotenv file so only you can read and write to it.
Permissions¶
1 2 |
|
pip install python-dotenv minio
6. Get the full SSL certifcate chain for the Minio instance
SSL Cert¶
1 2 3 4 |
|
URL | HOST | PORT |
---|---|---|
https://tllihpcmind6/minio | tllihpcmind6 | 9000 |
https://pllimsksparky3/minio/large | pllimsksparky3 | 9006 |
https://pllimsksparky3/minio/lake | pllimsksparky3 | 9007 |
https://pllimsksparky3/minio/small | pllimsksparky3 | 9008 |
https://pllimsksparky3/minio/user | pllimsksparky3 | 9009 |
-
Set up python boiler-plate code for creating a minio client object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
from minio import Minio import urllib3 import os from dotenv import load_dotenv from pathlib import Path env_path = Path('.')/'.env' load_dotenv(dotenv_path=env_path) ACCESS_KEY = os.getenv('ACCESS_KEY') SECRET_KEY = os.getenv('SECRET_KEY') # required for self-signed certs httpClient = urllib3.PoolManager( cert_reqs='CERT_REQUIRED', ca_certs='certificate.crt' ) # Create secure client with access key and secret key client = Minio( "tllihpcmind6:9000", access_key=ACCESS_KEY, secret_key=SECRET_KEY, secure=True, http_client=httpClient ) # list objects in a bucket for ii in client.list_objects("test"): print(ii.__dict__)
-
Try other commands like get_object and put_object from the Minio API.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
import pandas as pd obj = client.get_object(<BUCKET>,<CSV_FILE_PATH>) df = pd.read_csv(obj) df # for parquet files from io import BytesIO obj = client.get_object(<BUCKET>,<PARQUET_FILE_PATH>) df = pd.read_parquet(BytesIO(obj.data)) pq_obj = df.to_parquet() client.put_object(<BUCKET>, <PARQUET_FILE_PATH>, data=BytesIO(pq_obj), length=len(pq_obj))
Access via R¶
Data stored in minio can be accessed in R in the following fashion:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
Troubleshooting¶
-
If while making a minio client call (mc) you get a certificate error, use '–insecure' in your call to disable certificate verification. The connection will still be encrypted, ony that the certificate verification process with a certificate authority will be skipped in the SSL protocol. This is sometimes necessary for self-signed certificates. Our certificate is a self-signed certificate issued by MSK Open Systems .
-
mc:
Unable to initialize new alias from the provided credentials. Get "https://pllimsksparky3:9006": dial tcp: lookup pllimsksparky3 on 140.163.135.19:53: server misbehaving. - Solution: Ping the server to get its IP address and add it to you '/etc/hosts' file as '10.254.130.16 pllimsksparky3'.