Leveraging Kubernetes Data-Oriented Projects with Portworx

Published June 25, 2024 2 Contributors

Other contributors: Adam Overa

View edit history on GitHub → Originally authored by Cameron Laird

Traducciones al Español
Estamos traduciendo nuestros guías y tutoriales al Español. Es posible que usted esté viendo una traducción generada automáticamente. Estamos trabajando con traductores profesionales para verificar las traducciones de nuestro sitio web. Este proyecto es un trabajo en curso.

Create a Linode account to try this guide with a $ credit.

This credit will be applied to any valid services used during your first days.

Management of data at scale is crucial for deriving actionable insights, and an effective data platform can provide those kinds of insights. A data platform is the technology infrastructure used for the collection, storage, transaction processing, and analysis of varied data at scale. It simplifies engineering tasks such as expanding the storage available to an application or encrypting project secrets.

Portworx handles advanced storage and data management capabilities for cloud-native environments. This guide provides step-by-step instructions for installing Portworx on an existing Kubernetes cluster. It then walks through setting up a model project to demonstrate Portworx’ capabilities.

What Is Portworx?

Portworx enables deployment and management of storage and data services specifically in containerized environments. It also handles data replication, snapshots, backups, and data recovery, allowing application systems to focus on their own specific requirements. Since Portworx itself is cloud-native, it plays a crucial role in helping other systems maximize the capabilities of the cloud.

Current cloud computing practices face several challenges, particularly the difficulty of managing Kubernetes instances in the real-world. Portworx mitigates some of these challenges.

A limited version of the Portworx Storage Platform is available for free. It allows for an implementation of object storage for a single distributed cluster. This guide focuses on the free, downloadable software that you can install and run for your own educational and small-scale uses.

How Portworx Relates to Kubernetes, Kafka, and Cassandra

Portworx integrates with widely known software systems such as Kubernetes, Kafka, and Cassandra:

Kubernetes serves as the foundation of most Portworx implementations. However, Portworx is also compatible with other container orchestration systems.
Cassandra is an open source distributed database management system that emphasizes economical operation, high availability, and wide-column semantics. Portworx addresses several of the challenges involved in configuring and operating Cassandra. For example, when running Cassandra in containers managed by Kubernetes, Portworx can effectively control memory, resource quotas, and/or CPU cores per Kubernetes cluster.
Kafka is a widely used open source distributed event store and stream-processing platform. In much the same way a traditional database system manages records of data, Kafka manages events. For Kafka to perform optimally, it needs a high-performance underlying storage system, and Portworx is a good choice. Teams and individuals often initially adopt Portworx to meet requirements for hosting or upgrading Kafka. Portworx also offers white papers specifically on the operation of Kafka in a Kubernetes environment.

Before You Begin

Create a Kubernetes cluster that meets the Portworx installation prerequisites. A Shared CPU, Linode 8 GB plan is suitable. You must have kubectl configured on your local machine to interact with the cluster. See our Getting Stated with Kubernetes guide for instructions. Also, take note of the Kubernetes version running on your cluster as it is needed later.
The Portworx installation prerequisites also include a backing drive (i.e. Volume) for each of three nodes, which must be at least 8 GB. Follow our Getting Started with Block Storage guide to create and attach a 10 GB Volume to each node. Creating volumes via the Storage tab of the individual Kubernetes instances is more efficient than via Volumes, as it creates and attaches in one step.
Sign up for a personal account on Portworx Central.

Note

This guide is written for a non-root user. Commands that require elevated privileges are prefixed with sudo. If you’re not familiar with the sudo command, see the Users and Groups guide.

Portworx Installation

To install Portworx, use the basic installation model on an existing Kubernetes cluster. This can be any existing Kubernetes cluster, whether using Linode Kubernetes Engine or a manually constructed setup. You can also use kind for an installation purely within your desktop development environment.

Portworx is not an open source system, though it supports many individual open source components, and some of its licenses involve no fee. However, installation is generally done through the Portworx website and not via standard command-line package managers such as apt or brew.

Follow the steps in the below sections to install Portworx on an existing Kubernetes cluster.

The Wizard

Open a web browser and log in to Portworx Central.
Select Get Started from the Welcome to Portworx section of the Portworx Central home page:
Choose the Portworx Essentials/Portworx CSI fee-free license for demonstration or proof-of-concept workloads:
Choose DAS/SAN as Platform and None for Distribution Name. Retain portworx as the default Namespace, but change the K8s Version to match the Kubernetes version of your cluster (e.g. 1.30.2).
Note
Use the following command to check your version of Kubernetes:
kubectl version
Select Save Spec to generate kubectl commands for Operator and StorageCluster, which reflect the specifications chosen for the Portworx installation. Copy the kubectl commands for use in the next section.
To save this configuration, fill in Spec Name and Spec Tags then click Save Spec again.
Your generated spec manifest is now available in the Spec List section of Portworx Central. You can download it at any time by clicking the three vertical dots under Actions and choosing Download.

Deployment

Use the first kubectl command generated in the previous section to deploy the Operator specification. The command structure should follow that of the example command below, with PORTWORX_VERSION_NUMBER and KUBERNETES_VERSION_NUMBER matching your respective Portworx and Kubernetes versions:

kubectl apply -f 'https://install.portworx.com/PORTWORX_VERSION_NUMBER?comp=pxoperator&kbver=KUBERNETES_VERSION_NUMBER&ns=portworx'

Sample output:

namespace/portworx created
serviceaccount/portworx-operator created
clusterrole.rbac.authorization.k8s.io/portworx-operator created
clusterrolebinding.rbac.authorization.k8s.io/portworx-operator created
deployment.apps/portworx-operator created

Use the second kubectl command generated in the previous section to deploy the StorageCluster specification. The command structure should resemble the example command below, with PX_USER_ID and PX_CLUSTER_ID being unique to your Portworx Central account:

kubectl apply -f 'https://install.portworx.com/PORTWORX_VERSION_NUMBER?operator=true&mc=false&kbver=KUBERNETES_VERSION_NUMBER&ns=portworx&oem=esse&user=PX_USER_ID&b=true&iop=6&c=px-cluster-PX_CLUSTER_ID&stork=true&csi=true&mon=true&tel=true&st=k8s&promop=true'

Sample output:

storagecluster.core.libopenstorage.org/px-cluster-PX_CLUSTER_ID created
secret/px-essential created

Note

Should you receive any errors, you can use the Generate Spec screen to view up-to-date commands.

Verification

Monitor the status of Portworx nodes with the following command:

kubectl -n portworx get storagenodes -l name=portworx

Once the deployments finish, each Portworx node appears as Online:

NAME                            ID                                     STATUS   VERSION           AGE
lke194968-280433-369bf4810000   f2522d07-0b59-482a-a8ae-bd2854fd7bc4   Online   3.1.2.0-fb52ced   4m46s
lke194968-280433-438a8b610000   b1920afd-5326-48dc-9572-af8e638fd92b   Online   3.1.2.0-fb52ced   4m45s
lke194968-280433-527b95040000   58649b41-b4c9-4c56-b983-e27a61c9f582   Online   3.1.2.0-fb52ced   4m46s

Use the following command to monitor the status of an individual node, replacing NODE_NAME with the NAME of one of the nodes listed in the prior command’s output:
kubectl -n portworx describe storagenode NODE_NAME

At this point, your working Kubernetes cluster includes a small Portworx deployment with a permanent fee-free license. You can use your cluster for educational practice, proofs-of-concept, or other demonstrations of Portworx’ capabilities.

Run a Model Portworx Project

Among the examples found in the Portworx documentation is Run Kafka on Kubernetes at Scale with Portworx. While thousands of organizations already deploy Kafka manually, Portworx can enhance the process. Replacement of a manual deployment with Portworx’ mediation automates disaster recovery, application-specific high availability, backup services, and capacity management.

Using the Portworx installation from the preceding section, follow the steps below to get started:

Create a specification file named sc-kafka-rf2.yaml:

nano sc-kafka-rf2.yaml

Paste in the following contents, and save your changes:

File: sc-kafka-rf2.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: px-sc-kafka-repl2
provisioner: kubernetes.io/portworx-volume
allowVolumeExpansion: true
parameters:
  repl: "2"
  priority_io: "high"
  io_profile: "db_remote"
  cow_ondemand: "true"
  disable_io_profile_protection: "1"
  nodiscard: "false"
  group: "kafka-broker-rep2"
  fg: "false"

This storage specification provides several automations, including replication. The repl: "2" parameter maintains two full replicas of broker data (i.e. Kafka’s content) across the failure domains of the hosting Kubernetes cluster. This ensures that Kafka continues without downtime should a node fail.

Use the following command to apply the StorageClass:

kubectl apply -f sc-kafka-rf2.yaml

Sample output:

storageclass.storage.k8s.io/px-sc-kafka-repl2 created

Storage-level replication makes it possible for Portworx to identify and re-assign healthy storage in the event of failure. This keeps data available while replicating almost immediately.

While Portworx generally requires several dozens of lines of configuration files, it’s typically less than what administrators may use to maintain a Kubernetes cluster. When terabytes of data are involved, Portworx’s efficient data utilization can result in large cost savings.

More Information

You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.

This page was originally published on June 25, 2024.

Join the conversation.

Read other comments or post your own below. Comments must be respectful, constructive, and relevant to the topic of the guide. Do not post external links or advertisements. Before posting, consider if your comment would be better addressed by contacting our Support team or asking on our Community Site.

The Disqus commenting system for Linode Docs requires the acceptance of Functional Cookies, which allow us to analyze site usage so we can measure and improve performance. To view and create comments for this article, please update your Cookie Preferences on this website and refresh this web page. Please note: You must have JavaScript enabled in your browser.

Compute

Storage

Networking

Databases

Services

Solutions

Pricing

Library

Technical Resources

Community

Marketplace

What's New

Search Results

No Results

Filters

Leveraging Kubernetes Data-Oriented Projects with Portworx

What Is Portworx?

How Portworx Relates to Kubernetes, Kafka, and Cassandra

Before You Begin

Portworx Installation

The Wizard

Deployment

Verification

Run a Model Portworx Project

More Information

Your Feedback Is Important

On this page