BlogDatabasesDeploy a Cloud-Based Electronic Document Management System

Deploy a Cloud-Based Electronic Document Management System

November 2, 2022

_Highly-Available-EDMS-with-Mayan-and-PostgreSQL

For day-to-day work, document storage is typically done through online productivity software and cloud storage. It gets more challenging as an application needs to process, store, and retrieve larger volumes. Using an Electronic Document Management System (EDMS) is a better solution, as they are designed to store, index, and retrieve documents with high performance and availability, and some include features like customizable metadata and version control.

While there are many SaaS-based EDMS solutions available, you can deploy your own open source EDMS to maintain complete control over your data. In this post, you’ll learn how to set up a highly-available Mayan EDMS backed by a PostgreSQL database.

EDMS Benefits

This setup is ideal if you store and process a large number of documents and need an EDMS that is attached to a web-based application, removing the need for any client side installations. Running an EDMS as a central hub ensures:

security, privacy, and total control of your data;
easy integration with third-party software; and
automation of document workflows for business processes.

Why PostgreSQL?

PostgreSQL is a powerful, open source object-relational database management system that is highly valued for its scalability, security, and performance. In order to support end-to-end scaling for your application, your database also needs to be highly available, so this architecture example incorporates a replication tool specifically for PostgreSQL.

Getting Started with Mayan EDMS

Mayan is a web-based-based open source EDMS written in Python. Mayan defaults (by design) to installing and running on a single system; all of your application and database components can live on a single server or within several Docker containers. Though this is great for testing or trivial environments, for a production environment we want high availability and a widely known and adopted concept known as the SoC (Separation of Concern) principle. This is crucial best practice for building layered and scalable applications. This reference architecture demonstrates how to do that with Mayan.

Pros

Open source means no licensing fees
Easily store, view, and revert document versions
Full text search of documents using customizable user-defined metadata
Flexible access controls to design effective user roles and permissions
Customizable workflows with event triggers to keep documents up to date

Cons

Complex for smaller use cases
User interface is less intuitive than other solutions
Resource heavy for CPUs running optical character recognition (OCR)

Application Reference Architecture

To optimize Mayan’s capabilities in a real-world applications, our architecture utilizes:

NGINX: Web server
Prometheus & Grafana: Monitoring and observability tools
PostgreSQL: Database
Bucardo: PostgreSQL bi-directional database replication
Linode Object Storage: S3-compatible and highly available storage
keepalived: IP failover

A NodeBalancer distributes traffic to our application nodes. If one application server goes down, the load balancing service will begin only directing traffic to the healthy node. As soon as the unhealthy node recovers it will resume balancing connections as before. This makes it easy to add, remove, or update application servers without downtime, all while maintaining connections to the PostgreSQL database nodes.

For the “brain” of the application, Mayan and NGINX are deployed on the same virtual machines and we can leverage Mayan’s support for s3boto3 as a storage backend to upload our documents to Linode’s S3-compatible Object Storage.

If your application is mission-critical and uses PostgreSQL as a primary backend database, incorporating Bucardo provides a better uptime guarantee and makes your database fault-tolerant.

You can also achieve high-availability and replication with a managed database service that supports PostgreSQL, but keep in mind that most DBaaS offerings focus on updating PostgreSQL versions and keeping your database cluster online and available. Implementing Bucardo gives your PostgreSQL database bi-directional replication between two or more database nodes, ensuring that your database is highly available.

In this example, all nodes are secured with Cloud Firewalls for protection from the public internet and communicate internally via private VLAN. The application servers connect to the databases via a shared floating VLAN IP address with keepalived to facilitate failover.

Keepalived, or another IP failover system like FRRouting (FRR), is implemented at the database level so that a healthy database node will be connected to the cluster of your application nodes.

Achieving Fault Tolerance for Critical Files

An EDMS will often serve as a central hub for day to day operations and host some of your organization’s most critical files. Our application is built with redundancy at every level for baseline fault tolerance and optimize performance:

Documents are stored on Linode’s highly available Object Storage.
The database is on a separate node to increase performance and prevent having a single point of failure.
Bucardo performs automatic database replication between the Postgres nodes.

Explore More Technical Content and Architectures

Our Solutions Engineering team shares frameworks, guides, and tools like this one to make it easier for developers to build applications that follow best practices for software architecture. Check out our Galera cluster reference architecture for a highly available MySQL/MariaDB architecture, or browse our available reference architecture examples on Linode Docs.

Comments (2)

stepwey July 21, 2023 at 8:59 am

How much those it cost to implement the mayan edms in a month and in a year.
Your swift response is best appreciated

Reply
- tlambert July 21, 2023 at 1:30 pm
  
  If you’re using the Terraform script in our guide , you will deploy four 2GB compute instances ($48.00) and an Object Storage Bucket ($5.00). Additionally, as mentioned in the guide, you will want to deploy an additional node for Prometheus and Grafana ($5.00) as well as a NodeBalancer ($10.00). These services together would be roughly $68.00/month before taxes. This is assuming the amount of data your Object Storage was not more than 250GB and you stayed within your Network Transfer Allowance. Again, based on these assumptions, your yearly cost would be roughly $812.00.
  
  You have the option to edit the Terraform script and change the default compute instance to a Nanode, however, I can’t guarantee the performance of the deployment with that plan.
  
  Reply

Compute

Storage

Databases

Networking

Developer Tools

Delivery

Security

Services

Industries

Pricing

Community

Engage With Us

Deploy a Cloud-Based Electronic Document Management System

You might also like...

Comments (2)

Leave a Reply Cancel reply

Deploy a Cloud-Based Electronic Document Management System

You might also like...

Getting Started with LLMs: Managing Data Collection

How to Start Delivering Global-Scale Services | HarperDB Install and Set Up

Distributed Database Computing | Apache Cassandra Running in Multiple Data Centers

Comments (2)

Leave a Reply Cancel reply

Sign up for the “In the Node” Newsletter