Image by author inspired by Logos which are trademarks of Trino and Apache Software Foundation respectively from left
Part I: Concepts and Ideas#
Background#
As demand for data grows daily, the requirement for data security in an enterprise setup is also increasing. In the Hadoop ecosystem, Apache Ranger has been a promising framework for data security with extensive plugins such as HDFS, Solr, Yarn, Kafka, Hive, and many more. Apache Ranger added a plugin for prestosql in version 2.1.0 but recently PrestoSQL was rebranded as Trino and that broke the working prestosql plugin for Apache Ranger.
I have submitted a patch for this issue and there is already an open JIRA issue here but that will not stop us from integrating Trino with Apache Ranger. For this tutorial, I have built the Apache Ranger 2.1.0 with the Trino plugin. If you want to build the Apache Ranger from source code including the trino plugin you can refer to this GitHub repository on the branch ranger-2.1.0-trino
and for this tutorial purpose, we will this Github repository.
Update: 2022â05â20
Trino plugin is now officially available in the ranger repository and it is released in Apache Ranger-2.3 https://github.com/apache/ranger/tree/ranger-2.3
Introduction to Components and Key Ideas#
Apache Ranger has three key components ranger-admin
 , ranger-usersync
and ranger-audit
 . Let us get introduced to these components.
Note: Configuring*ranger-usersync*
is out of scope for this tutorial and we will not use any*usersync*
component for this tutorial.
Ranger Admin#
The Ranger Admin component is a UI component using which we can create policies for the different access levels. Ranger Admin requires a backend database, in our case, we are using Postgres as the backend database for Ranger Admin UI.
Ranger Audit#
The Ranger Audit component collects and shows logs for each access event of the resource. Ranger supports two audit methods, solr
and elasticsearch
 . We will use elasticsearch
to store ranger audit logs which will be then displayed in the Ranger Audit UI as well.
Trino#
Trino is a fast-distributed query engine. It can connect to several data sources such as hive
 , postgres
 , oracle
and so on. You can read more about Trino and Trino connectors in the official documentation here. For this tutorial, we will use the default catalog tpch
which comes with dummy data.
Trino-Ranger-Plugin#
Apache Ranger supports many plugins such as HDFS, Hive, Yarn, Trino, etc. Each of these plugins needs to be configured on the host that is running that process. Trino-Ranger-Plugin is one component that will communicate with Ranger Admin to check and download the access policies which will be then synced with the Trino Server. The downloaded policies are stored as JSON files on the Trino server and can be found under the path /etc/ranger/<service-name>/policycache
so in this case the policy path is /etc/ranger/trino/policycache
The communication between the above components is explained in the following diagram.
Image by author
The docker-compose file connects all of the above components.
Important points about docker-compose.yml
We have used
named-docker-volumes
ex:ranger-es-data
 ,ranger-pg-data
to persist data of the services such as elasticsearch and Postgres even after a container restartThe pre-built tar files of Ranger-Admin and Ranger-Trino Plugin are available as release assets on this demo repository here.
The ranger-Admin process requires a minimum of 1.5 GB of memory. The Ranger-Admin tar file contains
install.properties
andsetup.sh
 . Thesetup.sh
the script reads the configuration frominstall.properties
 . The following patch file describes configuration changes madeinstall.properties
compared to the default version ofinstall.properties
the Ranger-Admin component.
4. Ranger-Trino-Plugin tar file also contains install.properties
and enable-trino-plugin.sh
script. One important point to note about the trino docker environment is that the configuration files and plugin directory are configured to different directory locations. The configuration is read from /etc/trino
whereas plugins are loaded from /usr/lib/trino/plugins
These two directories are important when configuring install.properties
for Trino-Ranger-Plugin and hence some extra customization is required to the default script enable-trino-plugin.sh
that comes with the Trino-Ranger-Plugin tar file to make it work with dockerized Trino. These changes are highlighted in the following patch file. These changes introduce two new custom variables INSTALL_ENV
and COMPONENT_PLUGIN_DIR_NAME
which can be configured in install.properties
5. install.properties
file for the Trino Ranger Plugin needs to be configured as shown in the following patch file. Please note that we are using two newly introduced custom variables to inform enable-plugin-script
that Trino is deployed in the docker environment.
6. Finally, put it all together in the docker-compose.yml
as shown below. This file is also available in the GitHub Repository here.
Part II: Setup and Initializing#
In this part, we will deploy docker-compose services and confirm the status of each component.
Step 1: Cloning repository#
git clone https://github.com/aakashnand/trino-ranger-demo.git
Step 2: Deploy docker-compose#
$ cd trino-ranger-demo
$ docker-compose up -d
Once we deploy services using docker-compose, we should be able to see four running services. We can confirm this by docker-compose ps
Step 3: Confirm Services#
Letâs confirm that Trino and Ranger-Admin services are accessible on the following URLs
Ranger Admin: http://localhost:6080
Trino: http://localhost:8080
Elasticsearch: http://localhost:9200
Step 4: Create Trino service from Ranger-Admin#
Let’s access Ranger-Admin UI and log in as admin
a user. We configured our admin user password rangeradmin1
in the above ranger-admin-install.properties
file. As we can see in the following screenshot, by default, there is no trino
service. Therefore, let’s create a service with the name trino
 . The service name should match the name defined in**install.properties**
for Ranger-Admin
Please note the hostname in the JDBC string. From ranger-admin
container, trino is reachable my-localhost-trino
hence hostname is configured as my-localhost-trino
If we click on Test Connection we will get a Connection Failed error as shown below. This is because the Ranger-Admin process is already running and is still looking for a service with a nametrino
that we have not created yet. It will be created once we click Add
 .
So let’s add trino
service and then click Test Connection
again
Now Ranger-Admin is successfully connected to Trino đ
Step5: Confirm Ranger-Audit Logs#
To check audit logs, navigate to audit from the top navigation bar and click Audit
 . We can see that audit logs are displayed đ . Ranger-Admin and Elasticsearch are working correctly.
Part-III Seeing it in Action#
Now that we have finished the setup, it is time to create actual access policies and see them in action
- When creating the
trino
service we usedranger-admin
as username in the connection information. This creates default policies with this username and thus theranger-admin
user will have super privileges
To understand the access scenario and create an access policy we need to create a test user. The Ranger user-sync service syncs users, groups, and group memberships from various sources, such as Unix, File, or AD/LDAP into Ranger. Ranger user-sync provides a set of rich and flexible configuration properties to sync users, groups, and group memberships from AD/LDAP supporting a wide variety of use cases. In this tutorial, we will manually create a test user from Ranger-Admin UI.
Step 1: Create test-user
from Ranger-Admin#
To create a user, letâs navigate to Settings â Users/Groups/Roles â Add New User
When creating a user we can choose different roles.
user
the role is the normal userAdmin
role can create and manage policies from Ranger Admin UI.Auditor
role isread-only
the user role.
For the time being, letâs create a user with Admin
role.
Step 2: Confirm access for test-user
and ranger-admin
#
Let’s confirm access for the user ranger-admin
As we can see ranger-admin
user can access all the tables under the schema tpch.sf10
Since we have not configured any policy for test-user
if we try to access any catalog or execute any query, we should see an access denied message. Let’s confirm this by executing queries from Trino CLI
Step 3: Allow access to test-user
to all tables under the schema tpch.sf10
#
Letâs create a policy that allows test-user
access to tpch.sf10
to all tables.
We can also assign specific permissions on each policy, but for the time being, let’s create a policy with all permissions. After creating this policy, we have the following active policies.
Now letâs confirm the access again.
We are still getting access-denied messages. This is because Trino ranger policies need to be configured for each object level. For example, catalog
level policy, catalog+schema
level policy, catalog+schema+table
level policy and information_schema
policy. Let’s add a policy for the catalog
level.
Let’s confirm again with Trino CLI
We are still getting the error but the error message is different. Let’s navigate to the Ranger Audit Section to understand more about this.
We can see an entry that denied permission to a resource called tpch.information_schema.tables.table_schema
 . In Trino, information_schema
is the schema that contains metadata about table and table columns. So it is necessary to add policy information_schema
as well. Access information_schema
is required for any user to execute the query in Trino, therefore, we can use the {USER}
variable in Ranger policy that gives access to all users.
Let us confirm the access from Trino CLI again.
We still get access denied if we try to execute any SQL function. In the default policies section, all-functions
policy (ID:3) is the policy that allows access to execute any SQL function. Since executing the SQL function is a requirement for all users, Letâs edit the all-functions
policy (ID:3) and add all users using the {USER}
variable to give access to functions
So to summarize, to give access to test-user
to ALL tables under sf10
we added three new policies and edited the default all-function
policy.
Now we can access and execute queries for all tables for sf10
schema.
In the next step, letâs understand how to give access to test-user
for a specific table under the schema sf10
Step 4: Giving access to a specific table under sf10
 schema#
In the previous step, we configured policies to give access to ALL tables under sf10
schema and therefore, schema-level
the policy was not necessary. To give access to a specific schema we need to add schema-level
policy and then we can configure table-level
the policy. So let us add schema-level
a policy for tpch.sf10
Now let us edit sf10-all-tables-policy
from all tables to specific tables. We will configure a policy that will allow access to onlynation
table
So finally we have the following active policies
Now let’s execute queries from Trino CLI again for test-user
.
test-user
can now access only thenation
table from tpch.sf10
the schema as desired.
If you have followed all the steps and reached this end, Congratulations ăïž, now you have understood how to configure Trino and Apache Ranger.
Part III: Key Takeaways and Conclusion#
After the rebranding from PrestoSQL to Trino, the default plugin from Apache Rangerâs GitHub repository will NOT work with the new Trino as it is still referencing old
io.prestosql
packages. You can track this issue on JIRA hereThe rebranded Trino plugin will not be made available in the new Ranger version 2.2.0. So meanwhile, please feel free to use this GitHub repository for building Apache Ranger from source code and this GitHub repository for getting started with Trino-Ranger integration.
Configuring Ranger policies for Trino is not so intuitive because we need to configure access policies for each level. There is an open issue regarding this on Trinoâs repository here.
Nonetheless, it is recommended to configure some basic policies such as
information_schema
andall-functions
with{USER}
variable as these policies are necessary for any user to execute queries.
Due to the lack of good documentation and the not-so-intuitive nature of the integration process, integrating Apache Ranger and Trino can be painful, but I hope this article makes it a bit easier. If you are using Trino, I highly recommend you join Trino Community Slack for more detailed discussions. Thank you for reading.