Apache Ranger - NexusOne

Apache Ranger, is an open source security framework that provides centralized authorization, fine-grained access control, and auditing across various modern data ecosystems. Ranger is widely used with big data and analytics platforms such as:

HDFS
Hive
HBase
Kafka
Trino
Spark
Kyuubi

Why Ranger is important

In large data platforms, data access involves the following:

Many users
Multiple services
Different tools

Without a centralized authorization system, managing security results to the following:

Inconsistency
Errors
Poor auditability
Limited scalability

Ranger solves these issues by providing a single, centralized control point for the following:

Defining data access rules
Tracking and auditing access across the platform

Key features

Ranger has the following key features:

Centralized security management: You have access to a central place when managing access across all services. Hence, you don’t need to configure security separately in each tool.
Resource level permission: Controls access at the following levels:
- Database level
- Table level
- Column level
- File and folder level
- Kafka topic level
- Trino catalog/schema/table level
Access control with Role-Based Access Control (RBAC): Enforces access through roles and permissions. This includes:
- Access based on users and groups
- Administrative roles control who can manage Ranger
- Clean separation between data access and UI management permissions
Access control with Attribute-Based Access Control (ABAC): Uses attributes to control access dynamically. A resource attribute such as a tag, can have classifications such as:
- PII
- Financial
- Confidential
Centralized auditing and monitoring: A single point to monitor and audit all data access. This ensures the following:
- Ability to track all data access in one place
- Shows unusual activity, successful, and denied access attempts
- Used for compliance, security audits, and forensics
Near real-time policy enforcement: Enforces policies instantly without downtime. This ensures the following:
- Dynamically pushed policies
- No service restarts required
- Changes take effect immediately
Integration with external authentication systems: Integrates with the following authentication systems to enforce authorization:
- Keycloak
- Lightweight Directory Access Protocol (LDAP)
- Active Directory
- Kerberos
Policy versioning and history: Maintains a complete history of policy changes for accountability, rollbacks, and compliance. This allows it record the following:
- Who modified what
- When did the modification happen

Ranger components

Ranger follows a distributed architecture with the following key components working together:

Ranger Admin server: This is the central management service of Ranger that allows administrators create and manage security rules. Some of the key responsibilities it performs include:
- Hosting the Ranger UI
- Storing all security policies
- Managing the following
  - Users
  - Groups
  - Roles
  - Permissions
- Distributing policies to service plugins
- Maintaining the policy database
Ranger plugins: These act as the security guards inside a data service. These plugins intercept access requests before they reach the data service. During interception, it does the following:
- Evaluates the request against policies downloaded from a Ranger Admin server
- Make authorization decisions
- Log audit events
Some of these data services include Hive, Kafka, or Trino. Ranger caches these policies locally for high performance and periodically polls the Admin server for policy updates.
Ranger policy database: This is the backend database used to store the following:
- Policies
- Users
- Groups
- Roles
- Service definitions
This policy database serves as the source of truth for all authorization policies, and it’s accessed exclusively by the Ranger Admin server. Common databases include:
- MySQL
- PostgreSQL
- Oracle
Ranger audit store: The audit store is a centralized repository that records all the data access activity enforced by Ranger. The audit store captures the following:
- Who accessed what
- From which service
- Action performed, such as a read, write, or query
- Timestamp
- Allowed or denied access
The audit data destination includes:
- Solr
- Elasticsearch
- HDFS
Ranger usersync: This periodically syncs users and groups from Keycloak into Ranger’s database. This sync ensures that organizational changes reflect in the authorization system without manual intervention.

Core administrator concepts

This section explains the key concepts administrators need to understand to define and manage policies in Ranger.

Services: These are external system that Ranger secures. This external system might be Spark or Trino. Each service has its own set of policies and a Ranger plugin deployed within. Administrators must first define a service before creating policies for it.
Resources: The objects Ranger protects. These objects might be databases, tables, columns, files, or topics. Ranger uses hierarchical resource structures that vary by service type. For example, Ranger organizes Trino resources as database > table > column, while HDFS resources are file paths.
Users and groups: Ranger synchronizes users and groups from LDAP or Active Directory. Policies can target individual users or entire groups. Group-based policies are strongly preferred for maintainability.
Roles: Ranger roles are collections of users and groups that simplify policy management. Instead of adding multiple groups to a policy, create a role containing those groups and grant permissions to the role.
Policy: A policy is a rule that administrators create to define who can perform which actions on specific resources. Ranger supports the following policy types:
- Access policies: The most common policy type. It defines who can perform which operations on specific resources. It does this by specifying allow or deny rules, with deny rules taking precedence.
- Masking policies: These policies transform data before returning it to you. Masking functions include:
  - Redact: Replace entire value with “X” characters
  - Partial mask: Show only the last 4 characters
  - Hash: Replace with SHA-256 hash
  - Nullify: Return NULL value
  - Custom: Apply custom user-defined functions for transformation
- Row filter policies: These policies add WHERE clause conditions to queries, so it can restrict which rows you can see. For example, a sales representative might only see rows where the region matches their assigned region.
Policy condition: These are additional rules that further refine when or how a policy applies to a user or group. Examples include:
- IP address ranges: Allows access only from corporate networks
- Time ranges: Allows access only during business hours
- Custom attributes: Enforces access based on resource or user-specific metadata

Ranger in the NexusOne platform

Within the NexusOne ecosystem, Ranger serves as the central security enforcement for Trino and Spark.

Exploring the Ranger UI

The Ranger UI provides a web-based interface to manage policies. It shows the integrated apps, their reports, and more. You can launch Ranger using the following designated URL:

https://ranger.<client>.nx1cloud.com/

When you purchase NexusOne, you receive a client name. Replace client with your assigned client name.

When you launch Ranger using the previous URL, you should see an image similar to the following image.

Ranger homepage layout

Access requirements

To manage policies in the Ranger UI, you must have the following roles:

ranger_admin
Any other role mapped to the ranger_admin role

What Ranger controls

Ranger enforces permissions on data and resources to determine who can access them. Some of these data and resources it controls include:

Table, view, and column-level permissions: Controls who can access specific databases, tables, views, and columns.
Row-level filtering: Restricts which rows you can see based on conditions. For example, you only see your own department’s data.
Data masking: Automatically masks or redacts sensitive column values based on user roles. For example, show only the last 4 digits of a credit card.
URI/storage-level permissions: Controls access to underlying storage locations to prevent the bypassing of table security, which then leads to direct access to data files.

Ranger’s integration with other apps

This section explores deeper into how Ranger integrates with Trino and Spark within the NexusOne platform.

Trino

Ranger is Trino’s security system, which ensures that only authorized people can query specific data. The architecture comprises the following main components:

Trino cluster:
- Has one coordinator and multiple “worker” servers
- Handles user requests and queries using the coordinator
- Workers do the actual data processing
Ranger Admin: Web-based control panel where administrators set up security rules. A security rule can specify that the marketing team can only see customer names, while the finance team can see full customer records, including payment information.
Ranger plugin:
- Installed on Trino’s coordinator
- Acts as a bridge between Trino and Ranger Admin
- Constantly checks with Ranger Admin: “Is this user allowed to run this query?”
Other supporting components:
- Postgres database: Stores all security policies and user information
- Opensearch audit storage: Keeps logs of who accessed what data and when

The following steps describe the flow when you query data using Trino:

You send a query to Trino
The Trino coordinator receives the query and asks the Ranger plugin, “Is this user allowed to do this?”
The Ranger plugin checks its policies from the Ranger Admin component and asks the following questions:
- Does this user have permission to see the ‘orders’ table?
- Can they see all columns or only specific ones?
One of the following outcomes occurs:
- Allowed: Trino runs the query and returns results
- Denied: User gets an “Access Denied” message
Ranger records the access attempt in the audit log for compliance and security monitoring

Spark

Ranger is Spark’s security system, which ensures that only authorized people can perform compute operations on specific data. By default, Spark doesn’t provide enterprise-grade authorization at the database, table, column, or row level. So, Ranger fills this gap by acting as Spark’s external authorization engine. The architecture comprises the following main components:

Ranger Admin server: A central control plane where all Spark authorization rules live. It’s used for the following:
- Creating policies
- Managing roles
- Auditing access
Ranger Spark plugin:
- Installed inside the Spark environment
- Runs on the Spark SQL engine
- Responsible for the following:
  - Intercepting every SQL query
  - Sending access requests to Ranger Admin
  - Enforcing allow or deny decisions in real time
Policy storage:
- A PostgreSQL database for storing policies
- Plugin pulls these policies using REST APIs and caches them locally
User and group sync:
- Ensures policies apply to correct identities
- User authentication uses Keycloak
- Ranger sync service pulls the following:
  - Users
  - Groups
  - Roles
Audit store:
- Stores logs of all Spark access attempts in one of the following:
  - Opensearch
  - Object storage
- Its use include:
  - Security monitoring
  - Compliance audits

The following steps describe the flow when you query data, and it’s computed using Spark:

You log into NexusOne via Keycloak, and a verification of your identity and group membership occurs.
If the verification is successful, it results to the issuance of a valid access token.
You submit a Spark SQL query through Spark, a notebook, or a batch job.
The Ranger Spark plugin intercepts the query and extracts the username, database, table, column, and query type.
The plugin sends an authorization request to Ranger Admin with the user, resource, and action details.
Ranger evaluates the user, group, deny rules, row filters, and data masking policies.
If Ranger returns an ALLOW a query executes, if it’s a DENY, the query fails with an authorization error.
The audit store records the complete access attempt, such as the user, resource, timestamp, IP, and query type.

How Ranger secures NexusOne

The following capabilities show how Ranger secures the NexusOne platform:

Centralized security control: Rather than managing security separately in Trino, Spark, Ranger provides a single pane of glass for defining and enforcing policies.
Multi-Layer protection: Ranger enforces security at the following multiple levels simultaneously:
- Service access, which describes who can use Trino
- Database or schema access, which describes databases that are visible
- Table access, which describes queryable tables
- Column access, which describes the returned columns
- Row access, which describes the returned rows

This layered approach ensures comprehensive data protection.

Dynamic flexibility: As NexusOne evolves with new projects, data sources, and users, Ranger adapts without requiring app changes. This ensures that new policies take effect within seconds, and the granting and revoking of access is instant.
Compliance and auditability: There are logs of every access attempt across NexusOne. This creates a complete audit trail for compliance reporting and security analysis. By querying Ranger’s audit repository, administrators can answer the following questions:
- Who accessed this sensitive table?
- When did user X last access financial data?
- Are there any unauthorized access attempts?
User experience: While providing strong security, Ranger remains transparent to end users. Data analysts, data scientists, and apps interact with Trino and Spark normally. You see only the data you’re authorized to access, and sensitive values are also automatically masked. This means you don’t need to understand or work around complex security mechanisms.
Operational efficiency: For NexusOne administrators, Ranger reduces operational burden through group-based policies, tag-based automation, and centralized management. Security teams can now spend less time on access management and more time on strategic security improvements.

Ranger hands-on examples

This section describes several hands-on examples of using Ranger.

Add a new service

Use the following steps to add a new service:

Open the Ranger Admin UI. It should default to the Resource Policies sidebar option. This is also the Service Manager.
Click the plus sign to add a service. Select a service type, such as Trino or Hive.
Provide a service name and connection details such as URLs or credentials.
Test the connection to ensure Ranger can communicate with the service.
Save the service configuration.

Add a new service

Create a policy

Use the following steps to create a policy:

Open the Ranger Admin UI. It should default to the Resource Policies sidebar option. This is also the Service Manager.
Select the service that you want to create a policy for.
At the top right-corner of the page, click Add New Policy.
Specify a policy name.
Define a resource scope, such as which databases, tables, or columns.
Add users or groups who should receive access.
Select permissions such as SELECT, INSERT, or UPDATE.
Optionally, add policy conditions.
Click Save at the bottom of the screen to save the policy.

Create a policy

Synchronize users and groups

Use the following steps to synchronize users and groups:

Configure Keycloak connection details in Ranger usersync configuration.
Specify a base DN, search filters, and sync intervals.
Start the usersync service.
In the Ranger Admin UI, click Settings, and verify that the users and groups appear under the Users, Groups, and Roles section.

View audit logs

Use the following steps to view audit logs:

Open the Ranger Admin UI, and then click Audits.
Select one option, such as Admin or Login Sessions.
You should see a table showing the logs. You can use filters to narrow the results by user, resource, date range, or access result.
Export audit data for offline analysis or compliance reporting.
Create saved searches for frequent audit queries.

Delegate an administrator

Use the following steps to delegate an administrator:

Create a policy granting administrative permissions to specific users.
Scope the administrator’s permissions.

Doing this enables distributed administration where team leads manage their own resources

Best practices in Ranger

This section describes best practices for creating, testing, and maintaining access policies in Ranger.

Ranger resource policies

Creating effective policies

The following best practices help you create efficient policies:

Start with groups, not users: Always create policies for groups rather than individual users. When a user changes roles, update their group membership rather than modifying dozens of policies.
Use naming conventions: Establish and enforce naming standards for policies, services, and roles. This makes finding and managing policies much easier as the system grows.
Resource specification: Determine which objects a policy applies to and use wildcards strategically, especially with sensitive data. For example, database=finance, table=*, column=* grants access to all columns of all tables in the finance database.
User or group selection: When adding multiple users or groups to a policy, understand the following logic:
- Users or groups within a policy are OR’d together, meaning any match between the two grants access.
- Multiple policies with different resources are independently evaluated.
Permission selection: Grant minimum necessary permissions, for example:
- Grant only SELECT for read-only access, not ALL.
- Grant INSERT for data loading processes, not DROP or ALTER.
Policy conditions: Add conditions to restrict access further, for example:
- IP range: 192.168.1.0/24 limits access to the corporate network.
- Time: 9:00-17:00 restricts access to business hours.
- Custom conditions: Access data only if the user’s department matches the data owner.
Policy priority: When policies conflict, Ranger uses a priority order. For example:
- In deny policies, priority 1 always wins.
- Among allow policies, the higher priority number wins.
- If no matching policy exists, then Ranger denies access by default.

Advanced policy patterns

The following best practices help you implement advanced policy patterns efficiently:

Tag-based policies: Instead of creating policies for each table, do the following:
- Tag tables in the metadata system with classifications such as PII, SENSITIVE, or PUBLIC.
- Create Ranger policies based on tag tables so that new tables automatically inherit the policies associated with those tags.
Policy templates: For repetitive policy patterns, create a template policy. When applying it to new resources, adjust only the resource path and specific users.
Time-limited access: For contractors or temporary projects, create policies with specific validity periods. Ranger can automatically disable policies after the end date.
Break-glass access: Create disabled “emergency access” policies that can be quickly enabled during incidents when normal access channels fail.

Policy testing and validation

The following best practices help you detect issues early and verify that policies are behaving as intended:

Test before production: Create development and staging Ranger instances that mirror production. Test new policies in these environments before deploying to production.
Policy simulator: Before deploying policies, use Ranger’s policy evaluation tools to test how Ranger handles specific access requests. If you input a user, resource, and operation, you can see which policies apply and what the decision would be.
Audit log review: After deploying new policies, monitor audit logs for unexpected denials. If Ranger blocks legitimate users, adjust the policies accordingly.
User feedback loop: Establish a process for users to request access when denied. This creates a feedback mechanism to identify missing or incorrect policies.

Policy migration and promotion

The following best practices help you migrate Ranger policies between environments and manage policy changes safely:

Environment promotion: When promoting policies from a development environment to production ensure the following:
- Export policies from a development Ranger instance in JSON format.
- Review and adjust the policies for production environment differences.
- Import the policies into production during the maintenance window.
- Monitor the policies for issues and prepare to rollback.
Version control: Store policy exports in Git or a similar version control system. This provides change history, enables code review of policy changes, and facilitates disaster recovery.

Ongoing maintenance

The following best practices help you keep policies accurate, up to date, and aligned with your organization’s requirements over time:

Document policies: Use the policy description field to explain why a policy exists, who requested it, and any special considerations. This helps future administrators understand policy intent.
Regular audits: Periodically review policies to identify and remove obsolete permissions, identify overly permissive policies, and ensure policies align with the current organizational structure.

Additional resources

For more details about Ranger, refer to the Ranger official documentation.
For more details about Spark in NexusOne, refer to NexusOne’s Spark documentation.
For more details about Trino in NexusOne, refer to NexusOne’s Trino documentation.

​Why Ranger is important

​Key features

​Ranger components

​Core administrator concepts

​Ranger in the NexusOne platform

​Exploring the Ranger UI

​Access requirements

​What Ranger controls

​Ranger’s integration with other apps

​Trino

​Spark

​How Ranger secures NexusOne

​Ranger hands-on examples

​Add a new service

​Create a policy

​Synchronize users and groups

​View audit logs

​Delegate an administrator

​Best practices in Ranger

​Creating effective policies

​Advanced policy patterns

​Policy testing and validation

​Policy migration and promotion

​Ongoing maintenance

​Additional resources

Why Ranger is important

Key features

Ranger components

Core administrator concepts

Ranger in the NexusOne platform

Exploring the Ranger UI

Access requirements

What Ranger controls

Ranger’s integration with other apps

Trino

Spark

How Ranger secures NexusOne

Ranger hands-on examples

Add a new service

Create a policy

Synchronize users and groups

View audit logs

Delegate an administrator

Best practices in Ranger

Creating effective policies

Advanced policy patterns

Policy testing and validation

Policy migration and promotion

Ongoing maintenance

Additional resources