Tech To Do...: 2016

Friday, August 5, 2016

OpenAM 13.0.0 Radius Server Flaw

OpenAM version 13.0.0 was the first version of OpenAM that enabled it to act as a RADIUS server. It has supported being a RADIUS client for many years meaning that validation of authentication can be deferred to an external RADIUS server. With OpenAM now able to act as a RADIUS server, many of its existing authentication modules and the flexibility of its authentication chains can be leveraged by resources that can defer their authentication to a RADIUS server. VPN concentrators are but one example. Where previously my employer's VPN could only defer to an LDAP store, it can now pass authentications to OpenAM and leverage the same multi-factor mechanisms that they support for web authentications.

But inclusion of the RADIUS server functionality in version 13.0.0 of OpenAM inadvertently included a subtle flaw. If an incoming RADIUS request did not contain a username field, the internal thread of OpenAM's RADIUS listener died. Thereafter, any further incoming requests never get handled nor responded to. The characteristics are obvious. Client's fail to get responses back from the server and if debug logging is enabled for OpenAM's RADIUS server, that log file ceases to receive any more traffic logs.

Is My Server Susceptible?

You can tell if your server is susceptible by using a client that does not include a username field in the RADIUS request if the username is left empty. The Console Client expected to be included in version 14.0.0 of OpenAM behaves in that way. Until that version is available in the nightly war I've made it and the relevant jars available in a temporary git repo here. You can clone that repo and follow the steps outlined there to get that newer version of ConsoleClient running for use in the steps below.

In my previous post on Using OpenAM 13's Radius Server I showed how to use the Console Client included in OpenAM's war file to test the RADIUS server functionality. As outlined in that post, to expose what was going on in the RADUS Server, we enabled the Radius loggers at Message level. For clarity I'm adding a bit more detail here on how to do that. If you already know how and where the debug file is located just skip the next section.

Enabling Radius Loggers

To enable Radius logging at Message level as noted in that post, I first log into OpenAM as amadmin and go to <your-openam-root-path>/Debug.jsp. For example, I deploy in tomcat running on port 8080, with a war file renamed to sso.war, and an entry in my etc/hosts file pointing ident-local.lds.org to 127.0.0.1. So Debug.jsp is found on my box as shown in the image.

In that page select the Radius category and a level of Message and press the Submit button. That leads to the next screen where you must confirm your selection.

Once confirmed, output related to the RADIUS server functionality will begin to appear in a log file having the name of Radius and located under your configuration directory in a path of <config-root>/<webapp-root>/debug. I use a configuration directory of /opt/openam which places my log file at /opt/openam/sso/debug/Radius.

Does It Kill It?

Once you have the RADIUS Server enabled, a RADIUS client defined, have created the radius.properties file and have the newer Console Client running, don't enter a username. Just hit the Enter key and move on to the password prompt. Enter some value for the password and submit. You'll correctly be denied access as shown in my example run. (Ignore that NPE. As noted on the git repo it is innocuous and goes away if a larger set of OpenAM's jars are available.)

java -jar lib/openam-radius-server-14.0.0-SNAPSHOT.jar
? Username:
? Password:

DebugConfiguration:08/05/2016 04:14:42:775 PM MDT: Thread[main,5,main]
'/debugconfig.properties' isn't valid, the default configuration will be used instead: Can't find the configuration file '/debugconfig.properties'.
DebugImpl:08/05/2016 04:14:42:786 PM MDT: Thread[main,5,main]
Can't read debug files map. '. Please check the configuration file '/debugfiles.properties'.
java.lang.NullPointerException
at java.util.Properties$LineReader.readLine(Properties.java:434)

--snipped-stack-trace--

NOTE: username is empty string. Leaving out of request to test error handling.
Packet To 127.0.0.1:1814
ACCESS_REQUEST [1]
- USER_PASSWORD : *******
- NAS_IP_ADDRESS : localhost/127.0.0.1
- NAS_PORT : 0
- NAS_IDENTIFIER : console-client

Packet From 127.0.0.1:1814
ACCESS_REJECT [1]

---> Sorry. Not Authenticated.

Now try again with a valid username or actually with any values at all in the username and password fields. The request is sent to the server, but Console Client never receives any response from the server and therefore appears to hang:

java -jar lib/openam-radius-server-14.0.0-SNAPSHOT.jar
? Username:
? Password:

DebugConfiguration:08/05/2016 04:17:28:355 PM MDT: Thread[main,5,main]
'/debugconfig.properties' isn't valid, the default configuration will be used instead: Can't find the configuration file '/debugconfig.properties'.
DebugImpl:08/05/2016 04:17:28:364 PM MDT: Thread[main,5,main]
Can't read debug files map. '. Please check the configuration file '/debugfiles.properties'.
java.lang.NullPointerException
at java.util.Properties$LineReader.readLine(Properties.java:434)

--snipped-stack-trace--

NOTE: username is empty string. Leaving out of request to test error handling.
Packet To 127.0.0.1:1814
ACCESS_REQUEST [1]
- USER_PASSWORD : *******
- NAS_IP_ADDRESS : localhost/127.0.0.1
- NAS_PORT : 0
- NAS_IDENTIFIER : console-client

If you look in the RADIUS debug log at /opt/openam/sso/debug/Radius the last 100 lines or so will include your incoming packet like so:

amRadiusServer:08/05/2016 04:14:42:795 PM MDT: Thread[pool-8-thread-1,5,main]: TransactionId[683ced09-8e8b-48dd-b05e-aca40583cf05-27]
WARNING:
Packet from local:
ACCESS_REQUEST [1]
- USER_PASSWORD : *******
- NAS_IP_ADDRESS : /127.0.0.1
- NAS_PORT : 0
- NAS_IDENTIFIER : console-client

And not far after that you'll see a NullPointerException that looks like this:

amRadiusServer:08/05/2016 04:14:42:801 PM MDT: Thread[pool-8-thread-1,5,main]: TransactionId[683ced09-8e8b-48dd-b05e-aca40583cf05-27]
ERROR: Exception occured while handling radius request for RADIUS client 'local'. Rejecting access.
java.lang.NullPointerException
at org.forgerock.openam.radius.server.spi.handlers.OpenAMAuthHandler.startAuthProcess(OpenAMAuthHandler.java:726)

--snipped-stack-trace--

And you'll see the returned RADIUS response. But no log entries will appear thereafter and the RADIUS listener thread has died.

How to Fix It?

Patches to fix this issue are available from ForgeRock Support for both 13.0.0 and 13.5.0. And that fix will also be included in 14.0.0 when it moves to GA. Once you've applied the patch you'll still see the NPE but it won't damage the RADIUS listener and all further requests will continue to received responses from the server.

Enjoy.

Thursday, February 18, 2016

Revealing OpenAM's Policy Enforcement Model

In my post on migrating OAM 10g policies to OpenAM I promised to outline how I clarified my understanding of OpenAM's policy enforcement model. The policy evaluation endpoint was the quickest way to test. But right up front you quickly learn that an authenticated user token is required. If it isn't included your results are always negative. So to test I always have an authenticated user token.

When considering policy decisions I have the following variables to consider after assuming that a request URL matches a given policy's resource URL.

subject requirement is satisfied
condition exists
condition requirement is satisfied
corresponding action is enabled
action allows access

So five variables initially. If I start with a policy that has no condition then this drops to three variables.

First Test: No Conflicting Action Outcomes

If we look at this as a truth table with zero being false and one being true we then need to test the following combinations and see what their outcome is. For the subject satisfied I used a user and group subject with user = demo and I had two users in OpenAM with usernames of demo and test. As noted to the left of the table, test represents subject satisfied being false and demo to it being true.

I also noted that when an action matching that of a request is not enabled the policy essentially doesn't match and makes no contribution to the outcome. What that meant for output results I didn't know prior to the test.

action allows --------+

action enabled ------+ |

| |

subject satisfied --+ | |

| | |

v v v

- - - - - - - - +---+---+---+

| 0 | 0 | 0 | no match

+---+---+---+

test user | 0 | 0 | 1 | no match

+---+---+---+

| 0 | 1 | 0 | deny

+---+---+---+

| 0 | 1 | 1 | allow

- - - - - - - - +---+---+---+

| 1 | 0 | 0 | no match

+---+---+---+

demo user | 1 | 0 | 1 | no match

+---+---+---+

| 1 | 1 | 0 | deny

+---+---+---+

| 1 | 1 | 1 | allow

- - - - - - - - +---+---+---+

For testing deny and allow I crafted the following two policies.

URL: http://test.org/allow

subject: user = demo

actions: GET, POST, PUT enabled, allow checked.

URL: http://test.org/deny

subject: user = demo

actions: GET, PUT enabled, deny checked; POST enabled, allow checked

Then with a nodejs script that reads a file of usernames and passwords and another file that specifies a list of URLs, I evaluated access via the policy evaluation rest endpoint for test and demo against the following URLs. That third one was intentionally meant not to match any policy to see what results it provided.

http://test.org/allow
http://test.org/deny
http://test.org/no-match

The console output from running the script is shown below. The additional authentication is for the admin user that can hit the rest endpoint. In following output I'll only include the json results.

localhost:policy-tester boydmr$ node PT.js

loading ptConf.json

using cookie name of: iPlanetDirectoryPro

loading users file: ./users.json

loading policies file: urls.json

---> calling: http://ident-local.lds.org:8080/sso/json/authenticate

---> Acquired token.

{

"user": "demo",

"result": [

{

"http://test.org/allow": {

"POST": true,

"GET": true,

"PUT": true

}

{

"http://test.org/no-match": null

{

"http://test.org/deny": {

"POST": true,

"GET": false,

"PUT": false

}

]

}

{

"user": "test",

"result": [

{

"http://test.org/allow": null

{

"http://test.org/no-match": null

{

"http://test.org/deny": null

}

]

}

What I learned from this test is:

Only methods that are enabled are returned in the result set.
Where an action is marked to deny, a value of false for that action is returned.
Where an action is marked to allow, a value of true for that action is returned.
Where there is no policy matching a request either due to subject not satisfied or URL not matching a null object is returned.
When a null object is returned or a value of false is returned for an action the result should be to deny access to the request.

Second Test: When Action Outcomes Conflict

I then wanted to find out what happens when two policies match on URL, subject, and enabled action but the outcomes of that action conflict: allow versus deny. I created the following policy and ran the test again. As you can see from the highlighted portion, when allow and deny conflict, deny wins.

URL: http://test.org/all*

subject: user = demo

actions: POST enabled, deny checked.

{
"user": "test",
"result": [
{
"http://test.org/allow": null
},
{
"http://test.org/no-match": null
},
{
"http://test.org/deny": null
}
]
}
{
"user": "demo",
"result": [
{
"http://test.org/allow": {
"POST": false,
"GET": true,
"PUT": true
}
},
{
"http://test.org/no-match": null
},
{
"http://test.org/deny": {
"POST": true,
"GET": false,
"PUT": false
}
}
]
}

Third Test: Resource Matches, Subject Doesn't

I changed the test-conflict policy to match the test user in the subject. The results now show as follows. Note that the conflict is no longer occurring and POST for the demo user is now true.

URL: http://test.org/all*

subject: user = test

actions: POST enabled, deny checked.

{
"user": "demo",
"result": [
{
"http://test.org/allow": {
"POST": true,
"GET": true,
"PUT": true
}
},
{
"http://test.org/no-match": null
},
{
"http://test.org/deny": {
"POST": true,
"GET": false,
"PUT": false
}
}
]
}
{
"user": "test",
"result": [
{
"http://test.org/allow": {
"POST": false
}
},
{
"http://test.org/no-match": null
},
{
"http://test.org/deny": null
}
]
}

Final Test: Using Conditions & Narrowing Access

Once I understood OpenAM's policy matching and enforcement model I now turned to one last hurdle that was critical to solve before we could migrate our policies from OAM to OpenAM. When a narrowly scoped policy is at a high point such as the root of a domain like test.org/* and a policy lower down such as test.org/stuff/* broadened access, the outcome is answered above; the actions will be aggregated and hence the broader audience will indeed have access only in the lower URL space.

But what of the opposite? Can we have a broad subject set at the top such as Authenticated Users and narrow scope below to a smaller set of users perhaps via a condition? As I thought about this the deny outcome of actions seemed to suddenly jump out at me. I could define two policies at that lower location. The first would grant access to the narrowed set. That alone would make no difference since the higher policy's outcomes would be additive and everyone would have access. But if the higher policy weren't there we would need it to explicitly open up access for the target audience. The key to answering this puzzle was the second policy. It was identical except in two aspects: it would wrap the condition in a logical NOT and change the action outcome to deny.

To test this I crafted the following three policies.

URL: http://narrow.org/*

subject: Authenticated Users

actions: GET enabled, allow checked.

URL: http://narrow.org/stuff/*

subject: Authenticated Users

condition: ldap-filter: (role=admin)

actions: GET enabled, allow checked.

URL: http://narrow.org/stuff/*

subject: Authenticated Users

condition: NOT( ldap-filter: (role=admin) )

actions: GET enabled, deny checked.

The problem is I needed to have an ldap server to test these policies. Turns out this was simple to provide.

A Hand Crafted LDAP Server

I just happened to have run across a sweet nodejs library not too long ago that came to mind. It is ldapjs found at http://ldapjs.org/index.html. In about 30 minutes I had a prototype mock LDAP server whose source is included at the end of this post. And ldapsearch thought it was talking to a real ldap server. (See the comment section of the source.) The question was would OpenAM?

Activating LDAP Filter Use in Policy Evaluation

To use LDAP filters in policies in a realm OpenAM must first be told where the LDAP server is located to which it can defer evaluation of those filters. This is done on the realm in its Policy Configuration service. I configured mine with the following values based upon the LDAP server source for the bind user, object classes for users, and the search attribute.

Primary LDAP Server: 127.0.0.1:1389
User base DN: o=lds
Roles base DN: o=lds
Bind DN: cn=root
Bind Password: secret
Org search filter: (objectclass=sunismanagedorganization)
User search filter: (objectclass=inetorgperson)
User search scope: SUB
Role search scope: SUB
User search attribute: cn
all other values left as defaults.

Once it was running I was ready to try my test. When the policy evaluations executed the calls to the LDAP server appeared on its console indicating as desired that only the demo user matched the filter.

localhost:ldapjs-eval boydmr$ node server.js
users = {
"demo": {
"dn": "cn=demo, ou=users, o=lds",
"attributes": {
"cn": "demo",
"uid": "demo",
"role": "admin",
"objectclass": "inetorgperson"
}
},
"test": {
"dn": "cn=test, ou=users, o=lds",
"attributes": {
"cn": "test",
"uid": "test",
"objectclass": "inetorgperson"
}
}
}
test LDAP server up at: ldap://0.0.0.0:1389
-- bind: cn=root
-- bind: cn=root
-- C-0 search: (&(objectclass=inetorgperson)(cn=demo)(role=admin))
-- C-0 demo = true
-- C-1 search: (&(objectclass=inetorgperson)(cn=test)(role=admin))
-- bind: cn=root

-- C-2 search: (&(objectclass=inetorgperson)(cn=test)(role=admin))

Conclusions

The results of test are shown below. Note that we do indeed narrow access for the demo user how has a role of admin while the test user does not. By defining these tests and then gathering their results from the policy evaluation rest endpoint I now have a clearly defined path for migrating out policies from OAM to OpenAM. Two key aspects that I have not covered of that strategy is how authorization expressions are translated and how OAMs URLs are translated in view of their richer meta character patterns. I'll see if I can't cover those topics soon.

{
"user": "demo",
"result": [
{
"http://narrow.org/stuff/for/admins": {
"GET": true
}
},
{
"http://narrow.org/something": {
"GET": true
}
}
]
}
{
"user": "test",
"result": [
{
"http://narrow.org/stuff/for/admins": {
"GET": false
}
},
{
"http://narrow.org/something": {
"GET": true
}
}
]
}

LDAP Server Source

/** * Quick and simple ldap server with two users having cn of: demo, and test. You can search for any of * them with: * * ldapsearch -H ldap://localhost:1389 -x -D cn=root -w secret -LLL -b "o=lds" cn=<cn-here>
 * * Or search for all of them with: * * ldapsearch -H ldap://localhost:1389 -x -D cn=root -w secret -LLL -b "o=lds" objectclass=* * * Created by boydmr on 2/4/16. */
var ldap = require('ldapjs');
//var express = require('expressjs');
var server = ldap.createServer();

// map of user attributes by cnvar users = {};
var cn = 'demo';

function createUser(cn, role) {
  users[cn] = {
    dn: 'cn=' + cn + ', ou=users, o=lds',
    attributes: {
      cn: cn,
      uid: cn,
      role: role,
      objectclass: 'inetorgperson'    }
  };
}

createUser('demo', 'admin');
createUser('test');

console.log("users = " + JSON.stringify(users, null, 2));

// add support for bindingserver.bind('cn=root', function(req, res, next) {
  if (req.dn.toString() !== 'cn=root' || req.credentials !== 'secret') {
    // log failed attemtps to bind    console.log("!    bind: " + req.dn.toString() + " pwd: " + req.credentials);
    return next(new ldap.InvalidCredentialsError());
  }

  // log that we were bound to successfully and show username  console.log("--   bind: " + req.dn.toString());
  res.end();
  return next();
});

// filter that ensures we only handle non-anonymous requestsfunction authorize(req, res, next) {
  if (!req.connection.ldap.bindDN.equals('cn=root'))
    return next(new ldap.InsufficientAccessRightsError());

  return next();
}

var connId = 0;

server.search('o=lds', authorize, function(req, res, next) {
  var cid = 'C-' + connId++;
  console.log("-- " + cid + " search: " + req.filter.toString());

  Object.keys(users).forEach(function(k) {
    if (req.filter.matches(users[k].attributes)) {
      console.log("-- " + cid + "         " + k + " = true");
      res.send(users[k]);
    }

  });

  res.end();
  return next();
});

// now start listeningserver.listen(1389, function() {
  console.log('test LDAP server up at: %s', server.url);
});

Wednesday, February 17, 2016

Migrating OAM Policies to OpenAM

The policy enforcement models differ significantly between Oracle Access Manager 10g and OpenAM. To be clear, I'm referring only to using reverse proxy web gates (OAM's term) and agents (OpenAM's term) that protect access via policies to back-end application servers. In this post I'll start outlining how to migrate policies from OAM 10g to OpenAM. And for the record, i'm going to admit to some conceptual mistakes in this post and some may say, "Hey, why didn't you read the documentation?" And the answer is, I did.

Finding a Policy for a Request

Lets first discuss how a policy or policies are found to determine access for a given inbound request.

The OAM 10g Model

OAM uses policy domains that are containers of policies and have associated host identifier objects that can also have aliases and can include a port. And policy domains have URL prefixes that are the base for the relative URLs in the policies that are contained in the policy domain object. And policies themselves have separate fields for relative URL, query string, and query parameter of which all, some, or just one can be specified. And there is a rich set of meta character patterns as well. Furthermore, a policy has an authentication scheme of which we only used two types; one that required authentication and one that didn't, an anonymous one which was the root of my first big mistake in the plan for migrating our policies. All of these policy aspects combine to define the URL space that a policy domain object and its policies protect and the authentication state that is required.

When finding a policy for a given request URL, OAM first looks for policy domains that match the host and then looks for the most specific of these policy domains by taking the one having the longest prefix that matches the request's URL At that point only URLs of policies within that policy domain container will be consulted for one that matches. And OAM allows you to order those policies. And the first policy to match the request's URL is the only policy that will then be used to either allow or deny access.

With that background lets now look at the corresponding parts of OpenAM's model and how my understanding of OAM's model initially tripped me up on my first approach to migrating our policies into OpenAM. Hopefully that can help someone avoid the same mistakes.

The OpenAM 12.X Model

Again to be clear, I am discussing only OpenAM version 12.X and believe that there are no changes to the policy model in version 13.X save for moving from multiple resource URLs per policy to a single resource URL per policy. So for migration we chose to have a single resource URL per policy in OpenAM so that we would have no issues when updating to 13.

OpenAM uses application objects as containers of policies. These have base URLs to which policy resource URLs must be relative. And where OAM has three fields that combine for the policy's URL space OpenAM only provides the resource URL. And OAM implicitly applies policies to requests with and without queries subject to any query string and query parameters if specified while OpenAM must have two separate policies for the exact same URL if we want both the query-less request and the same request having query strings to make it through. With this requirement alone we anticipate that our number of policies will double at a minimum. But I'll outline some mitigation mechanisms in a follow-up post.

Oh, and by the way, if you are using the very cool rest endpoints in OpenAM to manage your policies you'll find that resource URLs are not relative. They are absolute. So the "relative" nature may solely be a UI enforced feature. Have I tried setting a policy's resource URL to something that isn't relative to its containing application object? I have not. Our tool for migrating just honors that contract.

And finally, before discussing finding a policy in OpenAM for a given request URL I need to explain a minor aspect of OpenAM's Web Agents. They can only point to a single OpenAM application object. Since we have a single cluster of reverse proxy agents that share identical configuration via an agent group this means that all of our policies must be contained in a single OpenAM application object. That is very important.

Finding a policy for a given request URL first looks for all policies in the application object pointed to by the web agent which happens to be all of them in our case so no narrowing there. I'm hoping we can change that at some point in a straightforward, backwards compatible way as I'll explain later. Next, it finds all policies whose resource URLs match the request's URL. All of them. Not just the first since OpenAM doesn't support ordering of policies. And access is only allowed if all policies allow the request. But it turns out that statement isn't accurate enough and this is where I made a second huge mistake.

Policy Matching

In OAM the identified policy either allows access or denies access based upon its authorization expression that I won't go into. Actually, the result of evaluating the expression can be either allow, deny, or inconclusive and can each be handled differently. But for us, inconclusive meant deny and we treated it as such.

Things are different in OpenAM. I assumed that once the set of policies were identified that matched the request's URL that access was allowed or denied based upon two additional aspects of OpenAM policies that seemed to be correlated to the authorization expression concept in OAM; namely subjects and conditions.

Subjects that are supported natively include Authenticated Users, Never Match (what good is that???), users and groups, jwt claims, and logical combinators NOT, AND, and OR. And that brings me to my first big mistake. Authenticated Users mapped nicely to OAM's login scheme mechanism. But where was anonymous access. Then I found in one piece of documentation that a NOT wrapping a Never Match would cause the policy to always match which I assumed was my elusive anonymous access. But it wasn't.

In OpenAM's policy engine there is no concept of an unauthenticated user. You only get to policy evaluation if you are already authenticated even if it is with what is known as the "anonymous" module that doesn't prompt for any user credentials. The key here is OpenAM's powerful concept of authentication levels and how you map them to mean different things such as 0 is an authenticated but anonymous user while 10 is a user who authenticated with a username and password and 20 is a user who used a 2nd factor mechanism.

That is why the unenforced url list is part of agent configuration and not policy configuration. If a request's URL matches on the unenforced url list then the agent lets it through. If it isn't, the agent redirects to the login page and only after you've authenticated and been redirected back to the original request URL will the agent then confer with the policy engine to find matching policies to apply to the request.

Now for conditions. Conditions add further requirements beyond the subject. A policy doesn't have to have any conditions in which case only the subject is considered. And this brings me to that second mistake. I assumed that if the subject or condition, if included, didn't apply to the current user that the policy denied access. And remember that if one policy denies access then the request is denied.

And this is totally wrong and why this section is titled Policy Matching. The subject and condition aspects are not used to deny and allow access. They are part of the matching to winnow down further the list of policies that apply to the request. If the user doesn't meet the requirements of the subject and condition, if included, that policy is simply dropped from the candidate list of policies to be applied to the request. And that brings us to one final aspect of policies.

Policy Actions

OAM provided a list of Http methods with check boxes that indicated if a policy applied to GET, POST, PUT, etc. OpenAM provides a similar list called actions also with a checkbox for each. But it also provides a pair of radio buttons with one for Allow and one for Deny. Only when the policies have been further winnowed down via subjects and conditions does OpenAM then attempt to allow or deny a request. And it does so by taking the http method of the request and looking to see which of the remaining policies in the list included that action by checking that action's check box. For each of those it then sees if that policy has Allow or Deny selected. If any have Deny selected then the request is denied. If all have Allow selected then the request is allowed.

How am I certain about all of this? I crafted a number of test scenarios with related policies, subjects, conditions, and related users with specific characteristics that would meet or fail those conditions and tested each with the policy evaluation rest endpoint to obtain the conclusive results. I'll cover these in my next post.

Wednesday, January 20, 2016

Fixing Your Custom Auth Modules In Open AM 13

We've been kicking the tires of Open AM 13 in several different ways particularly since it includes Radius Server support and we'll be migrating our custom version to this native version. One unrelated aspect of Open AM 13 that is proving problematic was discovered by Dave Bennet of Nulli while he was doing some investigative work for us. The gist of the problem is this,

Any custom authentication modules in Open AM 12 will be broken in Open AM 13

once you've completed your migration. But don't worry. There is a straightforward way to fix the issue. In fact, the fix is so straightforward it is surprising that it didn't make it as part of the automatic upgrade. Maybe it will make it in an early patch.

"Not found error" & "Resource XXX not found"

The characteristics of the bug can be seen in the new XUI console. I'll not be covering creation and deployment of custom authentication modules in much detail below. I'm assuming you know that stuff already. If not, that is covered in the Open AM documentation. Specifically, the developer's guide.

But suppose you've been using such custom modules or have just deployed and registered such a module. When viewing an existing module instance in a realm you'll receive an error message that isn't very helpful:

Similarly, when attempting to create new instances of your custom module you'll see this same message again but will first be presented with a different message:

This first message is much more useful for helping to identify the underlying problem. To explain why this one is more useful, you need to know that the module's service descriptor file, previously registered with Open AM via ssoadm or ssoadm.jsp, looks like the following. Every module has one of these and this module was working just fine prior to the upgrade. (To be clear, in all of these xml snippets below I've added white space to make it more comprehensible by us humans.)

<?xml version="1.0" encoding="UTF8"?>
<ServicesConfiguration>
<Service name="iPlanetAMAuthAuthLevelSetService" version="1.0">
<Schema i18nFileName="amAuthAuthLevelSet"
i18nKey="authlevelsetservicedescription" revisionNumber="10"
serviceHierarchy="/DSAMEConfig/authentication/iPlanetAMAuthAuthLevelSetService">
<Organization>

<AttributeSchema cosQualifier="default" i18nKey="a500"
isSearchable="no" listOrder="natural"
name="iplanetamauthauthlevelsetauthlevel" rangeEnd="2147483647"
rangeStart="0" syntax="number_range" type="single">
<DefaultValues><Value>1</Value></DefaultValues>
</AttributeSchema>

<AttributeSchema cosQualifier="default" i18nKey="a501"
isSearchable="no" listOrder="natural" name="upgradedAuthLevel"
syntax="string" type="single" validator="no"/>

<SubSchema inheritance="multiple" maintainPriority="no"
name="serverconfig" supportsApplicableOrganization="no"
validate="yes">

<AttributeSchema cosQualifier="default" i18nKey="a500"
isSearchable="no" listOrder="natural"
name="iplanetamauthauthlevelsetauthlevel" rangeEnd="2147483647"
rangeStart="0" syntax="number_range" type="single">
<DefaultValues><Value>1</Value></DefaultValues>
</AttributeSchema>

<AttributeSchema cosQualifier="default" i18nKey="a501"
isSearchable="no" listOrder="natural" name="upgradedAuthLevel"
syntax="string" type="single" validator="no"/>

</SubSchema>
</Organization>
</Schema>
</Service>
</ServicesConfiguration>

Note the highlighted value of the name attribute of the Service element. It holds the iPlanetAMAuthAuthLevelSetService value shown in the error. That points us to this service descriptor file as being a problem. Now it had been registered and was working previous to the upgrade. So why fail now?

An Undocumented "resourceName" Attribute

With this hint, Dave looked at the service descriptor file content within Open DJ of an authentication module native to Open AM before the upgrade and afterward. Specifically, Open AM's native DataStore module was used. Its descriptor file in Open AM 12 looked like the following:

<?xml version="1.0" encoding="UTF8"?>
<ServicesConfiguration>
<Service name="sunAMAuthDataStoreService" version="1.0">
<Schema i18nFileName="amAuthDataStore"
i18nKey="sunAMAuthDataStoreServiceDescription" revisionNumber="10"
serviceHierarchy="/DSAMEConfig/authentication/sunAMAuthDataStoreService">
<Organization>

<AttributeSchema cosQualifier="default" i18nKey="a500"
isSearchable="no" listOrder="natural"
name="sunAMAuthDataStoreAuthLevel" rangeEnd="2147483647"
rangeStart="0" syntax="number_range" type="single">
<DefaultValues><Value>0</Value></DefaultValues>
</AttributeSchema>

<AttributeSchema cosQualifier="default" i18nKey=""
isSearchable="no" listOrder="natural"
name="iplanetamauthldapinvalidchars" syntax="string" type="list">
<DefaultValues><Value>*|(|)|&|!</Value></DefaultValues>
</AttributeSchema>

<SubSchema inheritance="multiple" maintainPriority="no"
name="serverconfig" supportsApplicableOrganization="no"
validate="yes">

<AttributeSchema cosQualifier="default" i18nKey="a500"
isSearchable="no" listOrder="natural"
name="sunAMAuthDataStoreAuthLevel" rangeEnd="2147483647"
rangeStart="0" syntax="number_range" type="single">
<DefaultValues><Value>0</Value></DefaultValues>
</AttributeSchema>

<AttributeSchema cosQualifier="default" i18nKey=""
isSearchable="no" listOrder="natural"
name="iplanetamauthldapinvalidchars" syntax="string" type="list">
<DefaultValues><Value>*|(|)|&|!</Value></DefaultValues>
</AttributeSchema>

</SubSchema>
</Organization>
</Schema>
</Service>
</ServicesConfiguration>

Once the upgrade was completed, instances of this module worked just fine and new instances could be created as well without any error messages. The cause became clear by comparing the same content with that found within Open DJ after the upgrade was completed. It had been modified by the upgrade by adding the highlighted pieces:

<?xml version="1.0" encoding="UTF8"?>
<ServicesConfiguration>
<Service name="sunAMAuthDataStoreService" version="1.0">
<Schema i18nFileName="amAuthDataStore"
i18nKey="sunAMAuthDataStoreServiceDescription" revisionNumber="10"
serviceHierarchy="/DSAMEConfig/authentication/sunAMAuthDataStoreService">
<Organization>

<AttributeSchema cosQualifier="default" i18nKey="a500"
isSearchable="no" listOrder="natural"
name="sunAMAuthDataStoreAuthLevel" rangeEnd="2147483647"
rangeStart="0" resourceName="authenticationLevel"
syntax="number_range" type="single">
<DefaultValues><Value>0</Value></DefaultValues>
</AttributeSchema>

<AttributeSchema cosQualifier="default" i18nKey=""
isSearchable="no" listOrder="natural"
name="iplanetamauthldapinvalidchars" syntax="string" type="list">
<DefaultValues><Value>*|(|)|&|!</Value></DefaultValues>
</AttributeSchema>

<SubSchema inheritance="multiple" maintainPriority="no"
name="serverconfig" resourceName="USEPARENT"
supportsApplicableOrganization="no"
validate="yes">

<AttributeSchema cosQualifier="default" i18nKey="a500"
isSearchable="no" listOrder="natural"
name="sunAMAuthDataStoreAuthLevel" rangeEnd="2147483647"
rangeStart="0" resourceName="authenticationLevel"
syntax="number_range" type="single">
<DefaultValues><Value>0</Value></DefaultValues>
</AttributeSchema>

<AttributeSchema cosQualifier="default" i18nKey=""
isSearchable="no" listOrder="natural"
name="iplanetamauthldapinvalidchars" syntax="string" type="list">
<DefaultValues><Value>*|(|)|&|!</Value></DefaultValues>
</AttributeSchema>

</SubSchema>
</Organization>
</Schema>
</Service>
</ServicesConfiguration>

As of publishing time this new attribute is undocumented anywhere that I can see and was only added by the upgrade to the AttributeSchema that is associated with the module's authentication level and to the service's SubSchema element. Hopefully, its definition, semantics, and use will be documented by Forge Rock at some point. Until then, the way to fix your authentication modules is now apparent.

Two Fixes: Take Your Pick

There are two different ways to fix this. You can unregister the service descriptor for the authentication module, change the xml and re-register it. The problem with this is that any instances of that module will now be invalid and you'll need to go recreate them and add them back into their consuming authentication chains.

The better approach is to modify the xml found within Open DJ. You can do this by exporting via LDIF (in which case attribute values are base 64 encoded), decoding, modifying the xml, re-encoding, and re-importing that LDIF.

Alternatively, you can connect to Open DJ with a tool like Apache Directory Studio and modify it directly. This is particularly important if, like us, you have written your own DataStore replacement module and are not using the native version or you are using any custom module as the first one in the chain used for signing into Open AM. After upgrading you may be locked out of Open AM regardless of whether you were using XUI or not. (Correction: see update below) We didn't test that scenario but it seems highly likely. We well might have been in that state were we not to discover this problem and fix it before we upgrade.

So apply the change to any custom modules before upgrading to Open AM 13. But don't change modules native to Open AM. The upgrade will do that for you and I don't know what will happen if those values are already added.

To do this with Apache Directory Studio, open it and connect to your Open DJ with the same Directory Manager bind DN and password that Open AM uses. Expand your root DSE (dc=openam,dc=forgerock,dc=org for a default install) and expand the ou=services object.
Within this object is typically a single object with RDN of ou= followed by the version attribute of the Service element from your service descriptor file which is typically 1.0. Selecting this object will show its attributes in the right pane of Apache Directory Studio as shown below for one of our custom modules; SMSOTP auth:

Right click (or mac double finger tap) on the sunServiceSchema attribute and select Edit Value from the pop-up menu. That presents a Text Editor in which you can make the above changes to your module and press the OK button to save them back to Open DJ. Once those changes are made the module will be editable and create-able in Open AM 13 immediately without a restart if you are performing this after migration.

Since 13 isn't out yet I can't say for certain that this will be necessary because it still could get fixed by forge rock. But once 13 is out, if you have problems with your modules this hopefully will get you over that hurdle.

Enjoy.

--

Update 2016.01.21: Dave has confirmed today that the modules still function when authenticating. So the severity at this point is much less. The flaw only affects their management in XUI.