Find Communities by: Category | Product

I thought it worth republishing Dave Welch's (of House of Brick) comments to my discussion on the subject of Oracle licensing costs on VMware vSphere configurations. Here are Dave's comments:

 

All,

 

Let me start by bringing all of you into the courtroom. There are three issues that you will observe me as counsel provide to the jury as part of my allowed instruction (I am not an attorney in real life).

 

I offer the definitions in this paragraph only to make this post as self-sufficient as possible and not with intent to talk down to anyone.  Case law refers to precedents established in other court cases. Case law can supplement statutory law enacted by legislative bodies and can stand in lieu of statutory law when applicable statutory law does not exist.

 

Back to the courtroom:

 

  1. You cannot prove a negative.
  2. Just about every contract that any of our organizations enters into or sees contains a clause that essentially states that the only terms that are binding have been reduced to the fully-executed instrument itself as well as other explicitly-referenced written material, and all verbal agreements and/or representations are null and void. The language usually allows for the obvious possibility of written, fully executed amendment.
  3. As often as original documents are unavailable, courts in all fifty states recognize case law called "best available image.” This refers to copies which become recognized as legally binding as if they were the original. “Best available image” case law has been universally recognized since 1989 when as the Medical Records Systems Analyst for 23 hospitals and 7 clinics in 3 states, I had responsibility to begin architecture for imaging of medical records.


I offer the following as the exclusive binding source documentation for this post:



The OLSA is bi-laterally executed between Oracle and the customer. I have seen many of them as you can imagine in my role as the lead of House of Brick’s Oracle Enterprise License Assessment business line. The OLSA always contains:


  • A reference to other unnamed published documentation, which given the nature of the reference, refers to the Licensing Data Recovery Environments extract of the Software Investment Guide as well as the current Oracle Support policy. Because of the nature of these external OLSA references, I make it my business to occasionally archive current versions of the Software Investment Guide and Licensing Data Recovery Environments document such that any given OLSA can be clearly tied to the external terms in effect as of the OLSA’s execution.
  • A Non-Disclosure Agreement as to the OLSA’s terms. As Oracle publishes template OLSA copies, a reasonable and customary interpretation of the NDA’s coverage is pricing (discounts) as well as any other non-standard qualitative terms that the customer may have negotiated with Oracle. We have initiated for customers and seen Oracle approve non-standard qualitative OLSA terms. However, we have never bothered to initiate sub-cluster licensing terms as there is no need to do so.


The sum and detail of the stock published OLSA and its external Software Investment Guide for at least the last decade and to-date with respect to processor-based licensing is two-fold:


  • Customers must license all physical cores or sockets on a host where Oracle executables are “installed and/or running” (with physical cores factored per Oracle’s published Core Factor Table, and potentially subject to the so-called 10-day rule [whose terms became more restrictive sometime during 2007]). Notice the tense. Oracle customers are contractually obligated for licensing the physical servers where the Oracle executables are and have been, not where they might go. To imply otherwise without explicit contractual inducement would not be unlike concluding that I am legally obligated to purchase transportation to or obtain a visa for destinations that I clearly have the capability of visiting but where I have neither ever been nor yet made a determination to even visit.
  • Furthermore, customers must pay a license to cover the use of remote mirroring at the storage unit or shared disk array layer to transmit Oracle executables to a SAN whether or not and that set of Oracle executables is “installed and/or running” on any physical host connected to the SAN.


When discussing Oracle licensing, there are two common areas of confusion:


  1. Oracle itself defines both of the terms “Hard Partitioning” and “Soft Partitioning” in its Partitioning document, published since 2002. The document makes clear that both “Hard Partitioning” and “Soft Partitioning” have to do with carving up physical cores/CPUs within a single physical server. It is Oracle’s right to define the terms as they relate to the OLSA. I frequently see Oracle reps (knowingly?), customers and others confuse these terms over into a discussion of licensing physical servers within a sub-cluster. As often as unintentional term confusion occurs, I believe it important for authors to make corrective edits after the confusion has been pointed out, especially on a topic with such scaled financial ramifications.
  2. To say that Oracle has not made a statement supporting vSphere DRS Host Affinity Rules (as has been asserted in this thread) is beside the point. That is clearly in the realm of proving a negative. That DRS Host Affinity Rules may or may not be used as a mechanism to assure that Oracle executables are not “installed and/or running” on unlicensed physical hosts is customer-discretionary and irrelevant to the OLSA’s binding terms. I might add that methods existed to restrict VM movement to a sub-cluster of physical hosts long before the vSphere 4.1 advent of DRS Host Affinity Rules.

 

I tried to force both clarifications into Gartner Group’s Chris Wolf’s November 10, 2010 blog thread via comment posting.  I feel that I failed to get Mr. Wolf to directly address my point before he elected to close the thread.  I feel that I failed both verbally and in writing during the pre-publication interview process to preventively insert the clarifications into TechTarget’s Beth Pariseau’s May 4, 2011 article titled “Oracle licensing for vSphere 4.1 irks VMware pros.” Furthermore, the TechTarget article’s title unfortunately implies that Oracle may have licensing terms specific to vSphere version 4.1, which it does not, or licensing terms specific to VMware technologies in any way, which it does not. Thankfully the VMware Corp. November 2011 whitepaper (to which I contributed) “Understanding Oracle Certification, Support and Licensing for VMware Environments” got it right. I can confirm that the entire white paper received all the multi-disciplinary pre-publication review that you would imagine appropriate for a piece of its importance and ramifications.

 

House of Brick is occasionally invited by customers and/or partners to get on the phone with Oracle to deal with Oracle licensing concerns that Oracle has verbally conveyed to the customer.  We always welcome these opportunities.  The most recent call was a couple months ago. An Oracle Corp. licensing specialists was on the call. The impetus for the call was that Oracle had previously told the customer that all nodes in a physical cluster must be licensed, not just those where Oracle executables are “installed and/or running.” We asked the Oracle licensing specialist about the statement.  He informed us that it was Oracle’s policy.  We asked where the policy was written.  He replied that it is an unwritten policy. At this point, we reminded him of the binding OLSA language whereby all verbal conveyances and understandings were voided at the execution of the document leaving only the printed OLSA terms in effect. That was the end of the discussion. The customer left the call entirely and appropriately satisfied as to their full legal compliance with their proposed sub-cluster licensing scenario. This vignette is by no means an isolated incident. The outcome is always predictable and consistent.

 

A previous post on this thread states ‘I've heard Dave at HOB contend the same thing you've heard, which is essentially: "DRS/Host Affinity SHOULD be fine"...’  The quote could be interpreted inappropriately out of context of my consistent, persistent statements. What I have always said is that any method (manual or automatic) to restrict movement of Oracle executables to a licensed sub-cluster of physical hosts is fully OLSA-compliant and legally defensible. At the release of vSphere 4.1, I added the statement that DRS Host Affinity Rules happens to be one, the newest, and the easiest of multiple available methods of restricting Oracle executables’ movement.

 

There are comments on this thread that I believe fall under the umbrella of attempting to prove negatives. Oracle has no legal obligation to go on paper with sub-cluster licensing clarifications specific to vSphere. In my view, Oracle has no financial incentive to do so.  Stop looking for that to happen.

 

Occasionally people actually insist that I produce an Oracle-authored document that says Oracle will not ding them for not licensing all nodes in a physical cluster. That strikes me like asking me to provide an Oracle-authored document that says Oracle will not make a claim against me for choosing to drive a Ford as opposed to a GM. Not only is it asking me to prove a negative which I cannot do, but it delves into an area that is none of Oracle’s business since it is not listed in my OLSA obligations.

 

I see no need for Oracle to comment on their recognition of the validity of vSphere logs to prove vSphere VM movement. vSphere logs may well be the best available mechanism to document Oracle executables’ travels.  Although vSphere logs may not be identical to “best available image” case law, “best available image” is instructive to the audit log/verification concept under discussion. As of November, 2010, the majority of the world’s servers are virtual. ~85% of those virtual machines are VMware. That is tenured technology and no one is questioning the de facto authority of those logs. Oracle has the OLSA contractually-granted right to present themselves at any time without notice (upon 45 days’ notice as of the 2001 OLA) for the compliance inspection of their customers’ physical hardware. Oracle has options to audit the current and historical location of where their executables are “installed and/or running.” That those options may rely on mechanisms not provided by Oracle is beside the point.

 

As for the question on the thread as to whether vSphere logs could be maliciously manipulated,security industry experts have said that eighty percent of security breaches come from within the firewall. I see a corollary to that statistic. Oracle has plenty of financially incented prospective friends inside its customer organizations. It is sad that human integrity is such that posting of large rewards has become necessary to incent individuals to turn in their employers for knowing theft of software. To me, OLSA obligations are like the U.S. tax code. I believe in paying every penny I owe. However, beyond that, it is my discretion to who or what I donate and in what amount. I have no patience with individuals or entities that premeditate the creation of OLSA compliance issues.  I similarly have no patience with the knowing spreading of FUD by some professionals in what could be construed as extortion of funds beyond customers’ executed contractual obligations. I will continue to vigorously promote and defend the legal rights of both software vendors and their customers even if that means I induce accelerated hair loss through rapid, frequent hat swapping.

 

What Oracle verbally represents on this or any other contractual topic for that matter is of absolutely no relevance whatsoever. Look to the OLSA and extant external referenced documentation as of the OLSA’s execution date. Should you still be dissatisfied, look to court history of judgments against customers attempting to exercise their sub-cluster licensing rights. Let me save you some time. Relevant court history does not exist. There is a simple reason: the terms of your executed OLSA are clear, sufficient, binding. The terms explicitly invalidate any and all verbal representations.

 

Given what I have presented here, I see no legal, risk management, or moral need for and am not inclined to encourage and/or endorse air gap clusters strictly for Oracle licensing purposes. I am concerned that such talk may scare parties away from investigating, understanding, and asserting their legal rights. Furthermore, in my observation, such talk gets quickly quoted out of context of the original motive. I believe such talk has a similar long-term effect on the market as paying ransom for hostages has on future hostage taking. The SMB market has every contractual right to leverage sub-cluster licensing to facilitate crossing the chasm over to Business Critical Application virtualization. That larger enterprises may find it convenient to use air-gap cluster separation of their Oracle workloads is great, but to do so has absolutely no bearing whatsoever on their legally binding contractual obligations to Oracle Corporation.

 

Some on the thread have suggested this or that course of action to minimize risk with Oracle. Occasionally people express concern to me that Oracle has far deeper legal pockets than they do. I believe those that make these points do not understand the power of their position. I cannot think of one reason that Oracle could possibly want this thing to go to court and indeed I believe Oracle has everything to lose minimally moving forward if and when it does. Once such a judgment was out, it would cause an immediate slow-down on non-contractual cash donations that some of Oracle’s customers have felt induced to make.


Errata on remote mirroring and audit notification: March 26, 2014.

Stake in the Ground

VMware has put a stake in the ground with respect to Oracle licensing of VMware VMs running Oracle. The gist of this statement regarding certification, support, and licensing is that DRS host affinity rules, combined with vCenter audit trails showing where VMs have actually run, are sufficient for Oracle licensing purposes. The summary of the document states:

 

DRS Host Affinity rules can be used to run Oracle on a subset of the hosts within a cluster. In many cases, customers can use vSphere to achieve substantial licensing savings.

 

vCenter VMotion Logging

Concerning vCenter VMotion logging, the document states:

 

With VMware vMotion and DRS technologies you can migrate a live virtual machine running Oracle software from Host A to Host B for server maintenance or load-balancing purposes. In such instances you should ensure that the migration occurs between fully licensed hosts by using vSphere capabilities such as DRS Host Affinity—that is, both Host A and Host B must be fully licensed hosts from an Oracle licensing perspective (as described in section 2.1). VMware vCenter™ Server generates several migration log files maintained at /var/log/vmware/hostd.log and /vmfs/volumes/datastore/vm/vmware.log that can be leveraged to track and record such virtual machine movements across hosts for compliance purposes. Additionally, VMware provides an extensive open API that allows compliance tools to generate user-friendly reports using this data. In particular, VMware vCenter Configuration Manager provides host-level change-tracking mechanisms that enable you to record virtual machine movements across hosts. Since this hostlevel change tracking leverages an open API, third-party configuration-management solutions may also provide some of this functionality for VMware environments.

 

CPU Affinity

VMware's document also contains a discussion concerning CPU affinity. In that statement, VMware calls Oracle's bluff with respect to the hard partitioning statements concerning OVM on Oracle's website. Again, VMware's document states:

 

VMware enables you to pin a virtual machine to certain CPUs inside the host (using CPU pinning or CPU affinity). We believe this technology is every bit as robust and reliable as the “hard partitioned” technologies to which Oracle accords preferential subsystem pricing, and should enable customers to license only a subset of the host capacity. Unfortunately Oracle does not recognize this approach as a valid hard partitioning for its licensing mechanism. So today customers must license all the CPUs in the host and follow the “fully licensed host” approach for VMware environments.

 

Net-Net

The net-net is that VMware is endorsing the view that fully-licensed servers are required, but that Oracle servers can be part of a larger DRS / HA cluster environment, so long as VMs running Oracle are strictly limited to licensed servers, and this can be proven using audit trail documentation.

 

I cannot overstate the importance of this in terms of Oracle licensing. This is a sea change, folks.

I have received a fair number of responses to my previous post on this subject (some via comments and some via email). I thought the discussion worthwhile enough to punch it up a bit more here.

Backup

As I pointed out in the previous post, EMC can easily match NetApp's play to back up ExaData with the following:

 

EMCBackupSolution.png

As Geoff Rosser so correctly pointed out, this answer is incomplete. Yes, Data Domain is an awesome Oracle backup solution. Yes, it provides incredible deduplication rates for Oracle database environments. (Thanks, dynamox.) However, it is not the only viable solution from EMC for backing up ExaData. For example, the following would also work:

 

EMCBackupSolutionIsilon.png

As usual comments are welcome.

There has been lots of material on the web recently concerning NetApp being able to backup ExaData. The purpose of this blog is to respond to that content, and state why NetApp's offering is rather lame, and actually offers nothing new.

 

The items on the web produced by NetApp are easy to find. I will not increase their Google hit rate by linking to them here. Suffice it to say, Neil Gerren's blog contains the principle content to which I will respond here. There is also NetApp technical report TR 4022, a 34 page tome, which I have read thoroughly. I think I completely understand what NetApp has produced (which they thought so much of to produce a press release on the subject), and, believe me, there is nothing new or unique in NetApp's offering.

 

I will summarize the solution here. To avoid any issues with NetApp copyrights, I have recreated the graphics describing the solution. However, the gist is identical to that contained in TR 4022. The solution consists of two parts:

 

  • Backup
  • Disaster recovery / remote replication

Backup

Let's start with the backup component. What NetApp proposes is the following:

 

NetAppBackupSolution.png

The fundamental gist of the solution is:

 

  • Connect ExaData to a NetApp filer using 10 GbE.
  • Backup the Oracle database on the ExaData to the NetApp filer using RMAN.

 

I ask you: Is there anything interesting or unique here? In fact, I would state for the record that it would be manifestly better to do the following:

 

EMCBackupSolution.png

In other words, an EMC Data Domain deduplication array is capable of connecting to an ExaData box and backing it up using RMAN in exactly the same manner as a NetApp filer. I find this statement so obvious that it seems self-evident. But, again, NetApp's material on the subject makes it necessary to point this out.

 

Data Domain has many advantages over NetApp in the area of Oracle backup. But I will not belabor that point. Suffice it to say, we can do the same thing that NetApp can in the area of backing up Oracle ExaData via 10 GbE. And we certainly have a product which is very suited to backing up Oracle database data, and commonly used for this purpose.

Disaster Recovery / Remote Replication

Let's turn now to the disaster recovery / remote replication solution. What NetApp proposes for a disaster recovery / remote replication solution looks like this:

NetAppDRSolution.png

The fundamental gist of the solution is:

 

  • Replicate the production Oracle databse on ExaData using Oracle Data Guard with physical standby. The target platform consists of generic Intel x86 servers running Linux. (Remember that the RAC compute nodes in the ExaData are nothing more than relatively typical x86 / Linux servers. Thus, Oracle Data Guard with physical standby to a similar non-Oracle server will work just fine, it being an identical platform from an Oracle point of view.)
  • The target database server is connected to a NetApp filer using standard protocols (e.g. FC or NFS).
  • Once the database copy is on the NetApp filer, normal NetApp tools can be used to snap, clone, etc. the target database.

 

Again. it seems rather patently obvious that the following is also possible:

 

EMCDRSolution.png

 

And in terms of the ability to perform snaps, clones, and so forth of the Oracle database once it is on our array, EMC's arrays contain features that easily meet those offered by NetApp.

 

So, again, NetApp's highly touted offering concerning storing ExaData offers nothing new or unique. Each and every feature offered by NetApp is available from EMC, and many features offered by EMC are superior to those provided by NetApp.

 

Comments to this post are welcome.

Recently I got involved in another customer discussion around how to replicate data between two datacenters. The suggestion was to use Oracle ASM (with normal redundancy) instead of SRDF (or other SAN/Storage based tooling).

 

Reasons I have heard why customers would choose ASM over EMC tooling:

 

a) The claim that integration with Oracle would be better

b) Performance would be higher (i.e. lower latency because of parallel writes to both mirrors where SRDF would do the remote I/O in sequence)

c) Cost (no SRDF licences, ASM is free)

 

Although these statements might be partly true, I still recommend my customers to stay away from ASM mirroring (unfortunately they do not always follow my advice). OK, I am biased because I work for EMC, but still I would like to put things in the right perspective. So here a list of reasons why ASM might not be the best way to replicate data between datacenters:

 

  • Oracle host has to process every write twice as every write to an Oracle file has to be mirrored. This adds some CPU and I/O overhead and reduces the ability somewhat to process more workload. Expensive Oracle-licenced CPU's are now spending cycles on other stuff than application processing.
  • ASM can perform incremental updates after a link failure. However, this only works if the data that was disconnected has not changed in any way. If it was changed, you risk, best case, a full 100% re-sync of all data (which can take a very long time during which you have a severe performance impact, and, during this time you will have no D/R protection). Worst case, you will risk silent data corruption.
  • A two-datacenter setup cannot resolve split brain scenario's. Unless you deploy a 3rd (arbitration) site with 100% physically separate communication links to both the primary and the D/R location, you risk either split-brain scenario's (which can be a disaster for the business) or you risk downtime in case of a failure (eliminating high availability completely, which was the reason in the first place to mirror the data). Check http://www.oracle.com/technology/products/database/clustering/pdf/thirdvoteonnfs.pdf for more information on this requirement.(Note that with storage replication, because of the sequential write, you don't have this issue although for automated failover you need arbitration as well)
  • The setup is complex because you need to set up ASM failure groups correctly. Failure in correct setup means you mirror two volumes to a local site which can cause severe dataloss in case of a disaster. Failure to correctly setup priority paths can cause subtle performance impact which can be hard to diagnose. Check http://download.oracle.com/docs/cd/B28359_01/server.111/b28282/configbp005.htm and www.oracle.com/technetwork/database/clustering/overview/extendedracversion11-435972.pdf for more insight.
  • Any bug in the Oracle ASM or database code can cause issues. As an example, see the footnote (similar for storage but that tends to be much more robust and easier to monitor).
  • A failure of the storage connectivity can lead to both reads and writes being serviced over ISL (inter-switch links between the two datacenters) again causing severe performance impact (which will get even worse after the storage connectivity has been restored, due to re-silvering)
  • No consistency is possible between application data and database data because ASM only replicates databases. The exception is if you put all flat files on Oracle ACFS - which is a quite recent Oracle feature and hasn't been proven in the field yet. This very problem has been the reason for Oracle themselves to implement EMC Recoverpoint for their internal business applications and to endorse Recoverpoint in a joint EMC/Oracle whitepaper as a viable solution.
  • No consistency is possible between multiple databases. If you have a direct transaction dependency between databases, any failover might result in slight checkpoint timing issues causing transactions being applied to one but not both databases.
  • Only synchronous replication is possible, there is no fallback option to async to mitigate performance impact during peak workloads, upgrades, stresstesting, etc.
  • During a storage failure, transactions being processed might hang until the ASM layer decides that one site has failed and it will continue with one failure group only. Depending on the settings, if the failure is intermittent (such as caused by a bad but not completely broken cable) the transactions will experience good performance, hang for a while, be slow during ASM resilver, perform well again for a while and the cycle repeats. This can be very, very hard to diagnose.
  • Rolling disasters can cause complete unability to do failover. For example, a fire in datacenter A causes the remote links to break but database processing continues on site A. A moment later the link comes back for a while and resilvering remote ASM data to site B starts. During the resilver but before being complete, the fire completely breaks the remote links. After 30 minutes, the fire causes the servers to fail and corrupt or even destroy data at site A so manual failover to site B is required. However, during the aborted re-silvering the data at site B is completely corrupt so a full tape restore is required, taking many hours of downtime and causing severe loss of transactions
  • There is no well established method to test D/R capabilities. Manually forcing link failures will immediately cause performance issues and other risks. In the real world this causes customers to be reluctant to perform DR testing after going live, causing them to be in production for years without ever being able to test if their D/R scenario works.
  • Taking storage-based snapshots will be challenging at best because no cloning tools supports consistent snapshots to be taken from two separate storage boxes at the same time (which is needed because of ASM failure groups). Although technically possible with EMC, this needs to be scripted and requires special multi-session consistency implementation.
  • Every Oracle cluster needs to be carefully configured specifically for ASM mirroring.
  • Every Oracle cluster needs to be monitored for ASM mirroring to be in sync, including the link utilization.
  • Adding a 3rd resp 4th cluster node and so on, on one of the two locations is equally complex.
  • Every storage reconfiguration (i.e. adding or moving storage volumes) needs to be performed with these complexities in mind. Adding a volume without properly setting up the failure groups renders the whole environment unable to failover.
  • Another replication method is required for pre-Oracle 10 environments, for non-Oracle databases, for fileservers, for VMware environments, for Email and content, etc. This can be SAN based but then Oracle would be the single exception for replication. If the preference is for application replication then every application type would require its own method, causing a very complex D/R runbook with multiple dependencies, logical replication instances, versions, etc etc. It is debatable whether it would be possible to sustain a datacenter failure without suffering major downtime and/or dataloss when dealing with such a complex environment. It would be near impossible to perform D/R testing for more than a single application or sub component.
  • Nobody (not even from Oracle, I verified) seems to understand how Oracle deals with concurrent writes, where one makes it to site A, another makes it to site B but both do not complete fully when a failure happens (such as a power outage). The Oracle cluster should be able to recover but might require special understanding from Oracle administrators and the devil is in the details. Not being able to deal with this causes data corruption, possibly without being detected for a longer period.

 

*) Footnote (from Oracle documentation):

 

Known Issues

If the NFS device location is not accessible,

1. Shutting down of Oracle Clusterware from any node using “crsctl stop crs”, will stop the stack on that node, but CSS reconfiguration will take longer. The extra time will be equal to the value of css misscount.

2. Starting Oracle Clusterware again with “crsctl start crs” will hang, because some of the old clusterware processes will hang on I/O to the NFS voting file. These processes will not release their allocated resources such as PORT.

 

These issues are addressed and will be fixed in future versions.

Conclusion: Before stopping or starting the Oracle Clusterware, it should be made sure that the NFS location is accessible using the “df” command for example. If the command does not hang, one may assume that the NFS location is accessible and ready for use.

Filter Blog

By date:
By tag: