Find Communities by: Category | Product



Because of the many discussions and confusion around the topic of partitioning, disk alignment and it’s brother issue, ASM disk management, hereby an explanation on how to use UDEV, and as an extra, I present a tool that manages some of this stuff for you.







The questions could be summarized as follows:

  • When do we have issues with disk alignment and why?
  • What methods are available to set alignment correctly and to verify?
  • Should we use ASMlib or are there alternatives? If so, which ones and how to manage those?

I’ve written 2 blogposts on the matter of alignment so I am not going to repeat myself on the details. The only thing you need to remember is that classic “MS-DOS” disk partitioning, by default, starts the first partition on the disk at the wrong offset (wrong in terms of optimal performance). The old partitioning scheme was invented when physical spinning rust was formatted with 63 sectors of 512 bytes per disk track each. Because you need some header information for boot block and partition table, the smart guys back then thought it was a good idea to start the first block of the first data partition on track 1 (instead of track 0). These days we have completely different physical disk geometries (and sometimes even different sector sizes, another interesting topic) but we still have the legacy of the old days.

If you’re not using an Intel X86_64 based operating system then chances are you have no alignment issues at all (the only exception I know is Solaris if you use “fdisk”, similar problem). If you use newer partition methods (GPT) then the issue is gone (but many BIOSes, boot methods and other tools cannot handle GPT). As MSDOS partitioning is limited to 2 TiB ( it will probably be a thing of the past in a few years but for now we have to deal with it.

Wrong alignment causes some reads and writes to be broken in 2 pieces causing extra IOPS. I don’t have hard numbers but a long time ago I was told it could be an overhead of up to 20%. So we need to get rid of it.


ASM storage configuration


ASM does not use OS file systems or volume managers but has its own way of managing volumes and files. It “eats” block devices and these block devices need to be read/write for the user/group that runs the ASM instance, as well as the user/group that runs Oracle database processes (a public secret is that ASM is out-of-band and databases write directly to ASM data chunks). ASM does not care what the name or device numbers are of a block device, neither does it care whether it is a full disk, a partition, or some other type of device as long as it behaves as a block device under Linux (and probably other UNIX flavors). It does not need partition tables at all but writes its own disk signatures to the volumes it gets.


Read the entire blogpost here

cost-savings.jpgLast week (during EMC world) a discussion came up on Twitter around Oracle licensing and whether Oracle would support CPU affinity as a way to license subsets of a physical server these days.

Unfortunately, the answer is NO (that is, if you run any other hypervisor than Oracle’s own Oracle VM). Enough has been said on this being anti-competitive and obviously another way for Oracle to lock in customers to their own stack. But keeping my promise, here’s the blogpost ;-)

A good writeup on that can be found here: Oracle’s reaction on the licensing discussion
And see Oracle’s own statement on this: Oracle Partitioning Policy

So let’s accept the situation and see if we can find smarter ways to run Oracle on a smaller license footprint – without having to use an inferior hypervisor from a vendor who isn’t likely to help you use it to reduce license cost savings…


Read the entire article here.

(Note: this post is unrelated to Oracle but as I cross-post all my public blog posts here, this one is no exception. Enjoy!)


With my blog audience all being experts in the IT industry (I presume), I think we are all too familiar with the problems of classic password security mechanisms.


Humans are just not good at remembering long meaningless strings of tokens, especially if they need to be changed every so many months and having to keep track of many of those at the same time.
Some security experts blame humans. They say you should create strong passwords, not use a single password for different purposes, not write them down on paper – or worse – in an unencrypted form somewhere on your computer.


I disagree. I think the fundamental problem is within information technology itself. We invented computers to make life easier for ourselves – well, actually, that’s not true, ironically we invented them primarily to break military encryption codes. But the widespread adoption of computing happened because of the promise of making our lives easier.


I myself use a password manager (KeePass) to make my life a bit easier. There are many password manager tools available, and they solve part of the problem: keeping track of what password was used for what purpose. I now only need to remember one (hopefully, strong enough) password to access the password database and from there I just use the tool to log me in to websites, corporate networks and other services (let’s refer to all of those as “cloud servers”).


The many problems with passwords

The fundamental problem remains – even when using a password manager: passwords are no good for protecting our sensitive data or identity.


(read the entire blog post here)

Another frequently asked question I get asked a lot:


Is Oracle certified on VMware?

There are plenty articles discussing this very topic, here’s a few examples:

oracle blog – is Oracle certified on VMware
vmware understanding oracle certification support licensing environments – oracle linux fully supported vmware esxi and hyper-v
longwhiteclouds – fight the fud oracle licensing and support on vmware vsphere/
oraclestorageguy – what the oracle vmware support statement really means and why
everything oracle @ emc – vmwares official support statement regarding oracle certification and licensing

…and yet it still seems to bother many people I talk to when showing the clear and present benefits of going all-virtual.

It seems there is a lot of confusion between the meaning of “certified”, “supported”, and even the term “validated” comes up every now and then. To make things worse, the context in which those words are used makes a big difference.


(read the full article here)

future-british-bus-1.jpgA public transport company in a city called Galactic City, needs to replace its aging city buses with new ones. It asks three bus vendors what they have to offer and if they can do a live test to see if their claims about performance and efficiency holds up.

The transport company uses the city buses to move people between different locations in the city. The average trip distance is about 2 km. The vendors all prepare their buses for the test. The buses are the latest and greatest, with the most efficient and powerful engines and state of the art technology.


(read the full article here)


As an advocate on database virtualization, I often challenge customers to consider if they are using their resources in an optimal way.


And so I usually claim, often in front of a skeptical audience, that physically deployed servers hardly ever reach an average utilization of more than 20 per cent (thereby wasting over 80% of the expensive database licenses, maintenance and options).







Magic is really only the utilization of the entire spectrum of the senses. Humans have cut themselves off from their senses. Now they see only a tiny portion of the visible spectrum, hear only the loudest of sounds, their sense of smell is shockingly poor and they can only distinguish the sweetest and sourest of tastes.

– Michael Scott, The Alchemyst

About one in three times, someone in the audience objects and says that they achieve much better utilization than my stake-in-the-ground 20 percent number, and so use it as a reason (valid or not) for not having to virtualize their databases, for example, with VMware.


(read the full article here)

By now, we all know Oracle is fully supported on VMware. Anyone telling you it’s not supported is either lying to you, or doesn’t know what he is talking about (I keep wondering what’s worse).

VMware support includes Oracle RAC (if it’s version or above).  However, Oracle may request to reproduce problems on physically deployed systems in case they suspect the problem is related to the hypervisor. The support note says:

Oracle will only provide support for issues that either are known to occur on the native OS, or can be demonstrated not to be as a result of running on VMware.

In case that happens, I recommend to contact VMWare support first because they might be familiar with the issue or can escalate the problem quickly. VMware support will take full ownership of the problem. Still, I have met numerous customers who are afraid of having to reproduce issues quickly and reliably on physical in case the escalation policy does not help. We need to get out of the virtual world, into reality, without making any other changes.  How do we do that?

Unfortunately, no one can be told what the Matrix is. You have to see it for yourself.
(Opens a pillbox, empties the contents into his palms, and outstretches his hands)
This is your last chance. After this, there is no turning back.
(opens his right hand, to reveal a translucent blue pill)
You take the blue pill, the story ends, you wake up in your bed and believe whatever you want to believe.
(opens his left hand, revealing a similarly translucent red pill)
You take the red pill, you stay in Wonderland, and I show you how deep the rabbit hole goes.
- Morpheus, The Matrix (1999)

Let’s stay in database Wonderland but with a real, not virtual, view of the world.


(read the full article here)

fragsmall.pngYet another customer was asking me for advice on implementing the ZFS file system on EMC storage systems. Recently I did some hands-on testing with ZFS as Oracle database file store so that I could get an opinion on the matter.


One of the frequent discussions comes up is on the fragmentation issue. ZFS uses a copy-on-write allocation mechanism which basically means, every time you write to a block on disk (whether this is a newly allocated block, or, very important, overwriting a previously allocated one) ZFS will buffer the data and write it out on a completely new location on disk. In other words, it will never overwrite data in place. Now a lot of discussions can be found in the blogosphere and on forums debating whether this is really the case, how serious this is, what the impact is on performance and what ZFS has done to either prevent, or, alternatively, to mitigate the issue (i.e. by using caching, smart disk allocation algorithms, etc).


In this post I attempt to prove how database files on ZFS file systems get fragmented on disk quickly. I will not make any comments on how this affects performance (I’ll save that for a future post). I also deliberately ignore ZFS caching and other optimizing features – the only thing I want to show right now is how much fragmentation is caused on physical disk by using ZFS for Oracle data files. Note that this is a deep technical and lengthy article so you might want to skip all the details and jump right to the conclusion at the bottom :-)


(read the full article here)

sledgehammer.PNGAs more and more customers are moving their mission-critical Oracle database workloads to virtualized infrastructure, I often get asked how to deal with Oracle’s requirement to reproduce issues on a physical environment (especially if they use VMware as virtualization platform – as mentioned in Oracle Support Note # 249212.1).


In some cases, database engineers are still reluctant to move to VMware for that specific reason. But the discussion is not new – I remember a few years ago I was speaking in Vienna to a group of customers and partners from Eastern Europe, and these were the days we still had VMware ESX 3.5 as state-of-the-art virtualization platform. Performance was a bit limited (4 virtual CPUs max, some I/O overhead and memory limitations) but for smaller workloads it was stable enough for mission critical databases. So I discussed the “reproduce on physical in case of problems” issue and I stated that I never heared of any customer who really had to do this because of some issues. Immediately someone in the audience raised his hand and said, “well, I had to do that once!” – Duh, so far for my story…


Let’s say that very often I learn as much from my audience as (hopefully) the other way around ;-)


Later I heard of a few more occasions where customers actually were asked by Oracle support to “reproduce on physical” because of suspected problems with the VMware hypervisor. In all of the cases I am aware of, the root cause turned out to be elsewhere (Operating System or configuration) but having to create a copy in case of issues is a scary thought for many database administrators – as it could take a long time and if you have strict SLAs then this might bite back at you.


So what is my take on this?


(read the full article here)

Bart Sjerps

VMware is really expensive

Posted by Bart Sjerps Dec 11, 2012


A while ago somebody forwarded me a research paper from an "independent" research firm in which the cost of VMware and Oracle VM were compared. Interesting!

Now you might wonder why, as someone working for EMC, I would care about such comparisons. Why would I be bothered by VMware in the first place?


Well, full disclosure coming up, but here a few things you might or might not be aware of:


  • EMC aquired VMware in 2004 and is the largest share holder of VMware
  • EMC and VMware are separate companies with separate Profit & Loss, etc
  • EMC and VMware have a strong partnership
  • Both companies are free to work with other partners


This means VMware will work with non-EMC storage vendors (and they should, if for nothing else, to keep us sharp) and vice versa, EMC works with other virtualization solutions (like Microsoft Hyper-V and even Oracle VM for Solaris and Intel).


So being an EMC person, I could not care less if a customer would use Oracle VM (OVM) or VMware as long as the primary objective is met: cost savings and service level improvements. Now I leave the discussion which one of the two is better, technically speaking, to the people from both respective companies. But sometimes I get the feeling Oracle positions OVM only when they are forced to (due to pressure from VMware) and not because they really want to drive cost savings forward for their customers (otherwise they would have used OVM within their Exadata machine to achieve much higher database CPU utilization ratios). It is simply not in their best interest.


(read the full article here)

oraemc.jpgEMC and Oracle have supported each others products since 1995 and both spent millions of dollars in making them work together. EMC actually became famous in the late nineties because of our “Guilty until proven innocent” support mentality. We are known for the first company to give meaning to the concept of “Remote Support / Phone Home”, and the success stories still go around that EMC field engineers sometimes surprised customers with a visit in order to repair components (mostly disk drives), often before they were broken, and if they were actually broken the customers would not even notice (needless to say that replacements were done online).


Additionally, EMC spent billions of dollars in the EMC E-lab where EMC systems are tested together with other hard- and software components. We don’t just test to see if something works, we test to see when things break. In relation to Oracle, you will see complete stacks of storage, networking, server hardware, operating systems, volume managers, file systems, cluster-ware, and database software tested and the E-lab navigator will tell you if you need to apply any specific patches, or change configuration settings or other things to make the deployment rock solid. This goes way beyond interoperability testing of other vendors of storage systems, servers or even integrated database appliances. You can get a detailed support statement for a specific HW/SW stack by using the E-lab Navigator.


To go a step further, in 2001 EMC and Oracle together formed the Joint Escalation Center (JEC) also known as Joint Service Center or Joint Support Center  (JSC). This is a virtual team of Oracle and EMC field support engineers who work together to resolve customer issues. You don’t need a special support contract for that, it’s included in standard EMC and Oracle support agreement. This team avoids finger-pointing between Oracle and EMC and both are obliged to work together to resolve issues. It is referenced here for example.


(read the full article here)


(abstract from my blog)


One of my missions is to help customers saving money (Dirty Cache Cash). So considering the average enterprise application environment, I frequently ask them where they spend most of their IT budget on. Is it servers? Networks? Middleware? Applications?

Turns out that if you look at the operating cost of an Oracle database application, a very big portion of the TCO is in database licenses. Note that I focus on Oracle (that’s my job) but for other databases the cost ratio might be similar. Or not. But it makes sense to look at Oracle as that is the most common platform for mission-critical applications. So let’s look at a database environment and forget about the application for now.


Let’s say that 50% of the operating cost of a database server is spent on Oracle licensing and maintenance (and I guess that’s not that far off). Now if we can help saving 10% on licensing (for example, by providing fast and efficient infrastructure), would that justify more expensive, but faster and more efficient infrastructure? I guess so.


(read the full article here)

Another Frequently Asked Question: Is there any disadvantage for a customer in using Oracle/SUN ZFS appliances to create database/application snapshots in comparison with EMC’s cloning/snapshot offerings?


Oracle marketing is pushing materials where they promote the ZFS storage appliance as the ultimate method for database cloning, especially when the source database is on Exadata. Essentially the idea is as follows: backup your primary DB to the ZFS appliance, then create snaps or clones off the backup for testing and development (more explanation in Oracle’s paper and video). Of course it is marketed as being much cheaper, easier and faster than using storage from an Enterprise Storage system such as those offered by EMC.


Oracle Youtube video

Oracle White paper


In order to understand the limitations of the ZFS appliance you need to know the fundamental workings of the ZFS filesystem. I recommend you look at the Wikipedia article on ZFS (here and get familiar with its basic principles and features. The ZFS appliance is based on the same filesystem but due to it being an appliance, it’s a little bit different in behaviour.

So let’s see what a customer gets when he decides to go for the Sun appliance instead of EMC infrastructure (such as the Data Domain backup deduplication  system or VNX storage system).


(read further on my blog:

(abstract from my blog)


Although EMC and Oracle have been long-time partners, the Exadata Database Machine is the exception to the rule and competes with EMC products directly. So I find myself more and more in situations where EMC offerings are compared directly with Exadata features and functions. Note that Oracle offers more competing products, including some storage offerings such as the ZFS storage appliance and the Axiom storage systems, but so far I haven’t seen a lot of pressure from those (except when these are bundled with Exadata).


Recently I have visited customers who asked me questions on how EMC technology for databases compares with, in particular, Oracle’s Hybrid Columnar Compression (HCC) on Exadata. And some of my colleagues, being storage aliens and typically not database experts, have been asking me what this Hybrid Compression thing is in the first place.


(read the full article here)

Recently I got involved in another customer discussion around how to replicate data between two datacenters. The suggestion was to use Oracle ASM (with normal redundancy) instead of SRDF (or other SAN/Storage based tooling).


Reasons I have heard why customers would choose ASM over EMC tooling:


a) The claim that integration with Oracle would be better

b) Performance would be higher (i.e. lower latency because of parallel writes to both mirrors where SRDF would do the remote I/O in sequence)

c) Cost (no SRDF licences, ASM is free)


Although these statements might be partly true, I still recommend my customers to stay away from ASM mirroring (unfortunately they do not always follow my advice). OK, I am biased because I work for EMC, but still I would like to put things in the right perspective. So here a list of reasons why ASM might not be the best way to replicate data between datacenters:


  • Oracle host has to process every write twice as every write to an Oracle file has to be mirrored. This adds some CPU and I/O overhead and reduces the ability somewhat to process more workload. Expensive Oracle-licenced CPU's are now spending cycles on other stuff than application processing.
  • ASM can perform incremental updates after a link failure. However, this only works if the data that was disconnected has not changed in any way. If it was changed, you risk, best case, a full 100% re-sync of all data (which can take a very long time during which you have a severe performance impact, and, during this time you will have no D/R protection). Worst case, you will risk silent data corruption.
  • A two-datacenter setup cannot resolve split brain scenario's. Unless you deploy a 3rd (arbitration) site with 100% physically separate communication links to both the primary and the D/R location, you risk either split-brain scenario's (which can be a disaster for the business) or you risk downtime in case of a failure (eliminating high availability completely, which was the reason in the first place to mirror the data). Check for more information on this requirement.(Note that with storage replication, because of the sequential write, you don't have this issue although for automated failover you need arbitration as well)
  • The setup is complex because you need to set up ASM failure groups correctly. Failure in correct setup means you mirror two volumes to a local site which can cause severe dataloss in case of a disaster. Failure to correctly setup priority paths can cause subtle performance impact which can be hard to diagnose. Check and for more insight.
  • Any bug in the Oracle ASM or database code can cause issues. As an example, see the footnote (similar for storage but that tends to be much more robust and easier to monitor).
  • A failure of the storage connectivity can lead to both reads and writes being serviced over ISL (inter-switch links between the two datacenters) again causing severe performance impact (which will get even worse after the storage connectivity has been restored, due to re-silvering)
  • No consistency is possible between application data and database data because ASM only replicates databases. The exception is if you put all flat files on Oracle ACFS - which is a quite recent Oracle feature and hasn't been proven in the field yet. This very problem has been the reason for Oracle themselves to implement EMC Recoverpoint for their internal business applications and to endorse Recoverpoint in a joint EMC/Oracle whitepaper as a viable solution.
  • No consistency is possible between multiple databases. If you have a direct transaction dependency between databases, any failover might result in slight checkpoint timing issues causing transactions being applied to one but not both databases.
  • Only synchronous replication is possible, there is no fallback option to async to mitigate performance impact during peak workloads, upgrades, stresstesting, etc.
  • During a storage failure, transactions being processed might hang until the ASM layer decides that one site has failed and it will continue with one failure group only. Depending on the settings, if the failure is intermittent (such as caused by a bad but not completely broken cable) the transactions will experience good performance, hang for a while, be slow during ASM resilver, perform well again for a while and the cycle repeats. This can be very, very hard to diagnose.
  • Rolling disasters can cause complete unability to do failover. For example, a fire in datacenter A causes the remote links to break but database processing continues on site A. A moment later the link comes back for a while and resilvering remote ASM data to site B starts. During the resilver but before being complete, the fire completely breaks the remote links. After 30 minutes, the fire causes the servers to fail and corrupt or even destroy data at site A so manual failover to site B is required. However, during the aborted re-silvering the data at site B is completely corrupt so a full tape restore is required, taking many hours of downtime and causing severe loss of transactions
  • There is no well established method to test D/R capabilities. Manually forcing link failures will immediately cause performance issues and other risks. In the real world this causes customers to be reluctant to perform DR testing after going live, causing them to be in production for years without ever being able to test if their D/R scenario works.
  • Taking storage-based snapshots will be challenging at best because no cloning tools supports consistent snapshots to be taken from two separate storage boxes at the same time (which is needed because of ASM failure groups). Although technically possible with EMC, this needs to be scripted and requires special multi-session consistency implementation.
  • Every Oracle cluster needs to be carefully configured specifically for ASM mirroring.
  • Every Oracle cluster needs to be monitored for ASM mirroring to be in sync, including the link utilization.
  • Adding a 3rd resp 4th cluster node and so on, on one of the two locations is equally complex.
  • Every storage reconfiguration (i.e. adding or moving storage volumes) needs to be performed with these complexities in mind. Adding a volume without properly setting up the failure groups renders the whole environment unable to failover.
  • Another replication method is required for pre-Oracle 10 environments, for non-Oracle databases, for fileservers, for VMware environments, for Email and content, etc. This can be SAN based but then Oracle would be the single exception for replication. If the preference is for application replication then every application type would require its own method, causing a very complex D/R runbook with multiple dependencies, logical replication instances, versions, etc etc. It is debatable whether it would be possible to sustain a datacenter failure without suffering major downtime and/or dataloss when dealing with such a complex environment. It would be near impossible to perform D/R testing for more than a single application or sub component.
  • Nobody (not even from Oracle, I verified) seems to understand how Oracle deals with concurrent writes, where one makes it to site A, another makes it to site B but both do not complete fully when a failure happens (such as a power outage). The Oracle cluster should be able to recover but might require special understanding from Oracle administrators and the devil is in the details. Not being able to deal with this causes data corruption, possibly without being detected for a longer period.


*) Footnote (from Oracle documentation):


Known Issues

If the NFS device location is not accessible,

1. Shutting down of Oracle Clusterware from any node using “crsctl stop crs”, will stop the stack on that node, but CSS reconfiguration will take longer. The extra time will be equal to the value of css misscount.

2. Starting Oracle Clusterware again with “crsctl start crs” will hang, because some of the old clusterware processes will hang on I/O to the NFS voting file. These processes will not release their allocated resources such as PORT.


These issues are addressed and will be fixed in future versions.

Conclusion: Before stopping or starting the Oracle Clusterware, it should be made sure that the NFS location is accessible using the “df” command for example. If the command does not hang, one may assume that the NFS location is accessible and ready for use.

Filter Blog

By date:
By tag: