SNIA Certified Storage Engineer (SCSE) book / study guide (S10-201)Michael Boelen - rootkit.nl
1. Explain and recognize basic Storage Networking Technology Components and Concepts (9%)
1.1 Compare and contrast how the disk technologies of Fibre Channel, ATA, SATA, SCSI, and SAS operate
Define differences between serial and parallel approaches within a configuration
PATA: Master/Slave, shared bus
1.2 Describe Array Technology/Virtualization
Goal: Hiding real disks from application Virtualization knows several layers, including:
Describe virtualization implementation techniques and management strategies (e.g., in-band and out-of-band)
1.3 Define SAS and SATA technology
Identify a legal vs. illegal SAS topology layout
Explain the routing mechanism that occurs in a SAS expander topology
Direct routing: SAS host to directly attached devices
2. Perform Storage Networking Administration (24%)
If the number of outstanding I/Os per device is expected to be above 32, then QueueDepth needs to be increased. Usually the vendor of the storage and/or HBA's have documents describing how to adjust the value and how to measure the value with the best performance. Usually dividing the total of the storage array's queue lenght with the amount of HBA's. If QueueDepth is undersized, there can be a performance degradation due to Storport throttling of its device queue.
IO coalesce controls the number of CPU interrupts, for more efficient CPU utilization. Turn on the I/O coalesce parameter in high-performance environments. However when adjusting the related parameters it's important to find the most suitable values. Reducing the number of interrupts can cause poor performance. It depends mainly on the workload.
CoalesceMsCnt is the count in milliseconds, CoalesceRspCntis the count of pending responses.
ConnectionOption CO 0-3 See note 1 below DataRate DR 0-3 See note 2 below FrameSize FR 512,1024,2048 HardLoopID HD 0-125 ResetDelay RD 0-255 EnableBIOS EB 0,1 See note 3,6 below EnableHardLoopID HL 0,1 See note 3 below EnableFCPErrRecovery EF 0,1 See note 3 below ExecutionThrottle ET 1-65535 See note 5 below EnableExtendedLogging EL 0,1 See note 3,4 below LoginReTryCount LR 0-255 EnableLipReset LP 0,1 See note 5 below PortDownRetryCount PD 0-255 EnableLIPFullLogin FL 0,1 See note 3 below LinkDownTimeOut LT 0-240 EnableTargetReset TR 0,1 See note 3,5 below MaximumLUNsPerTarget ML 0,8,16,32,64,128,256 See note 5 below LinkDownError LD 0,1 See note 3,5 below FastErrorReporting FE 0,1 See note 3,5 below
Specifies the maximum number of I/O commands allowed to execute on a HBA port. When a port’s execution throttle is reached, no new commands are executed until the current command finishes
256 1–256 Windows Frame Size Specifies the size of a Fibre Channel frame per I/O. 2048 512–2048 All Fibre Channel Data Rate Specifies the HBA adapter data rate. When set to Auto, the adapter auto-negotiates the data rate with the connecting SAN device. Auto 1 (Auto), 2 (1Gb), 3 (2Gb), 4 (4Gb) All Maximum Queue Depth Specifies the maximum number of I/O commands allowed to execute/queue on a HBA port. 32 1-65535 VMware ESX Maximum Scatter Gather List Size Specifies the size of the list of DMA items that are reported to SCSI mid-level per I/O request. 32 1-255 VMware ESX Maximum Sectors Specifies the maximum number of disk sectors that are reported to SCSI mid-Level per I/O request. 512 512, 1024, 2048 VMware ESX
Switch zoning modifications are the most common change that occurs in a SAN, which explains the increased chance for mistakes. Also, there is also no way to automate zoning since it requires human decisions to determine initiator and target accessibility.
Host HBA issues occur almost as frequently as SAN zoning problems.
Disk zoning / lun masking provide another layer of manual configuration that can lead to problems.
FC cabling problems
Use a clear naming and cable convention to avoid problems and speeds up debugging issues.
Login after connecting to a fabric switch.
Related ports: F_port to N_Port (or NL_Port)
Two node ports establish a connection between (often fibre channel HBA connection to a switch).
Related ports: N_port to N_port
Process login is used to set up the environment between related processes on an originating N_Port and a responding N_Port.
Related ports: ULP( scsi-3 to scsi-3)
Monitoring of the bandwidth usage is important in tracing the source of these kind of problems.
One of the symptoms to this kind of problems are SCSI time out errors.
M-EOS switches: use “open” mode
Changes after activation of interoperability mode: Switch Feature
For example in with McData switches domain IDs are restricted to the range 97-127. This is to accommodate McData's nominal restriction to this same range. They can either be set up statically (the Cisco MDS switch accept only one domain ID, if it does not get that domain ID it isolates itself from the fabric) or preferred. (If it does not get its requested domain ID, it accepts any assigned domain ID.)
Note Brocade uses the cfgsave command to save fabric-wide zoning configuration. This command does not have any effect on Cisco MDS 9000 Family switches if they are part of the same fabric. You must explicitly save the configuration on each switch in the Cisco MDS 9000 Family.
TE ports and PortChannels cannot be used to connect Cisco MDS to non-Cisco MDS switches. Only E ports can be used to connect to non-Cisco MDS switches. TE ports and PortChannels can still be used to connect an Cisco MDS to other Cisco MDS switches even when in interop mode.
IVR-enabled VSANs can be configured in any interop mode.
1. Login as root
ISL ports should be monitored. A ISL port performing at 80% capacity could indicate possible oversubscription.
Create initial Fabric configuration:
Switch1:admin>cfgcreate "Fabric1", "LinuxNode1Zone1"Once the configuration is created, additional zones can be added with the cfgadd command:
Switch1:admin> cfgadd "Fabric1", "LinuxNode1Zone2"
Effective configuration: active set, loaded in memory. Can be saved with cfgSave.
Defined configuration: saved set on flash, can be loaded with cfgEnable.
Default zone membership includes all ports or WWNs that do not have a specific membership association. Access between default zone members is controlled by the default zone policy.
Type mismatch: Occurs when the name of a zone object in one fabric is also used for a different type of zone object in the other fabric.
Fabric A: alias: Mkt_Host 1,16
Fabric B: zone: Mkt_Host 1,16
Content mismatch: Occurs when the name and type of a zone object in one fabric is also used in the other fabric but the content or order is different.
Fabric A: alias: Eng_Stor wwn1; wwn2
Fabric B: alias: Eng_Stor wwn2; wwn1
Different time out values on E-ports can cause fabric segmentation
Segmentation errors can exist if a switch has a bigger zone database than the allowed maximum size. Usually the oldest/lightest switch determines how big the database can be within a fabric.
Different VSAN's on both fabrics.
ACL/allow list on VSAN, blocking (valid) traffic.
The name of a zone in Fabric A should not be used for a different type of zone in Fabric B. For example, if you create a zone named myZone in Fabric A, you should not use the same name as an alias, zone configuration, or zoneset name in Fabric B. In this scenario, merging the fabrics will cause a zone type mismatch.
If an alias, zone, zoneset, or zone configuration name is the same on both Fabric A and Fabric B, but the content between the two fabrics is different, the fabrics will not merge.
Follow the following steps as you prepare to merge SAN fabrics:
1. Check for conflicting Domain IDs on both fabrics before merging. Usually lowest WWN will get the principal role.
2. Check for conflicting zone definitions before merging.
3. Verify that the Fabric islands have the same feature licenses before merging.
4. Verify that all switch parameters are compatible with the fabric before merging. 5. When possible, use the same hardware as much as possible.
6. Merge the fabrics using one ISL at a time.
- pWWN and FC ID are not unique between fabrics
- Same zone name is used, but with different members or different order
Often only one zone set can be active (SAN should be idle or shutdown to change configuration).
Configure the number of buffers that are available to attached devices for frame receipt default 16. Values range 1-16.
Resource allocation time out value. This works with the E_D_TOV to determine switch actions when presented with an error condition
Error detect time out value. This timer is used to flag potential error condition when an expected response is not received within the set time
Hub: older devices which send incoming data to all ports
Switch: common devices which have an increased throughput compared with hubs, due the point-to-point connection.
Director: chassis with switch blades
- Create aggregate and add disks to it
- Create volume
- Configure characteristics of volume (minimal read-ahead, snapshots etc)
NAS: file based (commonly NFS/CIFS, sometimes iSCSI)
SAN: block based (Fibre Channel, iSCSI)
SANs scale better, since they don't reach practical limits that easily/quickly. NAS filers have a maximum current users / data throughput, before additional filers have to be added.
NAS filers are usually easier to manage and provide an easy access to data for Unix and Windows clients via NFS/CIFS.
NAS is often used for sharing documents, file stores, content archiving, email repositories, backups
Storage with low latency demands like databases and OLTP. Also mass storage demands including data replication.
Virtual HBA is a port within for example a virtual machine guest.
VN port: Virtual Node port, connected to a virtual node (e.g. host or storage device).
Cisco devices: clear zone database (clears zone information of VSAN)
Passwords: clear passwords
Configs: clear configuration before reusing or throwing hardware away.
Zone sets: xxx
Tape:Remove from catalog (remove or 'expire' the tape media) and use the company's disposal method.
- At least 2 HBA's in each host / storage array, if possible
- Don't use too much ISL's
Increasing throughput, connecting more fabrics together.
More ISLs means a better usage of the ports (and less oversubscription needed). Also expansion of the SAN is possible.
Degraded performance, possible increased latency
While merging, the following processes happen:
- Zoneset passing
- Name server distribution
- Negotiation of (shortest) paths
- principal switch selection/negiotiation (lowest WWN wins usually)
- Lowest domain id
- Lowest worldwide name
- Use one ISL at a time
- Signal loss - Oversubscription
(see initial reasons in 2.7)
If an Extended Fabrics port is to be installed on a SilkWorm 2000 Series switch, the fabric wide configuration parameter fabric.ops.mode.longDistance must be set to 1 on all switches operating within the fabric. Additionally, each long distance port must be set using the portCfgLongDistance command. Each of the two ports within a long distance ISL must be configured identically, otherwise fabric segmentation will occur.
Example messages on Brocade: 0x1023fc60 (tThad): Apr 3 22:11:44
WARNING FW-ABOVE, 3, eportTXPerf004 (E Port TX Performance 4) is above high boundary. current value : 95462 KB/s. (faulty)
0x1023fc60 (tThad): Apr 3 22:11:52
WARNING FW-BELOW, 3, eportTXPerf004 (E Port TX Performance 4) is below low boundary. current value : 12591 KB/s. (normal)
frames enc crc too too bad enc disc link loss loss frjt fbsy tx rx in err shrt long eof out c3 fail sync sig ===================================================================== 4: 617m 2.8g 0 2 0 0 0 268k 0 0 2 9 0 0 << switch_one 4: 2.8g 617m 0 29 0 0 0 1 333 0 1 5 0 0 << switch_two
- Length of cabling
- GBIC issue
- Dirty SPF
More information: Brocade portErrShow.pdf
Make daily/weekly backups of all available configurations. Most vendors have a way to download the configuration of switches and store it. If needed, adjust available tooling.
Low overhead on servers
Tape devices and backup disks could be zoned or placed in a dedicated fabric.
Use LAN-free, serverless backups, snapshot technology, or backup from a passive node.
Physical security: do not allow physical access to unauthorized people.
- Prevent physical access
- Prevent remote access through IP security measures (i.e. putting devices into a specific VLAN)
- Hard Zone the devices
- Lock Down E_port creation (Brocade: portCfgEport)
- Disable ports (Brocade: portCfgPersistantDisable)
Data encryption: store data encrypted when needed. If needed, encrypt data before putting it on the wire.
LUN masking: “exports” a LUN only to the systems which are allowed to use it.
Host isolation refers to ensuring only one initiator (host) per SAN zone, which prevents a misbehaving HBA or host driver from interfering with any of the other hosts in the SAN.
Hard zoning: members of a zone are physical ports, also known as port zoning Soft zoning: WWN of PWWN are members of zone, happens within a fabric switch. Software zoning lets you create symbolic names for the zones and zone members.
Use protocols with encryption like SSH (instead of telnet) and HTTPS (instead of HTTP).
PCIe-to-PCIX bridges allow access for legacy devices
PCI-X uses conventional PCI technology, and is the double-wide version of PCI with up to 4 times the clock speed. It was needed for hardware like gigabit, fiberchannel and Ultra320 SCSI cards.
PCI-X v1.0 slot is 133 MHz
If a conventional PCI card is installed in a PCI-X slot then the clock speed of other PCI-X slots may be reduced.
PCI express is a totally new approach, so PCI Express cards can neither be installed in conventional PCI or PCI-X slots, nor can conventional PCI cards or PCI-X cards be installed in a PCI Express slot.
1x PCI-e cards will fit in 1x, 4x, 8x and 16x PCI-e slots.
4x PCI-e cards will fit in 4x, 8x and 16x PCI-e slots.
8x PCI-e cards will fit in 8x en 16x PCI-e slots.
16x PCI-e cards will fit in 16x PCI-e slots.
So a fast 16x PCI-e card will not work in a 8x (or lower) slot.
Raid 0: Raid 1: Raid 2: Raid 3: Raid 4: Raid 5: Raid 6: Raid 0+1: Raid 1+0:
Hardware VS software: hardware has better performance and doesn't let the CPU do all the work.
RAID 5: slow with writing, as all disks are used to write data, but also are needed to write the parity information. With an even amount of disks, this means only half of the write actions are possible (8 disks = 8 reads or 4 writes, at the same time).
Cascaded: inexpensive, easy to extend. However, low reliability and low scalability.
Ring: same as Cascaded topology, but with better reliability
Core/Edge: best flexibility and reliability. Multi-layer design. Examples: tiered hybrid
Mesh: can be full or partially crossed. Good for any-any traffic. The downside is ISLs using valuable ports.
Fan-out : ratio of storage ports to hosts (1:4)
Fan-in : ratio of hosts to storage ports (7:1)
When using SSD: ALWAYS use a single port per PCI-E HBA card. Do not attempt to use multiple ports on your HBA cards, as the SSD bandwidth will be limited by the PCI bus Avoid putting more HBAs on a server than the bus throughput can support
xxx Tape libraries can be virtualized (VTL: virtual tape library), to make applications believe they are writing to a normal tape unit. Instead these virtual tapes are disks (or parts of disks) and have a way better performance than conventional tape units.
Better utilizing hardware, less power, more central management possible, load balancing, clustering and failover possibilities by placing VM's on different hosts.
NFS: UDP or TCP, port 2049, versions 2, 3, 4, usually Linux/Solaris, stateful (TCP), but no intervention needed when failing over. NFS is stateless, as in: failure is transparant for client and server. Recovering doesn't need actions like rebooting the system to free up resources or states. CIFS: TCP, port 445, usually Windows, stateful, intervention required at failover, due state recovery. With CIFS, the client maintains the connection and open file names, directories and various other aspects of the files and directories. CIFS is a "stateful" protocol, which is also a problem when the underlying connection is lost. The client does not know when to recreate the connecting. File content is cached via a cooperative process between client and server code, and this is where problems can occur. The state survives only as long as the session between the server and the client survives, and this session survives only as long as the underlying network connection (generally TCP/IP) survives.
When using file level protocols, the NAS will have to perform the local integrity of a file system. However, when performing forensics or file system checks, and data is being served via block based access (SAN/iSCSI), the guest system has to perform the operations.
Switch performance: Brocade example:
switch1:admin> portPerfShow 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total ------------------------------------------------------------------------------------- 0 0 21m 28m 31m 0 8.4m 0 28m 21m 31m 0 8.4m 0 0 0 178m 0 0 20m 29m 31m 0 10m 0 29m 20m 31m 0 10m 0 0 0 182m 0 0 18m 36m 31m 0 14m 0 36m 18m 31m 0 14m 0 0 0 201m 0 0 17m 34m 30m 0 7.0m 0 34m 17m 31m 0 7.0m 0 0 0 179m
HBA performance: xxx
Use tools like MRTG, Cacti and RRDTOOL, to create initial baselines.
Time synchronization is important for troubleshooting, when trying to debug issues and compare log events with error messages. Also interesting for security breaches and/or events, to trace back all steps in a investigation.
Brocade switches: configure time on principal switch. Other switches will use principal switch to synchronize time.
Another use for having the correct time is the discovery process happening with RSCN. When a new disk array is attached to the fabric (ONLY the switch with the connected array), the HBA's registered within the switch's notification list, will be notified and can start discovering new devices/LUN's.
Discovery process SCSI discovery process In the modern SCSI transport protocols, there is an automated process of "discovery" of the IDs. SSA initiators "walk the loop" to determine what devices are there and then assign each one a 7-bit "hop-count" value. Serial Storage Architecture (SSA) is an IBM developed serial interface. SSA is a serial technology which basically runs the SCSI-2 software protocol.
The good news about SSA compared to SCSI is:
- it is far easier configured and cabled -- no termination needed!
- it is built with HA features. The SSA loop architecture (as opposed to a SCSI bus) has no SPOF (see diagram below). If part of a loop fails, the device driver will automatically and transparently reconfigure itself to make sure all SSA devices can be accessed without any noticable interruption.
- it uses no SCSI ID addressing which means no hassle with setting up the adapters.
- the SSA loop can transport 4 times 20 MByte/s -- two independent reads and two independant writes across each loop direction. Current actual adapter implementations allow for 35 MByte/s per adapter.
- SSA uses no bus arbitration as opposed to SCSI. Rather than that, a network-like scheme is used. Data is sent and received in 128 Byte packets, and all devices on the loop can request time slots independantly. SCSI in turn needs bus arbitration which can lead to performance deadlocks if an initiator doesn't release the bus in time.
- SSA allows for 25 meters between each two devices. Plus, there is a fiberoptical extender which allows for data transfers across 50 Micrometer optical cables over distances up to 2.4 km. This makes it even suitable for site disaster recovery if configured properly.
- Most SSA adapters support two independent loops which makes it possible to attach mirrored disks to different loops for higher availability.
The SSA loops are symmetrical, twisted-pair, potential free. No TERMPWR potential shift problem.
FC-AL initiators use the LIP (Loop Initialization Protocol) to interrogate each device port for its WWN ( World Wide Name ). For iSCSI, because of the unlimited scope of the (IP) network, the process is quite complicated. These discovery processes occur at power-on/initialization time and also if the bus topology changes later, for example if an extra device is added.
Cache Optimizing the cache usage can have a great performance gain on the storage. More data can be quickly served from the cache, instead of the much slower disks.
While having cache memory is usually a good thing, it should be disabled if only small random reads are being used.
NetApp: sysstat -x 5 EMC Navisphere (CLI): navicli -h XXX getcache
Example:# navicli -h 192.168.29.133getcache -pdp -high -low Prct Dirty Cache Pages = 51 High Watermark: 80 Low Watermark: 60 If 80% of cache is dirty, then it will flush cache down to 60%, currently it is at 51%.RAID level Using the best RAID level optimized for safety and read and/or write speed is important. By creating several different RAID levels within the storage tiers, much of the data processing can be improved.
Monitoring logs is probably the most basic form of tracking the health of any system. Also checking trends by using tools like RRD, SNMP can give valuable information about the health and grow speed of affected systems. Also monitoring tools like Nagios, Zabbix etc are useful to respond to problems in time.
Brocade switches provide the commands portperfshow and porterrshow.
Root cause analysis (RCA): document describing events happened after a big issue/problem. Often with additional information about follow up actions, problem description, timeline of events, problem resolution/solution.
Use a proper amount of buffer-to-buffer credits. Use asynchronous replication instead of synchronous, to prevent huge (application) delays, if the RPO can be higher than zero. Set speed on both sides of the link to a fixed value (instead of auto negotiation)
The buffer-credit method, a form of storage distance extension. If the length of the fiber optic cable span exceeds this limit, the throughput drops sharply. The buffer-credit method gets around this problem. Unacknowledged frames (buffer credits) determine how many packets can be sent, before an acknowledgment has to come. It's compare with window size (in TCP connections). The value can be increased when the link is stable (or shorter).
Brocade formula: Buffer Credits = ((Distance in km) * (Data Rate) * 1000) / 2112
Brocade switches can also use LD mode (Dynamic long distance mode) to automatically adjust the buffer-to-buffer credit value.
virtual SAN or “virtual fabric”, to achieve isolation without having the need to setup a physical separated fabric. If a switch does not support VSANs, create a SAN as small as possible, but with room for growth.
LSAN: sharing (zone) information across fabrics (zones are usually prefixed with "lsan_").
Order: Compression first, then encryption.
Compression is useful for information which is text based and have a high compress rate. Compression is not useful for encrypted links (like VPN tunnels), or compact formats like audio, video and images.
Nearline storage is used to tier storage using cheaper storage, but usually with a bigger storage capacity. It can also apply to information which does not need high performance storage at that moment and has to be stored on a lower performance (and cheaper) array. One of the common used purposes is archiving of information or additional backups.
Content Addressable Storage/Content Addressed Storage (CAS) and Fixed Content Storage (FCS) are different acronyms for storage of documents which don't change in time and the related location based addressing. If the same document would available on multiple places, it is only placed once. Information is accessed by using specific ID's, generated at the time of creation on the CAS system.
DWDM or IP extenders (in combination with FCIP or iFCP).
CDP (Continuous Data Protection)
Synchronous replication: source and target both need to acknowledge data transfer, before application is being notified.
Asynchronous replication: source acknowledges write and notified application, afterwards data gets replicated to target device.
A blank commit is a commit operation that does not contain configuration changes, and enforces the SDV configuration of the committing switch fabric-wide. A blank commit operation resolves merge conflicts by pushing the configuration from the committing switch throughout the fabric, thereby reinitializing the conflicting virtual devices. Exercise caution while performing this operation, as it can easily take some virtual devices offline.
Merge failures resulting from a pWWN conflict can cause a failure with the device alias as well. A blank commit operation on a merge-failed VSAN within SDV should resolve the merge failure in the device alias.
are no virtual device name conflicts across VSANs in fabrics.
Zoning conflict parameters
When merging two fabrics, zoning information from the two previously separated fabrics is merged as much as possible into the new fabric. Sometimes, zoning inconsistency can occur and zoning information cannot be merged. Segmentation due to zoning will usually be flagged by an error message that says "Fabric segmented, zone conflict" appearing in the error logs. One of the solutions is to make sure zoning information on both switches is consistent before bringing up the ISL.
Upgrading firmware on Brocade switches:
The internal process will be as follows
1. firmware -s download command is entered, and you respond to prompts.
2. Firmware is downloaded to Secondary Partition
3. Primary and Secondary boot pointers are swapped
4. CP boots from firmware in new Primary partition.
Say no to autocommit and yes to reboot after download.
After a few days of cool operation, run the firmwareCommit command and then the new firmware is copied to the seconday partition as well.
Sources used: http://www.scsita.org/aboutscsi/sas/tutorials/SAS_General_overview_public.pdf http://www.directron.com/ncqvstcq.html