産品特色
        編輯推薦
                                    鼕瓜哥對技術的追求已經到瞭“癡迷”的境界,與10年前相比,文筆解析更為到位,技術理解更為精準。其公眾號的每篇文章,都是存儲業界風嚮標。                 
內容簡介
     全書分為:靈活的數據布局、應用感知及可視化存儲智能、存儲類芯片、儲海鈎沉、集群和多控製器、傳統存儲係統、新興存儲係統、大話光存儲係統、體係結構、I/O協議棧及性能分析、存儲軟件、固態存儲等,其中每章又有多個小節。每一個小節都是一個獨立的課題。本書秉承作者一貫的寫作風格,完全從讀者角度來創作本書,語言優美深刻,包羅萬象。另外,不僅闡釋瞭存儲技術,而且同時也加入瞭計算機係統技術和網格技術的一些解讀,使讀者大開眼界,茅塞頓開,激發讀者的閱讀興趣。
  本書適閤存儲領域所有從業人員閱讀研習,同時可以作為《大話存儲*版》的讀者的延伸高新資源。
     作者簡介
     鼕瓜哥(張鼕),現任某半導體公司係統架構師,著有《大話存儲》係列圖書。存儲領域技術專傢和布道者。
     目錄
   第一章 靈活的數據布局 ·········································································1 
1.1 Raid1.0和Raid1.5 ······························································································2 
1.2 Raid5EE和Raid2.0 ·····························································································4 
1.3 Lun2.0/SmartMotion ························································································13 
第二章 應用感知及可視化存儲智能 ·····················································23 
2.1 應用感知精細化自動存儲分層······································································25 
2.2 應用感知精細化SmartMotion ········································································27 
2.3 應用感知精細化QoS ······················································································28 
2.4 産品化及可視化展現······················································································31 
2.5 包裝概念製作PPT ···························································································43 
2.6 評浪潮“活性”存儲概念··············································································49 
第三章 存儲類芯片 ··············································································53 
3.1 通道及Raid控製器架構 ··················································································54 
3.2 SAS Expander架構 ··························································································60 
第四章 儲海鈎沉 ··················································································65 
4.1 你絕對想不到的兩種高格調存儲器······························································66 
4.2 JBOD裏都有什麼····························································································70  
4.3 Raid4校驗盤之殤 ····························································································72 
4.4 為什麼說Raid卡是颱小電腦 ··········································································73 
4.5 為什麼Raid卡電池被換為超級電容 ······························································74 
4.6 固件和微碼到底什麼區彆··············································································75 
4.7 FC成環器內部真的是個環嗎 ·········································································76 
4.8 為什麼說SAS、FC對CPU耗費比TCPIP+以太網低 ····································77 
4.9 雙控存儲之間的心跳綫都跑瞭哪些流量······················································78 
第五章集群和多控製器 ······································································· 79 
5.1 淺談雙活和多路徑··························································································80 
5.2 “淺”談容災和雙活數據中心(上)··························································82 
5.3 “淺”談容災和雙活數據中心(下)··························································87 
5.4 集群文件係統架構演變深度梳理圖解··························································96 
5.5 從多控緩存管理到集群鎖············································································107 
5.6 共享式與分布式各論····················································································115 
5.7 “鼕瓜哥畫PPT”雙活是個坑 ·····································································118 
第六章傳統存儲係統 ········································································· 121 
6.1 與存儲係統相關的一些基本話題分享························································122 
6.2 高端存儲係統江湖風雲錄!········································································133 
6.3 驚瞭!原來高端存儲架構是這樣演進的!················································145 
6.4 傳統高端存儲係統把數據緩存集中外置一石三鳥····································155 
6.5 傳統外置存儲已近黃昏················································································156 
6.6 存儲圈老炮大戰小鮮肉················································································166 
6.7 傳統存儲老矣,新興存儲能當大任否?····················································167 
第七章次世代存儲係統 ····································································· 185 
7.1 一杆老槍照玩次世代存儲係統····································································187 
7.2 最有傳統存儲格調的次世代存儲係統························································192 
7.3 最適閤大規模數據中心的次世代存儲係統················································203 
7.4 最高性能的次世代存儲係統········································································206 
7.5 最具備感知應用能力的次世代存儲係統····················································214 
7.6 最具有數據管理靈活性的次時代存儲係統················································225  
第八章光存儲係統············································································ 237 
8.1 光存儲基本原理····························································································238 
8.2 神秘的激光頭及藍光技術············································································244 
8.3 剖析藍光存儲係統························································································249 
8.4 光存儲係統生態····························································································253 
8.5 站在未來看現在····························································································259 
第九章體係結構 ················································································ 263 
9.1 大話眾核心處理器體係結構········································································264 
9.2 緻敬龍芯!鼕瓜哥手工設計瞭一個CPU譯碼器! ····································271 
9.3 NUNA體係結構首次落地InCloudRack機櫃 ···············································274 
9.4 評宏杉科技的CloudSAN架構 ······································································278 
9.5 內存竟然還能這麼玩?!············································································283 
9.6 PCIe交換,什麼鬼?····················································································293 
9.7 聊聊FPGA/GPCPU/PCIe/Cache-Coherency ················································300 
9.8 【科普】超算到底是怎樣算的?································································305 
第十章 I/O 協議棧及性能分析 ···························································· 317 
10.1 最完整的存儲係統接口/協議/連接方式總結 ···········································318 
10.2 I/O協議棧前沿技術研究動態 ····································································332 
10.3 Raid組的Stripe Size到底設置為多少閤適? ·············································344 
10.4 並發I/O——係統性能的根本! ································································347 
10.5 關於I/O時延你被騙瞭多久? ····································································349 
10.6 如何測得整條I/O路徑上的並發度? ························································351 
10.7 隊列深度、時延、並發度、吞吐量的關係到底是什麼··························351 
10.8 為什麼Raid對於某些場景沒有任何提速作用? ······································365 
10.9 為什麼測試時性能齣色,上綫時卻慘不忍睹?······································366 
10.10 隊列深度過淺有什麼影響?····································································368 
10.11 隊列深度調節為多大最理想? ································································369 
10.12 機械盤的隨機I/O平均時延為什麼有一過性降低? ······························370 
10.13 數據布局到底是怎麼影響性能的?························································371 
10.14 關於同步I/O與阻塞I/O的誤解 ·································································374 
10.15 原子寫,什麼鬼?!················································································375  
10.16 何不做個USB Target? ·············································································385 
10.17 鼕瓜哥的一項新存儲技術專利已正式通過············································385 
10.18 小梳理一下iSCSI底層 ··············································································394 
10.19 FC的4次Login過程簡析 ···········································································396 
第十一章存儲軟件············································································ 397 
11.1 Thin就是個坑誰用誰找抽!······································································398 
11.2 存儲係統OS變遷 ·························································································400 
第十二章固態存儲············································································ 409 
12.1 淺析固態介質在存儲係統中的應用方式··················································410 
12.2 關於SSD元數據及掉電保護的誤解··························································420 
12.3 關於閃存FTL的Host Base和Device Based的誤解 ····································421 
12.4 關於SSD HMB與CMB ···············································································423 
12.5 同有科技展翅歸來······················································································424 
12.6 和老唐說相聲之SSD性能測試之“玉”··················································435 
12.7 固態盤到底該怎麼做Raid? ······································································441 
12.8 當Raid2.0遇上全固態存儲 ·········································································448 
12.9 上/下頁、快/慢頁、MSB/LSB都些什麼鬼? ··········································451 
12.10 關於對MSB/LSB寫0時的步驟 ·································································457        
精彩書摘
   1.1 Raid1.0和Raid1.5  
在機械盤時代,影響最終I/O性能的根本因素無非就是兩個,一個是頂端源頭,  
也就是應用的I/O調用方式和I/O屬性;另一個是底端源頭,那就是數據最終是以什麼  
形式、狀態存放在多少機械盤上的。應用如何I/O調用完全不是存儲係統可以控製的  
事情,所以從這個源頭來解決性能問題對於存儲係統來講是無法做什麼工作的。但是  
數據如何組織、排布,絕對是存儲係統重中之重的工作。  
這一點從Raid誕生開始就一直在不斷的演化當中。舉個最簡單的例子,從Raid3  
到Raid4再到Raid5,Raid3當時設計的時候緻力於單綫程大塊連續地址I/O吞吐量最大  
化,為瞭實現這個目的,Raid3的條帶非常窄,窄到每次上層下發的I/O目標地址基本  
上都落在瞭所有盤上,這樣幾乎每個I/O都會讓多個盤並行讀寫來服務於這個I/O,而  
其他I/O就必須等待,所以我們說Raid3陣列場景下,上層的I/O之間是不能並發的,但  
是單個I/O是可以采用多盤為其並發的。所以,如果係統內隻有一個綫程(或者說用  
戶、程序、業務),而且這個綫程是大塊連續地址I/O追求吞吐量的業務,那麼Raid3  
非常閤適。但是大部分業務其實不是這樣,而是追求上層的I/O能夠充分地並行執  
行,比如多綫程、多用戶發齣的I/O能夠並發地被響應,此時就需要增大條帶到一個  
閤適的值,讓一個I/O目標地址範圍不至於牽動Raid組中所有盤為其服務,這樣就有一  
定幾率讓一組盤同時響應多個I/O,而且盤數越多,並發幾率就越大。Raid4相當於條  
帶可調的Raid3,但是Raid4獨立校驗盤的存在不但讓其成為高故障率的熱點盤,而且  
也製約瞭本可以並發的I/O,因為伴隨著每個I/O的執行,校驗盤上對應條帶的校驗塊  
都需要被更新,而由於所有校驗塊隻存放在這塊盤上,所以上層的I/O隻能一個一個  
第一章 靈活的數據布局3  
地順著執行,不能並發。Raid5則通過把校驗塊打散在Raid組中所有磁盤上,從而實現  
瞭並發I/O。大部分存儲廠商提供針對條帶寬度的設置,比如從32KB到128KB。假設  
一個I/O請求讀16KB,在一個8塊盤做的Raid5組裏,如果條帶為32KB,則每塊盤上的  
段(Segment)為4KB,這個I/O起碼要占用4塊盤,假設並發幾率為100%,那麼這個  
Raid組能並發兩個16KB的I/O,並發8個4KB的I/O;如果將條帶寬度調節為128KB,則  
在100%並發幾率的條件下可並發8個小於等於16KB的I/O。  
講到這裏,我們可以看到單單是調節條帶寬度,以及優化校驗塊的布局,就可以  
得到迥異的性能錶現。但是再怎麼摺騰,I/O性能始終受限在Raid組那少得可憐的幾  
塊或者十幾塊盤上。為什麼是幾塊或者十幾塊?難道不能把100塊盤做成一個大Raid5  
組,然後,通過把所有邏輯捲創建在它上麵來增加每個邏輯捲的性能麼?你不會選擇  
這麼做的,當一旦有一塊盤壞掉,係統需要重構的時候,你會後悔當時的決定,因為  
你會發現此時整個係統性能大幅降低,哪個邏輯捲也彆想好過,因為此時99塊盤都  
在全速讀齣數據,係統計算xor校驗塊,然後把校驗塊寫入熱備盤中。當然,你可以  
控製降速重構,來緩解在綫業務的I/O性能,但是付齣的代價就是增加瞭重構時間,  
重構周期內如果有盤再壞,那麼全部數據蕩然無存。所以,必須縮小故障影響域,  
所以一個Raid組最好是幾塊或者十幾塊盤。這比較尷尬,所以人們想齣瞭解決辦法,  
那就是把多個小Raid5/6組拼接成大Raid0,也就是Raid50/60,然後將邏輯捲分布在其  
上。當然,目前的存儲廠商黔驢技窮,再也弄齣什麼新花樣,所以它們習慣把這個大  
Raid50/60組成“Pool”,也就是池,從而迷惑一部分人,認為存儲又在革新瞭,存儲依  
然生命力旺盛。  
那鼕瓜哥在這裏也不妨順水推舟忽悠一下,如果把傳統的Raid組叫作Raid1.0,把  
Raid50/60叫作Raid1.5。我們其實在這裏可以體會齣一種周期式上升的規律,早期盤數  
較少,主要靠條帶寬度來調節不同場景的性能;後來人們想通瞭,為何不用Raid50呢?  
把數據直接分布到幾百塊盤中,豈不快哉?上層的並發綫程I/O在底層可以實現大規模  
並發,達到超高吞吐量。此時,人們被成功衝昏瞭頭腦,沒人再去考慮另一個可怕的  
問題。  
至這些文字傾諸筆端時仍沒有人考慮這個問題,至少從廠商的産品動嚮裏沒有看  
齣。究其原因,可能是另一輪底層的演變,那就是固態介質。底層的車輪是不斷地提  
速的,上層的形態是循環往復的,但有時候上層可能直接跨越式前進,跨越瞭其中應  
該有的一個形態,這個形態或者轉瞬即逝,亦或者根本沒齣現過,但是總會有人産生  
火花,即便這火花是那麼微弱。  
這個可怕的問題其實被一個更可怕的問題蓋過瞭,這個更可怕的問題就是重構時  
間過長。一塊4TB的SATA盤,在重構的時候就算全速寫入,其轉速決定瞭其吞吐量極  
4 大話存儲後傳——次世代數據存儲思維與技術  
限也基本在80MB/s左右,可以算一下,需要58h,實際中為瞭保證在綫業務的性能,  
一般會限製在中速重構,也就是40MB/s左右,此時需要116h,也就是5天5夜,我敢打  
賭沒有哪個係統管理員能在這一周內睡好覺。  
1.2 Raid5EE和Raid2.0  
20年前有人發明過一種叫作Raid5EE的技術,其目的有兩個,第一是把平時閑著  
沒事乾的熱備盤用起來,第二就是加速重構。  
很顯然,如果把下圖中用“H(hot spare)”錶示的熱備盤的空間也像校驗盤一  
樣,打散到所有盤上的話,就會變成圖右側所示的布局,每個P塊都跟著一個H塊。這  
樣整個Raid組能比原來多一塊磁盤可用於工作。另外,由於H空間也被打散瞭,當有  
一塊盤損壞時,重構的速度理應被加快,因為此時可以多盤並發寫入瞭。但是實際卻  
不然,整個係統的重構速度其實並不是被這塊單獨的熱備盤限製瞭,而是被所有盤一  
起限製瞭,因為熱備盤以滿速率寫入重構後的數據的前提是,其他所有盤都以滿速率  
讀齣數據,然後係統對其做xor。就算把熱備盤打散,甚至把熱備盤換成SSD、內存,  
對結果也毫無影響。  
那到底怎樣纔能加速重構呢?唯一的辦法隻有像下圖所示這樣,把原本擠在5塊  
盤裏的條帶,橫嚮打散,請注意,是以條帶為粒度打散,打散單盤是毫無用處的。這  
樣,纔能成倍地提升重構速度。      
前言/序言
     前言
  眨眼間,距離《大話存儲》一書齣版已經8年瞭。在這8年間,鼕瓜哥也一直在不斷地學習積纍並輸齣,並在2015年5月份創立瞭微信公眾號“大話存儲”,繼續總結和輸齣各類存儲係統知識,皆為原創。本書即對這一年多來鼕瓜哥的輸齣文章進行瞭整理再加工,並特意增加瞭30%的從未發布的額外內容。
  如果說《大話存儲》係列圖書是一部係統性講述存儲係統底層的小說的話,那麼本書相當於一部散文集,全篇形散神聚,自由穿梭於存儲和計算機係統的底層和頂層世界中。其中的每一篇都錶述瞭某個領域、課題或者技術,並圍繞該技術展開敘述。鼕瓜哥把全書劃分為12個技術領域部分,每一個部分又包含多篇相關的文章。
  其中有些文章中帶有鄙人手繪的圖片,為瞭保持原汁原味,決定保留原樣,如果侮辱瞭你的審美觀,請見諒。
  閱讀本書要求對存儲係統有一定瞭解,最好是相當瞭解,否則會感到比較吃力。不過,吃力是好事,證明有提升空間,那就趕緊去買本《大話存儲終極版》看看正傳吧,然後再來看後傳。當年鼕瓜哥看一些文檔的時候,也是很吃力,但是總感覺很有意思,也就堅持瞭下來。
  可能有人會想,後續會不會有《大話存儲外傳》呢?嗯,或許吧,順其自然!
  鼕瓜哥
    
				 
				
				
					《數據洪流中的靜水流深:下一代存儲架構的解構與展望》  在信息爆炸的時代,數據已然成為驅動社會進步與商業發展的核心引擎。從海量用戶行為軌跡的記錄,到支撐科學研究的龐大數據集,再到保障全球經濟運轉的交易記錄,我們正身處前所未有的數據洪流之中。然而,伴隨而來的是對數據存儲、管理、訪問和保護提齣的嚴峻挑戰。傳統的存儲解決方案在麵對日益增長的數據量、不斷變化的應用需求以及嚴苛的性能指標時,正逐漸顯露齣其局限性。  本書《數據洪流中的靜水流深:下一代存儲架構的解構與展望》並非對已有成熟技術進行簡單的羅列與復述,而是深入剖析當前數據存儲領域麵臨的深層睏境,並以宏觀的視角,前瞻性地探索和勾勒齣下一代數據存儲的演進方嚮、核心技術驅動力以及未來發展圖景。我們旨在為讀者提供一種全新的思考維度,幫助理解數據存儲技術如何從根本上適應並引領未來的信息時代。  第一章:傳統存儲的睏境與數據洪流的真相  本章我們將首先審視當前主流的存儲架構,包括但不限於SAN(存儲區域網絡)、NAS(網絡附加存儲)、DAS(直連存儲)以及早期雲存儲的演進。我們將詳細解析它們在應對海量數據、多雲混閤環境、實時數據處理、數據安全閤規以及成本效益方麵的挑戰。例如,高昂的硬件投入、復雜的管理運維、固有的性能瓶頸、以及在應對突發性數據增長時的彈性不足。  同時,我們會深入剖析“數據洪流”的本質,不僅僅是數據量的膨脹,更包含瞭數據類型的多樣化(結構化、半結構化、非結構化)、數據訪問模式的復雜化(批處理、實時流、交互式查詢)、以及數據生命周期管理的挑戰。我們會探討哪些場景和應用正以前所未有的速度消耗著存儲資源,並對這些趨勢進行量化和趨勢預測。我們將揭示,當前的存儲睏境並非偶然,而是信息技術發展到特定階段的必然産物,亟需根本性的變革。  第二章:下一代存儲的核心驅動力:從“量”到“質”的躍遷  本章我們將探討驅動下一代存儲技術發展的幾大核心力量。我們將首先聚焦於計算與存儲的融閤趨勢。過去,計算與存儲是分離的,但隨著AI、大數據分析等應用的崛起,數據需要在靠近計算的地方進行預處理和加速,這催生瞭諸如近數據處理(Near-Data Processing)、數據處理單元(DPU)等新興架構。我們將深入解析這些架構如何通過將計算能力下沉到存儲節點,從而大幅降低數據搬移開銷,提升處理效率。  其次,我們將深入探討軟件定義存儲(SDS)的精髓與演進。SDS不僅僅是抽象化存儲硬件,更是通過軟件層麵的智能化管理、自動化部署、彈性伸縮以及精細化服務質量保障,實現存儲資源的“應需而生”和“按需分配”。我們將剖析SDS如何通過解耦硬件與軟件,實現更靈活的架構,以及其在多租戶、自動化運維、成本優化等方麵的巨大優勢。  再者,我們將聚焦於新興存儲介質與技術。除瞭持續演進的SSD技術,如NVMe-oF(NVMe over Fabrics)帶來的超低延遲訪問,我們還將探討MRAM、相變存儲(PCM)等新興存儲技術在性能、能耗、持久性方麵的潛力,以及它們如何可能顛覆現有的存儲層級。同時,我們也會分析其當前的發展階段、商業化挑戰和潛在的應用場景。  第三章:下一代存儲的基石:智能化的數據管理與調度  本章將深入探討下一代存儲在智能化數據管理和調度方麵的關鍵技術。我們將詳細闡述AI與機器學習在存儲中的應用。這並非僅僅是簡單的模式識彆,而是包括:     智能化的數據放置與分層:根據數據訪問頻率、價值、安全等級以及應用需求,AI可以自動將數據放置在最優的存儲介質上,並實現動態分層,最大化利用不同存儲介質的性能與成本優勢。    預測性分析與性能優化:通過學習曆史數據訪問模式,AI可以預測未來的存儲需求和潛在的性能瓶頸,提前進行資源調配和優化,避免服務中斷。    自動化數據遷移與容災:AI可以更智能地管理數據在不同存儲區域(本地、雲端、歸檔)之間的遷移,並根據故障預測和風險評估,自動化執行容災策略,確保數據的安全與可用性。    智能化的數據壓縮與去重:利用更高級的算法和模型,AI可以實現更高效率的數據壓縮和去重,進一步降低存儲成本。  同時,我們將剖析數據湖(Data Lake)、數據湖倉一體(Data Lakehouse)等新型數據平颱如何重塑數據存儲與訪問的範式。它們如何打破傳統數據倉庫的結構化限製,實現對海量異構數據的統一存儲、管理和分析。我們將重點探討其在簡化數據管道、提升數據復用性、以及賦能更多數據驅動型應用方麵的變革性影響。  第四章:麵嚮未來的存儲架構:混閤雲、邊緣計算與數據安全  本章我們將展望下一代存儲架構在應對混閤雲與多雲環境中的策略。我們將深入分析如何設計統一的存儲管理平麵,實現跨越公有雲、私有雲和本地數據中心的數據訪問與遷移。我們將探討分布式文件係統、對象存儲在支撐多雲場景下的優勢,以及如何通過API標準化和數據互操作性,構建真正靈活、彈性的混閤雲存儲解決方案。  邊緣計算是信息時代的另一大前沿,本章也將重點探討其對存儲提齣的新需求。從實時數據采集、本地數據處理到與中心雲的協同,邊緣存儲將麵臨低延遲、高可靠性、低功耗以及有限帶寬的挑戰。我們將探討適用於邊緣場景的輕量級存儲解決方案、分布式自治存儲節點以及安全可靠的數據同步機製。  數據安全與隱私保護是任何存儲解決方案的重中之重。在下一代存儲中,我們將探討零信任安全模型在存儲層麵的落地,以及如何通過端到端的加密、細粒度的訪問控製、區塊鏈技術在數據溯源與防篡改方麵的應用,構建更加堅固的數據安全防綫。我們也將探討數據閤規性要求(如GDPR、CCPA等)如何驅動存儲架構的變革,以及如何通過技術手段實現數據的“可控性”與“可用性”的平衡。  第五章:下一代存儲的實踐與挑戰:生態、標準化與人纔  本章我們將迴歸到實踐層麵,探討下一代存儲技術從概念走嚮落地的挑戰與機遇。我們將分析構建開放、協作的生態係統的重要性,以及開源技術在推動下一代存儲發展中的角色。我們將探討標準化在促進不同廠商産品互操作性、降低集成成本方麵的重要性,並分析當前在存儲協議、API接口等方麵的標準化進展。  同時,我們也將認識到人纔的培養與儲備是迎接下一代存儲時代的關鍵。我們將分析當前行業麵臨的人纔缺口,以及如何通過教育、培訓和實踐,培養掌握新興存儲技術、懂得智能化數據管理、能夠應對復雜混閤雲環境的專業人纔。  結語:構建可持續的數據未來  本書《數據洪流中的靜水流深:下一代存儲架構的解構與展望》的最終目標,是為讀者描繪一幅關於未來數據存儲的清晰藍圖。我們相信,下一代存儲並非僅僅是硬件設備的迭代,而是一場深刻的技術範式轉變,它將以智能化、軟件化、融閤化和安全化為核心,構建一個能夠應對指數級數據增長、支撐多樣化應用場景、並保障數據安全與隱私的可持續數據未來。我們希望本書能激發讀者對數據存儲的深刻思考,並為他們在未來的技術探索與實踐中提供有價值的指引。