<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>algorithm 彙整 - Tsai Origami</title>
	<atom:link href="https://origami.abstreamace.com/tag/algorithm/feed/" rel="self" type="application/rss+xml" />
	<link>https://origami.abstreamace.com/tag/algorithm/</link>
	<description>Origami works, research and programming by Mu‑Tsun Tsai</description>
	<lastBuildDate>Tue, 23 Nov 2021 23:13:35 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://origami.abstreamace.com/wp-content/uploads/2021/05/cropped-20210526_190326-1-32x32.jpg</url>
	<title>algorithm 彙整 - Tsai Origami</title>
	<link>https://origami.abstreamace.com/tag/algorithm/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Fold a full Ryujin 3.5 with Orihime algorithm &#8211; Part 2</title>
		<link>https://origami.abstreamace.com/2021/11/01/fold-a-full-ryujin-3-5-with-orihime-algorithm-part-2/</link>
					<comments>https://origami.abstreamace.com/2021/11/01/fold-a-full-ryujin-3-5-with-orihime-algorithm-part-2/#respond</comments>
		
		<dc:creator><![CDATA[Mu-Tsun Tsai]]></dc:creator>
		<pubDate>Mon, 01 Nov 2021 13:16:36 +0000</pubDate>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[Orihime]]></category>
		<guid isPermaLink="false">https://origami.abstreamace.com/?p=888</guid>

					<description><![CDATA[<p>Using 12GB memory to fold the full Ryujin in 44 minutes certainly was an achievement, but I had this strong feeling that I could still do better. On one hand, both time and space consumption can still be improved. On the other hand, the CP file I used for folding has this weird mountain fold [&#8230;]</p>
<p>這篇文章 <a href="https://origami.abstreamace.com/2021/11/01/fold-a-full-ryujin-3-5-with-orihime-algorithm-part-2/">Fold a full Ryujin 3.5 with Orihime algorithm &#8211; Part 2</a> 最早出現於 <a href="https://origami.abstreamace.com">Tsai Origami</a>。</p>
]]></description>
										<content:encoded><![CDATA[<p><img decoding="async" src="https://origami.abstreamace.com/wp-content/uploads/2021/10/image-1635765569290.png" alt="file" class="lightbox" /></p>
<p>Using 12GB memory to fold the full Ryujin in 44 minutes certainly was an achievement, but I had this strong feeling that I could still do better. On one hand, both time and space consumption can still be improved. On the other hand, the CP file I used for folding has this weird mountain fold across the upper half of the body for flat foldability, and the legs are positioned in a way that the scales on them cannot be seen. I can certainly modify the file to fix those, but once I changed either of them, the new CP file completely defeated my optimizations and the runtime is again forever. As a result, I was on the second stage of my Ryujin quest for another 2 weeks, and finally, I made the program capable of folding the version depicted above in 16 minutes, using only 6GB of memory. I shall again document the journey here.</p>
<h2>Memory optimization</h2>
<p>One thing that troubled me was that the memory consumption was a lot more than what I estimated. So I started to look for what software engineers call &quot;memory leaks&quot;, which means the program fails to release the memory allocated for data that are no longer in use. I discovered that, in the stages where the Orihime algorithm prepares the data for the solution searching stage, there was this matrix recording the adjacency relationship of all faces, which uses a huge amount of memory spaces (in the case for Ryujin it's about 5GB) but it is not needed anymore after the preparation stages. So just by releasing those memory spaces, I immediately got the memory requirement for Ryujin down to 7GB. <sup id="fnref1:1"><a href="#fn:1" class="footnote-ref">1</a></sup></p>
<p>However, 7GB still didn't sound good enough, and I was more hoping for 6; not for any particular reason, it's just that an even number sounds better (plus the memory consumption is pretty close to 6). That motivated me to come up with another idea called PseudoListMatrix (probably not a new invention). In one part of the algorithm, there's a pretty large matrix <code>M</code> where each position <code>M[i][j]</code> is a list of numbers, which consumes a lot of memory if we store it as a 3D array or as a 2D array of lists. The idea of PseudoListMatrix is to store, for each <code>i</code>, the list of numbers that appear in <code>M[i][x]</code> for some <code>x</code>, and similarly for each <code>j</code>, the list of numbers that appear in <code>M[x][j]</code> for some <code>x</code>. Then when the list of <code>M[i][j]</code> is required, we give the intersection list of the <code>M[i][x]</code> and <code>M[x][j]</code>. In general, this will result in a larger list than the actual <code>M[i][j]</code> list, but in that particular use case, the result is exactly <code>M[i][j]</code> due to the nature of the lists. With this data structure, I finally get the memory consumption down to 6GB, and almost without sacrificing the speed at all.</p>
<h2>The power of quad tree</h2>
<p>Up to this point, all my speed optimization efforts were focused on the searching stage only. By the time I successfully folded Ryujin for the first time, the searching stage had become so fast that most of the 44 minutes are spent on the preparation stages. As I look into those preparation steps, I realized that many steps could all be speeded up dramatically by a classical data structure known as the quad tree.</p>
<p>Basically, computer programs can't really &quot;see&quot; the objects on the 2D plane. All it knows is a list of objects represented by numbers. If we need to ask the question, &quot;is there any two of them colliding&quot;, the naive way of answering this question is to examine all possible pairs of objects and calculate if there's a collision for each pair. So for <code class="katex-inline">n</code> objects, this will take <code class="katex-inline">n(n-1)/2</code> comparisons, which will be a huge number if <code class="katex-inline">n</code> becomes larger.</p>
<p>The idea of the quad tree is to divide the 2D plane into four quadrants and store the objects into the proper quadrant. If a quadrant contains too many objects, we divide it again into four quadrants within itself and repeat the process. Then when we ask the same question again, for each object, we only need to compare it against those objects that are contained in the same quadrant. This greatly reduces the number of comparisons to about <code class="katex-inline">n\log n</code> (which is much smaller especially for larger <code class="katex-inline">n</code>) and allows answering the question much faster.</p>
<p>Quad trees are so versatile that there're many places in the preparations stage where they can be applied, and each application saved about 5 minutes in total runtime. Combined with other optimization techniques (especially the indexing mentioned in Part 1), I eventually got the total runtime of the same Ryujin CP from 44 minutes down to just about 5 minutes.</p>
<h2>Folding the unfoldable</h2>
<p><img decoding="async" src="https://origami.abstreamace.com/wp-content/uploads/2021/10/image-1635477634511.png" alt="" class="lightbox" /></p>
<p>However, even as the old Ryujin CP now folded so fast, a few CPs (such as the version above) still wouldn't fold; the runtime is still like forever. Two major obstacles happen in the searching process for those CPs: Too many potential permutations, and swapping looping.</p>
<p>First, in Part 1, I compared the searching process of the Orihime algorithm to solving Sudoku. For Sudoku, each square has only 9 possible numbers to fill, but here for each subface, the number of possible permutations is measured in factorials, which are enormous numbers beyond imagination. Although I already applied the idea of fixing the order of the longest chain in the given constraints, in some cases that still leave billions of billions of possible permutations that need to be checked, and that's only for one subface, while in Ryujin there are thousands of them!</p>
<p>Secondly, I mentioned the swapping algorithm in Part 1 as well. Although it's one of the most powerful speed boosts so far, it does have a small chance of falling into &quot;looping&quot;, in which two or more subfaces repeatedly get swapped to the front of the others, and even reset the progress of them. In that case, the searching will never end in theory, which is even worse than the first problem I just mentioned, since that will only cause the search to run for a very very long time, but will still finish eventually in theory.</p>
<p>For two weeks I've been struggling with these two obstacles, trying various ideas without success. Finally, I realized that the solution to both of these problems lies in introducing new data into the search process. The permutation generator I designed has the ability to filter out invalid permutations based on the given constraints, so if there are still too many possible permutations left, it can only mean that I haven't offered it enough constraints. Similarly, if the swapping falls into a loop, one way to immediately break the loop is to swap a subface that we haven't seen before in front of the loop, as it creates a new sequence of subfaces that we haven't considered yet.</p>
<p>However, merely introducing new subfaces into the sequence doesn't necessarily create enough new constraints to narrow down the permutation generator. In order to quickly create more constraints, I used the Italiano algorithm mentioned in Part 1 to infer more stacking relations after each permutation decision of subfaces. This turned out to be a giant boost to the searching performance, comparable to the swapping algorithm. With these ideas implemented, I succeeded in folding the version above, as well as all the sample CPs that were previously unfoldable.</p>
<h2>Ryujin final version</h2>
<p>To me, the second version above is satisfying enough. But at the same moment, Avinash Karnik and Bodo Haag made another version of the Ryujin CP which folds the entire body in half, folds down the legs, and adds the head twist structure that is used in the real-world folding process. The resulting CP is even more complicated in layer stacking than all previous versions, and this CP again defeated my optimized algorithm. Fortunately, it wasn't anything major, but just a bug that was previously unnoticed as it only occurs to highly complex CPs. After fixing the bug, their CP folds just fine in 35 minutes. A lot longer than other versions of Ryujin CP, but at least it is foldable.</p>
<p>Of course, I won't easily stop there. Even this is no doubt a more complicated CP, I was hoping to get its runtime to less than 20 minutes. By observing the searching process up-close, I noticed that this CP in particular has several subface that is in fact critical but hidden at the very end of the initial subface ordering. This will cause the algorithm to hit dead-ends at a very deep depth and have to return several times, wasting dozens of minutes. To cope with this, I implement a new strategy saying that &quot;if it hits a dead-end at a deep depth, also bring a few nearby subfaces with it during the swapping&quot;. This is quite effective in solving that problem, and eventually, brings the runtime to just 16 minutes. The result is the first picture above.</p>
<p>These optimizations are now part of the v0.0.10 release of the <a href="https://oriedita.github.io">Oriedita</a>.</p>
<h4>Update:</h4>
<p>The latest record for the first figure is 65 seconds using 3GB memory.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1">
<p>Later, with more understanding of the algorithm, I found out that there's no need to create the face adjacency matrix at all by using the quad tree.&#160;<a href="#fnref1:1" rev="footnote" class="footnote-backref">&#8617;</a></p>
</li>
</ol>
</div>
<p>這篇文章 <a href="https://origami.abstreamace.com/2021/11/01/fold-a-full-ryujin-3-5-with-orihime-algorithm-part-2/">Fold a full Ryujin 3.5 with Orihime algorithm &#8211; Part 2</a> 最早出現於 <a href="https://origami.abstreamace.com">Tsai Origami</a>。</p>
]]></content:encoded>
					
					<wfw:commentRss>https://origami.abstreamace.com/2021/11/01/fold-a-full-ryujin-3-5-with-orihime-algorithm-part-2/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Fold a full Ryujin 3.5 with Orihime algorithm</title>
		<link>https://origami.abstreamace.com/2021/10/13/fold-a-full-ryujin-3-5-with-orihime-algorithm/</link>
					<comments>https://origami.abstreamace.com/2021/10/13/fold-a-full-ryujin-3-5-with-orihime-algorithm/#comments</comments>
		
		<dc:creator><![CDATA[Mu-Tsun Tsai]]></dc:creator>
		<pubDate>Wed, 13 Oct 2021 07:01:47 +0000</pubDate>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[Orihime]]></category>
		<guid isPermaLink="false">https://origami.abstreamace.com/?p=862</guid>

					<description><![CDATA[<p>Recently I've been spending quite some effort improving the algorithm of Orihime, a program developed by Toshiyuki Meguro. I was hoping to make it capable of folding the complete CP of Ryujin 3.5 (with shaped scales 1) by Satoshi Kamiya, arguably one of the most complicated models ever created. It has been a quest of [&#8230;]</p>
<p>這篇文章 <a href="https://origami.abstreamace.com/2021/10/13/fold-a-full-ryujin-3-5-with-orihime-algorithm/">Fold a full Ryujin 3.5 with Orihime algorithm</a> 最早出現於 <a href="https://origami.abstreamace.com">Tsai Origami</a>。</p>
]]></description>
										<content:encoded><![CDATA[<p><img decoding="async" src="https://origami.abstreamace.com/wp-content/uploads/2021/10/Ryujin.png" alt="" class="lightbox" /></p>
<p>Recently I've been spending quite some effort improving the algorithm of Orihime, a program developed by Toshiyuki Meguro. I was hoping to make it capable of folding the complete CP of Ryujin 3.5 (with shaped scales <sup id="fnref1:1"><a href="#fn:1" class="footnote-ref">1</a></sup>) by Satoshi Kamiya, arguably one of the most complicated models ever created. It has been a quest of persuing both speed and space with endless challenges. After two weeks of work, I finally manage to fold the CP in its entirety on my laptop PC in 44 minutes using 12G of memory. In this article, I will share the story behind this quest.</p>
<p>For those of you who would like to try this optimized version of Orihime, check out the latest release of <a href="https://oriedita.github.io/">Oriedita</a>. (Folding Ryujin will require v0.0.7 or above, and don't forget to launch it with large memory settings.)</p>
<h2>How it began</h2>
<p>The quest began when I was about to finalize the CP of Angelfish, my latest original design. The model being only 32-grid BP, I was shocked by the fact that Orihime took 75 minutes to complete the folding of it. I deeply wondered if there's anything about Orihime's algorithm that can be optimized to speed things up. The problem is, the source code of the original Orihime used mostly Japanese names for variables and functions, etc., making the source code pretty difficult to read and understand.</p>
<p>Luckily, only about two months ago, Mr. Gerben Oolbekkink (a.k.a &quot;qurben&quot;) has made a fork of Orihime called &quot;Oriedita&quot; and converted most of the source code to use English nomenclature instead. By looking at the source code he translated, it took me only a few minutes to spot the first place that can be made a lot more efficiently:</p>
<pre><code class="language-java">public int FaceId2PermutationDigit(int im) {
    for (int i = 1; i &lt;= faceIdCount; i++) {
        if (faceIdList[getPermutation(i)] == im) {
            return i;
        }
    }
    return 0;
}</code></pre>
<p>This was a function that is called repeatedly in the process of searching for a valid layer stacking order. Each time it is called, it performs what computer scientists call a linear search to locate the digit location of a given Id. This is way too slow. A much more efficient way is to store the correspondence of Id and location in an array, and then just look up that array directly.</p>
<pre><code class="language-java">int[] FaceId2PermutationDigitMap; // Initialized to be large enough

public void reset_map() {
    for (int i = 1; i &lt;= faceIdCount; i++) {
        FaceId2PermutationDigitMap[faceIdList[getPermutation(i)]] = i;
    }
}

public int FaceId2PermutationDigit(int im) {
    return FaceId2PermutationDigitMap[im];
}</code></pre>
<p>By making this change, the runtime of my Angelfish immediately dropped from 75 minutes to just 500 seconds. That's 9 times faster by just changing one function.</p>
<h2>More optimizations</h2>
<p>After I announced my result, several people soon sent me more large CP samples that previously took a very long time (or practically impossible) for Orihime to fold. Indeed, even with the optimizations I made, some of them still won't finish overnight. In a few days that follow, I found a few places in the code that can be optimized, but they all only led to slightly faster runtime, and none of them would give a significant improvement.</p>
<p>A few days later, I discovered the most significant boost in the entire quest: the swapping algorithm. Basically, what Orihime does is kind of like solving Sudoku, where the &quot;subfaces&quot; in the model are like the squares in Sudoku, and the permutations within the subfaces are like the available numbers to fill in the squares. Orihime performs an exhaustive search on the subfaces in a certain order and tries to find a permutation for each of them that is consistent. Once the algorithm finds a subface without a consistent permutation with the previous subfaces, it backtracks to the previous subface and chooses a different permutation, and continues.</p>
<p>But just like Sudoku, sometimes it is a lot faster to solve the puzzle by first fill in some of the &quot;critical&quot; squares, as they will quickly help to narrow down the possible numbers for the other squares. The problem is, it is not very obvious which subface is more critical from the beginning. We can, however, conclude that a subface is more critical than those few before it by observing that the search repeatedly hits a dead-end at that very subface. In that case, we dynamically swap the search order and work on that subface first.</p>
<p>This technique is so powerful that it suddenly brought the runtime of my Angelfish to just 3 or 4 seconds, and it also makes all other samples people sent to me folded in a feasible amount of time, with only one exception, Ryujin 3.5 by Satoshi Kamiya.</p>
<h2>Taming the divine dragon</h2>
<p>It was hard for me to say &quot;OK we're done here; forget about that one model that won't fold,&quot; since I was quite far from running out of optimization ideas yet. I used to say &quot;I'll never fold Ryujin in my lifetime&quot; (the reason is, if I'm spending that much effort folding one model, it better be my original design), but at that moment I ironically found myself seriously considering folding one, only in the computer instead.</p>
<p>The biggest challenge with fully shaped Ryujin is not about speed, but about space. It has way too many creases and faces, and it takes a crazy amount of memory to fold by the original data structure (estimated to require at least 30GB of memory). By using bitmap arrays instead of 32-bit integer arrays, I managed to control the total memory usage down to less than 12GB (maybe even less), which is feasible on my laptop.</p>
<p>Of course, that doesn't mean that speed is not an issue at all. Even with the swapping algorithm, it still took about a whole day to fold only half of the Ryujin CP. To this end, I improve the permutation generator by fixing the order of the longest known chain in the constraints provided, and this led to a factorial speed up for iterating over possible permutations, making the half Ryujin fold in less than half an hour.</p>
<p>I also replaced the Warshall algorithm used by Meguro for finding transitive closure with a much faster Italiano algorithm <sup id="fnref1:2"><a href="#fn:2" class="footnote-ref">2</a></sup>, but again that algorithm requires quite some memory to run. The last piece of the puzzle I put in to make it work, is by optimizing the memory used by the Italiano algorithm. The idea is pretty simple but took me a while to figure out: all I had to do is to flush out the changes more frequently, and not to let them piled up in the memory.</p>
<p>So finally, I succeed in taming the divine dragon. For more details about the individual optimizations I made, check out each pull requests I made on qurben's <a href="https://github.com/oriedita/oriedita">repository</a>.</p>
<p><strong>Update:</strong></p>
<p>Check out also <a href="https://origami.abstreamace.com/2021/11/01/fold-a-full-ryujin-3-5-with-orihime-algorithm-part-2/">Part 2</a> of my Ryujin journey!</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1">
<p>The CP file used is a fan-made file that is based on the CP published by Kamiya in his book, with additional creases to make it flat-foldable. Since the CP is a copyright material, I will not be sharing the file here.&#160;<a href="#fnref1:1" rev="footnote" class="footnote-backref">&#8617;</a></p>
</li>
<li id="fn:2">
<p>It is an online (i.e. dynamic) transitive closure algorithm described by G. F. Italiano. See <a href="http://dx.doi.org/10.1016/0304-3975%2886%2990098-8">http://dx.doi.org/10.1016/0304-3975%2886%2990098-8</a>.&#160;<a href="#fnref1:2" rev="footnote" class="footnote-backref">&#8617;</a></p>
</li>
</ol>
</div>
<p>這篇文章 <a href="https://origami.abstreamace.com/2021/10/13/fold-a-full-ryujin-3-5-with-orihime-algorithm/">Fold a full Ryujin 3.5 with Orihime algorithm</a> 最早出現於 <a href="https://origami.abstreamace.com">Tsai Origami</a>。</p>
]]></content:encoded>
					
					<wfw:commentRss>https://origami.abstreamace.com/2021/10/13/fold-a-full-ryujin-3-5-with-orihime-algorithm/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
	</channel>
</rss>
